Skip to content

Conversation

@arthurschreiber
Copy link

When data is encoded to BERT, each individual, encoded result piece is stored inside an Array based Buffer. At the end, each piece is sequentially written out to a StringIO object and the underlying String is returned. Unfortunately, this sequential writing to StringIO causes a lot of growth and reallocation of the internal String object. By calling #join on the Buffer's Array, Ruby will allocate a single string that can contain the whole result in a single step.

See benchmarks in #7 (comment).


This is an easy win in reducing memory usage without any noticeable impact on CPU usage. 🎆

When data is encoded to BERT, each individual, encoded result piece is stored inside an Array based Buffer. At the end, each piece is sequentially written out to a StringIO object and the underlying String is returned. Unfortunately, this sequential writing to StringIO causes a lot of growth of the internal String object. By calling `#join` on the Buffer internal Array, Ruby will allocate a single string that can contain the whole result in a single step.
@carlosmn
Copy link
Collaborator

Yeah, this looks like a gain even for smaller payloads.

@carlosmn carlosmn merged commit 3edcf49 into github:master Aug 12, 2016
@haileys
Copy link

haileys commented Jan 11, 2017

It looks like this PR causes some encoding-related test failures:

https://travis-ci.org/github/bert/builds/190846984

@arthurschreiber, @carlosmn: Do you mind if I revert this PR? I'm trying to get master CI'd and in a state where it reflects what we're running in production.

@arthurschreiber
Copy link
Author

@charliesome See the fix over in #10.

@arthurschreiber arthurschreiber deleted the arthur/reduce-mem-usage branch January 11, 2017 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants