Don't use a StringIO when encoding data. #8

arthurschreiber · 2016-08-12T09:46:54Z

When data is encoded to BERT, each individual, encoded result piece is stored inside an Array based Buffer. At the end, each piece is sequentially written out to a StringIO object and the underlying String is returned. Unfortunately, this sequential writing to StringIO causes a lot of growth and reallocation of the internal String object. By calling #join on the Buffer's Array, Ruby will allocate a single string that can contain the whole result in a single step.

See benchmarks in #7 (comment).

This is an easy win in reducing memory usage without any noticeable impact on CPU usage. 🎆

When data is encoded to BERT, each individual, encoded result piece is stored inside an Array based Buffer. At the end, each piece is sequentially written out to a StringIO object and the underlying String is returned. Unfortunately, this sequential writing to StringIO causes a lot of growth of the internal String object. By calling `#join` on the Buffer internal Array, Ruby will allocate a single string that can contain the whole result in a single step.

carlosmn · 2016-08-12T10:23:34Z

Yeah, this looks like a gain even for smaller payloads.

haileys · 2017-01-11T05:00:28Z

It looks like this PR causes some encoding-related test failures:

https://travis-ci.org/github/bert/builds/190846984

@arthurschreiber, @carlosmn: Do you mind if I revert this PR? I'm trying to get master CI'd and in a state where it reflects what we're running in production.

arthurschreiber · 2017-01-11T09:19:05Z

@charliesome See the fix over in #10.

carlosmn merged commit 3edcf49 into github:master Aug 12, 2016

arthurschreiber deleted the arthur/reduce-mem-usage branch January 11, 2017 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use a StringIO when encoding data. #8

Don't use a StringIO when encoding data. #8

Uh oh!

arthurschreiber commented Aug 12, 2016

Uh oh!

carlosmn commented Aug 12, 2016

Uh oh!

haileys commented Jan 11, 2017

Uh oh!

arthurschreiber commented Jan 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Don't use a StringIO when encoding data. #8

Don't use a StringIO when encoding data. #8

Uh oh!

Conversation

arthurschreiber commented Aug 12, 2016

Uh oh!

carlosmn commented Aug 12, 2016

Uh oh!

haileys commented Jan 11, 2017

Uh oh!

arthurschreiber commented Jan 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants