New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage when reading response body #451

Merged
merged 1 commit into from Jan 29, 2018

Conversation

Projects
None yet
3 participants
@janko-m
Member

janko-m commented Jan 29, 2018

Currently for each chunk read from the socket a new string is being allocated. This is not necessary, because we're feeding that string object into the HTTP parser and never using it again.

So we initialize a buffer string and pass it in when reading chunks. This will make each chunk be read into the same buffer object, no new string will be allocated in that operation.

When profiling download of a 18MB file, before this change there was 48MB of strings being allocated, and after this change it's about 18MB. This is more than 50% reduction in memory allocation.

Reduce memory usage when reading response body
Currently for each chunk read from the socket a new string is being
allocated. This is not necessary, because we're feeding that string
object into the HTTP parser and never using it again.

So we initialize a buffer string and pass it in when reading chunks.
This will make each chunk be read into the same buffer object, no new
string will be allocated in that operation.

In my benchmark this halves the allocated memory when downloading
large response bodies.
@zanker

This comment has been minimized.

Contributor

zanker commented Jan 29, 2018

Seems reasonable to me. We have enough specs on request states that I'd expect it to have broken if the http_parser gem was somehow reusing the string, and otherwise it does look like @buffer isn't reused once it's passed to the C gem.

@ixti

This comment has been minimized.

Member

ixti commented Jan 29, 2018

Looks good to me as well. Merging this in!
@janko-m thank you!

@ixti ixti merged commit 91722b1 into httprb:master Jan 29, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@janko-m janko-m deleted the janko-m:reduce-memory-usage-when-reading-response-body branch Jan 29, 2018

@janko-m

This comment has been minimized.

Member

janko-m commented Apr 20, 2018

I know that the master has been switched to 4.0.0.dev, but any chance that a new 3.0.x patch version can be released with this change? It makes a huge difference when timeouts are used:

HTTP
  .timeout(read: 2)
  .get("http://example.com/a-50MB-file")
  .body.each { |chunk| chunk.clear }
Before: 533MB
After:  15MB
@ixti

This comment has been minimized.

Member

ixti commented Apr 22, 2018

@janko-m I'll work on this.

@ixti

This comment has been minimized.

Member

ixti commented Apr 22, 2018

Released as 3.2.0 (somehow I merged EVERY bugfix from master to 3-x-stable but this and released as 3.1.0, so had to release 3.2.0 immediately after that one). :D

@janko-m

This comment has been minimized.

Member

janko-m commented Apr 22, 2018

@ixti Awesome, thank you very much! 🙏

The memory difference isn't as drastic as I initially posted (I read that from the MemoryProfiler output). I now tried watching how much the actual process memory grew when GC was disabled. So, testing with a 50MB file, before this change the amount allocated memory was about 150MB, while after this change it's only about 15MB. So, while not as drastic as 500MB to 15MB, it's still 10x less which is a lot 😃

I also compared this with open-uri (which uses Net::HTTP), and the memory usage was also about 150MB. So it's nice that for the Down gem I can now say that one more advantage of the Down::Http backend over Down::NetHttp is that it consumes 10x less memory 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment