Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage when reading response body #451

Merged

Conversation

@janko
Copy link
Member

@janko janko commented Jan 29, 2018

Currently for each chunk read from the socket a new string is being allocated. This is not necessary, because we're feeding that string object into the HTTP parser and never using it again.

So we initialize a buffer string and pass it in when reading chunks. This will make each chunk be read into the same buffer object, no new string will be allocated in that operation.

When profiling download of a 18MB file, before this change there was 48MB of strings being allocated, and after this change it's about 18MB. This is more than 50% reduction in memory allocation.

Currently for each chunk read from the socket a new string is being
allocated. This is not necessary, because we're feeding that string
object into the HTTP parser and never using it again.

So we initialize a buffer string and pass it in when reading chunks.
This will make each chunk be read into the same buffer object, no new
string will be allocated in that operation.

In my benchmark this halves the allocated memory when downloading
large response bodies.
@janko janko force-pushed the janko:reduce-memory-usage-when-reading-response-body branch to ddaa8a4 Jan 29, 2018
@zanker
Copy link
Contributor

@zanker zanker commented Jan 29, 2018

Seems reasonable to me. We have enough specs on request states that I'd expect it to have broken if the http_parser gem was somehow reusing the string, and otherwise it does look like @buffer isn't reused once it's passed to the C gem.

@ixti
Copy link
Member

@ixti ixti commented Jan 29, 2018

Looks good to me as well. Merging this in!
@janko-m thank you!

@ixti ixti merged commit 91722b1 into httprb:master Jan 29, 2018
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@janko janko deleted the janko:reduce-memory-usage-when-reading-response-body branch Jan 29, 2018
@janko
Copy link
Member Author

@janko janko commented Apr 20, 2018

I know that the master has been switched to 4.0.0.dev, but any chance that a new 3.0.x patch version can be released with this change? It makes a huge difference when timeouts are used:

HTTP
  .timeout(read: 2)
  .get("http://example.com/a-50MB-file")
  .body.each { |chunk| chunk.clear }
Before: 533MB
After:  15MB
@ixti
Copy link
Member

@ixti ixti commented Apr 22, 2018

@janko-m I'll work on this.

@ixti
Copy link
Member

@ixti ixti commented Apr 22, 2018

Released as 3.2.0 (somehow I merged EVERY bugfix from master to 3-x-stable but this and released as 3.1.0, so had to release 3.2.0 immediately after that one). :D

@janko
Copy link
Member Author

@janko janko commented Apr 22, 2018

@ixti Awesome, thank you very much! 🙏

The memory difference isn't as drastic as I initially posted (I read that from the MemoryProfiler output). I now tried watching how much the actual process memory grew when GC was disabled. So, testing with a 50MB file, before this change the amount allocated memory was about 150MB, while after this change it's only about 15MB. So, while not as drastic as 500MB to 15MB, it's still 10x less which is a lot 😃

I also compared this with open-uri (which uses Net::HTTP), and the memory usage was also about 150MB. So it's nice that for the Down gem I can now say that one more advantage of the Down::Http backend over Down::NetHttp is that it consumes 10x less memory 👌

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jun 7, 2020
Update ruby-http to 4.4.1.


## 4.4.1 (2020-03-29)

* Backport [#590](httprb/http#590)
  Fix parser failing on some edge cases.
  ([@ixti])

## 4.4.0 (2020-03-25)

* Backport [#587](httprb/http#587)
  Fix redirections when server responds with multiple Location headers.
  ([@ixti])

* Backport [#599](httprb/http#599)
  Allow passing HTTP::FormData::{Multipart,UrlEncoded} object directly.
  ([@ixti])

## 4.3.0 (2020-01-09)

* Backport [#581](httprb/http#581)
  Add Ruby-2.7 compatibility.
  ([@ixti], [@janko])


## 4.2.0 (2019-10-22)

* Backport [#489](httprb/http#489)
  Fix HTTP parser.
  ([@ixti], [@fxposter])


## 4.1.1 (2019-03-12)

* Add `HTTP::Headers::ACCEPT_ENCODING` constant.
  ([@ixti])


## 4.1.0 (2019-03-11)

* [#533](httprb/http#533)
  Add URI normalizer feature that allows to swap default URI normalizer.
  ([@mamoonraja])


## 4.0.5 (2019-02-15)

* Backport [#532](httprb/http#532) from master.
  Fix pipes support in request bodies.
  ([@ixti])


## 4.0.4 (2019-02-12)

* Backport [#506](httprb/http#506) from master.
  Skip auto-deflate when there is no body.
  ([@Bonias])


## 4.0.3 (2019-01-18)

* Fix missing URL in response wrapped by auto inflate.
  ([@ixti])

* Provide `HTTP::Request#inspect` method for debugging purposes.
  ([@ixti])


## 4.0.2 (2019-01-15)

* [#506](httprb/http#506)
  Fix instrumentation feature.
  ([@paul])


## 4.0.1 (2019-01-14)

* [#515](httprb/http#515)
  Fix `#build_request` and `#request` to respect default options.
  ([@RickCSong])


## 4.0.0 (2018-10-15)

* [#482](httprb/http#482)
  [#499](httprb/http#499)
  Introduce new features injection API with 2 new feaures: instrumentation
  (compatible with ActiveSupport::Notification) and logging.
  ([@paul])

* [#473](httprb/http#473)
  Handle early responses.
  ([@janko-m])

* [#468](httprb/http#468)
  Rewind `HTTP::Request::Body#source` once `#each` is complete.
  ([@ixti])

* [#467](httprb/http#467)
  Drop Ruby 2.2 support.
  ([@ixti])

* [#436](httprb/http#436)
  Raise ConnectionError when writing to socket fails.
  ([@janko-m])

* [#438](httprb/http#438)
  Expose `HTTP::Request::Body#source`.
  ([@janko-m])

* [#446](httprb/http#446)
  Simplify setting a timeout.
  ([@mikegee])

* [#451](httprb/http#451)
  Reduce memory usage when reading response body.
  ([@janko-m])

* [#458](httprb/http#458)
  Extract HTTP::Client#build_request method.
  ([@tycoon])

* [#462](httprb/http#462)
  Fix HTTP::Request#headline to allow two leading slashes in path.
  ([@scarfacedeb])

* [#454](httprb/http#454)
  [#464](httprb/http#464)
  [#384](httprb/http#384)
  Fix #readpartial not respecting max length argument.
  ([@janko-m], [@marshall-lee])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants