Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge GC pressure when downloading a file over http (but not https) #345

Closed
eras opened this issue May 11, 2015 · 7 comments
Closed

Huge GC pressure when downloading a file over http (but not https) #345

eras opened this issue May 11, 2015 · 7 comments

Comments

@eras
Copy link

eras commented May 11, 2015

I considered piggy-backing issue #207, but as it was decidedly about server side, I decided to make up a new one.

Downloading a file over HTTP second becomes a CPU-bound operation on a three-year old PC, limited to 800 kilobytes/second after some GC-tuning (the same internet resource can be loaded at 1.4M/s with curl). I have the source code reproducing the issue here: https://www.modeemi.fi/~flux/software/ocaml/downloader/ (this is perhaps easier to test that server-side operation).

The application is able download a separate HTTPS internet resource at 4.3M/s consuming 17% CPU. Before GC-tuning also HTTPS had a similar issue. If I increase Gc.minor_heap_size to (512 * (1 lsl 20)) the download does get faster, as well as the CPU%, but after a few seconds in the CPU% gets back to 100% and the virtual size of the process is now 12 gigabytes and resident creeps up at least towards 1 gigabyte during the transfer. (I tried a few Gc values in-between without better results, in particular the Downloader uses value 32 lsl 20.)

Here's what perf top says:

  94.62%  libc-2.19.so         [.] __memmove_ssse3
   1.17%  downloader.native    [.] mark_slice
   1.01%  downloader.native    [.] caml_page_table_lookup
   0.44%  downloader.native    [.] do_compaction
   0.42%  downloader.native    [.] sweep_slice
   0.35%  downloader.native    [.] invert_pointer_at
   0.14%  downloader.native    [.] caml_darken
   0.11%  downloader.native    [.] caml_do_roots
   0.10%  downloader.native    [.] compact_allocate

In this case I would eventually want to save downloaded content to a file.

@l1x
Copy link

l1x commented May 21, 2015

Can you control the buffer size? It seems this is just spinning the cpu.... I guess if the buffer size is small this would be the symptom.

@rgrinberg
Copy link
Member

You cannot control the buffer size in cohttp. I'm in the process of fixing this problem but for the server side. Client side will follow afterwards.

I'll have a look if it's possible to have a quick fix for this until then but I wouldn't count on it.

@l1x
Copy link

l1x commented May 21, 2015

Thx. Out of curiosity what is/was the root cause?

@rgrinberg
Copy link
Member

Sure. Your comment actually made me take a look now and I think this is actually fixable relatively easily now!

If we follow along the code path for fetching a response we get something like:

  1. https://github.com/mirage/ocaml-cohttp/blob/master/lwt/cohttp_lwt.ml#L147

  2. https://github.com/mirage/ocaml-cohttp/blob/master/lib/transfer_io.ml#L70

  3. https://github.com/mirage/ocaml-cohttp/blob/master/lwt/cohttp_lwt_unix_io.ml#L45

  4. https://github.com/mirage/ocaml-cohttp/blob/master/lwt/cohttp_lwt_unix_io.ml#L41

Connecting the 2, 3, and 4 together we realize that we are trying to allocate a rather hefty string! (That will be immediately thrown away after being read instead of being reused, but that's a different problem already).

If you're interested, a fix for this would be appreciated. If not, then I'll try and get it done next week.

@artemkin already fixed a similar problem with async on issue #330 so you can have a look there.

@rgrinberg
Copy link
Member

@marklrh if you want, you can try tackling this as well

@eras
Copy link
Author

eras commented May 24, 2015

@rgrinberg, it appears the fix resolves my issue completely.

On testing from localhost I was able to get 400+ megabytes per second even with standard GC settings, which I think is completely acceptable :). Even if it at that point it was CPU bound. (But netcat wasn't far off at 65%.)

Thanks!

@rgrinberg
Copy link
Member

@eras NP. This is a pretty significant issue so I will make a release for it as well (0.17.2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants