Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #169

Closed
AlexDaniel opened this issue Apr 1, 2017 · 23 comments
Closed

Memory leak #169

AlexDaniel opened this issue Apr 1, 2017 · 23 comments

Comments

@AlexDaniel
Copy link
Collaborator

Sometimes around ≈20MB per fetched page.

Feel free to try it yourself (I was using blead rakudo):

use HTTP::UserAgent;
my $ua = HTTP::UserAgent.new;
loop {
    my $response = $ua.get(https://perl6.org);
    say $response.content.chars
}
@MasterDuke17
Copy link
Contributor

With sergot/openssl#37 applied the leak is significantly reduced. I changed the code to loop just ten times and profiled with heaptrack. It showed 10Gb allocated before and only 268Mb allocated after.

@AlexDaniel
Copy link
Collaborator Author

@MasterDuke17 sounds awesome!

That said, I don't think it is good enough.

@MasterDuke17
Copy link
Contributor

sergot/openssl#38 reduces the memory used even further, down to 187Mb.

@jonathanstowe
Copy link
Collaborator

Leaving aside the SSL thing, there is a built-in growth in memory consumption that is by design (though arguably not entirely sensible design,) as it saves every response in order to attempt to detect redirect loops, This is appended to the attribute history on the UserAgent object, so you can reset that whenever you see fit.

Testing against a non-SSL URI with an additional $ua.history = [] in the loop, the memory usage stabilises and stays stable after the first couple of requests.

I am pretty certain that this isn't documented (I had actually forgotten that it did this until you raised it,) and I will fix that this afternoon. I also think that there probably should be some switch on the UserAgent to control the behaviour as it would all become a bit of a faff in long runing software that uses the same UA for a large number of requests.

@AlexDaniel
Copy link
Collaborator Author

@jonathanstowe ok, that makes sense!

However, I'm not sure how redirect loops are related to this “feature”? If I'm doing a new request, why would it need something from the history?

@jonathanstowe
Copy link
Collaborator

I'm not entirely convinced either but the code does :

when $.max-redirects < +@.history && all(@.history.reverse[0..$.max-redirects]>>.code)

Which basically checks whether the last max-redirects responses were re-directs. It's actually a bit tricky. I guess the best that you could do (and preserve this check,) is keep a count of the redirects and reset it when you get a non-redirect. Of course better suggestions would be appreciated (and being Saturday my brane is somewhat entertainment impaired :)

@moritz
Copy link
Collaborator

moritz commented Apr 1, 2017

Or for a start, only store the headers, not the bodies...

@jonathanstowe
Copy link
Collaborator

there's a long standing comment in there that say's just that :)

@AlexDaniel
Copy link
Collaborator Author

I'm still not sure why it has to store anything after .get(…) is finished.

@moritz
Copy link
Collaborator

moritz commented Apr 1, 2017

pull request forthcoming, as a basis for discussion.

moritz added a commit that referenced this issue Apr 1, 2017
See #169.
We used to store the full history of the response, just to count redirects.
This used up lots of RAM for very little gain.

This commit replaces it by a simple integer that stores the count of
redirects in a row.

Since @.history was a public attribute, this changes the public API,
and as such is subject to discussion.
@jonathanstowe
Copy link
Collaborator

It seems to pass the tests and I quite like it, but I was going to go for something slightly more conservative and use an :save-history on the request, which is only set by default on on the redirect (so that part can remain unchanged) and if it isn't set rest history at the end of request. That way the same strange behaviour can be retained but not by default.

@moritz
Copy link
Collaborator

moritz commented Apr 1, 2017

@jonathanstowe there are two things here I don't understand. First, who benefits from retaining the strange behavior, particularly if it's not always enabled? At least inside the public ecosystem, there's no usage:

moritz@hack:~/p6/perl6-all-modules$ git ls-files -z|xargs -0  grep -lF UserAgent|xargs grep -l --word history 2>/dev/null
grep: Tux/CSV/csv-rust-csvrdr: No such file or directory
grep: Tux/CSV/csv-rust-libcsv: No such file or directory
grep: Tux/CSV/csv-rust-qckrdr: No such file or directory
avuserow/perl6-webservice-lastfm/README.md
jonathanstowe/META6/t/data/projects.json
sergot/http-useragent/lib/HTTP/UserAgent.pm6
zoffixznet/perl6-RT-REST-Client/lib/RT/REST/Client.pm6

(The mention of history in zoffixznet/perl6-RT-REST-Client/lib/RT/REST/Client.pm6 is part of an URL, and unrelated to the .history method).

And second, how does that work? You don't know whether to save history until after the request, and IMHO that's too late to lead to predictable behavior.

@ugexe
Copy link
Collaborator

ugexe commented Apr 1, 2017

I think the original intention was eventually keep just the headers and allow some way for the Request to transparently store the content history to and from disk. But there have been lots of improvements to the architecture of HTTP::UserAgent since the redirect code was written, so module space could probably provide the content storage portion easily now (although I'm all for it being implemented directly if someone else is doing it).

As for why to keep the headers instead of just a redirect counter - more enhanced redirect logic (especially involving caching) requires more information than just "how many times have I been redirected?".

@jonathanstowe
Copy link
Collaborator

As I noted on the #170 I'd be inclined to move the redirect chain into the response object for those cases where something more sophisticated is required ( this could be as simple as adding a previous-response attribute to request and populating this in the next-request, however right now as we are simply doing a count on the previous redirects anyway @moritz PR seems to be a sound solution. the @.history is neither explicitly tested nor documented, so quite relaxed about dropping it myself.

@jonathanstowe
Copy link
Collaborator

I've merged the #170 I'm going to look at ssupplying the redirect history on a per-response basis under separate cover.

@MasterDuke17
Copy link
Contributor

sergot/openssl#38 was just merged.

@jonathanstowe
Copy link
Collaborator

Are we happy with this now so it can be closed?

@AlexDaniel
Copy link
Collaborator Author

No, I am not. It is still leaking memory. Just run the example and see yourself.
It may be due to RT #131879, I don't know.

@MasterDuke17
Copy link
Contributor

i'm only seeing a tiny leak. Memory usage goes up for a while, but then pretty much stabilizes.

@jonathanstowe
Copy link
Collaborator

Yeah I'm not seeing the memory use monotonously increasing, it goes up for a bit then stays flat as, I guess, the garbage collection starts to take back the unused objects.

@AlexDaniel
Copy link
Collaborator Author

Actually, yes! It turns out that I was using an old version. D'oh! Sorry!

IMO it's closable. I don't know if you want to have tests for this, so feel free to reopen.

@jonathanstowe
Copy link
Collaborator

Yeah I'd like to have a test but I can't see how to do it without shelling out to some highly platform dependent tool.

@AlexDaniel
Copy link
Collaborator Author

AlexDaniel commented Aug 12, 2017

Check the OS and maybe use Linux::Proc::Statm on linux. It's much better to have a test that only runs on Linux than to have no test at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants