New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leaks from larger NZB's #1736
Comments
Additionally, if you keep adding this NZB, the memory leaks keep adding up. |
The leaking seems a lot less when I don't have the interface open. Only about 3-5MB for each time I try the NZB. |
A few remarks
|
Oh ... my SAB downloading nohting, but on each F5/refresh, RAM uses goes up. Not sure if the Refresh causes that, or it happens in the background already, and the refresh just reveals it. In 2 minutes, and 20 refreshes memory went from 52 MB to "V=1943M R=110M", so 110 MB RAM ... |
I stripped out the starting of CherryPy and the leaks are the same, so not an interface thing it seems. |
On Windows, using the nzb above, I notice that the memory is not freed 30 seconds after the nzb is finished. When calling gc.collect() it instantly falls to a few 100 KB more than it was before importing it. I used this for measuring:
|
Aah, I used just task manager.. What do you see there? |
I think the same as you. About 56 MB before importing, then it increases to about 90MB, falls down to 70-something when it finishes, and a bit more than 56 MB after I remove it from history and |
Strange, I see this with Each time I let it download, it fails to download due to missing articles and then I remove it from history manually. |
I use 6 servers, all with different priorities. Did you try importing gc and calling gc.collect()? |
I use a separate thread doing this:
|
I see similar behavior from 2.3.9, so will have to investigate this more. Will continue making the 3.2.x release. |
@puzzledsab, what if the 'Only Get Articles for Top of Queue' switch is enabled? (as this is what it was designed to prevent) |
@thezoggy: it makes no difference. They are broken on all servers so the first server probably doesn't get far ahead and SAB quickly decides to give up. |
I'm at a loss, during testing sometimes it just randomly keeps sitting at 250MB with an empty queue while now that I added a |
Yes, worth at try. And logging that tells memory usage before and after that after gc.collect() |
FWIW / without further inspection: Currently the hottest thing in Linux world is 'ebpf'. And ebpf has a tool "memleak-bpfcc". So I ran that on PID of SABnzbd, and ran the 1GB-test-download ... with the results below. No idea if it's useful
After the download:
|
Maybe it's because gc.collect() forces it to garbage collect everything including lost circular references immediately, which is not the default. https://docs.python.org/3/library/gc.html Are all the references to articles, files and nzos cleared in the right order, so that the references from articles to files or files to nzos don't make problems? |
Maybe it's the connection handle to the usenet server that's not getting closed until job is over |
I added the 5 minute |
https://stackify.com/python-garbage-collection/ have we checked to see what the reference count is on the various nzo vars to see if one keeps getting reused causing gc not to run ? |
With the |
from the notes here, if you add the nzb and memory goes up, and doesnt go down if job completes.. what if the user delete the job does memory go down? (and while I know the gc isnt agressive but it should still run after some time rather than us having to force it unless something is triggering it not to be freed) |
I read the |
I loaded up a bunch of nzbs, dumped the data of the process from RAM to disk, ran it through strings and scrolled through the resulting file with less. I saw a whole bunch of data from unparsed nzb files, so I suspected the XML parser. To test it, I replaced the XML parser with regex parsing. The code is here: https://github.com/puzzledsab/sabnzbd/tree/feature/regex_parser The current develop branch starts at 52.6 MB, increases to 152 when the NZBs are loaded, and after deleting them and waiting for GC to run, it falls back to 73.5. The regex version starts at 53.1 MB, increases to just 70,6 when the nzbs are loaded, and falls to 66.6 when they are deleted and GC has run. Both were loaded in paused mode so there is no caching. It seems too good to be true so it is not impossible that I have done something wrong. The nzbs I tried to download had no problems, though. I think it seems very likely that the XML parser is the biggest problem. |
Hmmm, let's see if there's a way maybe we can get it to clear the data... The big benefit of this nzbparser is that it is more readable code. We used to have a regex based version in the old SABnzbd and those just get ugly.. |
I think it's particularly strange that almost all of the memory is freed when the nzos are deleted. It seems like it is not actually leaked, but connected to the imported data somehow. Maybe it's an indication that there is a mistake in the regex parser. |
I think you are on to something! 🚀 #nzb_tree = xml.etree.ElementTree.fromstring(raw_data)
nzb_tree = lxml.etree.fromstring(utob(raw_data)) With (edited) |
I'd def think we should go with lxml. Back in sickbeard dev, elementtree while stable is slow compared to about anything else. Lxml is significant faster (partially written in C) and has a bit more features (full xpath). Also it's what beatifulsoup uses if we ever wanted to go that route for other parsing. |
Nice. I saw it jump to up to 177 MB in short bursts on the large files I had used for testing, but it does fall back to about 61 MB so that's great. There are some huge nzbs that really pushes the RAM usage, though. It would probably help a lot if the parser could read the file from disk itself instead of as a string. I tried a 600 GB nzb, and it uses more than 1,2 GB RAM while parsing. When it's done parsing it falls back to 330 MB. I changed the regex parser to use StringIO instead of splitlines. Now it uses 58,5 MB after loading my set of test NZBs (1.1 TB). The 600 GB nzb jumps straight to 500 MB when it starts to parse and increases to 576 before falling back to 138 MB, then later to 122. I agree that the XML parser is usually preferable, but SAB would be able to load just about anything on any setup if it's there as an option and it reads from a stream instead of loading it into RAM first. |
Setting I retested it without |
The reason it uses the string is due to the way file uploads work. I can always change that of course. |
How about if I try to write a very conservative regex parser that is optional and falls back to the XML parser if anything at all about the NZB is off? Almost all NZBs seem to follow a very similar pattern so it shouldn't be impossible to make something that works in most cases and doesn't harm the rest. The resource usage is one of the issues that is most frequently mentioned in discussions so I think it would make SAB a lot more attractive. |
We could do that in the end indeed. Still investigating what causes the memory spike. |
Is it easy as that? If so, would it useful if I try that on my Linux, to see if it lowers memory R=400M to something more reasonable? |
Yes please @sanderjo, but make sure you run |
It was already there:
TBC |
Have you found anything regarding the XML parser? Those optimising tasks are like an itch, it's impossible not to scratch. I changed the regex parser a bit and made it fall back to XML if there is a problem: https://github.com/puzzledsab/sabnzbd/tree/feature/regex_parser I then filled the queue with more than 1300 NZBs for a total of 6.2 TB in paused mode, which pushed the memory usage to about 230 MB. One of the files could not be parsed by the regex version. |
On Ubuntu Linux: I did the lxml change (see below). At start, SAB uses 73 MB real memory. During a download it goes up to 350 MB. After download, it's at 230 MB. So not back to 73 MB. I would to know what @jcfp experiences as memory usage on his Linux
|
More results with lxml: Fresh start of SAB: 83 MB memory So ... low memory usage. No full release. |
Second test with lxml: SABnzbd paused Unpausing SAB ... memory goes up to 374 MB and hovers between 292 MB and 414 MB ... mostly around 380 MB. I'll report back after completing the download. Oh ... memory usage now 537 MB: After some time: When I pause, memory stays 679 MB |
After you unpause it's probably mostly the article cache that increases the memory usage. I don't think it's flushed by pausing again. |
with lxml: after downloading the total of 100GB, and everything ready, memory usage is at 699 MB. I repeat what I said earlier: maybe on Linux it's just usage, and not a leak. The Linux OS itself uses memory if it's there. So therefore I would like to hear experience and opinion of @jcfp |
Apparently you have to tell Python to release memory in a very stern tone on Linux. |
I will put this into SABnzbd.py right after the gc.collect()
EDIT I did this:
with nice results:
So memory is release with the malloc_trim(), and stays lower. As soon as the queue is empty, I expect the mem to go to a nice low 100MB. BRB EDIT 2: Pity ... it stays on 290 MB ...
|
@Safihre Have you given any more thought to this? I think the alternatives are lxml or regex with fallback to today's xml parser. |
Closed by #1992. |
I am not sure what is going on, but somewhere memory is leaking.
Take for example this NZB Test123.nzb.gz (it is incomplete, so safe to share).
When I start SABnzbd it uses 55MB of memory, after this download fails it is stuck at 99MB of memory. This memory is never released.
With the
gc_stats
API-call I am pretty sure I can verify that noTryList
related objects are left in memory.So what is left in memory? 😵
The text was updated successfully, but these errors were encountered: