Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks from larger NZB's #1736

Closed
Safihre opened this issue Jan 3, 2021 · 47 comments
Closed

Memory leaks from larger NZB's #1736

Safihre opened this issue Jan 3, 2021 · 47 comments
Labels

Comments

@Safihre
Copy link
Member

Safihre commented Jan 3, 2021

I am not sure what is going on, but somewhere memory is leaking.

Take for example this NZB Test123.nzb.gz (it is incomplete, so safe to share).
When I start SABnzbd it uses 55MB of memory, after this download fails it is stuck at 99MB of memory. This memory is never released.
With the gc_stats API-call I am pretty sure I can verify that no TryList related objects are left in memory.
So what is left in memory? 😵

@Safihre
Copy link
Member Author

Safihre commented Jan 3, 2021

Additionally, if you keep adding this NZB, the memory leaks keep adding up.

@Safihre Safihre added the Bug label Jan 3, 2021
@Safihre
Copy link
Member Author

Safihre commented Jan 3, 2021

The leaking seems a lot less when I don't have the interface open. Only about 3-5MB for each time I try the NZB.
Could it be a Cheetah problem?
Plus calling gc_stats also lowers the memory usage because it triggers a garbage collection.

@sanderjo
Copy link
Contributor

sanderjo commented Jan 3, 2021

A few remarks

  • When is something a memory leak? When it does not release memory spontaneously, where when it does not release memory when other apps ask the OS for memory?
  • My SABnzbd, having something a few posts, now downloading nothing, says " V=2407M R=393M", so it's using 393 MB real memory. After a SAB restart: "V=1939M R=57M", so only 57 MB

@sanderjo
Copy link
Contributor

sanderjo commented Jan 3, 2021

The leaking seems a lot less when I don't have the interface open. Only about 3-5MB for each time I try the NZB.
Could it be a Cheetah problem?

Oh ... my SAB downloading nohting, but on each F5/refresh, RAM uses goes up. Not sure if the Refresh causes that, or it happens in the background already, and the refresh just reveals it.

In 2 minutes, and 20 refreshes memory went from 52 MB to "V=1943M R=110M", so 110 MB RAM ...

@Safihre
Copy link
Member Author

Safihre commented Jan 3, 2021

I stripped out the starting of CherryPy and the leaks are the same, so not an interface thing it seems.

@puzzledsab
Copy link
Contributor

puzzledsab commented Jan 3, 2021

On Windows, using the nzb above, I notice that the memory is not freed 30 seconds after the nzb is finished. When calling gc.collect() it instantly falls to a few 100 KB more than it was before importing it.

I used this for measuring:

proc = psutil.Process(os.getpid())
proc.memory_info().rss

@Safihre
Copy link
Member Author

Safihre commented Jan 3, 2021

Aah, I used just task manager.. What do you see there?

@puzzledsab
Copy link
Contributor

I think the same as you. About 56 MB before importing, then it increases to about 90MB, falls down to 70-something when it finishes, and a bit more than 56 MB after I remove it from history and gc.collect() is called. The increase after every import and remove is maybe somewhere between 200 KB and 800 KB. It varies.

@Safihre
Copy link
Member Author

Safihre commented Jan 3, 2021

Strange, I see this with psutil:
Startup: 75030528
After first time: 92147712
After second time: 98308096
After third time: 100237312

Each time I let it download, it fails to download due to missing articles and then I remove it from history manually.
I use 2 servers with same priority, you?

@puzzledsab
Copy link
Contributor

I use 6 servers, all with different priorities. Did you try importing gc and calling gc.collect()?

@puzzledsab
Copy link
Contributor

I use a separate thread doing this:

    def run(self):
        proc = psutil.Process(os.getpid())
        before = 0
        after = 0
        while 1:
            before = proc.memory_info().rss
            logging.debug("Mem usage before gc: %d (%d)", before/1024, (before - after) / 1024)
            gc.collect()
            after = proc.memory_info().rss
            logging.debug("Mem usage after gc: %d (%d)", after/1024, (after - before) / 1024)
            time.sleep(30)

@Safihre
Copy link
Member Author

Safihre commented Jan 3, 2021

I see similar behavior from 2.3.9, so will have to investigate this more. Will continue making the 3.2.x release.

@thezoggy
Copy link
Contributor

thezoggy commented Jan 3, 2021

@puzzledsab, what if the 'Only Get Articles for Top of Queue' switch is enabled? (as this is what it was designed to prevent)

Safihre added a commit that referenced this issue Jan 4, 2021
@puzzledsab
Copy link
Contributor

@thezoggy: it makes no difference. They are broken on all servers so the first server probably doesn't get far ahead and SAB quickly decides to give up.

@Safihre
Copy link
Member Author

Safihre commented Jan 4, 2021

I'm at a loss, during testing sometimes it just randomly keeps sitting at 250MB with an empty queue while now that I added a gc.collect() to the 30-second polling-loop it recuded the memory usage to 24MB (while after startup it's 55MB).
Strange... No clue what to make of it.
Should we add a regular gc.collect() every 5 (?) minutes?

@sanderjo
Copy link
Contributor

sanderjo commented Jan 4, 2021

Should we add a regular gc.collect() every 5 (?) minutes?

Yes, worth at try. And logging that tells memory usage before and after that after gc.collect()

@sanderjo
Copy link
Contributor

sanderjo commented Jan 4, 2021

FWIW / without further inspection:

Currently the hottest thing in Linux world is 'ebpf'. And ebpf has a tool "memleak-bpfcc". So I ran that on PID of SABnzbd, and ran the 1GB-test-download ... with the results below.

No idea if it's useful

sudo apt-get install bpfcc-tools
sudo memleak-bpfcc -p 194556
[23:09:50] Top 10 stacks with outstanding allocations:
	731300 bytes in 43220 allocations from stack
		ASN1_STRING_set+0x7a [libcrypto.so.1.1]
	832720 bytes in 5640 allocations from stack
		BUF_MEM_grow+0x70 [libcrypto.so.1.1]
	1050452 bytes in 18990 allocations from stack
		CRYPTO_zalloc+0xe [libcrypto.so.1.1]
	1411159 bytes in 259 allocations from stack
		_PyBytes_Resize+0x12c [python3.8]
	1920960 bytes in 45900 allocations from stack
		CRYPTO_zalloc+0xe [libcrypto.so.1.1]
	2148960 bytes in 2820 allocations from stack
		[unknown] [libcrypto.so.1.1]
	2191648 bytes in 43066 allocations from stack
		CRYPTO_zalloc+0xe [libcrypto.so.1.1]
		[unknown]
	2957312 bytes in 2 allocations from stack
		gdk_pixbuf_new+0x37 [libgdk_pixbuf-2.0.so.0.4000.0]
	3380304 bytes in 633 allocations from stack
		[unknown] [libsqlite3.so.0.8.6]
	21424424 bytes in 492 allocations from stack
		PyBytes_FromStringAndSize+0x231 [python3.8]
		[unknown]

[23:10:00] Top 10 stacks with outstanding allocations:
	731300 bytes in 43220 allocations from stack
		ASN1_STRING_set+0x7a [libcrypto.so.1.1]
	832720 bytes in 5640 allocations from stack
		BUF_MEM_grow+0x70 [libcrypto.so.1.1]
	1033984 bytes in 196 allocations from stack
		_PyBytes_Resize+0x12c [python3.8]
	1050452 bytes in 18990 allocations from stack
		CRYPTO_zalloc+0xe [libcrypto.so.1.1]
	1478656 bytes in 1 allocations from stack
		gdk_pixbuf_new+0x37 [libgdk_pixbuf-2.0.so.0.4000.0]
	1920960 bytes in 45900 allocations from stack
		CRYPTO_zalloc+0xe [libcrypto.so.1.1]
	2148960 bytes in 2820 allocations from stack
		[unknown] [libcrypto.so.1.1]
	2191840 bytes in 43070 allocations from stack
		CRYPTO_zalloc+0xe [libcrypto.so.1.1]
		[unknown]
	3921936 bytes in 757 allocations from stack
		[unknown] [libsqlite3.so.0.8.6]
	17321886 bytes in 500 allocations from stack
		PyBytes_FromStringAndSize+0x231 [python3.8]
		[unknown]

After the download:

$ sudo memleak-bpfcc -p 194556
Attaching to pid 194556, Ctrl+C to quit.
[23:15:52] Top 10 stacks with outstanding allocations:
	24 bytes in 1 allocations from stack
		[unknown] [libX11.so.6.3.0]
	24 bytes in 1 allocations from stack
		g_malloc0+0x21 [libglib-2.0.so.0.6400.3]
		[unknown]
		[unknown]
	32 bytes in 1 allocations from stack
		g_realloc+0x20 [libglib-2.0.so.0.6400.3]
		[unknown]
	32 bytes in 1 allocations from stack
		[unknown] [libcairo.so.2.11600.0]
		[unknown]
	192 bytes in 6 allocations from stack
		[unknown] [python3.8]
		[unknown]
	256 bytes in 1 allocations from stack
		[unknown] [libpixman-1.so.0.38.4]
	440 bytes in 1 allocations from stack
		[unknown] [libcairo.so.2.11600.0]
	711 bytes in 1 allocations from stack
		PyBytes_FromStringAndSize+0x231 [python3.8]
		[unknown]
	2208 bytes in 1 allocations from stack
		PyObject_Malloc+0xe4 [python3.8]
	1478656 bytes in 1 allocations from stack
		gdk_pixbuf_new+0x37 [libgdk_pixbuf-2.0.so.0.4000.0]

@puzzledsab
Copy link
Contributor

Maybe it's because gc.collect() forces it to garbage collect everything including lost circular references immediately, which is not the default. https://docs.python.org/3/library/gc.html

Are all the references to articles, files and nzos cleared in the right order, so that the references from articles to files or files to nzos don't make problems?

@thezoggy
Copy link
Contributor

thezoggy commented Jan 5, 2021

Maybe it's the connection handle to the usenet server that's not getting closed until job is over

Safihre added a commit that referenced this issue Jan 6, 2021
@Safihre
Copy link
Member Author

Safihre commented Jan 6, 2021

I added the 5 minute gc.collect() call and randomly I see after a while the memory usage drop. Sometimes it's after 5 minutes, sometimes it's only after an hour. But I guess it helps.

@thezoggy
Copy link
Contributor

thezoggy commented Jan 6, 2021

https://stackify.com/python-garbage-collection/

have we checked to see what the reference count is on the various nzo vars to see if one keeps getting reused causing gc not to run ?

@Safihre
Copy link
Member Author

Safihre commented Jan 7, 2021

With the gc_stats call I already verified the problem is specifically not any of our NzoObject, NzoObject or Article objects. I fixed that problem a while a go.
So it's something else..

@thezoggy
Copy link
Contributor

thezoggy commented Jan 7, 2021

from the notes here, if you add the nzb and memory goes up, and doesnt go down if job completes.. what if the user delete the job does memory go down? (and while I know the gc isnt agressive but it should still run after some time rather than us having to force it unless something is triggering it not to be freed)

@Safihre
Copy link
Member Author

Safihre commented Jan 7, 2021

I read the gc documentation a bit more and it seems it has these 3 thresholds which trigger the collection (3 different levels of "deepness" of the garbage to be cleared). Only after it reaches the threshold it triggers the collection and only for that level.
So eventually it will get called, it just might take forever until the threshold is met.

@puzzledsab
Copy link
Contributor

puzzledsab commented Jan 24, 2021

I loaded up a bunch of nzbs, dumped the data of the process from RAM to disk, ran it through strings and scrolled through the resulting file with less. I saw a whole bunch of data from unparsed nzb files, so I suspected the XML parser.

To test it, I replaced the XML parser with regex parsing. The code is here: https://github.com/puzzledsab/sabnzbd/tree/feature/regex_parser

The current develop branch starts at 52.6 MB, increases to 152 when the NZBs are loaded, and after deleting them and waiting for GC to run, it falls back to 73.5.

The regex version starts at 53.1 MB, increases to just 70,6 when the nzbs are loaded, and falls to 66.6 when they are deleted and GC has run.

Both were loaded in paused mode so there is no caching. It seems too good to be true so it is not impossible that I have done something wrong. The nzbs I tried to download had no problems, though. I think it seems very likely that the XML parser is the biggest problem.

@Safihre
Copy link
Member Author

Safihre commented Jan 24, 2021

Hmmm, let's see if there's a way maybe we can get it to clear the data... The big benefit of this nzbparser is that it is more readable code. We used to have a regex based version in the old SABnzbd and those just get ugly..

@puzzledsab
Copy link
Contributor

I think it's particularly strange that almost all of the memory is freed when the nzos are deleted. It seems like it is not actually leaked, but connected to the imported data somehow. Maybe it's an indication that there is a mistake in the regex parser.

@Safihre
Copy link
Member Author

Safihre commented Jan 24, 2021

I think you are on to something! 🚀
I saw that I also have lxml installed, so I simply did in nzbparser.py:

#nzb_tree = xml.etree.ElementTree.fromstring(raw_data)
nzb_tree = lxml.etree.fromstring(utob(raw_data))

With ElementTree I see the memory jump from 55M to 100M and stay there, but with lxml it jumps to 100M but then straight back to 60M.
I remember vaguely that lxml is a bit of a harder dependency because it is a compiled package, so have to see if it's a easy replacement.

(edited)

@thezoggy
Copy link
Contributor

thezoggy commented Jan 24, 2021

I'd def think we should go with lxml.

Back in sickbeard dev, elementtree while stable is slow compared to about anything else. Lxml is significant faster (partially written in C) and has a bit more features (full xpath). Also it's what beatifulsoup uses if we ever wanted to go that route for other parsing.

@puzzledsab
Copy link
Contributor

Nice. I saw it jump to up to 177 MB in short bursts on the large files I had used for testing, but it does fall back to about 61 MB so that's great. There are some huge nzbs that really pushes the RAM usage, though. It would probably help a lot if the parser could read the file from disk itself instead of as a string. I tried a 600 GB nzb, and it uses more than 1,2 GB RAM while parsing. When it's done parsing it falls back to 330 MB.

I changed the regex parser to use StringIO instead of splitlines. Now it uses 58,5 MB after loading my set of test NZBs (1.1 TB). The 600 GB nzb jumps straight to 500 MB when it starts to parse and increases to 576 before falling back to 138 MB, then later to 122. I agree that the XML parser is usually preferable, but SAB would be able to load just about anything on any setup if it's there as an option and it reads from a stream instead of loading it into RAM first.

@puzzledsab
Copy link
Contributor

Setting raw_data = None immediately after lxml.etree.fromstring may have helped a little bit on the maximum usage. Also it quickly falls to 127.6 MB instead of 330 when it's done importing the 600 GB NZB.

I retested it without raw_data = None as well, and when I let it sit long enough in paused mode it falls to 125.7 MB. I guess setting it to None just gives the GC a head start.

@Safihre
Copy link
Member Author

Safihre commented Jan 25, 2021

The reason it uses the string is due to the way file uploads work. I can always change that of course.
Unfortunately lxml is huge and it makes the resulting Windows/macOS binaries 4MB larger 😢 that's a lot..

@puzzledsab
Copy link
Contributor

How about if I try to write a very conservative regex parser that is optional and falls back to the XML parser if anything at all about the NZB is off? Almost all NZBs seem to follow a very similar pattern so it shouldn't be impossible to make something that works in most cases and doesn't harm the rest. The resource usage is one of the issues that is most frequently mentioned in discussions so I think it would make SAB a lot more attractive.

@Safihre
Copy link
Member Author

Safihre commented Jan 25, 2021

We could do that in the end indeed. Still investigating what causes the memory spike.

@sanderjo
Copy link
Contributor

sanderjo commented Jan 25, 2021

#nzb_tree = xml.etree.ElementTree.fromstring(raw_data)
nzb_tree = lxml.etree.fromstring(utob(raw_data))

Is it easy as that? If so, would it useful if I try that on my Linux, to see if it lowers memory R=400M to something more reasonable?

@Safihre
Copy link
Member Author

Safihre commented Jan 25, 2021

Yes please @sanderjo, but make sure you run pip install lxml first to be sure.

@sanderjo
Copy link
Contributor

Yes please @sanderjo, but make sure you run pip install lxml first to be sure.

It was already there:

sander@brixit:~$ python3 -c "import lxml; print(lxml.__version__)" 
4.6.2

TBC

@puzzledsab
Copy link
Contributor

Have you found anything regarding the XML parser? Those optimising tasks are like an itch, it's impossible not to scratch. I changed the regex parser a bit and made it fall back to XML if there is a problem: https://github.com/puzzledsab/sabnzbd/tree/feature/regex_parser

I then filled the queue with more than 1300 NZBs for a total of 6.2 TB in paused mode, which pushed the memory usage to about 230 MB. One of the files could not be parsed by the regex version.

@sanderjo
Copy link
Contributor

On Ubuntu Linux: I did the lxml change (see below). At start, SAB uses 73 MB real memory. During a download it goes up to 350 MB. After download, it's at 230 MB. So not back to 73 MB.

I would to know what @jcfp experiences as memory usage on his Linux

sander@brixit:~/git/lxml-sabnzbd$ git diff
diff --git a/sabnzbd/nzbparser.py b/sabnzbd/nzbparser.py
index 7e7a6472..7eab1412 100644
--- a/sabnzbd/nzbparser.py
+++ b/sabnzbd/nzbparser.py
@@ -23,8 +23,9 @@ import gzip
 import time
 import logging
 import hashlib
-import xml.etree.ElementTree
+#import xml.etree.ElementTree
 import datetime
+import lxml.etree # SJ
 
 import sabnzbd
 from sabnzbd import filesystem, nzbstuff
@@ -36,7 +37,8 @@ from sabnzbd.misc import name_to_cat
 def nzbfile_parser(raw_data, nzo):
     # Load data as file-object
     raw_data = raw_data.replace("http://www.newzbin.com/DTD/2003/nzb", "", 1)
-    nzb_tree = xml.etree.ElementTree.fromstring(raw_data)
+    #nzb_tree = xml.etree.ElementTree.fromstring(raw_data)
+    nzb_tree = lxml.etree.fromstring(utob(raw_data)) # SJ
 
     # Hash for dupe-checking
     md5sum = hashlib.md5()

@sanderjo
Copy link
Contributor

More results with lxml:

Fresh start of SAB: 83 MB memory
SAB Paused
Adding the Test123.gz ... 92 MB memory
Unpausing SAB
Immediately aborted, 119 MB memory usage

So ... low memory usage. No full release.

@sanderjo
Copy link
Contributor

sanderjo commented Jan 28, 2021

Second test with lxml:

SABnzbd paused
Add 13 NZBs, with total of 109 GB to download
SAB still paused, memory usage: 153 MB.

Unpausing SAB ... memory goes up to 374 MB and hovers between 292 MB and 414 MB ... mostly around 380 MB.

I'll report back after completing the download.

Oh ... memory usage now 537 MB:

image

After some time:

image

When I pause, memory stays 679 MB

image

@puzzledsab
Copy link
Contributor

After you unpause it's probably mostly the article cache that increases the memory usage. I don't think it's flushed by pausing again.

@sanderjo
Copy link
Contributor

with lxml: after downloading the total of 100GB, and everything ready, memory usage is at 699 MB.

I repeat what I said earlier: maybe on Linux it's just usage, and not a leak. The Linux OS itself uses memory if it's there. So therefore I would like to hear experience and opinion of @jcfp

image

@puzzledsab
Copy link
Contributor

Apparently you have to tell Python to release memory in a very stern tone on Linux.
https://www.softwareatscale.dev/p/run-python-servers-more-efficiently

@sanderjo
Copy link
Contributor

sanderjo commented Jan 28, 2021

I will put this into SABnzbd.py right after the gc.collect()

  libc = ctypes.CDLL("libc.so.6")
  return libc.malloc_trim(0)

EDIT

I did this:

        # 300 sec polling tasks
        if not timer % 100:
            if sabnzbd.LOG_ALL:
                logging.debug("Triggering Python garbage collection")

            process = psutil.Process(os.getpid())
            memusageMB1 = process.memory_info().rss / 1024 / 1024                
                
            gc.collect()

            memusageMB2 = process.memory_info().rss / 1024 / 1024            

            libc = ctypes.CDLL("libc.so.6")
            libc.malloc_trim(0)

            memusageMB3 = process.memory_info().rss / 1024 / 1024
            
            logging.debug("SJ, %s - %s - %s", memusageMB1, memusageMB2, memusageMB3)
            
            timer = 0

with nice results:

$ cat ~/.sabnzbd/logs/sabnzbd.log* | grep "SJ," | sort 
2021-01-28 21:48:41,058::DEBUG::[SABnzbd:1555] SJ, 348.39453125 - 348.39453125 - 195.078125
2021-01-28 21:53:41,240::DEBUG::[SABnzbd:1555] SJ, 348.68359375 - 348.68359375 - 147.8828125
2021-01-28 21:58:41,427::DEBUG::[SABnzbd:1555] SJ, 296.14453125 - 296.14453125 - 233.91015625
2021-01-28 22:03:41,615::DEBUG::[SABnzbd:1555] SJ, 378.03515625 - 378.03515625 - 256.27734375

So memory is release with the malloc_trim(), and stays lower.

As soon as the queue is empty, I expect the mem to go to a nice low 100MB. BRB

EDIT 2:

Pity ... it stays on 290 MB ...

$ cat ~/.sabnzbd/logs/sabnzbd.log* | grep "SJ," | sort 
2021-01-28 21:48:41,058::DEBUG::[SABnzbd:1555] SJ, 348.39453125 - 348.39453125 - 195.078125
2021-01-28 21:53:41,240::DEBUG::[SABnzbd:1555] SJ, 348.68359375 - 348.68359375 - 147.8828125
2021-01-28 21:58:41,427::DEBUG::[SABnzbd:1555] SJ, 296.14453125 - 296.14453125 - 233.91015625
2021-01-28 22:03:41,615::DEBUG::[SABnzbd:1555] SJ, 378.03515625 - 378.03515625 - 256.27734375
2021-01-28 22:08:41,797::DEBUG::[SABnzbd:1555] SJ, 509.34375 - 509.34765625 - 356.5078125
2021-01-28 22:13:41,894::DEBUG::[SABnzbd:1555] SJ, 458.71875 - 458.46875 - 293.09765625
2021-01-28 22:18:41,959::DEBUG::[SABnzbd:1555] SJ, 291.06640625 - 291.06640625 - 290.2109375
2021-01-28 22:23:42,036::DEBUG::[SABnzbd:1555] SJ, 292.06640625 - 291.81640625 - 289.40234375
2021-01-28 22:28:42,113::DEBUG::[SABnzbd:1555] SJ, 290.80078125 - 290.80078125 - 290.10546875

@puzzledsab
Copy link
Contributor

@Safihre Have you given any more thought to this? I think the alternatives are lxml or regex with fallback to today's xml parser.

@Safihre
Copy link
Member Author

Safihre commented Dec 8, 2021

Closed by #1992.

@Safihre Safihre closed this as completed Dec 8, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants