Leaving the program open for days--continues to consume memory to exhaustion. #40

brizzbane · 2018-06-26T18:34:17Z

The program seems to continue to consume memory until it runs out. Ctrl+C and restarting clears it--and keeps in place proxies found...

imWildCat · 2018-08-01T17:00:42Z

If you do not follow the template, this issue can hardly be helpful.
https://github.com/imWildCat/scylla/issues/new?template=bug_report.md

brizzbane · 2018-09-01T22:06:01Z

Hey.

I apologize for not following the template.

Here it is:

Describe the bug

After running scylla uninterrupted for a few days (say 5, or 120 hours), the memory of the OS goes to 99%.
When running 'top', it shows that the process is python (scylla).

To Reproduce

Open scylla, and allow it to run uninterrupted. I usually ran into issues after a couple of days.

Also, this may/or may not be part of the issue: I would run a scraper that would use the scylla API every minute or so. I would also occasionally open up the browser to view the scraped proxies/countries.

Expected behavior
That the scylla process would not continue to consume memory to the the point of exhaustion, rendering the OS unusable for both scylla, and other processes.

Screenshots
N/A.

Desktop (please complete the following information):
Debian Stretch
Chrome
version = '1.1.4'

Smartphone (please complete the following information):
Doesn't apply.

Additional context
Really just curious if anyone does NOT experience this. If you have the program open for more than a week at a time, is the memory usage what is expected?

I found that I can ctrl+c/close out the program, and re-open, and the memory will completely clear up. The scraped/working proxies are left in place, and scylla will return to scraping (and verifying scraped proxies are still working) as expected. Memory usage does not continue to grow until in use for many days.

@imWildCat I hope this is a little better. Again, my apologies.

shawwwn · 2019-03-18T04:07:34Z

I can confirm this issue. After leaving it to run for two days, linux oom killer was triggered. There seems to be a memory leak somewhere but I couldn't find it.

My syslog:

[56917.949294] Out of memory: Kill process 2344 (scylla) score 670 or sacrifice child
[56917.952297] Killed process 2461 (scylla) total-vm:281812kB, anon-rss:61084kB, file-rss:24kB

shawwwn · 2019-03-22T10:07:01Z

Found the culprit!

scylla/scylla/scheduler.py

Lines 55 to 62 in 19c8a2e

    
           def validate_ips(validator_queue: Queue, validator_pool: ThreadPoolExecutor): 
        
               while True: 
        
                   try: 
        
                       proxy: ProxyIP = validator_queue.get() 
        
                       validator_pool.submit(validate_proxy_ip, p=proxy) 
        
                   except (KeyboardInterrupt, SystemExit): 
        
                       break

OOM was not caused by memory leak, but resource exhaustion.
The exhaustion happens outside of python's vm, therefore can be very hard to trace.
I narrowed it down to the code above in which a ThreadPoolExecutor is used to assign jobs to worker threads.
Upon further investigation, ThreadPoolExecutor maintains a internal/native queue to store pending jobs.
If enqueue speed is somehow faster than dequeue speed to the internal queue, the queue will grow indefinitely.
Situations like this happens when the machine memory is scarce(e.g. running on a raspberry pi) thus can not spawn more workers to validate ip.

shawwwn · 2019-03-22T10:14:48Z

Fixes I could think of are:

Slow down validation job submission according to number of worker threads.
Implement a custom ThreadPoolExecutor w/e max internal queue size, reject or block when queue is full.
Give user more control over when to fetch and validate ips.

What do you guys think?

iburakov · 2019-03-24T15:41:37Z

Observing similar behaviour too on Ubuntu 18.04.1 LTS

Processes sorted by VIRT in htop during OOM situation after running scylla for several days:

This happens regularly (peaks correspond to OOM, nothing heavy is running alongside on the server):

If scylla is turned off, no such peaks can be seen. Restarting the process, as mentioned by previous users, helps, indeed.

imWildCat added the help wanted Extra attention is needed label Jun 29, 2018

brizzbane closed this as completed Sep 1, 2018

brizzbane reopened this Sep 1, 2018

shawwwn mentioned this issue Mar 23, 2019

Enable Running Standalone Components #71

Merged

imWildCat added the 1.0-deprecated label Apr 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaving the program open for days--continues to consume memory to exhaustion. #40

Leaving the program open for days--continues to consume memory to exhaustion. #40

brizzbane commented Jun 26, 2018

imWildCat commented Aug 1, 2018

brizzbane commented Sep 1, 2018 •

edited

shawwwn commented Mar 18, 2019

shawwwn commented Mar 22, 2019

shawwwn commented Mar 22, 2019

iburakov commented Mar 24, 2019

Leaving the program open for days--continues to consume memory to exhaustion. #40

Leaving the program open for days--continues to consume memory to exhaustion. #40

Comments

brizzbane commented Jun 26, 2018

imWildCat commented Aug 1, 2018

brizzbane commented Sep 1, 2018 • edited

shawwwn commented Mar 18, 2019

shawwwn commented Mar 22, 2019

shawwwn commented Mar 22, 2019

iburakov commented Mar 24, 2019

brizzbane commented Sep 1, 2018 •

edited