Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaving the program open for days--continues to consume memory to exhaustion. #40

Open
brizzbane opened this issue Jun 26, 2018 · 6 comments
Labels
1.0-deprecated help wanted Extra attention is needed

Comments

@brizzbane
Copy link

The program seems to continue to consume memory until it runs out. Ctrl+C and restarting clears it--and keeps in place proxies found...

@imWildCat imWildCat added the help wanted Extra attention is needed label Jun 29, 2018
@imWildCat
Copy link
Owner

If you do not follow the template, this issue can hardly be helpful.
https://github.com/imWildCat/scylla/issues/new?template=bug_report.md

@brizzbane
Copy link
Author

brizzbane commented Sep 1, 2018

Hey.

I apologize for not following the template.

Here it is:

Describe the bug

After running scylla uninterrupted for a few days (say 5, or 120 hours), the memory of the OS goes to 99%.
When running 'top', it shows that the process is python (scylla).

To Reproduce

Open scylla, and allow it to run uninterrupted. I usually ran into issues after a couple of days.

Also, this may/or may not be part of the issue: I would run a scraper that would use the scylla API every minute or so. I would also occasionally open up the browser to view the scraped proxies/countries.

Expected behavior
That the scylla process would not continue to consume memory to the the point of exhaustion, rendering the OS unusable for both scylla, and other processes.

Screenshots
N/A.

Desktop (please complete the following information):
Debian Stretch
Chrome
version = '1.1.4'

Smartphone (please complete the following information):
Doesn't apply.

Additional context
Really just curious if anyone does NOT experience this. If you have the program open for more than a week at a time, is the memory usage what is expected?

I found that I can ctrl+c/close out the program, and re-open, and the memory will completely clear up. The scraped/working proxies are left in place, and scylla will return to scraping (and verifying scraped proxies are still working) as expected. Memory usage does not continue to grow until in use for many days.

@imWildCat I hope this is a little better. Again, my apologies.

@brizzbane brizzbane reopened this Sep 1, 2018
@shawwwn
Copy link
Contributor

shawwwn commented Mar 18, 2019

I can confirm this issue. After leaving it to run for two days, linux oom killer was triggered. There seems to be a memory leak somewhere but I couldn't find it.

My syslog:

[56917.949294] Out of memory: Kill process 2344 (scylla) score 670 or sacrifice child
[56917.952297] Killed process 2461 (scylla) total-vm:281812kB, anon-rss:61084kB, file-rss:24kB

@shawwwn
Copy link
Contributor

shawwwn commented Mar 22, 2019

Found the culprit!

def validate_ips(validator_queue: Queue, validator_pool: ThreadPoolExecutor):
while True:
try:
proxy: ProxyIP = validator_queue.get()
validator_pool.submit(validate_proxy_ip, p=proxy)
except (KeyboardInterrupt, SystemExit):
break

OOM was not caused by memory leak, but resource exhaustion.
The exhaustion happens outside of python's vm, therefore can be very hard to trace.
I narrowed it down to the code above in which a ThreadPoolExecutor is used to assign jobs to worker threads.
Upon further investigation, ThreadPoolExecutor maintains a internal/native queue to store pending jobs.
If enqueue speed is somehow faster than dequeue speed to the internal queue, the queue will grow indefinitely.
Situations like this happens when the machine memory is scarce(e.g. running on a raspberry pi) thus can not spawn more workers to validate ip.

@shawwwn
Copy link
Contributor

shawwwn commented Mar 22, 2019

Fixes I could think of are:

  1. Slow down validation job submission according to number of worker threads.
  2. Implement a custom ThreadPoolExecutor w/e max internal queue size, reject or block when queue is full.
  3. Give user more control over when to fetch and validate ips.

What do you guys think?

@iburakov
Copy link

Observing similar behaviour too on Ubuntu 18.04.1 LTS

Processes sorted by VIRT in htop during OOM situation after running scylla for several days:
image

This happens regularly (peaks correspond to OOM, nothing heavy is running alongside on the server):
image

If scylla is turned off, no such peaks can be seen. Restarting the process, as mentioned by previous users, helps, indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0-deprecated help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants