Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use psutil to check memory #271

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

[WIP] Use psutil to check memory #271

wants to merge 1 commit into from

Conversation

chekunkov
Copy link
Contributor

Splash uses resource.getrusage(resource.RUSAGE_SELF).ru_maxrss which have several downsides and I believe is not very suitable for this task:

  1. This was discussed in Use psutil if available to measure RSS on memusage extension聽scrapy/scrapy#1329. @dangra said:

This is particulary problematic when running spiders inside docker containers because docker daemon can take up to GBs of RAM and its max_rss is propagated to init process of the docker container.

Looks like it can be an issue for Splash instance as well

  1. Splash itself is a kind of application which allow peaks of memory usage, but after that peak memory usage goes down usually, but max_rss never goes down, so Splash instance will be shut down after one such memory usage peak despite current memory usage might be really low - which leads to server restart, makes service unavailable, clears all cache and forces all pages to be downloaded once again which is bad in terms of performance.

An article about difference between max_rss and psutils can be found [here].(http://fa.bianp.net/blog/2013/different-ways-to-get-memory-consumption-or-lessons-learned-from-memory_profiler/)

Splash already depends on psutil, so it's not a problem to use it for a memory check.

First commit is intended to start discussion, it's not the code I want to merge right away. First of all because this change is to some extent backwards incompatible, and probably we can have another Splash option to enable psutil memory check and keep maxrss option for compatibility, maybe remove it later. But there's one more thing about how psutil measures memory. In general it would allow spikes of memory usage and would work okay, util memory check will be launched during some memory intensive process. In this case psutil behaviour would be the same as max_rss, but this PR rather aims to find better solution for memory check than simply replace max_rss with psutil. So instead of simply measuring memory usage with psutil I wanted to propose using moving average as a metric for memory measurement. We can collect 3-5 points of memory usage and compare to the limit. This will allow to remove some random fluctuations and shut down instance only in case when memory usage in average is higher than some give watermark.

Disclaimer: proposal about moving average came to my mind after committing first commit which simply replaces max_rss with psutil, but I hope the idea is quite clear 馃槂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant