Skip to content

Add logging functionality to memusage extension #5717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
GeorgeA92 opened this issue Nov 18, 2022 · 3 comments · Fixed by #5722
Closed

Add logging functionality to memusage extension #5717

GeorgeA92 opened this issue Nov 18, 2022 · 3 comments · Fixed by #5722

Comments

@GeorgeA92
Copy link
Contributor

Summary

To add logging functionality to memusage extension.

Motivation

Scrapy jobs with MEMUSAGE_ENABLED : True and defined MEMUSAGE_LIMIT_MB (all jobs on scrapy cloud) can be stopped early due to overuse of RAM memory and receive memusage_exceeded outcome.

First thing required to debug RAM memory leaks - is to identify.. pattern of RAM memory usage.
Is RAM usage continuously increased at higher rates during runtime?
or Is RAM usage rapidly increased over limit in last several minutes after hours or even days of stable runtime performance?
Each reason require different approaches to debug RAM memory leaks.

It will be much easier to debug this if value of self.get_virtual_size() will be added to log in _check_limit method of memusage extension

def _check_limit(self):
if self.get_virtual_size() > self.limit:
self.crawler.stats.set_value('memusage/limit_reached', 1)
mem = self.limit / 1024 / 1024
logger.error("Memory usage exceeded %(memusage)dM. Shutting down Scrapy...",
{'memusage': mem}, extra={'crawler': self.crawler})
if self.notify_mails:
subj = (
f"{self.crawler.settings['BOT_NAME']} terminated: "
f"memory usage exceeded {mem}M at {socket.gethostname()}"
)
self._send_report(self.notify_mails, subj)
self.crawler.stats.set_value('memusage/limit_notified', 1)

Describe alternatives you've considered

Applying MEMUSAGE_WARNING_MB setting to ~80-90% of MEMUSAGE_LIMIT_MB - current implementation of memusage extension warns only 1 time so it is not enough data for this.

Manually subclass memusage extension with similar changes - as well as any other option it will require to reschedule job. It may be not siutable for jobs with several days(and more) total runtime. So from this side it is preferable that it will be applied in scrapy itself and with enabled this loggin by default.

Additional context

Similar functionality previously requested here #2173

@JazzGlobal
Copy link
Contributor

I have a patch ready to address this but don't have permission to push my branch. Am I doing something wrong?

@elacuesta
Copy link
Member

@JazzGlobal I suppose you're trying to push to the main Scrapy repository, you should instead fork the repo and create a pull request from that fork.

@JazzGlobal
Copy link
Contributor

@JazzGlobal I suppose you're trying to push to the main Scrapy repository, you should instead fork the repo and create a pull request from that fork.

Ah, I sure was. Thank you for the prompt and helpful response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants