-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitor celery-worker memory usage in Prod #3840
Comments
Looks like the celery worker tried to load AO 2012-20 and then exited before the reload completed: https://logs.fr.cloud.gov/goto/4228f8f66a575f1bd3cc582360d11fb2
In the short term, I'm going to try to manually reload AO's in |
It looks like started getting this error (mostly in
https://logs.fr.cloud.gov/goto/52250b0d73a83b390c7ff04edde4ed4c This seems to indicate the reload task was terminated by another process. |
There are some recent ‘billiard’ bug fixes that might address this issue. I pushed a manual deploy of #3842 to dev to see how the reloads and downloads work. This might be hard to test because the issue isn’t easily replicated. We might need to update versions and check once the deploy goes live. |
Daily reload of AOs failed at the same records on Wednesday night as well
|
CPU and memory usage in celery-worker (we should increase to 2G and re-run AO refresh)
|
Can monitor memory usage by targeting |
name: celery-worker type: web
|
name: celery-worker
|
Here's a Kibana query that tracks celery-worker memory usage over time. We want to look at memory usage before and after 7/9/19. https://logs.fr.cloud.gov/goto/6429fcbc24388c72b6c4094ff5b2377d |
We don't have any other symptoms of celery-worker running out of memory, so I'm closing this issue and we can investigate if the refreshes fail again. |
Not received Slack #bot message about successful reload all AOs messages from production environments for the past two days. (6/24 & 6/25/19). But we are receiving reload of AOs completion messages from DEV and STAGE environments.
By looking and Kibana logs it appears to be AO reload jobs are starting in DEV/STAGE/PROD environments but for some reason AO reload job is not completing in PROD.
Links to Kibana logs:
https://logs.fr.cloud.gov/goto/a0518e7dcc8367cc11cb23efcc255989
https://logs.fr.cloud.gov/goto/db1e2660c6fafdda0c0547b59bb35a99
The text was updated successfully, but these errors were encountered: