Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix munin zeros #2103

Merged
merged 6 commits into from Apr 12, 2019

Conversation

@noirbizarre
Copy link
Member

commented Apr 11, 2019

This PR ensures that the udata worker status --munin always output tasks count even if count is zero.
Otherwise Munin sees it as "no data" (ie. vertical splitter in graph) and display count as -nan instead of 0. It was also making Munin diagnosis difficult when a queue is empty.

This also ensure only tasks for monitored queue are listed in udata worker status --munin-config reducing the number of tasks by graph (extras tasks were always displayed as -nan. As a side-effect, caching tasks is not required anymore as it does not hit all the Celery workers anymore to list them.

A udata worker tasks command has been added to display all tasks and their queue/priority.

Some test jobs have been renamed to ensure they all have the same test- prefix and they are filtered out of munin commands.

The documentation link to the munin plugin has been updated.

@noirbizarre noirbizarre added this to the 1.6.7 milestone Apr 11, 2019
@noirbizarre noirbizarre requested a review from opendatateam/etalab Apr 11, 2019
@noirbizarre noirbizarre force-pushed the noirbizarre:fix-munin-zeros branch 4 times, most recently from 438a0f8 to 5e2be2f Apr 11, 2019
noirbizarre added 5 commits Apr 11, 2019
@noirbizarre noirbizarre force-pushed the noirbizarre:fix-munin-zeros branch from 5e2be2f to 5b573d5 Apr 11, 2019
@abulte
abulte approved these changes Apr 12, 2019
Copy link
Member

left a comment

it does not hit all the Celery workers anymore to list them.

I'm not sure I understand why 🤔 and thus why we don't need caching anymore.

CHANGELOG.md Outdated Show resolved Hide resolved
@noirbizarre

This comment has been minimized.

Copy link
Member Author

commented Apr 12, 2019

As stated here, we were using cache because we were using Celery inspect() which is querying the state of every worker (the w here).
It was working with the following limitations:

  • the more we have workers, the more inspect() takes time (ie. query duration * number of workers)
  • it only works when workers are started and for every queues
  • it only works if the works if the workers are unfiltered (ie. when using one of --queues/-Q, --exclude-queues/-X, --include/-I)
  • it only works if workers are reachables

If has been replaced by celery.tasks which doesn't query anything, these are only the tasks registered to celery in the code and so it doesn't depend on the workers at all (no network roundtrip, not state based)

But it's how I understand this comment. If cache is still required, I can put it back.

@noirbizarre noirbizarre force-pushed the noirbizarre:fix-munin-zeros branch from 5b573d5 to 8b0c6b1 Apr 12, 2019
@abulte

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

@noirbizarre Ok thanks! Looks good.

@noirbizarre noirbizarre merged commit 9b54900 into opendatateam:master Apr 12, 2019
3 checks passed
3 checks passed
ci/circleci: assets Your tests passed on CircleCI!
Details
ci/circleci: dist Your tests passed on CircleCI!
Details
ci/circleci: python Your tests passed on CircleCI!
Details
@noirbizarre noirbizarre deleted the noirbizarre:fix-munin-zeros branch Apr 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.