Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix munin zeros #2103

Merged
merged 6 commits into from Apr 12, 2019
Merged

Fix munin zeros #2103

merged 6 commits into from Apr 12, 2019

Conversation

noirbizarre
Copy link
Contributor

This PR ensures that the udata worker status --munin always output tasks count even if count is zero.
Otherwise Munin sees it as "no data" (ie. vertical splitter in graph) and display count as -nan instead of 0. It was also making Munin diagnosis difficult when a queue is empty.

This also ensure only tasks for monitored queue are listed in udata worker status --munin-config reducing the number of tasks by graph (extras tasks were always displayed as -nan. As a side-effect, caching tasks is not required anymore as it does not hit all the Celery workers anymore to list them.

A udata worker tasks command has been added to display all tasks and their queue/priority.

Some test jobs have been renamed to ensure they all have the same test- prefix and they are filtered out of munin commands.

The documentation link to the munin plugin has been updated.

@noirbizarre noirbizarre added this to the 1.6.7 milestone Apr 11, 2019
@noirbizarre noirbizarre requested a review from a team April 11, 2019 09:45
Copy link
Contributor

@abulte abulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not hit all the Celery workers anymore to list them.

I'm not sure I understand why 🤔 and thus why we don't need caching anymore.

CHANGELOG.md Outdated Show resolved Hide resolved
@noirbizarre
Copy link
Contributor Author

noirbizarre commented Apr 12, 2019

As stated here, we were using cache because we were using Celery inspect() which is querying the state of every worker (the w here).
It was working with the following limitations:

  • the more we have workers, the more inspect() takes time (ie. query duration * number of workers)
  • it only works when workers are started and for every queues
  • it only works if the works if the workers are unfiltered (ie. when using one of --queues/-Q, --exclude-queues/-X, --include/-I)
  • it only works if workers are reachables

If has been replaced by celery.tasks which doesn't query anything, these are only the tasks registered to celery in the code and so it doesn't depend on the workers at all (no network roundtrip, not state based)

But it's how I understand this comment. If cache is still required, I can put it back.

@abulte
Copy link
Contributor

abulte commented Apr 12, 2019

@noirbizarre Ok thanks! Looks good.

@noirbizarre noirbizarre merged commit 9b54900 into opendatateam:master Apr 12, 2019
@noirbizarre noirbizarre deleted the fix-munin-zeros branch April 12, 2019 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants