-
Notifications
You must be signed in to change notification settings - Fork 972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding health checks accessible via API #9180
Comments
to expand, these are the additional routes added:
and sample output from my dev instance looks like this:
note that the above is for the |
@luke-c-sargent Really like the idea of basing the schema off the IETF draft and great to see the PR with the initial implementation too. I think this issue is related to several others, so I'm hoping that we can get more input about this as well as make the design compatible with those other issues too. The issues are, namely.
|
Included in galaxyproject/galaxy-helm#127 Is there any plan to merge this back upstream into Galaxy? I suspect it would be good to have something like this in the code and apologies if the PR was abandoned because of committer inaction 😢. I told myself several times I was going to review the PR and never did. |
hello,
tl;dr: i want to add a health check API endpoint; what is the best way to determine if web/job handlers are performing as desired, and how verbose should this output be?
i would like to add health check functionality to Galaxy for web and job handlers, and was looking to implement something similar to what is described in this IETF proposal draft, whose schema could look like this:
roughly, we get an overall status, and further details are provided in the 'checks' field, which contains a list of responses from individual systems. these checks can be more (or less) granular as desired.
this effort was started for kubernetes reasons (readiness / liveness probes) but could be broadly useful to sysadmins or extra-nerdy users curious about server status.
my naive approach has been to check the
app
member of thetrans
object:application_stack.workers()
, iterate through, accept idle or busy status.job_manager.job_handler.dispatcher.job_runners
, ensure there are workers associated with it (nworkers
> 0)i am sure there are many caveats and oversights in the above (e.g., many runner/handler types), so i would love to hear from my learned colleagues. thanks for reading!
The text was updated successfully, but these errors were encountered: