-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI: ability to check health of all projects for support users #3725
CLI: ability to check health of all projects for support users #3725
Conversation
…heck-health-of-all-projects-for-support-users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this PR relies on the projects.status
and projects.status_message
fields to determine health, which is not really useful since these are only used for provisioning status, not reconcile status.
Reconcile status and errors are reported at a resource-level not project-level (e.g. one source may have ingested successfully while another failed), and need to be fetched from the runtime of each deployment separately by calling ListResources
. See the implementation of rill project status
here.
I think it would make more sense to implement this "health" check as follows:
- Do a search for projects using the same API as
rill sudo project search
. Maybe even make it part of that CLI command and toggle it with arill sudo project search --status
flag. - Concurrently, request runtime details + JWT for each of the returned projects, and call
ListResources
for it. - Output a table containing something like the following columns for each project:
- Count of idle resources
- Count of idle resources with errors
- Count of pending/running resources
- Count of parser errors
- Bool value for whether the health check is responsive or not (depends on https://github.com/rilldata/rill-private-issues/issues/31)
- Bool value for provisioning error (i.e. if the project's status is not
DEPLOYMENT_STATUS_OK
)
Regarding the initial issue for this PR, I would propose limiting the initial PR to the search capabilities of rill sudo project search
, i.e. org and project search (not user, domain, or status type search – these are trickier to resolve correctly and potentially less useful).
I would also propose completing this issue first: #3648. Combined with the above suggestions, it would make it easy to get the status of all projects with an SLA attached.
Agreed with this. I think the main utility of this command will be for support members to quickly check whether a triggered alert is customer impacting and in production, which warrants higher SLAs and elevated urgency / severity from our side to address. I'm not sure if it was implied in this response, while I do agree it makes sense to limit scope on the search capabilities (especially if the expanded syntax proves tricky / non-trivial to implement), we should at least be able to pass in the |
No description provided.