Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jupyter] Add UNKNOWN cluster status state #9719

Merged
merged 3 commits into from Feb 8, 2024
Merged

Conversation

bbonenfant
Copy link
Contributor

Adding this state makes the GET /config endpoint (at least our code) no longer able to error. This makes it a suitable replacement for, and improvement from, the GET /health endpoint.


try:
except tornado.web.HTTPError:
payload = {"cluster_status": self.CLUSTER_UNKNOWN, "pachd_address": ""}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this endpoint is going to function as a /health should we add a status or details section to the response so that we can provide additional context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that would be a good idea for now. i apologize in advance for going back and forth on this but thinking about it some more i almost wonder if it was a mistake polling the config endpoint at all from the frontend. to me it seems as though in the mount-server model we had in practice 2 different health checks handled on two different endpoints i.e. the /health endpoint for monitoring mount-server status and /config for monitoring the connectivity to pachyderm. if i were to rewrite it from scratch, i think i would have combined the cluster statuses into the /health endpoint along with the mount-server status and had the actual config i.e. the configured address and ca certs served separately in /config.

there are two places where we use the /config endpoint in the front end. one is in pollMounts where we more or less use it as a health check and the other is in useConfig where we update the config. if i were to re-do the /config and /health endpoints today, i think i would segregate cluster_status from the rest of config. GET /health would respond with the cluster status and PUT /config would update the config and respond 200 on success, 400/500 if we encounter errors (in which case maybe we can revert to the old config without updating the client). GET /config would just respond with the configured address and ca_certs, we'd probably only need to use it in the update config view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree that we should revisit the polling. Now that we've compartmentalized the app a bit better, I think we can make some simplifications. I'm hoping the process of cleaning up the Explore view will bring some of this to light. Do you want to hold off on this change for now and come back to it, or just move forward and do another pass later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can go ahead and merge this for now, it is an improvement over what we already have. i'll file a ticket for revisiting how we poll the backend. thanks!

@bbonenfant bbonenfant marked this pull request as ready for review February 8, 2024 20:40
Copy link
Contributor

@kevyang kevyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for now, but i think the way we health check/config should be revamped in the long term. as it stands now, it's functional but kind of confusing


try:
except tornado.web.HTTPError:
payload = {"cluster_status": self.CLUSTER_UNKNOWN, "pachd_address": ""}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that would be a good idea for now. i apologize in advance for going back and forth on this but thinking about it some more i almost wonder if it was a mistake polling the config endpoint at all from the frontend. to me it seems as though in the mount-server model we had in practice 2 different health checks handled on two different endpoints i.e. the /health endpoint for monitoring mount-server status and /config for monitoring the connectivity to pachyderm. if i were to rewrite it from scratch, i think i would have combined the cluster statuses into the /health endpoint along with the mount-server status and had the actual config i.e. the configured address and ca certs served separately in /config.

there are two places where we use the /config endpoint in the front end. one is in pollMounts where we more or less use it as a health check and the other is in useConfig where we update the config. if i were to re-do the /config and /health endpoints today, i think i would segregate cluster_status from the rest of config. GET /health would respond with the cluster status and PUT /config would update the config and respond 200 on success, 400/500 if we encounter errors (in which case maybe we can revert to the old config without updating the client). GET /config would just respond with the configured address and ca_certs, we'd probably only need to use it in the update config view.

@bbonenfant bbonenfant merged commit 1e26724 into master Feb 8, 2024
12 checks passed
bbonenfant added a commit that referenced this pull request Feb 22, 2024
Adding this state makes the `GET /config` endpoint (at least our code)
no longer able to error. This makes it a suitable replacement for, and
improvement from, the `GET /health` endpoint.
bbonenfant added a commit that referenced this pull request Feb 23, 2024
- Lumino tabs (#9692)
- Uninstantiated python client bugfix (#9705)
- Prevent bad config from being stored (#9711)
- Remove dev_server (#9713)
- UNKNOWN cluster status state (#9719)
- Prevent error popups bugfix (#9721)
- Infinite scroll in file explorer (#9715)
- Fix local cypress tests (#9728)
- Infinite scroll w/ scrollbar (#9746)
- Clear mounted datums before new datums (#9716)
- Make input spec non-resizeable (#9720)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants