New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Jupyter] Add UNKNOWN cluster status state #9719
Conversation
|
||
try: | ||
except tornado.web.HTTPError: | ||
payload = {"cluster_status": self.CLUSTER_UNKNOWN, "pachd_address": ""} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this endpoint is going to function as a /health
should we add a status
or details
section to the response so that we can provide additional context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that would be a good idea for now. i apologize in advance for going back and forth on this but thinking about it some more i almost wonder if it was a mistake polling the config endpoint at all from the frontend. to me it seems as though in the mount-server model we had in practice 2 different health checks handled on two different endpoints i.e. the /health
endpoint for monitoring mount-server status and /config
for monitoring the connectivity to pachyderm. if i were to rewrite it from scratch, i think i would have combined the cluster statuses into the /health
endpoint along with the mount-server status and had the actual config i.e. the configured address and ca certs served separately in /config
.
there are two places where we use the /config
endpoint in the front end. one is in pollMounts where we more or less use it as a health check and the other is in useConfig where we update the config. if i were to re-do the /config
and /health
endpoints today, i think i would segregate cluster_status
from the rest of config
. GET /health
would respond with the cluster status and PUT /config
would update the config and respond 200 on success, 400/500 if we encounter errors (in which case maybe we can revert to the old config without updating the client). GET /config
would just respond with the configured address and ca_certs, we'd probably only need to use it in the update config view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree that we should revisit the polling. Now that we've compartmentalized the app a bit better, I think we can make some simplifications. I'm hoping the process of cleaning up the Explore view will bring some of this to light. Do you want to hold off on this change for now and come back to it, or just move forward and do another pass later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you can go ahead and merge this for now, it is an improvement over what we already have. i'll file a ticket for revisiting how we poll the backend. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm for now, but i think the way we health check/config should be revamped in the long term. as it stands now, it's functional but kind of confusing
|
||
try: | ||
except tornado.web.HTTPError: | ||
payload = {"cluster_status": self.CLUSTER_UNKNOWN, "pachd_address": ""} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that would be a good idea for now. i apologize in advance for going back and forth on this but thinking about it some more i almost wonder if it was a mistake polling the config endpoint at all from the frontend. to me it seems as though in the mount-server model we had in practice 2 different health checks handled on two different endpoints i.e. the /health
endpoint for monitoring mount-server status and /config
for monitoring the connectivity to pachyderm. if i were to rewrite it from scratch, i think i would have combined the cluster statuses into the /health
endpoint along with the mount-server status and had the actual config i.e. the configured address and ca certs served separately in /config
.
there are two places where we use the /config
endpoint in the front end. one is in pollMounts where we more or less use it as a health check and the other is in useConfig where we update the config. if i were to re-do the /config
and /health
endpoints today, i think i would segregate cluster_status
from the rest of config
. GET /health
would respond with the cluster status and PUT /config
would update the config and respond 200 on success, 400/500 if we encounter errors (in which case maybe we can revert to the old config without updating the client). GET /config
would just respond with the configured address and ca_certs, we'd probably only need to use it in the update config view.
Adding this state makes the `GET /config` endpoint (at least our code) no longer able to error. This makes it a suitable replacement for, and improvement from, the `GET /health` endpoint.
- Lumino tabs (#9692) - Uninstantiated python client bugfix (#9705) - Prevent bad config from being stored (#9711) - Remove dev_server (#9713) - UNKNOWN cluster status state (#9719) - Prevent error popups bugfix (#9721) - Infinite scroll in file explorer (#9715) - Fix local cypress tests (#9728) - Infinite scroll w/ scrollbar (#9746) - Clear mounted datums before new datums (#9716) - Make input spec non-resizeable (#9720)
Adding this state makes the
GET /config
endpoint (at least our code) no longer able to error. This makes it a suitable replacement for, and improvement from, theGET /health
endpoint.