[Jupyter] Add UNKNOWN cluster status state #9719

bbonenfant · 2024-02-08T20:07:12Z

Adding this state makes the GET /config endpoint (at least our code) no longer able to error. This makes it a suitable replacement for, and improvement from, the GET /health endpoint.

bbonenfant · 2024-02-08T20:08:37Z

jupyter-extension/jupyterlab_pachyderm/handlers.py

-
-        try:
+        except tornado.web.HTTPError:
+            payload = {"cluster_status": self.CLUSTER_UNKNOWN, "pachd_address": ""}


Now that this endpoint is going to function as a /health should we add a status or details section to the response so that we can provide additional context?

i think that would be a good idea for now. i apologize in advance for going back and forth on this but thinking about it some more i almost wonder if it was a mistake polling the config endpoint at all from the frontend. to me it seems as though in the mount-server model we had in practice 2 different health checks handled on two different endpoints i.e. the /health endpoint for monitoring mount-server status and /config for monitoring the connectivity to pachyderm. if i were to rewrite it from scratch, i think i would have combined the cluster statuses into the /health endpoint along with the mount-server status and had the actual config i.e. the configured address and ca certs served separately in /config.

there are two places where we use the /config endpoint in the front end. one is in pollMounts where we more or less use it as a health check and the other is in useConfig where we update the config. if i were to re-do the /config and /health endpoints today, i think i would segregate cluster_status from the rest of config. GET /health would respond with the cluster status and PUT /config would update the config and respond 200 on success, 400/500 if we encounter errors (in which case maybe we can revert to the old config without updating the client). GET /config would just respond with the configured address and ca_certs, we'd probably only need to use it in the update config view.

I definitely agree that we should revisit the polling. Now that we've compartmentalized the app a bit better, I think we can make some simplifications. I'm hoping the process of cleaning up the Explore view will bring some of this to light. Do you want to hold off on this change for now and come back to it, or just move forward and do another pass later?

i think you can go ahead and merge this for now, it is an improvement over what we already have. i'll file a ticket for revisiting how we poll the backend. thanks!

kevyang

lgtm for now, but i think the way we health check/config should be revamped in the long term. as it stands now, it's functional but kind of confusing

kevyang · 2024-02-08T21:20:07Z

jupyter-extension/jupyterlab_pachyderm/handlers.py

-
-        try:
+        except tornado.web.HTTPError:
+            payload = {"cluster_status": self.CLUSTER_UNKNOWN, "pachd_address": ""}


i think that would be a good idea for now. i apologize in advance for going back and forth on this but thinking about it some more i almost wonder if it was a mistake polling the config endpoint at all from the frontend. to me it seems as though in the mount-server model we had in practice 2 different health checks handled on two different endpoints i.e. the /health endpoint for monitoring mount-server status and /config for monitoring the connectivity to pachyderm. if i were to rewrite it from scratch, i think i would have combined the cluster statuses into the /health endpoint along with the mount-server status and had the actual config i.e. the configured address and ca certs served separately in /config.

there are two places where we use the /config endpoint in the front end. one is in pollMounts where we more or less use it as a health check and the other is in useConfig where we update the config. if i were to re-do the /config and /health endpoints today, i think i would segregate cluster_status from the rest of config. GET /health would respond with the cluster status and PUT /config would update the config and respond 200 on success, 400/500 if we encounter errors (in which case maybe we can revert to the old config without updating the client). GET /config would just respond with the configured address and ca_certs, we'd probably only need to use it in the update config view.

Adding this state makes the `GET /config` endpoint (at least our code) no longer able to error. This makes it a suitable replacement for, and improvement from, the `GET /health` endpoint.

- Lumino tabs (#9692) - Uninstantiated python client bugfix (#9705) - Prevent bad config from being stored (#9711) - Remove dev_server (#9713) - UNKNOWN cluster status state (#9719) - Prevent error popups bugfix (#9721) - Infinite scroll in file explorer (#9715) - Fix local cypress tests (#9728) - Infinite scroll w/ scrollbar (#9746) - Clear mounted datums before new datums (#9716) - Make input spec non-resizeable (#9720)

bbonenfant added 2 commits February 8, 2024 12:03

Add new UNKNOWN cluster status and plumb through to FE

1893081

remove health handler

59cd923

bbonenfant commented Feb 8, 2024

View reviewed changes

update config test

28ce755

bbonenfant requested a review from kevyang February 8, 2024 20:40

bbonenfant marked this pull request as ready for review February 8, 2024 20:40

kevyang approved these changes Feb 8, 2024

View reviewed changes

bbonenfant merged commit 1e26724 into master Feb 8, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Jupyter] Add UNKNOWN cluster status state #9719

[Jupyter] Add UNKNOWN cluster status state #9719

bbonenfant commented Feb 8, 2024

bbonenfant Feb 8, 2024

kevyang Feb 8, 2024

bbonenfant Feb 8, 2024

kevyang Feb 8, 2024

kevyang left a comment

kevyang Feb 8, 2024

[Jupyter] Add UNKNOWN cluster status state #9719

[Jupyter] Add UNKNOWN cluster status state #9719

Conversation

bbonenfant commented Feb 8, 2024

bbonenfant Feb 8, 2024

Choose a reason for hiding this comment

kevyang Feb 8, 2024

Choose a reason for hiding this comment

bbonenfant Feb 8, 2024

Choose a reason for hiding this comment

kevyang Feb 8, 2024

Choose a reason for hiding this comment

kevyang left a comment

Choose a reason for hiding this comment

kevyang Feb 8, 2024

Choose a reason for hiding this comment