-
Notifications
You must be signed in to change notification settings - Fork 235
chore(connections): disconnect when we encounter a non-retryable error code on an atlas connection CLOUDP-286331 #6598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…r code on an atlas connection
packages/compass-connections/src/stores/connections-store-redux.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there another PR on the MMS side you're currently working on?
packages/compass-connections/src/stores/connections-store-redux.ts
Outdated
Show resolved
Hide resolved
I think you definitely should be able to add unit tests in the compass-connections for this by emitting the event manually on a mocked dataService, we have all the tooling to setup a test like that and at least it would cover the error parsing logic. Should be possible to add some e2e tests too if we want to by modifying our ws-proxy code, we can chat more about how to set that up |
No, this is the only branch I've been working on. Is there work you foresee us needing on the mms side? I see there's already support for the ping/pongs checking the rolls and throwing these errors https://github.com/10gen/mms/blob/de2a9c463cfe530efb8e2a0941033e8207b6cb11/server/src/main/com/xgen/cloud/services/clusterconnection/runtime/res/ClusterConnectionEndpoint.java#L521 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one comment, everything else looks good
Chatted a bit with Sergey in slack, not going to go for an e2e for this at least for now. I'm going to wait on merging until I can reliably get this working with our main branch of mms. It might take some backend changes, having trouble getting these expected heartbeat failures at the moment (I developed this by manually throwing them from ccs), it may go away once we merge https://github.com/10gen/mms/pull/116288 |
Tried this out with the pr Simon has open in mms and it worked nicely. |
CLOUDP-286331
When we're on cloud we listen for non-retry-able errors on failed server heartbeats. These can happen when:
When we encounter one we disconnect. This is to avoid polluting logs/metrics and to avoid constantly retrying to connect when we know it'll fail.
When a user runs a command after we've disconnected they end up with errors like this:
Which we surface ourselves. While they do have a message from the toast I'm thinking we probably want to give something less cryptic there. A bit more context, we also chatted about fully disconnecting from the connection, which would close all of their open tabs, but that may end up in them losing some work which would be a frustrating ux. Discussion: https://mongodb.slack.com/archives/C069YM25L8N/p1735662529296989
Not an easily testable flow as these are mostly internals of data service and mongo client, curious if folks think it's worth some mocks to facilitate tests for it. We'd need to simulate the
serverHeartbeatFailed
events on a mongo client and have the custom mongo client passed to the data service.Custom close codes in mms: https://github.com/10gen/mms/blob/24f5af9c5318a5da746ad328547591f376449dc0/server/src/main/com/xgen/cloud/services/clusterconnection/runtime/res/CustomCloseCodes.java#L5
Where they are returned on pings which form the
serverHeartbeatFailed
events: https://github.com/10gen/mms/blob/24f5af9c5318a5da746ad328547591f376449dc0/server/src/main/com/xgen/cloud/services/clusterconnection/runtime/res/ClusterConnectionEndpoint.java#L92