Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRIVERS-2035 Use minimum RTT for CSOT maxTimeMS calculation instead of 90th percentile #1350

Merged
merged 6 commits into from Feb 16, 2023

Conversation

ShaneHarvey
Copy link
Member

@ShaneHarvey ShaneHarvey commented Nov 18, 2022

https://jira.mongodb.org/browse/DRIVERS-2035

Please complete the following before merging:

Copy link
Contributor

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these changes!

A previous version of this spec used the 90th percentile RTT to short
circuit operations that might otherwise fail with a socket timeout.
We decided to change this logic to avoid canceling operations that may
have a high chance of succeeding and also removes a dependency on t-digest.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
have a high chance of succeeding and also removes a dependency on t-digest.
have a high chance of succeeding; this change also removes a dependency on t-digest.

@@ -39,7 +39,7 @@ tests:
failCommands: ["hello", "isMaster"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have yet to sync these tests in the Go driver as we do not use the short-circuiting logic until we have at least 10 RTT samples. The tests do not allow enough time (~100s) to reach this sample count.

@matthewdale do you know why we have this 10 sample minimum in the first place? Will minimum RTT be significantly different before the first 10 samples than after? I imagine the answer is highly dependent on the network context, but is the potential downside of not waiting for 10 samples worth the complexity of implementation/testing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 samples gives a 100 second window by default which seems fine. I also do not agree with using 0 when < 10 samples. I think we should use the rtt even with only 1 sample. Otherwise, the rtt+maxTimeMS logic will be disabled for the first 100 seconds after a server is (re)discovered. That will be difficult to test and confusing to reason about (both internally and externally).

In the end any heuristic we choose here is somewhat arbitrary. We should avoid adding complexity to these heuristics and keep them as simple as possible. That's why I'm suggesting a fixed window that's not based on time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 10 sample minimum is an attempt to reduce the probability that early RTT outliers cause unnecessary operation skipping. Keep in mind that the "min RTT" logic currently in the Go driver was not added as part of CSOT, but was an optimization added with Go driver connection pool improvements. I decided to be conservative with the implementation because it was a novel and untested concept at the time and impacts all Go driver users.

We haven't encountered any known issues with that "min RTT" implementation, so it makes sense to be less conservative with the opt-in CSOT behavior. However, trusting the first RTT sample on startup seems dangerous and may lead to to increased operation errors immediately after connecting, until more RTT samples are collected. I recommend we collect at least 2 samples before assuming it's an accurate representation of the actual RTT.

I don't have any stats on the impact of requiring 1 vs 2 RTT samples, only the observation that networks are unreliable and that any single RTT sample may have a nontrivial probability of being bogus. That may be especially true while an application is starting up.

As far as using a fixed number of samples (e.g. 10), that seems totally reasonable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the spec to use a min of 2 and max of 10 samples. I also updated the tests to wait (heuristically) for >=2 samples.

@@ -812,24 +810,25 @@ on a dedicated connection, for example:
helloOk = stableApi != Null
lock = Mutex()
movingAverage = MovingAverage()
rttDigest = TDigest() # for 90th percentile RTT calculation
rttMin = MinWindow(max_window_size=10) # for min RTT calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, I think we should document what MinWindow does. Based on context, I'm assuming it takes the minimum RTT value of at most the past max_window_size samples. That seems to be an ok definition to me, but see my other comment about calculating min when we have less than 10 samples available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The peusdo-python is now explicit about the behavior using a deque: https://docs.python.org/3/library/collections.html#deque-objects

@ShaneHarvey ShaneHarvey marked this pull request as ready for review February 9, 2023 21:24
@ShaneHarvey ShaneHarvey requested review from a team as code owners February 9, 2023 21:24
@ShaneHarvey ShaneHarvey requested review from nbbeeken and matthewdale and removed request for a team February 9, 2023 21:24
@ShaneHarvey
Copy link
Member Author

@matthewdale this is ready for another look. I've updated the spec to require at least 2 RTT samples before enabling the short-circuit logic and updated the tests to account for this.

Copy link
Contributor

@matthewdale matthewdale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, but otherwise looks good 👍

clients MUST use average and 90th percentile round trip times from the RTT
task.
clients MUST set the average and minimum round trip times from the RTT task as the
"roundTripTime" and "minRoundTripTime" fields, respectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requirement has actually lead to significant confusion when implementing features in the Go driver that depend on up-to-date avg/min RTT values. Using values for average and minimum RTT pulled from server descriptions can lead to difficult-to-diagnose bugs unless devs realize that the RTT values on server descriptions are never updated after handshake. The scope for Go Driver 2.0 currently includes removing all values from server descriptions that are not derived directly from the connection/handshake process. See GODRIVER-2691 for more details.

Should we remove this requirement and the corresponding roundTripTime and minRoundTripTime fields from a ServerDescription?

P.S. I realize this is only updating the existing requirement, but this could be a good time to reconsider the need for these fields. I created DRIVERS-2552 to continue this discussion if this isn't a good forum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree because even fields that are reported by the server in the hello command response can become stale. For example, hosts, isWritablePrimary, $clusterTime, operationTime, etc.. In fact, almost every single field can become stale right after the server sends it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants