Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPEC-1555 Consider connection pool health during server selection #876
SPEC-1555 Consider connection pool health during server selection #876
Changes from 6 commits
5c71cda
a0a2f8d
e8ef656
2a824d0
ecdd538
1c4c4ad
92e9f3d
488fa8b
4e6a6f3
6aa42d0
c833255
3236ee5
1945eb9
8df3608
0caf78a
74c6299
960131a
73be550
e88135f
59772b8
af823a6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be
<=
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thank you for catching this! This would've been a pretty bad error to leave in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the "algorithm" is much simpler, is it still worth it to have all drivers implement these unit tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still worthwhile. I very much prefer these tests to no tests. That said, do you have anything in mind for real end-to-end tests for this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're limited a bit by the topology. We could include some of those experiments I did as prose tests--for example, describe a test against a two mongos sharded topology where one of the mongoses has a failpoint that makes every operation take 500ms, then do a ton of concurrent stuff and then assert that the non-failpoint node got picked a lot more. While this does verify the operationCount behavior, I'm not sure whether this is preferable to having a bunch of unit spec tests, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please open a jira ticket so we can further discuss adding that prose test. I'm in favor of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I just go ahead and add it now? I think it'll be most useful if drivers can have the test case when they're implementing this for the first time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's just add it now. I doubt the unified test format is expressive enough for a test like this so it'll need to be a prose test like you said. You can use the blockConnection option for failCommand like this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. When you run it, can you let me know what you get for % of operations routed to the slow node? Curious to see if theres any differences across drivers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented. I ran it a few times and see:
Exciting to see this feature in action!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get around 12-15% too, so it seems like our implementations are consistent, nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test should also configure the read preference used for server selection. I suggest adding a
read_preference
field to be consistent with the existing tests.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we discussed this in the meeting, but since this test is verifying what happens after the set of available servers is determined and the topologies will always be sharded, a read preference shouldn't be necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm maybe my test runner implementation is different. In mine the read preference does actually have an impact. Right now I'm arbitrarily using Nearest. My implementation is roughly:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, yeah that makes sense. In my implementation I skipped right to the in window portion, so I didn't need a TopologyDescription or a read preference. I imagine most drivers will probably implement your way though.
Rather than including a read preference in each spec test, I just updated the runner's description to say use a default or "primary" read preference since it shouldn't matter for sharded topologies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still feels a little odd that we have no explicit tests for replica sets. Perhaps the prose test can cover it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server selection logic tests already ensure that drivers properly determine what is in the latency window across different topology types, and these tests ensure that a server is selected properly from what's within the latency window. Putting those together should provide us full coverage, so I don't think it's necessary to test all topologies in these tests.
Given that selecting from within the window is topology agnostic, the topology is provided in these tests purely as a convenience for implementing the test runners; it doesn't really have any bearing on the results of the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand It's not unreasonable for a driver to have two code paths for server selection and adding one test for a replica set is trivial, so why not add a single replica set test? The test format supports it as long as we say to use Nearest read preference instead of Primary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main hesitation for adding it was that if we're treating the provided topology as more than just a convenience (i.e. as part of the tests' coverage), then drivers would be required to use it to implement the tests, even though it isn't necessary since the feature we're testing should ideally be topology agnostic. That said, if we think there could be separate implementations for selection based on topology, then it's probably worth having full coverage of that. Updated to include a few replset tests.