-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GODRIVER-1898 SDAM error handling changes for LB mode #611
GODRIVER-1898 SDAM error handling changes for LB mode #611
Conversation
a2b058a
to
d7f0cb3
Compare
d7f0cb3
to
5018113
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Frankly there are some code changes here that really go over my head, but I think that's only due to my poor understanding of connection pooling. I'll probably ask some questions offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done, the thoughtful comments and description were really helpful. I have a few questions to check my own understanding, but code LGTM.
// Ignore the error if the server is behind a load balancer but the server ID is unknown. This indicates that the | ||
// error happened when dialing the connection or during the MongoDB handshake, so we don't know the server ID to use | ||
// for clearing the pool. | ||
if err == nil || s.cfg.loadBalanced && serverID == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, it should be impossible for serverID
to be primitive.NilObjectID
when in load balanced mode. In load balanced mode, any handshake that omits a serverID
in the reply will result in an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there's a distinction between nil
and primitive.NilObjectID
. The former is a nil
pointer and the latter is a non-nil pointer that contains 12 0
bytes. In LB mode, the former is possible when we get here if we encountered an error before the initial hello
succeeded. The latter is never possible because 12 0 bytes is not a valid ObjectId (at least not right now because the timestamp field would be non-zero).
// setGenerationNumber sets the connection's generation number if a callback has been provided to do so in connection | ||
// configuration. | ||
func (c *connection) setGenerationNumber() { | ||
if c.config.getGenerationFn != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would getGenerationFn
ever be nil
? The serverID => (generation, count)
map is used to determine generation regardless of whether we are in load balanced mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unit tests, it can be. I verified that removing the line in pool.go
that sets this function causes some of the new spec tests to fail though.
@@ -288,7 +294,7 @@ func (s *Server) ProcessHandshakeError(err error, startingGenerationNumber uint6 | |||
// the description.Server appropriately. The description should not have a TopologyVersion because the staleness | |||
// checking logic above has already determined that this description is not stale. | |||
s.updateDescription(description.NewServerFromError(s.address, wrappedConnErr, nil)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check, would updating the server description to unknown make it ineligible for server selection (and similarly in ProcessError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec actually prohibits us from marking the server Unknown to ensure that the LB is always selectable. As part of a previous PR in this epic, I changed updateDescription to be a no-op when the server type is LoadBalancer
. I felt this was a cleaner approach than checking every invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, that makes sense. Thanks for the explanations!
Co-authored-by: Kevin Albertson <kevin.albertson@10gen.com>
Summary of changes:
uint64
, use a map fromserverID
(which is an ObjectID) to a tuple of(generation number, connection count)
. In this new map type,primitive.NilObjectID
is used to store the generation and connection count for servers that do not return aserverId
field inismaster
responses (i.e. servers in non-load balanced deployments).ServerID
field toPoolCleared
events.