New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SERVER-30283 Update topology_coordinator's .h and .cpp files #1196
Conversation
PingStats, defined in topology_coordinator.cpp, is a data structure used to track statistics about replication heartbeat messages. It has a variable called _numFailuresSinceLastStart, which keeps track of failed heartbeat attempts. When a good heartbeat response comes back, this variable is set to std::numeric_limits<int>max(), presumably as an easy way for this value to always be larger than others in certain comparisons. In the case that we actually get back a good hearbeat response (PingStats::hit) and then a failed heartbeat response from the same node (PingStats::miss), we will actually overflow this _numFailuresSinceLastStart variable. The integer max function should not be used to determine if a good heartbeat response was received. To remediate the issue, a field named _goodHeartbeatReceived was added to PingStats to indicate whether a good heartbeat response (PingStats::hit) was received since the last start (PingStats::start). The assignment statement responsible for the potential overflow is then replaced by a new assignment statement that sets _goodHeartbeatReceived to true when a good heartbeat response comes back. Resolves: SERVER-30283
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of this patch, you should also consider adding a small C++ unit test in db/repl/topology_coordinator_test.cpp
to exercise the methods of PingStats
.
Thanks for taking on this ticket!
* heartbeat has been received since the last call to start(). | ||
*/ | ||
bool noMoreRetries() const { | ||
return _numFailuresSinceLastStart > 2 || _goodHeartbeatReceived == true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max number of heartbeat failures should not be hard-coded. You should use the kMaxHeartbeatRetries
constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, to make sure there is no confusion, it might be prudent to surround each condition in parentheses.
* exceeded the max number of heartbeat retries or whether a good | ||
* heartbeat has been received since the last call to start(). | ||
*/ | ||
bool noMoreRetries() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name hasFailed
might be a clearer name for this method.
if (hbStats.getNumFailuresSinceLastStart() <= kMaxHeartbeatRetries && | ||
alreadyElapsed < _rsConfig.getHeartbeatTimeoutPeriod()) { | ||
if ((hbStats.getNumFailuresSinceLastStart() <= kMaxHeartbeatRetries) && | ||
(hbStats.getGoodHeartbeatReceived() == false) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace these two condition checks with one call to noMoreRetries
? If so, it looks like this would also make the getGoodHeartbeatReceived
method obsolete, since it is only used here.
if (!_rsConfig.isInitialized() || | ||
(hbStats.getNumFailuresSinceLastStart() > kMaxHeartbeatRetries) || | ||
(alreadyElapsed >= _rsConfig.getHeartbeatTimeoutPeriodMillis())) { | ||
if (!_rsConfig.isInitialized() || (hbStats.noMoreRetries() == true) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are checking if an expression X
is true, it is redundant to write if(X == true)
, you can just write if(X)
. You can apply this to the hbStats.noMoreRetries()
condition.
if (!_rsConfig.isInitialized() || | ||
(hbStats.getNumFailuresSinceLastStart() > kMaxHeartbeatRetries) || | ||
(alreadyElapsed >= _rsConfig.getHeartbeatTimeoutPeriodMillis())) { | ||
if (!_rsConfig.isInitialized() || (hbStats.noMoreRetries() == true) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same applies here as on line 911.
@@ -1093,8 +1096,9 @@ HeartbeatResponseAction TopologyCoordinator::processHeartbeatResponse( | |||
if (!hbResponse.isOK()) { | |||
if (isUnauthorized) { | |||
hbData.setAuthIssue(now); | |||
} else if (hbStats.getNumFailuresSinceLastStart() > kMaxHeartbeatRetries || | |||
alreadyElapsed >= _rsConfig.getHeartbeatTimeoutPeriod()) { | |||
} else if ((hbStats.noMoreRetries() == true) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same applies here as on line 911.
*/ | ||
int getNumFailuresSinceLastStart() const { | ||
return _numFailuresSinceLastStart; | ||
} | ||
|
||
/** | ||
* Gets a flag that indicates whether the number of failed heartbeats has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When describing a function like this, it's a bit redundant to say "Gets a flag...". The fact that a function returns a boolean should be enough information about the nature of the return value. In a comment like this, it can be clearer to just concisely describe the case when the function returns true
, and when the function returns false
.
} | ||
|
||
/** | ||
* Gets a flag that indicates whether a good heartbeat was received |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same applies here as on line 1054.
Sounds good. Once you add the unit test feel free to push the new changes and I will take another look at them. The best guide for how to add a unit test is probably to look at an example i.e. |
Hi will62794, After building Typically, unit tests only test the public interface, so are we allowed to test private members of a class in this context? If not, then we can still have the peace of mind that there are multiple tests in I will push my other changes for now and await your decision on the unit test part. For documentation purposes, here's what my (unfinished) unit test would look like:
|
My new changes have now been pushed. |
Hi cramaechi, You are right, I overlooked the fact that PingStats is private to Also, it looks like your latest changes include many other unrelated changes, which makes it difficult to do another review pass. Did you perhaps rebase your branch against master and then push? That may be causing the noise. If you could try to clean that up and push your changes again, then I will take another look. Thanks! |
91af117
to
32e7618
Compare
Hi will62794, Sorry for the trouble. I believe I have resolved the conflict. A new pull request has been opened. |
} | ||
|
||
private: | ||
static constexpr Milliseconds UninitializedPing{-1}; | ||
|
||
// Maximum number of retries for a failed heartbeat. | ||
int _kMaxHeartbeatRetries = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a constant in the header file, instead of a member variable of PingStats
.
Change has been made. |
Hi cramaechi, This patch looks good, thanks! I have compiled it and run the appropriate tests and everything seems to look fine. I will let you know when I get this merged. Thank you for your work! Will |
Hi will62794, No problem. I look forward to contributing more! Best Regards, |
Your changes have been merged at 6e3b0d! |
Great. Thank you! |
Closes mongodb#1196 Signed-off-by: William Schultz <william.schultz@mongodb.com>
PingStats, defined in topology_coordinator.cpp, is a data structure used
to track statistics about replication heartbeat messages. It has a
variable called _numFailuresSinceLastStart, which keeps track of failed
heartbeat attempts. When a good heartbeat response comes back, this
variable is set to std::numeric_limitsmax(), presumably as an easy
way for this value to always be larger than others in certain
comparisons. In the case that we actually get back a good hearbeat
response (PingStats::hit) and then a failed heartbeat response from the
same node (PingStats::miss), we will actually overflow this
_numFailuresSinceLastStart variable. The integer max function should
not be used to determine if a good heartbeat response was received.
To remediate the issue, a field named _goodHeartbeatReceived was
added to PingStats to indicate whether a good heartbeat response
(PingStats::hit) was received since the last start
(PingStats::start). The assignment statement responsible for the
potential overflow is then replaced by a new assignment statement
that sets _goodHeartbeatReceived to true when a good heartbeat
response comes back.
Resolves: SERVER-30283