-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-19494 New IBYTI logic not working properly #11043
HPCC-19494 New IBYTI logic not working properly #11043
Conversation
https://track.hpccsystems.com/browse/HPCC-19494 |
04607fa
to
3aaa983
Compare
Code was assuming meaning in subChannels array that was not actually valid. Make the assumption valid by allocating subChannels in a more cyclic fashion. Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
a060db7
to
92e2189
Compare
roxie/ccd/ccd.hpp
Outdated
@@ -192,6 +193,12 @@ class RoxiePacketHeader | |||
return SUBCHANNEL_MASK << (SUBCHANNEL_BITS * subChannel); | |||
} | |||
|
|||
unsigned getRespondingSubChannel() const // NOTE - 0 based | |||
{ | |||
unsigned bitpos = ffs(retries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to define ffs for windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
roxie/ccd/ccdqueue.cpp
Outdated
* for any given channel. | ||
* | ||
* To determine which subchannel is the "primary" for a given query packet, a hash value of fields from the packet header | ||
* is used, modulo the number of subchannels for on this channel. The slave on this subchannel will respond immediately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo "for on"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
roxie/ccd/ccd.hpp
Outdated
@@ -96,6 +96,7 @@ extern unsigned myNodeIndex; | |||
|
|||
#define SUBCHANNEL_MASK 3 | |||
#define SUBCHANNEL_BITS 2 // allows for up to 7-way redundancy in a 16-bit short retries flag, high bits used for indicators/flags | |||
#define MAX_SUBCHANNEL 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment // (16-2) / SUBCHANNEL_BITS - or calculate
roxie/ccd/ccdqueue.cpp
Outdated
|
||
void init(unsigned channel) | ||
{ | ||
mySubChannel = subChannels[channel]-1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do with comments on the channels/subChannels variables to indicate exactly what they mean.
roxie/ccd/ccdqueue.cpp
Outdated
return (liveSubChannels[channel] & mask) == mask; | ||
} | ||
public: | ||
unsigned getIbytiDelay(unsigned subChannel) const // NOTE - zero-based |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearer if the parameter was called primarySubChannel, same for the function below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
roxie/ccd/ccdqueue.cpp
Outdated
channels[channelCount++] = channel; | ||
} | ||
for (int i = 0; i < channelCount; i++) | ||
subChannelInfo[i].init(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subChannelInfo[channels[i]].init(channels[i]);
or merge into the previous loop
roxie/ccd/ccdqueue.cpp
Outdated
while (primarySubChannel != mySubChannel) | ||
{ | ||
unsigned channelMask = SUBCHANNEL_MASK << (SUBCHANNEL_BITS * primarySubChannel); | ||
if ((h.retries & ROXIE_RETRIES_MASK) == channelMask) | ||
if (primarySubChannel == theirSubChannel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still trying to understand this change...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was supposed to make this code clearer! Net effect will be the same.
Automated Smoketest: ✅ Unit tests result:
Regression test result:
HPCC Stop: OK
|
@mckellyln looks good to me and ready for testing. |
Reduce IBYTI delays more gradually - treating a single timed-out IBYTI as indicative of a dead node seems to cause far more failed IBYTIs than prior logic did. This version should be cleaner and thus easier to understand and fix in the future, should cope with more than 2 subchannels, and should behave identically to the pre-6.4.12 code when there are only 2 subchannels. Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
a82238e
to
c002832
Compare
Code was assuming meaning in subChannels array that was not actually valid.
Make the assumption valid by allocating subChannels in a more cyclic fashion.
Signed-off-by: Richard Chapman rchapman@hpccsystems.com
Type of change:
Checklist:
Testing:
Debugged through and observed values and code routes being more what I was expecting.
Note that I don't know if this is the ONLY issue with the new IBYTI code, nor does it explain the memory usage issue which suggests that a Roxie with a high rate of ineffective IBYTI is leaking - this needs to be investigated elsewhere.