Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-19494 New IBYTI logic not working properly #11043

Merged
merged 2 commits into from
Apr 19, 2018

Conversation

richardkchapman
Copy link
Member

Code was assuming meaning in subChannels array that was not actually valid.
Make the assumption valid by allocating subChannels in a more cyclic fashion.

Signed-off-by: Richard Chapman rchapman@hpccsystems.com

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Testing:

Debugged through and observed values and code routes being more what I was expecting.

Note that I don't know if this is the ONLY issue with the new IBYTI code, nor does it explain the memory usage issue which suggests that a Roxie with a high rate of ineffective IBYTI is leaking - this needs to be investigated elsewhere.

@hpcc-jirabot
Copy link

@richardkchapman richardkchapman force-pushed the ibyti-fix branch 3 times, most recently from 04607fa to 3aaa983 Compare April 18, 2018 20:03
Code was assuming meaning in subChannels array that was not actually valid.
Make the assumption valid by allocating subChannels in a more cyclic fashion.

Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
@@ -192,6 +193,12 @@ class RoxiePacketHeader
return SUBCHANNEL_MASK << (SUBCHANNEL_BITS * subChannel);
}

unsigned getRespondingSubChannel() const // NOTE - 0 based
{
unsigned bitpos = ffs(retries);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to define ffs for windows

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

* for any given channel.
*
* To determine which subchannel is the "primary" for a given query packet, a hash value of fields from the packet header
* is used, modulo the number of subchannels for on this channel. The slave on this subchannel will respond immediately.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "for on"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -96,6 +96,7 @@ extern unsigned myNodeIndex;

#define SUBCHANNEL_MASK 3
#define SUBCHANNEL_BITS 2 // allows for up to 7-way redundancy in a 16-bit short retries flag, high bits used for indicators/flags
#define MAX_SUBCHANNEL 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment // (16-2) / SUBCHANNEL_BITS - or calculate


void init(unsigned channel)
{
mySubChannel = subChannels[channel]-1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do with comments on the channels/subChannels variables to indicate exactly what they mean.

return (liveSubChannels[channel] & mask) == mask;
}
public:
unsigned getIbytiDelay(unsigned subChannel) const // NOTE - zero-based
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearer if the parameter was called primarySubChannel, same for the function below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

channels[channelCount++] = channel;
}
for (int i = 0; i < channelCount; i++)
subChannelInfo[i].init(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subChannelInfo[channels[i]].init(channels[i]);

or merge into the previous loop

while (primarySubChannel != mySubChannel)
{
unsigned channelMask = SUBCHANNEL_MASK << (SUBCHANNEL_BITS * primarySubChannel);
if ((h.retries & ROXIE_RETRIES_MASK) == channelMask)
if (primarySubChannel == theirSubChannel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still trying to understand this change...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was supposed to make this code clearer! Net effect will be the same.

@HPCCSmoketest
Copy link
Contributor

Automated Smoketest: ✅
OS: centos 7.2.1511 (Linux 3.10.0-327.28.3.el7.x86_64)
Sha: a82238e
Build: success
Install hpccsystems-platform-community_6.4.16-rc1.el7.x86_64.rpm
HPCC Start: OK

Unit tests result:

Test total passed failed errors timeout
unittest 97 97 0 0 0
wutoolTest(Dali) 19 19 0 0 0
wutoolTest(Cassandra) 19 19 0 0 0

Regression test result:

phase total pass fail
setup (hthor) 11 11 0
setup (thor) 11 11 0
setup (roxie) 11 11 0
test (hthor) 734 734 0
test (thor) 629 629 0
test (roxie) 762 762 0

HPCC Stop: OK
HPCC Uninstall: OK
Time stats:

Prep time Build time Package time Install time Start time Test time Stop time Summary
5 sec (00:00:05) 163 sec (00:02:43) 37 sec (00:00:37) 6 sec (00:00:06) 36 sec (00:00:36) 1070 sec (00:17:50) 33 sec (00:00:33) 1350 sec (00:22:30)

@ghalliday
Copy link
Member

@mckellyln looks good to me and ready for testing.

Reduce IBYTI delays more gradually - treating a single timed-out IBYTI as
indicative of a dead node seems to cause far more failed IBYTIs than prior
logic did.

This version should be cleaner and thus easier to understand and fix in the
future, should cope with more than 2 subchannels, and should behave
identically to the pre-6.4.12 code when there are only 2 subchannels.

Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
@ghalliday ghalliday merged commit 03daf41 into hpcc-systems:candidate-6.4.16 Apr 19, 2018
@richardkchapman richardkchapman deleted the ibyti-fix branch December 18, 2018 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants