New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-20864 Avoid race condition fetching loop input dataset #11883
Conversation
If a child query activity asked for the entire global loop input dataset, a race could happen where the result on some slaves was not prepared at the time the result was requested, causing an assert to be hit. Fix is to move the synchronization point, until after the input is prepared, to do this required moving some of the code out of a couple of 'execute' helper functions. Signed-off-by: Jake Smith <jake.smith@lexisnexisrisk.com>
https://track.hpccsystems.com/browse/HPCC-20864 |
@ghalliday - please review |
Automated Smoketest: ✅ Unit tests result:
Regression test result:
HPCC Stop: OK
|
Will this have a performance impact? |
I would not expect any difference, it is only moving the synchronization point slightly later. |
if (sync(loopCounter)) | ||
break; | ||
queryContainer().queryLoopGraph()->execute(*this, (helper->getFlags() & IHThorGraphLoopArg::GLFcounter)?loopCounter:0, loopResults.get(), extractBuilder.size(), extractBuilder.getbytes()); | ||
|
||
boundGraph->queryGraph()->executeChild(extractBuilder.size(), extractBuilder.getbytes(), results, loopResults); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this would have called graph->setLoopCounter(counter), now it is setting a counter result. Was it previously wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be same as before.
It set the loop counter result via prepareCounterResult() and still does (line 243 above).
It used to call setLoopCounter inside the now removed IThorBoundLoopGraph::execute() method, now it is called as part of prepareCounterResult()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith a couple of questions
unsigned condLoopCounter = (helper->getFlags() & IHThorGraphLoopArg::GLFcounter) ? loopCounter : 0; | ||
IThorBoundLoopGraph *boundGraph = queryContainer().queryLoopGraph(); | ||
if (condLoopCounter) | ||
boundGraph->prepareCounterResult(*this, results, condLoopCounter, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the last parameter be a 2 to match process()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is as it was before but it was a bit confusing.
It is 2 in the LOOP case, and 0 in the GRAPH case.
And I'm hoping that regression suite tests are validating it is correct.
Having had a quick look at roxie code, I think 0 for GRAPH and 2 for LOOP is correct or at least same in Roxie. Can you verify?
The code above is for GRAPH and before this change it used to call one of the 2nd form of IThorBoundLoopGraph::execute which used to set the counter result as result 0.
@ghalliday - can you look at my answers please? |
If a child query activity asked for the entire global loop
input dataset, a race could happen where the result on some
slaves was not prepared at the time the result was requested,
causing an assert to be hit.
Fix is to move the synchronization point, until after the
input is prepared, to do this required moving some of the
code out of a couple of 'execute' helper functions.
Signed-off-by: Jake Smith jake.smith@lexisnexisrisk.com
Type of change:
Checklist:
Testing: