New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-21091 Fix smart join race condition causing crash #11981
Conversation
https://track.hpccsystems.com/browse/HPCC-21091 |
0d11b38
to
e90c2fc
Compare
@ghalliday - please review |
Need to update the check boxes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith code looks generally good - but found one problem
@ghalliday - please see fix for nextRhsToSpill issue. |
@jakesmith looks good. Please squash and fill in the checkboxes. |
@ghalliday - squashed and filled checkboxes. |
@richardkchapman should this be considered for 7.0.6? |
@ghalliday Yes I should think so, if we are confident in the fix |
@jakesmith I have checked through again, and I think it would be worth targeting 7.0.6 |
1) Fix problems when row arrays spilt under memory pressure. Code was thread unsafe – rows could be added and row array expanded whilst saving, and subsequent flushes of any remaining rows was not using spillable array correctly and could hit null rows and cause the crash reported by this JIRA. NB: this should resolve an old issue (HPCC_17331) that has some debugging code (still left in for now). 2) Improve choice of row array to spill. Used to check if any had any rows to process and spill in order, which meant if rows dripping in from different sources and memory pressure was trying to flush, would spill lots of small sets. Change to round robin through row arrays. 3) Move some of the code that cleans, compacts and prepares the row arrays after oom event trying to prepare global lookup, to an earlier common point. And before compacting ensure row array is force flushed to ensure relocated, in the event it has spilt. 4) Add/change a few comments Signed-off-by: Jake Smith <jake.smith@lexisnexisrisk.com>
f05c065
to
ea5e17f
Compare
@ghalliday - retargeted to candidate-7.0.6 |
Automated Smoketest: ✅ Unit tests result:
Regression test result:
HPCC Stop: OK
|
Code was thread unsafe – rows could be added and row array expanded
whilst saving, and subsequent flushes of any remaining rows was not using
spillable array correctly and could hit null rows and cause the crash
reported by this JIRA.
NB: this should resolve an old issue (HPCC_17331) that has some debugging
code (still left in for now).
Improve choice of row array to spill. Used to check if any had any rows
to process and spill in order, which meant if rows dripping in from
different sources and memory pressure was trying to flush, would spill lots
of small sets. Change to round robin through row arrays.
Move some of the code that cleans, compacts and prepares the row arrays
after oom event trying to prepare global lookup, to an earlier common point.
And before compacting ensure row array is force flushed to ensure relocated,
in the event it has spilt.
Add/change a few comments
Signed-off-by: Jake Smith jake.smith@lexisnexisrisk.com
Type of change:
Checklist:
Testing: