New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-19906 Fix race during finalization of spilling container. #11320
HPCC-19906 Fix race during finalization of spilling container. #11320
Conversation
https://track.hpccsystems.com/browse/HPCC-19906 |
@ghalliday - the only functional change + the fix is in the last commit, the other commits are there to improve the tracing and to tidy up some virtuals + add overrides. Please review and let me know when to sqaush. |
@jakesmith the changes look sensible. Any idea why the thor loop tests are failing? |
No, I will investigate. |
@ghalliday - fixed the cause of the loop failures, the locking scope wasn't long enough in some cases (and there used to be a crit block in getStream). Please review |
thorlcr/thorutil/thmem.cpp
Outdated
totalRows += spillableRows.numCommitted(); | ||
if (iCompare) | ||
{ | ||
// Option(rcflag_noAllInMemSort) - avoid sorting allMemRows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not connected with this PR, but I don't think this option is ever used (I was trying to work out why it would be used...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I can see it's not. And although not directly related, I am going to remove associated code now.
@@ -165,6 +165,7 @@ class CSpillable : public CSimpleInterfaceOf<roxiemem::IBufferedRowCallback> | |||
unsigned spillPriority = SPILL_PRIORITY_DISABLE; | |||
IThorRowInterfaces *rowIf = nullptr; | |||
roxiemem::IRowManager *rowManager = nullptr; | |||
StringAttr tracingPrefix; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more efficient if this didn't involve a heap allocation. I suspect it is a tiny proportion of the time spent in these classes, but I am slightly concerned about the overhead for small numbers of rows in child queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree it's a concern, but it's more that if these objects in general are recreated on the heap repeatedly for small sets in child queries, then it is wasteful/inefficient.
I've tried in the past to avoid that, but there is likely plenty of scope to look for further optimizations to minimize reallocation of reusable utility objects like this.
i.e. I think the approach should be, if possible, create a e.g. CSpillableStream object once and reuse it on next iteration of the activity in a child query, in which case the overhead of this StringAttr allocation for example becomes a 1 time overhead and trivial.
@jakesmith I am not sure I would spot any big problems, but the logic looks sensible. A couple of minor comments, but could merge as-is. |
@ghalliday - added commit to remove RowCollectorOptionFlags / rcflag_noAllInMemSort Let me know and I'll squash the commits. |
@AttilaVamos - there are no changes to hthor or any base classes/code that hthor uses in this PR, so the failures would appear to be unrelated. Is there another merged PR implicated do you know? |
@ghalliday - anything recently merged that may be causing these mismatches? |
I don't know any. |
I run your PR on my Ubuntu based smoketest. Moreover and got and build latest 6.4.22 and execute regression test on it to check it is healthy. |
@ghalliday - smoketest issues were my fault, last commit caused it, now rectified. Please review/let me know when to squash. |
@jakesmith I think it looks ok - please squash |
Signed-off-by: Jake Smith <jake.smith@lexisnexisrisk.com>
@ghalliday - squashed. |
Automated Smoketest: ✅ Unit tests result:
Regression test result:
HPCC Stop: OK
|
Type of change:
Checklist:
Testing:
Manually reproduced race-condition type events using gdb to halt threads.