-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-18895 Roxie disk read to use new field filters #10741
HPCC-18895 Roxie disk read to use new field filters #10741
Conversation
https://track.hpccsystems.com/browse/HPCC-18895 |
@ghalliday Please review |
d72e29d
to
2ce77ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments.
roxie/ccd/ccdcontext.cpp
Outdated
if (!workUnit) | ||
return factory->queryOnceResultStore(); | ||
// fall into... | ||
if (!workUnit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace: strange indentation.
roxie/ccd/ccdkey.hpp
Outdated
* IDirectReader (this remains TBD at this point) | ||
* | ||
*/ | ||
interface IDirectReaderEx : extends ISerialStream, extends ISimpleReadStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit ugly. Can we derived ISerialStream from ISimpleReadStream instead? It might simplify some other code, and should be trivial to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, though I fear it would open other cans of worms, and might be better addressed separately from this PR
* matching rows. Translated rows are returned. | ||
* | ||
*/ | ||
interface IDirectReader : extends IThorDiskCallback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strange that IDirectReaderEx isn't an extension of IDirectReader. That is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should probably be renamed. IDirectReader is not a great name either...
@@ -575,7 +575,7 @@ size32_t RtlRecord::calculateOffset(const void *_row, unsigned field) const | |||
unsigned numOffsets = getNumVarFields() + 1; | |||
size_t * variableOffsets = (size_t *)alloca(numOffsets * sizeof(size_t)); | |||
RtlRow sourceRow(*this, nullptr, numOffsets, variableOffsets); | |||
sourceRow.setRow(_row, field); | |||
sourceRow.setRow(_row, field+1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this +1 required if only accessing the offset of the field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's required if calling getOffset (which will asset without it). But perhaps the assert is wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I added the +1 because I hit the assert - but I don't recall the exact details about whether I was trying to get the offset or the size at the time.)
assert(row); | ||
if (_numFieldsUsed > numFieldsUsed) | ||
{ | ||
info.calcRowOffsets(variableOffsets, row, _numFieldsUsed); // MORE - could be optimized t oonly calc ones not previously calculated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in comment.
Would be worth discussing the idea of calculation on demand and/or incremental calculation. I suspect the cost is a bit too high, but it may solve some issues.
roxie/ccd/ccdactivities.cpp
Outdated
// MORE - this need refactoring - it is not threadsafe any more! We need to split the index list out of the index manager. | ||
// Easy enough to make it threadsafe if we accept that indexes once created are not destroyed (until file is unloaded) | ||
// There's a small potential flaw that I can end up with indexes from a query that was unloaded active on some nodes but not others | ||
// which could mess up continuation. I think I don't care. | ||
ForEach(*memKeyInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need addressing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second half of the comment is still valid (but I don't plan to address it). The first half of the comment has been addressed, and the comment should be deleted.
{ | ||
} | ||
|
||
// This version is used for fixed size rows only - variable size rows use more derived class which overrides | ||
virtual void doQuery(IMessagePacker *output, unsigned processed, unsigned __int64 rowLimit, unsigned __int64 stopAfter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future: There is potential for a more efficient version of this on keyed data with no post filter - by jumping to the following value, and subtracting the positions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The old code did not do that either (but could have). Worth raising a Jira
Refactored significantly to remove unused code for dynamically selectingindexes to build, and to common up the interfaces used for keyed versus unkeyed access to disk files. Added translation support to keyed and unkeyed rdisk operations. Removed optimized fixed-size record processing. Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
e90bcc7
to
23c889c
Compare
_nextRow(); | ||
if (postFilter.matches(row)) // MORE - could filter before translation. | ||
{ | ||
anyThisGroup = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think finishedRow() is called for rows that match - should finishRow() be called at the head of the loop instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller is responsible for calling finishRow on the ones that are returned
} | ||
if (a<numPtrs) | ||
else if (postFilter.matches(keySearcher->queryRow())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this applying a post translation filter to a pre translation record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Translation of filters is still WIP
seg++; | ||
if (seg==lim) | ||
return a; | ||
MemoryBufferBuilder aBuilder(buf, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think buf needs to have length set back to 0 in finishedRow()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably
@@ -1997,113 +1210,51 @@ class InMemoryIndexCursor : implements IInMemoryIndexCursor, public CInterface | |||
|
|||
virtual void serializeCursorPos(MemoryBuffer &mb) const | |||
{ | |||
mb.append(keySize); | |||
mb.append(keySize, keyBuffer); | |||
index->serializeCursorPos(mb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this need to do more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like what?
Automated Smoketest: ✅ Unit tests result:
Regression test result:
HPCC Stop: OK |
I have created jiras for the issues that I think are still outstanding. Will merge this as-is. |
Refactored significantly to remove unused code for dynamically
selectingindexes to build, and to common up the interfaces used for keyed
versus unkeyed access to disk files. Added translation support to keyed and
unkeyed rdisk operations. Removed optimized fixed-size record processing.
Signed-off-by: Richard Chapman rchapman@hpccsystems.com
Type of change:
Checklist:
Testing:
Regression suite, unit tests.