Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-18895 Roxie disk read to use new field filters #10741

Merged
merged 1 commit into from
Dec 14, 2017

Conversation

richardkchapman
Copy link
Member

@richardkchapman richardkchapman commented Dec 14, 2017

Refactored significantly to remove unused code for dynamically
selectingindexes to build, and to common up the interfaces used for keyed
versus unkeyed access to disk files. Added translation support to keyed and
unkeyed rdisk operations. Removed optimized fixed-size record processing.

Signed-off-by: Richard Chapman rchapman@hpccsystems.com

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Testing:

Regression suite, unit tests.

@hpcc-jirabot
Copy link

@richardkchapman
Copy link
Member Author

@ghalliday Please review

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.

if (!workUnit)
return factory->queryOnceResultStore();
// fall into...
if (!workUnit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace: strange indentation.

* IDirectReader (this remains TBD at this point)
*
*/
interface IDirectReaderEx : extends ISerialStream, extends ISimpleReadStream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit ugly. Can we derived ISerialStream from ISimpleReadStream instead? It might simplify some other code, and should be trivial to implement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, though I fear it would open other cans of worms, and might be better addressed separately from this PR

* matching rows. Translated rows are returned.
*
*/
interface IDirectReader : extends IThorDiskCallback
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange that IDirectReaderEx isn't an extension of IDirectReader. That is confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be renamed. IDirectReader is not a great name either...

@@ -575,7 +575,7 @@ size32_t RtlRecord::calculateOffset(const void *_row, unsigned field) const
unsigned numOffsets = getNumVarFields() + 1;
size_t * variableOffsets = (size_t *)alloca(numOffsets * sizeof(size_t));
RtlRow sourceRow(*this, nullptr, numOffsets, variableOffsets);
sourceRow.setRow(_row, field);
sourceRow.setRow(_row, field+1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this +1 required if only accessing the offset of the field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's required if calling getOffset (which will asset without it). But perhaps the assert is wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I added the +1 because I hit the assert - but I don't recall the exact details about whether I was trying to get the offset or the size at the time.)

assert(row);
if (_numFieldsUsed > numFieldsUsed)
{
info.calcRowOffsets(variableOffsets, row, _numFieldsUsed); // MORE - could be optimized t oonly calc ones not previously calculated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in comment.
Would be worth discussing the idea of calculation on demand and/or incremental calculation. I suspect the cost is a bit too high, but it may solve some issues.

// MORE - this need refactoring - it is not threadsafe any more! We need to split the index list out of the index manager.
// Easy enough to make it threadsafe if we accept that indexes once created are not destroyed (until file is unloaded)
// There's a small potential flaw that I can end up with indexes from a query that was unloaded active on some nodes but not others
// which could mess up continuation. I think I don't care.
ForEach(*memKeyInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need addressing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second half of the comment is still valid (but I don't plan to address it). The first half of the comment has been addressed, and the comment should be deleted.

{
}

// This version is used for fixed size rows only - variable size rows use more derived class which overrides
virtual void doQuery(IMessagePacker *output, unsigned processed, unsigned __int64 rowLimit, unsigned __int64 stopAfter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future: There is potential for a more efficient version of this on keyed data with no post filter - by jumping to the following value, and subtracting the positions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The old code did not do that either (but could have). Worth raising a Jira

Refactored significantly to remove unused code for dynamically
selectingindexes to build, and to common up the interfaces used for keyed
versus unkeyed access to disk files. Added translation support to keyed and
unkeyed rdisk operations. Removed optimized fixed-size record processing.

Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
_nextRow();
if (postFilter.matches(row)) // MORE - could filter before translation.
{
anyThisGroup = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think finishedRow() is called for rows that match - should finishRow() be called at the head of the loop instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller is responsible for calling finishRow on the ones that are returned

}
if (a<numPtrs)
else if (postFilter.matches(keySearcher->queryRow()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this applying a post translation filter to a pre translation record?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Translation of filters is still WIP

seg++;
if (seg==lim)
return a;
MemoryBufferBuilder aBuilder(buf, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think buf needs to have length set back to 0 in finishedRow()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably

@@ -1997,113 +1210,51 @@ class InMemoryIndexCursor : implements IInMemoryIndexCursor, public CInterface

virtual void serializeCursorPos(MemoryBuffer &mb) const
{
mb.append(keySize);
mb.append(keySize, keyBuffer);
index->serializeCursorPos(mb);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this need to do more?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like what?

@HPCCSmoketest
Copy link
Contributor

Automated Smoketest: ✅
Sha: 23c889c
Build: success
Install hpccsystems-platform-community_6.5.0-trunk0.el7.x86_64.rpm
HPCC Start: OK

Unit tests result:

Test total passed failed errors timeout
unittest 92 92 0 0 0
wutoolTest(Dali) 19 19 0 0 0
wutoolTest(Cassandra) 19 19 0 0 0

Regression test result:

phase total pass fail
setup (hthor) 11 11 0
setup (thor) 11 11 0
setup (roxie) 11 11 0
test (hthor) 734 734 0
test (thor) 647 647 0
test (roxie) 761 761 0

HPCC Stop: OK
HPCC Uninstall: OK

@ghalliday
Copy link
Member

I have created jiras for the issues that I think are still outstanding. Will merge this as-is.

@ghalliday ghalliday merged commit a4d8fa2 into hpcc-systems:master Dec 14, 2017
@richardkchapman richardkchapman deleted the roxie-fieldfilters branch February 12, 2018 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants