HPCC-18895 Roxie disk read to use new field filters #10741

richardkchapman · 2017-12-14T09:01:58Z

Refactored significantly to remove unused code for dynamically
selectingindexes to build, and to common up the interfaces used for keyed
versus unkeyed access to disk files. Added translation support to keyed and
unkeyed rdisk operations. Removed optimized fixed-size record processing.

Signed-off-by: Richard Chapman rchapman@hpccsystems.com

Type of change:

This change is a bug fix (non-breaking change which fixes an issue).
This change is a new feature (non-breaking change which adds functionality).
This change improves the code (refactor or other change that does not change the functionality)
This change fixes warnings (the fix does not alter the functionality or the generated code)
This change is a breaking change (fix or feature that will cause existing behavior to change).
This change alters the query API (existing queries will have to be recompiled)

Checklist:

Testing:

Regression suite, unit tests.

hpcc-jirabot · 2017-12-14T09:02:08Z

https://track.hpccsystems.com/browse/HPCC-18895
Jira updated

richardkchapman · 2017-12-14T09:02:13Z

@ghalliday Please review

ghalliday

A few comments.

ghalliday · 2017-12-14T09:51:38Z

roxie/ccd/ccdcontext.cpp

-        	if (!workUnit)
-        		return factory->queryOnceResultStore();
-        	// fall into...
+        	    if (!workUnit)


whitespace: strange indentation.

ghalliday · 2017-12-14T10:37:24Z

roxie/ccd/ccdkey.hpp

+ * IDirectReader (this remains TBD at this point)
+ *
+ */
+interface IDirectReaderEx : extends ISerialStream, extends ISimpleReadStream


This looks a bit ugly. Can we derived ISerialStream from ISimpleReadStream instead? It might simplify some other code, and should be trivial to implement.

Possibly, though I fear it would open other cans of worms, and might be better addressed separately from this PR

ghalliday · 2017-12-14T10:38:45Z

roxie/ccd/ccdkey.hpp

+ * matching rows. Translated rows are returned.
+ *
+ */
+interface IDirectReader : extends IThorDiskCallback


strange that IDirectReaderEx isn't an extension of IDirectReader. That is confusing.

It should probably be renamed. IDirectReader is not a great name either...

ghalliday · 2017-12-14T10:57:19Z

rtl/eclrtl/rtlrecord.cpp

@@ -575,7 +575,7 @@ size32_t RtlRecord::calculateOffset(const void *_row, unsigned field) const
        unsigned numOffsets = getNumVarFields() + 1;
        size_t * variableOffsets = (size_t *)alloca(numOffsets * sizeof(size_t));
        RtlRow sourceRow(*this, nullptr, numOffsets, variableOffsets);
-        sourceRow.setRow(_row, field);
+        sourceRow.setRow(_row, field+1);


is this +1 required if only accessing the offset of the field?

It's required if calling getOffset (which will asset without it). But perhaps the assert is wrong?

(I added the +1 because I hit the assert - but I don't recall the exact details about whether I was trying to get the offset or the size at the time.)

ghalliday · 2017-12-14T11:31:42Z

rtl/eclrtl/rtlrecord.cpp

+    assert(row);
+    if (_numFieldsUsed > numFieldsUsed)
+    {
+        info.calcRowOffsets(variableOffsets, row, _numFieldsUsed); // MORE - could be optimized t oonly calc ones not previously calculated


typo in comment.
Would be worth discussing the idea of calculation on demand and/or incremental calculation. I suspect the cost is a bit too high, but it may solve some issues.

ghalliday · 2017-12-14T11:59:18Z

roxie/ccd/ccdactivities.cpp

+                    // MORE - this need refactoring - it is not threadsafe any more! We need to split the index list out of the index manager.
+                    // Easy enough to make it threadsafe if we accept that indexes once created are not destroyed (until file is unloaded)
+                    // There's a small potential flaw that I can end up with indexes from a query that was unloaded active on some nodes but not others
+                    // which could mess up continuation. I think I don't care.
                    ForEach(*memKeyInfo)


Does this need addressing?

The second half of the comment is still valid (but I don't plan to address it). The first half of the comment has been addressed, and the comment should be deleted.

ghalliday · 2017-12-14T12:04:01Z

roxie/ccd/ccdactivities.cpp

-    {
-    }
-
-    // This version is used for fixed size rows only - variable size rows use more derived class which overrides
    virtual void doQuery(IMessagePacker *output, unsigned processed, unsigned __int64 rowLimit, unsigned __int64 stopAfter)


Future: There is potential for a more efficient version of this on keyed data with no post filter - by jumping to the following value, and subtracting the positions.

Yes. The old code did not do that either (but could have). Worth raising a Jira

Refactored significantly to remove unused code for dynamically selectingindexes to build, and to common up the interfaces used for keyed versus unkeyed access to disk files. Added translation support to keyed and unkeyed rdisk operations. Removed optimized fixed-size record processing. Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>

ghalliday · 2017-12-14T14:39:16Z

roxie/ccd/ccdkey.cpp

+            _nextRow();
+            if (postFilter.matches(row))  // MORE - could filter before translation.
+            {
+                anyThisGroup = true;


I don't think finishedRow() is called for rows that match - should finishRow() be called at the head of the loop instead

The caller is responsible for calling finishRow on the ones that are returned

ghalliday · 2017-12-14T14:50:25Z

roxie/ccd/ccdkey.cpp

            }
-            if (a<numPtrs)
+            else if (postFilter.matches(keySearcher->queryRow()))


Is this applying a post translation filter to a pre translation record?

Yes. Translation of filters is still WIP

ghalliday · 2017-12-14T14:50:50Z

roxie/ccd/ccdkey.cpp

-                    seg++;
-                    if (seg==lim)
-                        return a;
+                    MemoryBufferBuilder aBuilder(buf, 0);


I think buf needs to have length set back to 0 in finishedRow()

ghalliday · 2017-12-14T14:52:23Z

roxie/ccd/ccdkey.cpp

@@ -1997,113 +1210,51 @@ class InMemoryIndexCursor : implements IInMemoryIndexCursor, public CInterface

    virtual void serializeCursorPos(MemoryBuffer &mb) const 
    {
-        mb.append(keySize);
-        mb.append(keySize, keyBuffer);
        index->serializeCursorPos(mb);


doesn't this need to do more?

HPCCSmoketest · 2017-12-14T14:58:50Z

Automated Smoketest: ✅
Sha: 23c889c
Build: success
Install hpccsystems-platform-community_6.5.0-trunk0.el7.x86_64.rpm
HPCC Start: OK

Unit tests result:

Test	total	passed
unittest	92	92
wutoolTest(Dali)	19	19
wutoolTest(Cassandra)	19	19

Regression test result:

phase	total	pass
setup (hthor)	11	11
setup (thor)	11	11
setup (roxie)	11	11
test (hthor)	734	734
test (thor)	647	647
test (roxie)	761	761

HPCC Stop: OK
HPCC Uninstall: OK

ghalliday · 2017-12-14T15:05:21Z

I have created jiras for the issues that I think are still outstanding. Will merge this as-is.

richardkchapman force-pushed the roxie-fieldfilters branch from d72e29d to 2ce77ae Compare December 14, 2017 09:33

ghalliday reviewed Dec 14, 2017

View reviewed changes

richardkchapman force-pushed the roxie-fieldfilters branch from e90bcc7 to 23c889c Compare December 14, 2017 14:27

ghalliday reviewed Dec 14, 2017

View reviewed changes

ghalliday merged commit a4d8fa2 into hpcc-systems:master Dec 14, 2017

richardkchapman deleted the roxie-fieldfilters branch February 12, 2018 10:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPCC-18895 Roxie disk read to use new field filters #10741

HPCC-18895 Roxie disk read to use new field filters #10741

richardkchapman commented Dec 14, 2017 •

edited by ghalliday

Loading

hpcc-jirabot commented Dec 14, 2017

richardkchapman commented Dec 14, 2017

ghalliday left a comment

ghalliday Dec 14, 2017

ghalliday Dec 14, 2017

richardkchapman Dec 14, 2017

ghalliday Dec 14, 2017

richardkchapman Dec 14, 2017

ghalliday Dec 14, 2017

richardkchapman Dec 14, 2017

richardkchapman Dec 14, 2017

ghalliday Dec 14, 2017

ghalliday Dec 14, 2017

richardkchapman Dec 14, 2017

ghalliday Dec 14, 2017

richardkchapman Dec 14, 2017

ghalliday Dec 14, 2017

richardkchapman Dec 14, 2017

ghalliday Dec 14, 2017

richardkchapman Dec 14, 2017

ghalliday Dec 14, 2017

richardkchapman Dec 14, 2017

ghalliday Dec 14, 2017

richardkchapman Dec 14, 2017

HPCCSmoketest commented Dec 14, 2017

ghalliday commented Dec 14, 2017

HPCC-18895 Roxie disk read to use new field filters #10741

HPCC-18895 Roxie disk read to use new field filters #10741

Conversation

richardkchapman commented Dec 14, 2017 • edited by ghalliday Loading

Type of change:

Checklist:

Testing:

hpcc-jirabot commented Dec 14, 2017

richardkchapman commented Dec 14, 2017

ghalliday left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HPCCSmoketest commented Dec 14, 2017

ghalliday commented Dec 14, 2017

richardkchapman commented Dec 14, 2017 •

edited by ghalliday

Loading