Use DictionaryBlocks for Unnest Operator #901

phd3 · 2019-06-04T02:00:17Z

Following the discussion in #563 , this PR proposes a new design for Unnest Operator.

The key parts of the new design are as follows:

1. Avoid copying by using a DictionaryBlock, whenever possible.

The main drawback of current implementation is copying of data, both in replicate channels and unnest channels. The solution to this is using a DictionaryBlock for both the cases. Some form of the input block is used as the dictionary in output Dictionaryblock.
Output for replicate channels often contains the same elements repeated multiple times. Using a dictionary block for them can help avoid copying unnecessary data, especially for datatypes with large sizes. We can create dictionary blocks that point to elements (repetitively when required) in the input block to produce output for replicate channels. In this case, input block itself acts as a dictionary for output block.
Nested blocks (ArrayBlock OR MapBlock for unnest channels) are created by adding a layer of indexing on top of their underlying element blocks (a LongArrayBlock for example). We can say that for the most part, these underlying element blocks are what we need as an output of our unnest channels, modulo a few special cases. Therefore, we try to refer elements from the underlying element blocks while creating output block. In this case, underlying element block inside the input block acts as a dictionary for output block.

2. There exist cases where DictionaryBlock cannot be used, and have to copy.

The need for copying arises when a null element needs to be appended to the output. This happens when the cardinality of a nested element is smaller than any of the elements in other unnest channels. We call this mismatch of cardinalities misalignment.
This is because we cannot create a pointer to any element in the input block for representing null, if there is no null element in the input block. Currently, the DictionaryBlock implementation does not use negative indices to represent NULL elements. (When it does, we can get rid of copying altogether. @dain mentioned that it would be a big project in itself.)
If the input block indeed contains a NULL element, we can use the index of that element in output dictionary block.
So when we encounter a null-append operation on the output block (and absence of a null element in the input), we have to move away from DictionaryBlock. Elements are copied over to create a new output block (just like the current implementation). But this won’t happen as frequently.
For replicate output blocks, no copying is required whatsoever. This is because a null element is never going to be appended, unless there is a null in the input replicate block. So, the replicate output block will always be a dictionary block.
For unnest output blocks, copying is required only if there is misalignment. That will not happen if we are only unnesting one column (apart from a special case with row unnesting), which is usually the common case. Even when the misalignment occurs, copying won’t be required if there is a null entry in the underlying element block of the nested block.
BlockProducerFromSource object is given one of the input blocks as the source. It has methods to process elements in this block and create an output block. The output can be either (1) a dictionary block using the source as a dictionary or (2) a block created by copying elements from the source.

3. Unnesters maintain columnar structures to help avoid copying.

There are 3 different implementations of Unnester abstract class. MapUnnester, ArrayUnnester and ArrayOfRowsUnnester. An Unnester object processes input rows for a nested column and produces output blocks when asked for.
ColumnarMap, ColumnarArray and ColumnarRow structures help access underlying element blocks from the nested blocks (like MapBlock, ArrayBlock, RowBlock). Unnesters maintain some form of the Columnar Structures for processing unnest channel blocks. When new input blocks are assigned to an Unnester, corresponding Columnar structures are refreshed as well.
Every Unnester is supposed to output one or more output blocks, the number of output blocks is called channelCount. For example, channelCount for a MapUnnester is 2, because it produces blocks corresponding to keys and values. An Unnester contains an array of BlockProducerFromSource objects of length channelCount, to produce outputs.
The Unnester processes input nested blocks one row at a time. For every row, it translates the position in the nested block to positions in underlying element blocks and invokes BlockProducerFromSource methods for all output channels.

cla-bot · 2019-06-04T02:00:19Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@prestosql.io. For more information, see https://github.com/prestosql/cla.

sopel39 · 2019-06-04T11:49:28Z

Hi @phd3

Thanks for contribution! Do you have some benchmarking numbers?

wagnermarkd

Thanks Pratham! In addition to timing benchmarks, would you share some memory usage comparisons? That's what originally got us interested in this. I assume at least that the particularly bad query we had, which used to fail now passes. Can you get samples of the sizes of Pages returned from the UnnestOperator of that query before and after this change?

I haven't gone through everything yet, but my high level take away is that the interaction between Unnesters and the BlockProducerFromSource is complicated and can be simplified by separating concerns

presto-main/src/main/java/io/prestosql/operator/unnest/UnnestOperator.java

wagnermarkd · 2019-06-04T18:47:22Z

presto-main/src/main/java/io/prestosql/operator/unnest/UnnestOperator.java

+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package io.prestosql.operator.unnest;


Do you have the package move commit separate? If it's easy, it'd be nice to have this as two commits: 1 for the move and 1 for the logic changes. That will help git recognize that you've moved the file and it's not a brand new file, which in turn makes diff and blame output a bit more sensible.

presto-main/src/main/java/io/prestosql/operator/unnest/BlockProducerFromSource.java

phd3 · 2019-06-04T20:07:41Z

@sopel39 I saw that there are no benchmarks for UnnestOperator yet, I am looking into how we can add that.

martint · 2019-06-10T20:30:29Z

@cla-bot check

cla-bot · 2019-06-10T20:30:34Z

The cla-bot has been summoned, and re-checked this pull request!

phd3 · 2019-06-11T02:20:06Z

Some benchmark results using the code in #952 . @sopel39 @wagnermarkd This addresses a couple of your comments.

For all plots below, VARCHAR elements in all columns have lengths close to 50, and length of nested structures in every row (eg. array(VARCHAR)) have lengths distributed uniformly between 0 and 300. We compare cpu time per operation and gc.alloc.rate_norm from JMH.

Comparison 1. Performance comparison with a single unnest channel

This is the case with the maximum gain in terms of CPU. We get away without copying blocks at all. gc.alloc.rate.norm follows the same trend.

Comparison 2. Performance comparison with two unnest channels

We have to copy blocks in case of misalignments (possible due to two unnest channels). But if we have nulls in our element blocks, using DictionaryBlock is still possible saving us CPU time.

Note that the second unnest channel doesn't have any nulls, and will always end up copying for that channel in this example. The memory consumption would reduce if it did have some nulls.

When there is misalignment and there are no null elements, we end up copying blocks. In these cases, we may have to do more work because we first create dictionary ids, and then may have to switch to copied blocks.

Memory Allocations

Using a DictionaryBlock means that we don't have to allocate new memory for output blocks. Instead we just use pointers to the input data.

The newly allocated memory (that is referenced in output blocks) while processing one iteration in BenchmarkOperator#unnest is presented below with a comparison. Note that the tables below don't account for structures created and then thrown away in BlockProducerFromSource. For output dictionary blocks, this number essentially corresponds to size of ids.

The values shown here are just for getting a handle on the data scale.

Implementation	Fresh allocation for replicateChannel output block
Current	499.704K
Proposed	5.177K

Scenario	Implementation	Fresh allocation in unnestChannel output Block
No misalignment	Current	499.113K
No misalignment	Proposed	5.177K
Misalignment, no nulls	Current	307.508K
Misalignment, no nulls	Proposed	78.825K
Misalignment, 20% nulls	Current	264.638K
Misalignment, 20% nulls	Proposed	5.21K

dain

Looking good! Most of my inline comments are code style nits, or readability suggestions.

I do have one higher level concern about the size of created pages. It looks like the UnnestOperator is exclusively using PageBuilderStatus to limit page sizes. This limits page based on bytes, but since most of the time you will have dictionary blocks, the status is not used. When we need to transition from dictionary to copy, we need to copy all of the data to the block builder, and this could be very large. Limiting the number of rows in the output page is a good first step to limit this issue. If it becomes a further problem we could try to estimate the size of the copy (doing a perfect count is to expensive), or we could just figure out how to get a null into a dictionary block without copy.

presto-main/src/main/java/io/prestosql/operator/unnest/UnnestOperator.java

presto-main/src/main/java/io/prestosql/operator/unnest/BlockProducerFromSource.java

presto-main/src/test/java/io/prestosql/operator/unnest/TestBlockProducerFromSource.java

dain · 2019-06-11T01:54:12Z

presto-main/src/test/java/io/prestosql/operator/unnest/TestBlockProducerFromSource.java

+        producer.appendElementFromSource(0);
+        producer.appendNull();
+        block = producer.buildOutputAndFlush();
+        assertTrue((block instanceof DictionaryBlock) ^ (nullIndex == -1));


Use logical operators || instead of bit operators ^ for boolean expressions.

I tried looking into the documentation, and found that ^ is overloaded for boolean operands, and evaluates as a logical XOR: https://docs.oracle.com/javase/specs/jls/se12/html/jls-15.html#jls-15.22.2 Do you still recommend using || and && for readability?

Yes. The bitwise operators have unexpected behavior like they don't short circuit.

presto-main/src/test/java/io/prestosql/operator/unnest/TestUnnesterUtil.java

phd3 · 2019-06-17T21:40:09Z

@wagnermarkd The updated PR contains 3 commits: package move, implementation, addressing all other feedback. Splitting the BlockProducerFromSource... does simplify things.

@dain Thanks for the meticulous review and tips on better code styling and readability. The updated PR has your suggestions incorporated as extensively as I could. We discussed your concern regarding block sizes on slack, adding the gist here for reference: The while loop in UnnestOperator#getOutput has a check on output row count and pageBuilderStatus, which bounds length and size of the output. Checks are performed after processing every input row, so there is a possibility of overshooting the bounds. But these checks are enough for now. For better readability, moved the two output checks together.

dain

Looking good. Just a few final comments.

I think we should make make all of the classes in the new package "package protected" except for the operator, since they shouldn't be needed out side of this package.
Let's remove use usage of '^' in boolean expressions.
There is some minor comments on readability and formatting.

presto-main/src/main/java/io/prestosql/operator/unnest/ReplicatedBlockBuilder.java

dain · 2019-06-18T00:06:30Z

presto-main/src/main/java/io/prestosql/operator/unnest/ReplicatedBlockBuilder.java

+        checkState(source != null, "source is null");
+        checkElementIndex(index, source.getPositionCount());
+
+        for (int i = 0; i < count; i++) {


If you want (this is not necessary), you could simplify this code by growing the array before the loop, and then using Arrays.fill(ids, positionCount, positionCount + count, index)

I implemented this:

if (positionCount + count >= ids.length) { // Grow capacity int newSize = Math.max(calculateNewArraySize(ids.length), positionCount + count); ids = Arrays.copyOf(ids, newSize); }

There is an edge case if positionCount + count is greater than UnnestOperatorBlockUtil.MAX_ARRAY_SIZE (or overflows). I don't expect it to be common though, since it would require a trillion elements in the block, hence ~billions in a row (and will probably fail before it comes to this). Do you think we still need to sanity-check it?

In this case it would overflow to negative, and you would get only the result of calculateNewArraySize(ids.length). If that was not big enough, you would get an error when calling array fill. Anyway, I don' think this is worth changing here. In the future, you can use addExact instead of + to cause an exception to be thrown on overflow.

presto-main/src/main/java/io/prestosql/operator/unnest/UnnestBlockBuilder.java

presto-main/src/main/java/io/prestosql/operator/unnest/UnnestOperator.java

presto-main/src/test/java/io/prestosql/operator/unnest/TestArrayOfRowsUnnester.java

presto-main/src/test/java/io/prestosql/operator/unnest/TestArrayUnnester.java

presto-main/src/test/java/io/prestosql/operator/unnest/TestMapUnnester.java

dain · 2019-06-18T00:53:22Z

presto-main/src/test/java/io/prestosql/operator/unnest/TestBlockProducerFromSource.java

+        producer.appendElementFromSource(0);
+        producer.appendNull();
+        block = producer.buildOutputAndFlush();
+        assertTrue((block instanceof DictionaryBlock) ^ (nullIndex == -1));


Yes. The bitwise operators have unexpected behavior like they don't short circuit.

oneonestar · 2019-06-18T07:14:57Z

Update 2: See below

~~Update: The latest commit seems solved the issue. Let me test it again.~~

I got a position out of bounds error when trying this patch using our production test data. I don't know the exact data which caused this problem, but I got some debugging information. Contact me if you can't reproduce the problem.

oneonestar · 2019-06-18T07:38:29Z

I got the following errors using b6923ed49e. I don't know the exact data which caused this problem. Contact me if you can't reproduce the problem.

Issue 1:
This error happened when:

SELECT abc FROM table
    CROSS JOIN UNNEST(nested_field1, nested_field2) AS f
    WHERE top_column != ''             <<<< Without this condition, the query runs fine

Query 20190618_073546_00001_9qdzn failed: start index (5331) must not be greater than size (3516)
java.lang.IndexOutOfBoundsException: start index (5331) must not be greater than size (3516)
        at com.google.common.base.Preconditions.checkPositionIndexes(Preconditions.java:1404)
        at io.prestosql.operator.unnest.UnnestBlockBuilder.appendRange(UnnestBlockBuilder.java:142)
        at io.prestosql.operator.unnest.ArrayUnnester.processCurrentPosition(ArrayUnnester.java:74)
        at io.prestosql.operator.unnest.Unnester.processCurrentAndAdvance(Unnester.java:76)
        at io.prestosql.operator.unnest.UnnestOperator.lambda$processCurrentPosition$2(UnnestOperator.java:227)
        at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:407)
        at io.prestosql.operator.unnest.UnnestOperator.processCurrentPosition(UnnestOperator.java:227)
        at io.prestosql.operator.unnest.UnnestOperator.getOutput(UnnestOperator.java:200)
        at io.prestosql.operator.Driver.processInternal(Driver.java:379)
        at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
        at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
        at io.prestosql.operator.Driver.processFor(Driver.java:276)
        at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
        at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
        at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
        at io.prestosql.$gen.Presto_309_37_gb6923ed____20190618_073512_1.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

Issue 2:
This error happened when:

SELECT abc FROM table
    CROSS JOIN UNNEST(nested_field1) AS f      << This time no nested_field2
    WHERE top_column != ''             <<<< Without this condition, the query runs fine

Query 20190618_075434_00031_9qdzn failed: position is not valid
java.lang.IllegalArgumentException: position is not valid
        at io.prestosql.spi.block.AbstractRowBlock.checkReadablePosition(AbstractRowBlock.java:223)
        at io.prestosql.spi.block.AbstractRowBlock.isNull(AbstractRowBlock.java:215)
        at io.prestosql.spi.block.ColumnarRow.isNull(ColumnarRow.java:118)
        at io.prestosql.operator.unnest.ArrayOfRowsUnnester.processCurrentPosition(ArrayOfRowsUnnester.java:86)
        at io.prestosql.operator.unnest.Unnester.processCurrentAndAdvance(Unnester.java:76)
        at io.prestosql.operator.unnest.UnnestOperator.lambda$processCurrentPosition$2(UnnestOperator.java:227)
        at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:407)
        at io.prestosql.operator.unnest.UnnestOperator.processCurrentPosition(UnnestOperator.java:227)
        at io.prestosql.operator.unnest.UnnestOperator.getOutput(UnnestOperator.java:200)
        at io.prestosql.operator.Driver.processInternal(Driver.java:379)
        at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
        at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
        at io.prestosql.operator.Driver.processFor(Driver.java:276)
        at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
        at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
        at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
        at io.prestosql.$gen.Presto_309_37_gb6923ed____20190618_073511_1.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

phd3 · 2019-06-18T16:44:00Z

@oneonestar Thanks for letting me know, I will look into it and get back to you.

phd3 · 2019-06-20T00:29:28Z

With the help of debug information from @oneonestar, I was able to reproduce both the errors. The issue seems to be arising because of the offset calculation in Columnar classes. I've created #1027 for that.

I've included a commit in updated PR that contains tests for reproducing the issue (that can be dropped before the check-in). These tests succeed with the offset adjustment and fail otherwise. @oneonestar Thanks for bringing this up. It would be great if you can verify the changes on your test data.

@dain I don't know if there was a reason behind keeping the offsets referring to the original elements block. please let me know if you can think of any potential problems with this approach.

oneonestar · 2019-06-20T06:00:43Z

@phd3 Confirmed this PR plus #1027 passed with our test data with 2x~10x faster. Thanks =)

dain

Looks good. Can you rebase and squash the 3 followup commits?

Cherry-pick of trinodb/trino#901 (trinodb/trino#901) Co-authored-by: Ajay George <ajaygeorge@fb.com>

wagnermarkd suggested changes Jun 4, 2019

View reviewed changes

dain self-requested a review June 4, 2019 22:18

cla-bot bot added the cla-signed label Jun 10, 2019

dain requested changes Jun 11, 2019

View reviewed changes

phd3 force-pushed the issue#563 branch from 71295d4 to b6923ed Compare June 17, 2019 21:22

dain requested changes Jun 18, 2019

View reviewed changes

phd3 mentioned this pull request Jun 20, 2019

Adjust getOffset output in Columnar classes for trimmed element blocks #1027

Merged

dain self-requested a review June 22, 2019 20:32

dain approved these changes Jun 24, 2019

View reviewed changes

phd3 force-pushed the issue#563 branch from c082e2c to 87f2388 Compare June 24, 2019 22:26

phd3 added 2 commits June 24, 2019 15:47

Move classes for Unnest Operator to a separate package

fe9e7c0

Implement Unnest Operator with Dictionary Blocks

e05502d

phd3 force-pushed the issue#563 branch from 87f2388 to e05502d Compare June 24, 2019 22:51

dain merged commit f5b1534 into trinodb:master Jun 25, 2019

dain added this to the 316 milestone Jun 25, 2019

dain mentioned this pull request Jun 25, 2019

Release notes for 316 #1000

Closed

6 tasks

wenleix mentioned this pull request Jul 24, 2019

Backport Dictionary Block Optimization for Unnest Operator prestodb/presto#13121

Closed

phd3 mentioned this pull request Aug 23, 2019

Add a post about unnest operator optimization trinodb/trino.io#41

Merged

ajaygeorge mentioned this pull request Sep 3, 2019

Use DictionaryBlocks for Unnest Operator prestodb/presto#13319

Merged

ajaygeorge added a commit to ajaygeorge/presto that referenced this pull request Sep 9, 2019

Move classes for Unnest Operator to a separate package

66b173d

Cherry-pick of trinodb/trino#901 (trinodb/trino#901) Co-authored-by: Ajay George <ajaygeorge@fb.com>

ajaygeorge added a commit to ajaygeorge/presto that referenced this pull request Sep 9, 2019

Implement Unnest Operator with Dictionary Blocks

c903416

Cherry-pick of trinodb/trino#901 (trinodb/trino#901) Co-authored-by: Ajay George <ajaygeorge@fb.com>

wenleix pushed a commit to prestodb/presto that referenced this pull request Sep 9, 2019

Move classes for Unnest Operator to a separate package

1906c16

Cherry-pick of trinodb/trino#901 (trinodb/trino#901) Co-authored-by: Ajay George <ajaygeorge@fb.com>

wenleix pushed a commit to prestodb/presto that referenced this pull request Sep 9, 2019

Implement Unnest Operator with Dictionary Blocks

9232496

Cherry-pick of trinodb/trino#901 (trinodb/trino#901) Co-authored-by: Ajay George <ajaygeorge@fb.com>

wenleix mentioned this pull request Oct 29, 2019

Revert dictionary block optimization for unnest prestodb/presto#13619

Closed

kaikalur pushed a commit to kaikalur/presto that referenced this pull request Jan 22, 2020

Move classes for Unnest Operator to a separate package

e1404f8

Cherry-pick of trinodb/trino#901 (trinodb/trino#901) Co-authored-by: Ajay George <ajaygeorge@fb.com>

kaikalur pushed a commit to kaikalur/presto that referenced this pull request Jan 22, 2020

Implement Unnest Operator with Dictionary Blocks

b8b3bd9

Cherry-pick of trinodb/trino#901 (trinodb/trino#901) Co-authored-by: Ajay George <ajaygeorge@fb.com>

xerial mentioned this pull request Apr 21, 2020

Performance regression around MarkDistinct operator #3503

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use DictionaryBlocks for Unnest Operator #901

Use DictionaryBlocks for Unnest Operator #901

phd3 commented Jun 4, 2019 •

edited

Loading

cla-bot bot commented Jun 4, 2019

sopel39 commented Jun 4, 2019 •

edited

Loading

wagnermarkd left a comment

wagnermarkd Jun 4, 2019

phd3 commented Jun 4, 2019

martint commented Jun 10, 2019

cla-bot bot commented Jun 10, 2019

phd3 commented Jun 11, 2019

dain left a comment

dain Jun 11, 2019

phd3 Jun 17, 2019

dain Jun 18, 2019

phd3 commented Jun 17, 2019

dain left a comment

dain Jun 18, 2019

phd3 Jun 20, 2019

dain Jun 24, 2019

dain Jun 18, 2019

oneonestar commented Jun 18, 2019 •

edited

Loading

oneonestar commented Jun 18, 2019 •

edited

Loading

phd3 commented Jun 18, 2019

phd3 commented Jun 20, 2019 •

edited

Loading

oneonestar commented Jun 20, 2019

dain left a comment

Use DictionaryBlocks for Unnest Operator #901

Use DictionaryBlocks for Unnest Operator #901

Conversation

phd3 commented Jun 4, 2019 • edited Loading

cla-bot bot commented Jun 4, 2019

sopel39 commented Jun 4, 2019 • edited Loading

wagnermarkd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phd3 commented Jun 4, 2019

martint commented Jun 10, 2019

cla-bot bot commented Jun 10, 2019

phd3 commented Jun 11, 2019

Memory Allocations

dain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phd3 commented Jun 17, 2019

dain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oneonestar commented Jun 18, 2019 • edited Loading

oneonestar commented Jun 18, 2019 • edited Loading

phd3 commented Jun 18, 2019

phd3 commented Jun 20, 2019 • edited Loading

oneonestar commented Jun 20, 2019

dain left a comment

Choose a reason for hiding this comment

phd3 commented Jun 4, 2019 •

edited

Loading

sopel39 commented Jun 4, 2019 •

edited

Loading

oneonestar commented Jun 18, 2019 •

edited

Loading

oneonestar commented Jun 18, 2019 •

edited

Loading

phd3 commented Jun 20, 2019 •

edited

Loading