Optimize PartitionedOutputOperator #13183

yingsu00 · 2019-08-05T06:28:52Z

Github issue #13015

The new OptimizedPartitionedOutputOperator achieves better performance by
skipping block building. It appends data directly to output buffers, then wraps
these buffers into SerializedPage.

Internal shadow run shows 2x improvement in the operator CPU usage, and 5%
CPU improvement over all queries.

Benchmark results show 1.5x - 3.7x improvements on 5000 pages (unit ms/op):

Type	hasNull	channels	baseline	optimized	Gain (x)
BIGINT	FALSE	1	2,677	882	3.0
BIGINT	FALSE	2	3,961	1,084	3.7
BIGINT	TRUE	1	3,348	1,245	2.7
BIGINT	TRUE	2	4,375	1,557	2.8
BOOLEAN	FALSE	1	2,915	873	3.3
BOOLEAN	FALSE	2	3,818	1,140	3.3
BOOLEAN	TRUE	1	3,003	1,112	2.7
BOOLEAN	TRUE	2	4,084	1,553	2.6
INTEGER	FALSE	1	2,804	954	2.9
INTEGER	FALSE	2	3,921	1,101	3.6
INTEGER	TRUE	1	3,143	1,181	2.7
INTEGER	TRUE	2	4,442	1,542	2.9
LONG_DECIMAL	FALSE	1	3,697	1,118	3.3
LONG_DECIMAL	FALSE	2	5,550	1,476	3.8
LONG_DECIMAL	TRUE	1	3,654	1,311	2.8
LONG_DECIMAL	TRUE	2	6,089	1,821	3.3
SMALLINT	FALSE	1	2,821	1,032	2.7
SMALLINT	FALSE	2	4,007	1,239	3.2
SMALLINT	TRUE	1	2,908	1,141	2.5
SMALLINT	TRUE	2	4,244	1,580	2.7
VARCHAR	FALSE	1	5,821	1,972	3.0
VARCHAR	FALSE	2	10,021	3,117	3.2
VARCHAR	TRUE	1	5,732	2,226	2.6
VARCHAR	TRUE	2	9,484	3,436	2.8
DICTIONARY(BIGINT)	FALSE	1	1,694	628	2.7
DICTIONARY(BIGINT)	FALSE	2	2,450	822	3.0
DICTIONARY(BIGINT)	TRUE	1	1,940	823	2.4
DICTIONARY(BIGINT)	TRUE	2	2,960	1,096	2.7
RLE(BIGINT)	FALSE	1	1,663	592	2.8
RLE(BIGINT)	FALSE	2	2,404	753	3.2
RLE(BIGINT)	TRUE	1	1,645	684	2.4
RLE(BIGINT)	TRUE	2	2,429	851	2.9
ARRAY(BIGINT)	FALSE	1	1,128	660	1.7
ARRAY(BIGINT)	FALSE	2	1,924	958	2.0
ARRAY(BIGINT)	TRUE	1	1,190	702	1.7
ARRAY(BIGINT)	TRUE	2	2,002	1,160	1.7
ARRAY(VARCHAR)	FALSE	1	2,049	1,204	1.7
ARRAY(VARCHAR)	FALSE	2	3,937	2,162	1.8
ARRAY(VARCHAR)	TRUE	1	1,913	1,216	1.6
ARRAY(VARCHAR)	TRUE	2	3,482	2,177	1.6
ARRAY(ARRAY(BIGINT))	FALSE	1	2,148	1,290	1.7
ARRAY(ARRAY(BIGINT))	FALSE	2	4,271	2,430	1.8
ARRAY(ARRAY(BIGINT))	TRUE	1	2,122	1,441	1.5
ARRAY(ARRAY(BIGINT))	TRUE	2	3,937	2,495	1.6
MAP(BIGINT,BIGINT)	FALSE	1	2,266	838	2.7
MAP(BIGINT,BIGINT)	FALSE	2	4,468	1,441	3.1
MAP(BIGINT,BIGINT)	TRUE	1	2,137	936	2.3
MAP(BIGINT,BIGINT)	TRUE	2	4,165	1,644	2.5
MAP(BIGINT,MAP(BIGINT,BIGINT))	FALSE	1	5,941	2,154	2.8
MAP(BIGINT,MAP(BIGINT,BIGINT))	FALSE	2	11,707	4,564	2.6
MAP(BIGINT,MAP(BIGINT,BIGINT))	TRUE	1	5,301	2,060	2.6
MAP(BIGINT,MAP(BIGINT,BIGINT))	TRUE	2	10,225	4,169	2.5
ROW(BIGINT,BIGINT)	FALSE	1	1,239	575	2.2
ROW(BIGINT,BIGINT)	FALSE	2	2,289	933	2.5
ROW(BIGINT,BIGINT)	TRUE	1	1,354	721	1.9
ROW(BIGINT,BIGINT)	TRUE	2	2,380	1,121	2.1
ROW(ARRAY(BIGINT),ARRAY(BIGINT))	FALSE	1	2,349	1,238	1.9
ROW(ARRAY(BIGINT),ARRAY(BIGINT))	FALSE	2	4,647	2,476	1.9
ROW(ARRAY(BIGINT),ARRAY(BIGINT))	TRUE	1	2,412	1,568	1.5
ROW(ARRAY(BIGINT),ARRAY(BIGINT))	TRUE	2	4,698	3,093	1.5

Updates

Sept 30 2019

Fixed "position is not valid" for MapBlock
Removed FIXED_WIDTH_TYPE_SERIALIZED_BYTES in OptimizedPartitionedOutputOperator
Removed "Add getRetainedSizeInBytes to BlockFlattener". This is because the arrays are already accounted for in the DecodeBlock.

Sept 26 2019

Added BlockEncodingBuffer as an interface and AbstractBlockEncodingBuffer as a super class to all XXXBlockEncodingBuffer.
Renamed XXXBlockEncodingBuffers to XXXBlockEncodingBuffer
Reordered methods in the BlockEncodingBuffer and AbstractBlockEncodingBuffer and all XXXBlockEncodingBuffer
Added memory tracking for BlockFlattener, which is owned by OptimizedPartitionedOutputOperator.PagePartitioner
Introduced ExpansionOption enum in Arrays
Removed "Copy JvmUtils to presto-spi"
Addressed other comments from @arhimondr

Sept 23 2019

Fixed GENERIC_INTERNAL_ERROR "Invalid position x in block with y positions" in accumulateRowSizes()) in two commits 3a08823 and 83ca97
Fixed INVALID_CAST_ARGUMENT in ae0277d25c

Sept 16 2019

Changed SerializedPage size from 200B to 10000B for TestHiveDistributedQueriesWithOptimizedRepartitioning to make it run faster.

Sept 13 2019
Moved the operator and BlockEncodingBuffers to operator/repartition folder

Sept 10 2019

Removed columnarArrayBaseOffset in populateNestedPositions() from ArrayBlockEncodingBuffers and MapBlockEncodingBuffers after df67792 is merged (Adjust getOffset output in Columnar classes for trimmed element blocks)

Sept 09 2019

Fixed a bug where calculateRowSizes fails for REAL type.
Removed early initialization on OptimizedPartitionedOutputOperator#PartitionBuffer.positions and OptimizedPartitionedOutputOperator#PartitionBuffer.rowSizes(now renamed to serializedRowSizeInBytes) to fix small queries performance regression. Internal shadow run on cluster with 340 nodes show 2x CPU gain in the operator and no visible regression in small queries.

TODO

We need to decided whether to use the ColumnarArray/Map/Row objects to expose the offsets and child blocks from the Array/Map/Row Blocks, and whether to do the same to VariableWidthBlock. Based on what we agree we will make the change accordingly.
We want connectors to be able to provide custom BlockEncodingBuffer implementations for the custom blocks. One way of doing this is to move the BlockEncodingBuffer implementations to presto-spi but these buffers are coupled with the operator and are stateful. How to achieve the goal is an open question yet.
PartitionedOutputOperator reports incorrect output data size PartitionedOutputOperator reports incorrect output data size #11770

Previous WIP PR

#13032

mbasmanova · 2019-08-06T18:23:25Z

CC: @bhhari @sayhar

mbasmanova

@yingsu00 Ying, here are some initial comments on Introduce UncheckedByteArrays commit.

presto-main/src/main/java/com/facebook/presto/operator/UncheckedByteArrays.java

presto-main/src/test/java/com/facebook/presto/operator/TestUncheckedByteArrays.java

mbasmanova

@yingsu00 Here are some partial comments for Optimize repartitioning for BIGINT, DOUBLE and SHORT_DECIMAL types commit. I'll continue reviewing next week.

mbasmanova · 2019-08-09T17:15:13Z

presto-main/src/main/java/com/facebook/presto/execution/buffer/PagesSerde.java

+    {
+        checkArgument(slice.isCompact(), "slice is not compact");
+
+        int uncompressedSize = slice.length();


This method's logic appears to be an exact copy of serialize(Page page) except for slice = Slices.copyOf(slice); line. Would it make sense to extract this logic into a helper method and make a copy logic conditional on slice.isCompact() or have a boolean flag that tells whether the copy should occur or not?

They are not exactly the same. The new serialize(Slice slice, int positionCount) has additional optimizations that use byte[] instead of ByteBuffer as the compressionBuffer and make it a class member to avoid allocating new memory each time. So to compare the performance of the two operators I'd like to make them separate.

@yingsu00 Thanks for explaining. Looks like this is a small change that existing operator can benefit from. Any reason not to apply this optimization and reap some benefits earlier? You can always put old code into a benchmark to test/confirm the gains.

Sure I can cut a separate PR for it if that's what you're talking about.

@yingsu00 That would be great!

I have exactly the same thought as @mbasmanova . With the optimization in #13232 , does that mean we don't need to have to replicate the code ? If so I think this method (SerializedPage serialize(Slice slice, int positionCount)) can be added in a separate commit.

Also note this method, in theory, is not doing "serialize": the page is already serialized into slice. Would wrapSerializedSlice a better name? Do you have any suggestion? @mbasmanova

@wenleix

Also note this method, in theory, is not doing "serialize": the page is already serialized into slice. Would wrapSerializedSlice a better name? Do you have any suggestion? @mbasmanova

I agree that the name could be improved. I don't have a good suggestion though. For now, I think it would be reasonable to use the same name as an existing method and unify their implementations. A rename of both methods can happen separately.

Added a new commit c8eb31102d "Extract wrapSlice method in PagesSerde" as @wenleix suggested.

presto-main/src/main/java/com/facebook/presto/operator/BlockEncodingBuffers.java

presto-main/src/main/java/com/facebook/presto/operator/OptimizedPartitionedOutputOperator.java

mbasmanova · 2019-08-09T20:58:02Z

presto-main/src/main/java/com/facebook/presto/operator/OptimizedPartitionedOutputOperator.java

+            pagesAdded.incrementAndGet();
+            rowsAdded.addAndGet(bufferedRowCount);
+
+            if (blockEncodingBuffers != null) {


this logic is inconsistent with the for loop above; change it to

for (int i = 0; i < channelCount; i++) { blockEncodingBuffers[i].resetBuffers(); }

Why this comment is marked as resolved? Since on line 477 blockEncodingBuffers is not checked whether it's null or not, so I assume here we don't need the check as well?

Also, does it make sense to have one single serializeAndReset (a.k.a flush) API ? As it looks to me serializeTo call is always coupled with resetBuffers. -- We can do this refactor after this PR is merged, of course.

presto-main/src/main/java/com/facebook/presto/operator/OptimizedPartitionedOutputOperator.java

mbasmanova

@yingsu00 Finished reading the OptimizedPartitionedOutputOperator. Just a few minor comments.

presto-main/src/main/java/com/facebook/presto/operator/OptimizedPartitionedOutputOperator.java

mbasmanova

@yingsu00 Some initial comments on Introduce TestingBlockBuilders and TestingPageBuilders commit

presto-main/src/test/java/com/facebook/presto/operator/TestingBlockBuilders.java

presto-main/src/test/java/com/facebook/presto/operator/TestingPageBuilders.java

presto-main/src/test/java/com/facebook/presto/operator/TestingBlockBuilders.java

mbasmanova

Copy JvmUtils from Airlift to presto-spi

Looks good.

wenleix · 2019-10-09T18:35:09Z

...src/main/java/com/facebook/presto/operator/repartition/VariableWidthBlockEncodingBuffer.java

+        // We need to use getRawSlice() to get the raw slice whose address is not advanced by getSlice(). It's incorrect to call getSlice()
+        // because the returned slice's address may be advanced if it's based on a slice view.
+        Slice rawSlice = variableWidthBlock.getRawSlice(0);
+        byte[] sliceBase = (byte[]) rawSlice.getBase();


While today a VariableWidthBlock is always holding a Slice backed by a byte array, I do feel this downcast is a bit hacky and breaks the abstraction.

What do you think ? @highker , @arhimondr

@wenleix the downcast from Object to byte[] was pre-existing in the code base. I saw examples from PagesSerde(Original serialize(Page) method), OrcOutputBuffer, OrcInputStream, and ParquetCompressionUtils.

Example from PagesSerde.serialize(Page page)

int compressedSize = compressor.get().compress( (byte[]) slice.getBase(), (int) (slice.getAddress() - ARRAY_BYTE_BASE_OFFSET), uncompressedSize, compressionBuffer, 0, maxCompressedSize);

From OrcOutputBuffer.writeBytes(Slice source, int sourceIndex, int length)

writeDirectlyToOutputStream((byte[]) source.getBase(), sourceIndex + (int) (source.getAddress() - ARRAY_BYTE_BASE_OFFSET), length);`

From OrcInputStream.advance():

int uncompressedSize = decompressor.get().decompress((byte[]) chunk.getBase(), (int) (chunk.getAddress() - ARRAY_BYTE_BASE_OFFSET), chunk.length(), output);

wenleix

"Fix "position is not valid" for Mapblock": LGTM.

wenleix

"Fix "invalid position in block" error" LGTM.

HiveQueryRunner produced a lot of logs and caused Travis to fail with "The job exceeded the maximum log length" error. This commit increases the logging level for com.facebook.presto.event from INFO to WARN to reduce the amount of logs produced.

Operators that need to use BlockFlattener need to instantiate an ArrayAllocator. However the ArrayAllocators were package private and can't be accessed by operators not under the root folder of operator. This commit makes them public.

oerling · 2019-10-09T21:19:34Z

The code is entirely type safe. The alternative abstraction is to have a method that copies bytes out of a block, like Block.copyBytes(int position, int start, int length, byte[] target, int targetOffset); But this is more code, more overhead, misses the opportunity of decoding offsets outside of loops, etc. Presto needs a handle on CPU efficiency and this is part of getting this. . From: Wenlei Xie <notifications@github.com> Sent: Wednesday, October 9, 2019 11:37 AM To: prestodb/presto <presto@noreply.github.com> Cc: oerling <erling@xs4all.nl>; Mention <mention@noreply.github.com> Subject: Re: [prestodb/presto] Optimize PartitionedOutputOperator (#13183) @wenleix commented on this pull request.

_____ In presto-main/src/main/java/com/facebook/presto/operator/repartition/VariableWidthBlockEncodingBuffer.java <#13183 (comment)> :

@@ -165,10 +167,15 @@ private void appendOffsetsAndSlices()

AbstractVariableWidthBlock variableWidthBlock = (AbstractVariableWidthBlock) decodedBlock; int[] positions = getPositions(); - // This implementation uses variableWidthBlock.getPositionOffset() to achieve high performance - int sliceLength = variableWidthBlock.getPositionOffset(variableWidthBlock.getPositionCount()) - variableWidthBlock.getPositionOffset(0); + // We need to use getRawSlice() to get the raw slice whose address is not advanced by getSlice(). It's incorrect to call getSlice() + // because the returned slice's address may be advanced if it's based on a slice view. + Slice rawSlice = variableWidthBlock.getRawSlice(0); + byte[] sliceBase = (byte[]) rawSlice.getBase(); While today a VariableWidthBlock is always holding a Slice backed by a byte array, I do feel this downcast is a bit hacky and breaks the abstraction. What do you think ? @highker <https://github.com/highker> , @arhimondr <https://github.com/arhimondr> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13183?email_source=notifications&email_token=AKPPPT3C3PUOQL55TWSXHX3QNYQDJA5CNFSM4IJH4I6KYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCHN4DGY#pullrequestreview-299614619> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKPPPTZR5IAG4EKB4TPASXLQNYQDJANCNFSM4IJH4I6A> .

The new OptimizedPartitionedOutputOperator achieves better performance by skipping block building. It appends data directly to output buffers, then wraps these buffers into SerializedPage. Includes support for types represented as long, e.g. BIGINT, DOUBLE and short DECIMAL.

yingsu00 · 2019-10-10T01:08:58Z

@wenleix Hi Wenlei, thank you for reviewing the PR again. I have squashed "Fix "invalid position in block" error" and "Fix "position is not valid" for Mapblock" into the previous commits.

Regarding the downcast in VariableWidthBlockEncodingBuffer, do you think it's blocking? Like I said, there're existing code examples that downcasts from Slice.getBase() to byte[], and VariableWidthBlock is always backed by byte[], so I think it's pretty type safe. Actually the downcasting was already in the commit "834e12c8ea Optimize repartitioning for VARCHAR type". The change in this "ce775377a7 Fix INVALID_CAST_ARGUMENT error in VariableWidthBlockEncodingBuffers" was to change the variableWidthBlock.getSlice() to variableWidthBlock.getRawSlice(). This was because of a side effect in getSlice() that it returns a Slice whose address was advanced and would cause inconvenience in calculating the sliceAddress:

I understand your concern about exposing the byte[] from VariableWidthBlock, but as we talked before, we have a TODO item to expose the getRawSlice() and getPositionOffset() in ColumnarSlice and we will discuss about it more as a follow up item. Also as @oerling pointed out, an alternative would be to make the Block to support this

public void copyPositionsToBuffer(int[] positions, int offset, int length, byte[] destination, int address) or
public void copyPositionsToBuffer(int[] positions, int offset, int length, OptimizedSliceOutput destination)

How about we create a TODO item on this and I will host a meeting to disucss it next week? If you really think this is a blocker, can we merge the other types other than VariableWidthBlock first?

Type hasNull channels baseline optimized Gain (x) BIGINT FALSE 1 2,677 882 3.0 BIGINT FALSE 2 3,961 1,084 3.7 BIGINT TRUE 1 3,348 1,245 2.7 BIGINT TRUE 2 4,375 1,557 2.8 BOOLEAN FALSE 1 2,915 873 3.3 BOOLEAN FALSE 2 3,818 1,140 3.3 BOOLEAN TRUE 1 3,003 1,112 2.7 BOOLEAN TRUE 2 4,084 1,553 2.6 INTEGER FALSE 1 2,804 954 2.9 INTEGER FALSE 2 3,921 1,101 3.6 INTEGER TRUE 1 3,143 1,181 2.7 INTEGER TRUE 2 4,442 1,542 2.9 LONG_DECIMAL FALSE 1 3,697 1,118 3.3 LONG_DECIMAL FALSE 2 5,550 1,476 3.8 LONG_DECIMAL TRUE 1 3,654 1,311 2.8 LONG_DECIMAL TRUE 2 6,089 1,821 3.3 SMALLINT FALSE 1 2,821 1,032 2.7 SMALLINT FALSE 2 4,007 1,239 3.2 SMALLINT TRUE 1 2,908 1,141 2.5 SMALLINT TRUE 2 4,244 1,580 2.7 VARCHAR FALSE 1 5,821 1,972 3.0 VARCHAR FALSE 2 10,021 3,117 3.2 VARCHAR TRUE 1 5,732 2,226 2.6 VARCHAR TRUE 2 9,484 3,436 2.8 DICTIONARY(BIGINT) FALSE 1 1,694 628 2.7 DICTIONARY(BIGINT) FALSE 2 2,450 822 3.0 DICTIONARY(BIGINT) TRUE 1 1,940 823 2.4 DICTIONARY(BIGINT) TRUE 2 2,960 1,096 2.7 RLE(BIGINT) FALSE 1 1,663 592 2.8 RLE(BIGINT) FALSE 2 2,404 753 3.2 RLE(BIGINT) TRUE 1 1,645 684 2.4 RLE(BIGINT) TRUE 2 2,429 851 2.9 ARRAY(BIGINT) FALSE 1 1,128 660 1.7 ARRAY(BIGINT) FALSE 2 1,924 958 2.0 ARRAY(BIGINT) TRUE 1 1,190 702 1.7 ARRAY(BIGINT) TRUE 2 2,002 1,160 1.7 ARRAY(VARCHAR) FALSE 1 2,049 1,204 1.7 ARRAY(VARCHAR) FALSE 2 3,937 2,162 1.8 ARRAY(VARCHAR) TRUE 1 1,913 1,216 1.6 ARRAY(VARCHAR) TRUE 2 3,482 2,177 1.6 ARRAY(ARRAY(BIGINT)) FALSE 1 2,148 1,290 1.7 ARRAY(ARRAY(BIGINT)) FALSE 2 4,271 2,430 1.8 ARRAY(ARRAY(BIGINT)) TRUE 1 2,122 1,441 1.5 ARRAY(ARRAY(BIGINT)) TRUE 2 3,937 2,495 1.6 MAP(BIGINT,BIGINT) FALSE 1 2,266 838 2.7 MAP(BIGINT,BIGINT) FALSE 2 4,468 1,441 3.1 MAP(BIGINT,BIGINT) TRUE 1 2,137 936 2.3 MAP(BIGINT,BIGINT) TRUE 2 4,165 1,644 2.5 MAP(BIGINT,MAP(BIGINT,BIGINT)) FALSE 1 5,941 2,154 2.8 MAP(BIGINT,MAP(BIGINT,BIGINT)) FALSE 2 11,707 4,564 2.6 MAP(BIGINT,MAP(BIGINT,BIGINT)) TRUE 1 5,301 2,060 2.6 MAP(BIGINT,MAP(BIGINT,BIGINT)) TRUE 2 10,225 4,169 2.5 ROW(BIGINT,BIGINT) FALSE 1 1,239 575 2.2 ROW(BIGINT,BIGINT) FALSE 2 2,289 933 2.5 ROW(BIGINT,BIGINT) TRUE 1 1,354 721 1.9 ROW(BIGINT,BIGINT) TRUE 2 2,380 1,121 2.1 ROW(ARRAY(BIGINT),ARRAY(BIGINT)) FALSE 1 2,349 1,238 1.9 ROW(ARRAY(BIGINT),ARRAY(BIGINT)) FALSE 2 4,647 2,476 1.9 ROW(ARRAY(BIGINT),ARRAY(BIGINT)) TRUE 1 2,412 1,568 1.5 ROW(ARRAY(BIGINT),ARRAY(BIGINT)) TRUE 2 4,698 3,093 1.5

yingsu00 · 2019-10-10T04:21:06Z

@wenleix Just squashed all commits. Thank you very much!

wenleix · 2019-10-10T05:31:10Z

Merged #13183. Thanks for the contribution!

junyi1313 · 2021-11-02T09:37:26Z

Hi @yingsu00 , we notice that this feature is still experimental (FeaturesConfig 'experimental.optimized-repartitioning'). We want to enable this feature in our product environment，but we worry it may have some potential issues. Do we have any data to indicate the performance and stability of this feature in product environment(e.g. facebook product environment)? Thanks.

facebook-github-bot added the CLA Signed label Aug 5, 2019

yingsu00 force-pushed the repartition branch 5 times, most recently from bc58acb to ba429f3 Compare August 6, 2019 05:12

yingsu00 requested review from mbasmanova, tdcmeehan, oerling and wenleix August 6, 2019 18:11

mbasmanova added the aria Presto Aria performance improvements label Aug 6, 2019

yingsu00 changed the title ~~RepartitionedOutputOperator optimizations~~ PartitionedOutputOperator optimizations Aug 6, 2019

mbasmanova requested a review from a team August 9, 2019 13:44

mbasmanova self-assigned this Aug 9, 2019

mbasmanova mentioned this pull request Aug 9, 2019

WIP - RepartitionedOutputOperator optimizations #13032

Closed

mbasmanova requested changes Aug 9, 2019

View reviewed changes

mbasmanova requested a review from a team August 9, 2019 16:15

mbasmanova reviewed Aug 9, 2019

View reviewed changes

mbasmanova reviewed Aug 14, 2019

View reviewed changes

mbasmanova requested changes Aug 14, 2019

View reviewed changes

yingsu00 mentioned this pull request Aug 14, 2019

Fix reporting of output data size for PartitionedOutputOperator #13224

Merged

mbasmanova mentioned this pull request Aug 14, 2019

Support for nano-second/microsecond timestamps #13063

Open

yingsu00 force-pushed the repartition branch 5 times, most recently from 5f94f6d to 3990e92 Compare August 20, 2019 17:40

mbasmanova requested a review from a team August 22, 2019 06:23

mbasmanova reviewed Aug 22, 2019

View reviewed changes

yingsu00 force-pushed the repartition branch from dc47363 to f2bda4b Compare October 9, 2019 08:32

wenleix reviewed Oct 9, 2019

View reviewed changes

Ying Su added 3 commits October 9, 2019 13:58

Reduce HiveQueryRunner logs

afac6dd

HiveQueryRunner produced a lot of logs and caused Travis to fail with "The job exceeded the maximum log length" error. This commit increases the logging level for com.facebook.presto.event from INFO to WARN to reduce the amount of logs produced.

Add experimental.optimized-repartitioning configuration property

3d74d11

Make ArrayAllocator implementations public

5f38ce7

Operators that need to use BlockFlattener need to instantiate an ArrayAllocator. However the ArrayAllocators were package private and can't be accessed by operators not under the root folder of operator. This commit makes them public.

Ying Su added 5 commits October 9, 2019 16:13

Optimize repartitioning for LONG_DECIMAL and IPADDRESS types

b239c13

Optimize repartitioning for INTEGER type

336ad64

Optimize repartitioning for SMALLINT type

e756fce

Optimize repartitioning for BOOLEAN type

7f85e57

yingsu00 force-pushed the repartition branch from f2bda4b to ce77537 Compare October 10, 2019 00:12

Ying Su added 7 commits October 9, 2019 21:14

Optimize repartitioning for VARCHAR type

87af591

Optimize repartitioning for ARRAY type

cce69a3

Optimize repartitioning for MAP type

0962ba2

Optimize repartitioning for ROW type

1ad09fc

Add unit test for OptimizedPartitionedOutputOperator

5280506

Add integration tests for OptimizedPartitionedOutputOperator

2612011

yingsu00 force-pushed the repartition branch from ce77537 to 64e8564 Compare October 10, 2019 04:19

wenleix approved these changes Oct 10, 2019

View reviewed changes

wenleix merged commit 04a346d into prestodb:master Oct 10, 2019

rongrong mentioned this pull request Oct 16, 2019

Add release note for 0.228 #13562

Merged

2 tasks

kewang1024 mentioned this pull request Oct 6, 2023

Type mismatch between Presto OptimizedPartitionedOutputOperator and Velox's Exchange Operator #21064

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize PartitionedOutputOperator #13183

Optimize PartitionedOutputOperator #13183

yingsu00 commented Aug 5, 2019 •

edited

mbasmanova commented Aug 6, 2019

mbasmanova left a comment

mbasmanova left a comment

mbasmanova Aug 9, 2019

yingsu00 Aug 13, 2019

mbasmanova Aug 14, 2019

yingsu00 Aug 14, 2019

mbasmanova Aug 14, 2019

yingsu00 Aug 15, 2019

wenleix Sep 3, 2019

mbasmanova Sep 3, 2019

yingsu00 Sep 6, 2019

mbasmanova Aug 9, 2019

wenleix Sep 10, 2019

mbasmanova left a comment

mbasmanova left a comment

mbasmanova left a comment

wenleix Oct 9, 2019

yingsu00 Oct 9, 2019

yingsu00 Oct 9, 2019 •

edited

wenleix left a comment

wenleix left a comment

oerling commented Oct 9, 2019 via email

yingsu00 commented Oct 10, 2019

yingsu00 commented Oct 10, 2019

wenleix commented Oct 10, 2019

junyi1313 commented Nov 2, 2021

Optimize PartitionedOutputOperator #13183

Optimize PartitionedOutputOperator #13183

Conversation

yingsu00 commented Aug 5, 2019 • edited

Updates

TODO

Previous WIP PR

mbasmanova commented Aug 6, 2019

mbasmanova left a comment

Choose a reason for hiding this comment

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbasmanova left a comment

Choose a reason for hiding this comment

mbasmanova left a comment

Choose a reason for hiding this comment

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yingsu00 Oct 9, 2019 • edited

Choose a reason for hiding this comment

wenleix left a comment

Choose a reason for hiding this comment

wenleix left a comment

Choose a reason for hiding this comment

oerling commented Oct 9, 2019 via email

yingsu00 commented Oct 10, 2019

yingsu00 commented Oct 10, 2019

wenleix commented Oct 10, 2019

junyi1313 commented Nov 2, 2021

yingsu00 commented Aug 5, 2019 •

edited

yingsu00 Oct 9, 2019 •

edited