Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ORC double and float performance with batch reads #465

Merged
merged 1 commit into from Mar 13, 2019

Conversation

2 participants
@dain
Copy link
Member

dain commented Mar 12, 2019

Speedup:

Benchmark                                  Speedup
BenchmarkStreamReaders.readDoubleNoNull      3.93x
BenchmarkStreamReaders.readDoubleWithNull    1.57x
BenchmarkStreamReaders.readFloatNoNull       5.17x
BenchmarkStreamReaders.readFloatWithNull     1.70x

Before:

Benchmark                                  Mode  Cnt  Score   Error  Units
BenchmarkStreamReaders.readDoubleNoNull    avgt   60  0.228 ± 0.004   s/op
BenchmarkStreamReaders.readDoubleWithNull  avgt   60  0.173 ± 0.002   s/op
BenchmarkStreamReaders.readFloatNoNull     avgt   60  0.212 ± 0.003   s/op
BenchmarkStreamReaders.readFloatWithNull   avgt   60  0.179 ± 0.003   s/op

After:

Benchmark                                  Mode  Cnt  Score   Error  Units
BenchmarkStreamReaders.readDoubleNoNull    avgt   60  0.058 ± 0.001   s/op
BenchmarkStreamReaders.readDoubleWithNull  avgt   60  0.110 ± 0.002   s/op
BenchmarkStreamReaders.readFloatNoNull     avgt   60  0.041 ± 0.001   s/op
BenchmarkStreamReaders.readFloatWithNull   avgt   60  0.105 ± 0.001   s/op

@cla-bot cla-bot bot added the cla-signed label Mar 12, 2019

@martint
Copy link
Member

martint left a comment

Just a couple of minor comments. I only commented on the double version, but they apply to both.

throws IOException
{
BlockBuilder blockBuilder = type.createBlockBuilder(null, items);
for (int batchBase = 0; batchBase < items; batchBase += 128) {

This comment has been minimized.

@martint

martint Mar 12, 2019

Member

Add constant for 128

int batchSize = min(items - batchBase, 128);

int nonNullCount = 0;
for (int i = 0; i < batchSize; i++) {

This comment has been minimized.

@martint

martint Mar 12, 2019

Member
for (int i = batchBase; i < batchBase + batchSize; i++)  {
  ... !isNull[i] ... 
}

@dain dain force-pushed the dain:orc-doubl-perf branch from b1a8298 to ba01116 Mar 12, 2019

@martint
Copy link
Member

martint left a comment

Looks good, but check travis

Improve ORC double and float performance with batch reads
Speedup:

Benchmark                                  Speedup
BenchmarkStreamReaders.readDoubleNoNull      3.93x
BenchmarkStreamReaders.readDoubleWithNull    1.57x
BenchmarkStreamReaders.readFloatNoNull       5.17x
BenchmarkStreamReaders.readFloatWithNull     1.70x

Before:

Benchmark                                  Mode  Cnt  Score   Error  Units
BenchmarkStreamReaders.readDoubleNoNull    avgt   60  0.228 ± 0.004   s/op
BenchmarkStreamReaders.readDoubleWithNull  avgt   60  0.173 ± 0.002   s/op
BenchmarkStreamReaders.readFloatNoNull     avgt   60  0.212 ± 0.003   s/op
BenchmarkStreamReaders.readFloatWithNull   avgt   60  0.179 ± 0.003   s/op

After:

Benchmark                                  Mode  Cnt  Score   Error  Units
BenchmarkStreamReaders.readDoubleNoNull    avgt   60  0.058 ± 0.001   s/op
BenchmarkStreamReaders.readDoubleWithNull  avgt   60  0.110 ± 0.002   s/op
BenchmarkStreamReaders.readFloatNoNull     avgt   60  0.041 ± 0.001   s/op
BenchmarkStreamReaders.readFloatWithNull   avgt   60  0.105 ± 0.001   s/op

@dain dain force-pushed the dain:orc-doubl-perf branch from ba01116 to b0cc150 Mar 13, 2019

@dain dain merged commit 63ec248 into prestosql:master Mar 13, 2019

1 check passed

verification/cla-signed
Details

@dain dain deleted the dain:orc-doubl-perf branch Mar 13, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.