New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count max code point once per batch for slice readers #11836

Merged
merged 1 commit into from Nov 2, 2018

Conversation

Projects
None yet
5 participants
@highker
Contributor

highker commented Nov 1, 2018

SliceStreamReader::computeTruncatedLength computes truncated length for
a slice. Type length is calculated per function call, even it is
redundant for slice reader reading every row given the type is fixed.
This patch moves the type length calculation outside the batch loop and
only computes the truncated length within the loop.

Resolves #11694

@dain dain self-requested a review Nov 1, 2018

@@ -192,7 +197,10 @@ public Block readBlock(Type type)
dataStream.next(data, offset, offset + currentLength);
// adjust offsetVector with truncated length
int truncatedLength = computeTruncatedLength(slice, offset, currentLength, type);
int truncatedLength = currentLength;
if (maxCodePointCount >= 0) {

This comment has been minimized.

@dain

dain Nov 1, 2018

Contributor

Maybe make this maxCodePointCount >= 0 && currentLength > maxCodePointCount? I'm thinking it might help types that have a max length but the code never sees multibyte data.

This comment has been minimized.

@highker

highker Nov 1, 2018

Contributor

good point

VarcharType varcharType = (VarcharType) type;
return varcharType.isUnbounded() ? -1 : varcharType.getLengthSafe();
}
else if (isCharType(type)) {

This comment has been minimized.

@dain

dain Nov 1, 2018

Contributor

The else is redundant here (due to the return above)

@highker highker force-pushed the highker:truncate branch from 60362ba to 7a3a54f Nov 1, 2018

@highker highker changed the title from Count max code point once per batch for SliceDirectStreamReader to Count max code point once per batch for slice readers Nov 1, 2018

@highker highker force-pushed the highker:truncate branch from 7a3a54f to 7084ef0 Nov 1, 2018

// truncate the characters and then remove the trailing white spaces
truncatedLength = byteCountWithoutTrailingSpace(slice, offset, length, ((CharType) type).getLength());
truncatedLength = byteCountWithoutTrailingSpace(slice, offset, length, maxCodePointCount);

This comment has been minimized.

@dain

dain Nov 1, 2018

Contributor

You can just return directly from here... which would make the else redundant (and return directly from the next block).

Count max code point once per batch for slice readers
SliceStreamReader::computeTruncatedLength computes truncated length for
a slice. Type length is calculated per function call, even it is
redundant for slice reader reading every row given the type is fixed.
This patch moves the type length calculation outside the batch loop and
only computes the truncated length within the loop.

@highker highker force-pushed the highker:truncate branch from 7084ef0 to 5701199 Nov 1, 2018

@@ -83,7 +83,7 @@ public void testVarcharTypeWithoutNulls()
break;
}
Block block = reader.readBlock(BIGINT, 0);
Block block = reader.readBlock(VARCHAR, 0);

This comment has been minimized.

@wenleix

wenleix Nov 2, 2018

Contributor

Curious: Why this has to change?

This comment has been minimized.

@highker

highker Nov 2, 2018

Contributor

It was wrong before.

This comment has been minimized.

@yingsu00

yingsu00 Nov 2, 2018

Contributor

Ah. Thanks for the catch.

@dain

dain approved these changes Nov 2, 2018

Looks good

@dain dain merged commit 744be58 into prestodb:master Nov 2, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment