New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix char value padding with respect to UTF-16 surrogate pairs #195

Merged
merged 1 commit into from Feb 9, 2019

Conversation

4 participants
@findepi
Copy link
Member

findepi commented Feb 8, 2019

Previously, the code was using String#length to determine how many
space characters needs to be added at the end of the returned value.
However, this method calculates text length incorrectly when text
contains one or more supplementary characters, i.e. characters encoded
as surrogate pairs in UTF-16.

@findepi findepi requested a review from electrum Feb 8, 2019

@cla-bot cla-bot bot added the cla-signed label Feb 8, 2019

@martint

This comment has been minimized.

Copy link
Member

martint commented Feb 8, 2019

Can you add an explanation in the commit message describing what’s the issue?

@findepi findepi force-pushed the findepi:findepi/master/fix-char-value-padding-with-respect-to-utf-16-surrogate-pairs-e5db07 branch from 49ff9ec to e87eb01 Feb 8, 2019

@findepi

This comment has been minimized.

Copy link
Member Author

findepi commented Feb 8, 2019

@martint added this:

Previously, the code was using `String#length` to determine how many
space characters needs to be added at the end of the returned value.
However, this method calculates text length incorrectly when text
contains one or more supplementary characters, i.e. characters encoded
as surrogate pairs in UTF-16.
@martint

martint approved these changes Feb 8, 2019

@@ -78,9 +79,9 @@ public Object getObjectValue(ConnectorSession session, Block block, int position
}

StringBuilder builder = new StringBuilder(length);
String value = block.getSlice(position, 0, block.getSliceLength(position)).toStringUtf8();
builder.append(value);
for (int i = value.length(); i < length; i++) {

This comment has been minimized.

@dain

dain Feb 8, 2019

Member

Not sure it matters, but you can use value.codePointCount(0, value.size()).

This comment has been minimized.

@findepi

findepi Feb 9, 2019

Author Member

Nice! I didn't know.
I will stay with Slice-provided utility, as we use this in other places as well (here we are fortunate to have a slice at hand)

This comment has been minimized.

Fix char value padding with respect to UTF-16 surrogate pairs
Previously, the code was using `String#length` to determine how many
space characters needs to be added at the end of the returned value.
However, this method calculates text length incorrectly when text
contains one or more supplementary characters, i.e. characters encoded
as surrogate pairs in UTF-16.

@findepi findepi force-pushed the findepi:findepi/master/fix-char-value-padding-with-respect-to-utf-16-surrogate-pairs-e5db07 branch from e87eb01 to b7abfe5 Feb 9, 2019

@findepi findepi merged commit 04198c4 into prestosql:master Feb 9, 2019

1 check passed

verification/cla-signed
Details

@findepi findepi deleted the findepi:findepi/master/fix-char-value-padding-with-respect-to-utf-16-surrogate-pairs-e5db07 branch Feb 9, 2019

@findepi findepi referenced this pull request Feb 9, 2019

Closed

Release notes for 303 #181

5 of 6 tasks complete

@electrum electrum added this to the 303 milestone Feb 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment