Fix ByteBufUtil#writeUtf8 subsequence split surrogate edge-case bug #9437

njhill · 2019-08-08T02:30:24Z

Motivation

#9224 introduced overloads of ByteBufUtil#writeUtf8(...) and related methods to operate on a sub-charsequence directly to save having to allocate substrings, but it missed an edge case where the subsequence does not extend to the end of the CharSequence and the last char in the sequence is a high surrogate.

Due to the catch-IndexOutOfBoundsException optimization that avoids an additional bounds check, it would be possible to read past the specified end char and successfully decode a surrogate pair that would otherwise result in a '?' byte being written.

Modifications

Check for end-of-subsequence before reading next char after a high surrogate is encountered in the writeUtf8(AbstractByteBuf,int,CharSequence,int,int) and utf8BytesNonAscii methods
Add unit test for this edge case

Result

Bug is fixed.

This removes the bounds-check-avoidance optimization but it does not appear to have a measurable impact on benchmark results, including when the char sequence contains many surrogate pairs (which should be rare in any case).

Motivation netty#9224 introduced overrides of ByteBufUtil#writeUtf8(...) and related methods to operate on a sub-CharSequence directly to save having to allocate substrings, but it missed an edge case where the subsequence does not extend to the end of the CharSequence and the last char in the sequence is a high surrogate. Due to the catch-IndexOutOfBoundsException optimization that avoids an additional bounds check, it would be possible to read past the specified end char index and successfully decode a surrogate pair which would otherwise result in a '?' byte being written. Modifications - Check for end-of-subsequence before reading next char after a high surrogate is encountered in the writeUtf8(AbstractByteBuf,int,CharSequence,int,int) and utf8BytesNonAscii methods - Add unit test for this edge case Result Bug is fixed. This removes the bounds-check-avoidance optimization but it does not appear to have a measurable impact on benchmark results, including when the char sequence contains many surrogate pairs (which should be rare in any case).

netty-bot · 2019-08-08T02:34:06Z

Can one of the admins verify this patch?

normanmaurer · 2019-08-08T05:50:37Z

@netty-bot test this please

normanmaurer · 2019-08-09T07:47:45Z

@netty-bot test this please

normanmaurer · 2019-08-10T18:54:13Z

@njhill thanks a lot!

…9437) Motivation: #9224 introduced overrides of ByteBufUtil#writeUtf8(...) and related methods to operate on a sub-CharSequence directly to save having to allocate substrings, but it missed an edge case where the subsequence does not extend to the end of the CharSequence and the last char in the sequence is a high surrogate. Due to the catch-IndexOutOfBoundsException optimization that avoids an additional bounds check, it would be possible to read past the specified end char index and successfully decode a surrogate pair which would otherwise result in a '?' byte being written. Modifications: - Check for end-of-subsequence before reading next char after a high surrogate is encountered in the writeUtf8(AbstractByteBuf,int,CharSequence,int,int) and utf8BytesNonAscii methods - Add unit test for this edge case Result: Bug is fixed. This removes the bounds-check-avoidance optimization but it does not appear to have a measurable impact on benchmark results, including when the char sequence contains many surrogate pairs (which should be rare in any case).

njhill added the defect label Aug 8, 2019

Fix comment indentation

a6e33df

normanmaurer merged commit fedcc40 into netty:4.1 Aug 10, 2019

normanmaurer added this to the 4.1.39.Final milestone Aug 10, 2019

normanmaurer self-requested a review August 10, 2019 18:54

njhill deleted the utf8-subseq-fix branch August 11, 2019 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ByteBufUtil#writeUtf8 subsequence split surrogate edge-case bug #9437

Fix ByteBufUtil#writeUtf8 subsequence split surrogate edge-case bug #9437

njhill commented Aug 8, 2019 •

edited

netty-bot commented Aug 8, 2019

normanmaurer commented Aug 8, 2019

normanmaurer commented Aug 9, 2019

normanmaurer commented Aug 10, 2019

Fix ByteBufUtil#writeUtf8 subsequence split surrogate edge-case bug #9437

Fix ByteBufUtil#writeUtf8 subsequence split surrogate edge-case bug #9437

Conversation

njhill commented Aug 8, 2019 • edited

netty-bot commented Aug 8, 2019

normanmaurer commented Aug 8, 2019

normanmaurer commented Aug 9, 2019

normanmaurer commented Aug 10, 2019

njhill commented Aug 8, 2019 •

edited