Skip to content

Sanitize all multibyte chars in HpackHuffmanEncoder#13546

Merged
normanmaurer merged 5 commits intonetty:4.1from
Lincong:hpack_huffman_encoder_sanitization
Aug 17, 2023
Merged

Sanitize all multibyte chars in HpackHuffmanEncoder#13546
normanmaurer merged 5 commits intonetty:4.1from
Lincong:hpack_huffman_encoder_sanitization

Conversation

@Lincong
Copy link
Copy Markdown
Contributor

@Lincong Lincong commented Aug 13, 2023

Motivation:

To fix the following problem: during encoding, Huffman encoded headers are sanitized differently compared to non-Huffman encoded headers in HpackEncoder. As a result, characters with code point values higher than 0xFF which could be decoded to an unexpected control chars instead of '?'.

Modification:

Change how each character is sanitized in HpackHuffmanEncoder. Specifically, use the new approach [1] to replace the old approach [2].

[1] AsciiString.c2b(aChar) & 0xFF
[2] aChar & 0xFF

Expected output is 0 if aChar > 0xFF. But with the old approach, if aChar == 0x4E01, 0x4E01 & 0xFF == 1 which is incorrect.

Result:
All characters with code point values higher than 0xFF are decoded to ?s regardless of whether Huffman encoding was used during encoding.

Fixes #13540

@Lincong Lincong requested a review from bryce-anderson August 15, 2023 05:11
@bryce-anderson
Copy link
Copy Markdown
Contributor

Pr looks good to me although CodeQL is complaining, all but certainly about some extra whitespace. 😄

Can we tighten up the commit message a touch: I think the core problem was that headers that ended up Huffman encoded were sanitized differently, specifically chars with values higher than 0xFF which could result in unexpected control chars instead of the '?'. In both cases the headers are corrupted: one is just safer than the other.

As an aside and as your test demonstrates, we can still emit control chars we just have to be explicit about them.

Copy link
Copy Markdown
Contributor

@bryce-anderson bryce-anderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Lincong!

@Lincong
Copy link
Copy Markdown
Contributor Author

Lincong commented Aug 16, 2023

Thanks @bryce-anderson for the suggestion to improve the commit message. I have updated it and PTAL before we merge this PR!

Thanks @normanmaurer for fixing style violation (here). I have not completely set up my dev environment yet so that some style violation cannot be caught locally. I will make sure I am able to catch and fix such issues in my future PRs.

@Lincong Lincong requested a review from bryce-anderson August 16, 2023 17:56
@normanmaurer normanmaurer added this to the 4.1.97.Final milestone Aug 17, 2023
@normanmaurer normanmaurer merged commit aa07be4 into netty:4.1 Aug 17, 2023
normanmaurer added a commit that referenced this pull request Aug 17, 2023
Motivation:

To fix the following problem: during encoding, Huffman encoded headers
are sanitized differently compared to non-Huffman encoded headers in
`HpackEncoder`. As a result, characters with code point values higher
than 0xFF which could be decoded to an unexpected control chars instead
of `'?'`.

Modification:

Change how each character is sanitized in `HpackHuffmanEncoder`.
Specifically, use the new approach [1] to replace the old approach [2].

[1] `AsciiString.c2b(aChar) & 0xFF`
[2] `aChar & 0xFF`

Expected output is `0` if `aChar > 0xFF`. But with the old approach, if
`aChar == 0x4E01`, `0x4E01 & 0xFF == 1` which is incorrect.

Result:
All characters with code point values higher than 0xFF are decoded to
`?`s regardless of whether Huffman encoding was used during encoding.

Fixes #13540

---------

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
@Lincong Lincong deleted the hpack_huffman_encoder_sanitization branch August 17, 2023 04:32
@Lincong
Copy link
Copy Markdown
Contributor Author

Lincong commented Aug 17, 2023

Thanks @normanmaurer for merging this PR!

Do you know an ETA for 4.1.97.Final release (including this change)?

@normanmaurer
Copy link
Copy Markdown
Member

@Lincong I think sometime next week

@Lincong
Copy link
Copy Markdown
Contributor Author

Lincong commented Aug 17, 2023

@Lincong I think sometime next week

I am not sure if this is something appropriate to ask for, but it will be super nice if 4.1.97.Final release can land by the end of next Wed (8/23/2023). Thanks! @normanmaurer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected characters are synthesized by DefaultHttp2HeadersDecoder when Huffman encoding is enabled in HpackEncoder

4 participants