fix: Harden SIMD UTF-8 tail-copy bounds checks#26797
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Unfortunately we cannot look at PRs without the CLA signed, are you able to do so? Thanks! |
I've signed the CLA. |
|
Please fix failing tests prior to us reviewing this PR. |
|
Hi, I've merged the latest main to fix the CI failures (the previous failures were caused by a stale |
tonyliaoss
left a comment
There was a problem hiding this comment.
The changes themselves look fine. However the title of this PR seems weird. There is no invocation of exec() anywhere? This PR is more about implementing more bounds checks in our SIMD utf8 implementation.
|
Could you rephrase the title of this PR to something a bit more appropriate? |
I've updated the title/description to appropriately reflect the changes. |
Automated security fix generated by Orbis Security AI
0f8474b to
69f9c26
Compare
|
Hello, why was this PR closed? |
|
The PR was merged (with you as the credited author). Thanks for the contribution! The presentation of merge is a bit confusing. The reason is that Protobuf's source of truth is not github: all changes authored by Googlers go against our internal repo directly and are mirrored out to Github. When we accept PRs, the copybara bot imports it for the final internal review (and sometimes final edits, including merge conflicts with the internal state which has some additional tests, etc). It is then committed there, and then it is mirrored back out and the copybara bot closes the PR and links the commit that managed to round trip. Unfortunately github doesn't have a way for us to present this as the PR itself being merged instead of closed, which makes it look odd in that way. |
Summary
This PR hardens the SIMD UTF-8 validation tail-copy paths in
third_party/utf8_range.The AVX2 and SSE implementations copy the remaining input bytes into fixed-size stack buffers before processing the final partial block. This change makes the copy length explicitly derived from
size_tinput length arithmetic and clamps it to the destination buffer size beforememcpy.Changes
third_party/utf8_range/lemire-avx2.cthird_party/utf8_range/lemire-sse.cthird_party/utf8_range/main.clentosize_tafter rejecting negative values.src_lenconsistently for loop and tail-length calculations.Security impact
This should be treated as defensive hardening rather than a demonstrated critical vulnerability. The goal is to make the fixed-size tail-buffer invariant explicit and prevent future changes from accidentally turning the tail copy into an out-of-bounds write.
Verification
Automated security fix by OrbisAI Security