Skip to content

Conversation

@pytorchbot
Copy link
Collaborator

Summary

We're seeing crashes on Android when running XNNPACK-delegated models. I tracked it down to a bug in the alignment calculation for weight cache memory. To make the calculation, it casts the void* to a (signed) intptr_t. When the address is in the upper half of the address space, it becomes negative. This causes the modulo to return a negative value and increment the address too much - leading to out of bounds access.

void* maybe_aligned_space = data_container.data();
void* aligned_space = (void*)((intptr_t)maybe_aligned_space + 64 -
(intptr_t)maybe_aligned_space % 64);

Walking through the numbers I captured in #14831:

  • The raw (unaligned) address of the data buffer is 0xb40000763d4bfa90.
  • The target alignment is 64 bytes.
  • Casting the address to intptr_t gives -5476376639047992688.
    • Mod 64 is -48.
    • The total offset applied is 64 - (-48) = 112.
  • Since the allocation size is N + 64, increasing the start by 112 means the new region extends 48 bytes past the end of the allocation.

To resolve this, I replaced the alignment code with a call to std::align. Casing to uintptr_t also resolves it, but using the standard implementation seems less error prone.

Test plan

I've validated that the repro in #14831 does not crash with this change.

### Summary
We're seeing crashes on Android when running XNNPACK-delegated models. I
tracked it down to a bug in the alignment calculation for weight cache
memory. To make the calculation, it casts the void* to a (signed)
intptr_t. When the address is in the upper half of the address space, it
becomes negative. This causes the modulo to return a negative value and
increment the address too much - leading to out of bounds access.

https://github.com/pytorch/executorch/blob/cc6cb837d6ac92f52a2d30a405900caf115f0556/backends/xnnpack/runtime/XNNWeightsCache.cpp#L166-L168

Walking through the numbers I captured in
#14831:
* The raw (unaligned) address of the data buffer is 0xb40000763d4bfa90.
* The target alignment is 64 bytes.
* Casting the address to intptr_t gives -5476376639047992688.
  * Mod 64 is -48.
  * The total offset applied  is 64 - (-48) = 112.
* Since the allocation size is N + 64, increasing the start by 112 means
the new region extends 48 bytes past the end of the allocation.

To resolve this, I replaced the alignment code with a call to
std::align. Casing to uintptr_t also resolves it, but using the standard
implementation seems less error prone.

### Test plan
I've validated that the repro in
#14831 does not crash with
this change.

(cherry picked from commit 7421646)
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15090

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

⏳ No Failures, 4 Pending

As of commit 7b1103a with merge base e0dda90 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 14, 2025
@mergennachin mergennachin self-requested a review October 14, 2025 02:27
@GregoryComer GregoryComer merged commit 2897bde into release/1.0 Oct 14, 2025
121 of 124 checks passed
@GregoryComer GregoryComer deleted the cherry-pick-15039-by-pytorch_bot_bot_ branch October 14, 2025 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants