BUG: Fix unaligned memory access in `npy_memchr`. #21117

saran-t · 2022-02-24T21:52:09Z

seberg · 2022-02-24T22:52:49Z

Thanks! We could remove the alignment required flag if we do this change, I think? I have to admit, I am a bit curious if this is worthwhile and whether it might slow down things a bit. Or is this actually incorrect and not just the sanitizer being very strict?

If you are up for it, I would be interested whether the bench_io benchmarks, pick this up: python3 runtests.py --bench-compare main --bench bench_io should work. (I think they use this code for some of the copyto benchmarks.)

saran-t · 2022-02-24T23:23:20Z

I'm pretty busy right now but can probably pick this up sometime in the next couple of weeks. I wouldn't be surprised if this actually makes things a bit faster on largeish arrays. As thing stands if the array isn't uint-aligned then you would be repeatedly performing unaligned reads in the coarse-grained loop, and these may or may not come with a performance penalty (see e.g. https://stackoverflow.com/questions/70964604).

We can also replace the modulo calculation with switch sizeof(unsigned int) and bit shifts, assuming that sizeof(unsigned int) is a power of two (which is a safe assumption on any platform). Since the switch condition is a compile-time constant there shouldn't be any runtime branching. If we do that then I would not expect the change to have any noticeable negative performance impact (famous last words 🙂 ).

Edit: another thing to think about.

If the array size is < sizeof(unsigned int) then this change doesn't actually increase the number of comparisons at all.
If the array is already aligned then this change is a no-op, other than the modulo calculation upfront (which as mentioned, can be further optimised).
If the array is unaligned but its size isn't a multiple of sizeof(unsigned int), then half the time we also don't increase the number of comparisons. If the array is
If the array is large enough then the cost of the initial loop will be comparatively insubstantial.

seberg · 2022-02-25T00:19:34Z

repeatedly performing unaligned reads

Well, we are also interested in random arrays (i.e. we should aim to do well for short loops) – OTOH short loops have a lot of other overheads, so it may not matter there. The % will be optimized away, no problem.
I ran a quick test locally, and annoyingly it is the test that should not notice this change at all that slows down (i.e. where this loop just keeps going) – and the case that should bail out right away once (use the other branch mostly) is much faster...

Possibly just random, or the additional code makes gcc do some different optimization choices. The more "mixed" cases, seem to barely notice though.
May, need to time it more manual if I want results that can be trusted...

saran-t · 2022-02-25T00:26:00Z

Thanks! In the meantime, I found a reference for a real situation where this UB can result in a real crash: https://blog.quarkslab.com/unaligned-accesses-in-cc-what-why-and-solutions-to-do-it-properly.html

seberg · 2022-02-25T00:29:36Z

@saran-t we test on platforms that do not support unaligned access. This code path is only taken when that define evaluates to 0, which is only the case on selected CPUs/platforms. So we know that this is OK. Unless, the compiler goes one step further and optimizes this for SIMDs that have higher alignment requirements somehow.

saran-t · 2022-02-25T00:30:22Z

Ah, fair enough. I thought NPY_ALIGNMENT_REQUIRED was a property of the array but on second reading that's clearly a global macro.

seberg · 2022-03-07T21:17:38Z

Considering that my timings were a bit mixed, I am somewhat tempted to close this. At least unless we see it as a speed-enhancement on CPUs that do not support this unaligned access.

Or is that sanitizer warning problematic? (since I don't think the sanitizer is right to worry here.)

saran-t · 2022-03-07T21:40:20Z

The issue with the sanitizer firing is that it means that we cannot use UBsan to detect bugs in our own code that depends on NumPy. An alternative "fix" would be to add __attribute__((no_sanitize("alignment"))) to the function, guarded behind #ifdef __clang__.

charris · 2022-05-05T17:13:07Z

What is the status of this?

seberg · 2022-05-06T13:32:43Z

Let me make a PR adding the attribute on clang for now, so we can backport that.

I do think there may be value in this (or similar changes) in terms of optimizing this code. But, right now, as far as I understand, the code is correct; and clang is only half correct (in the sense that it affects some platforms, but is OK on the one where it is run).

And since I am not sure that this might not slow down certain common code paths, I think optimization would be nice with benchmarks and a bit more care.

Clangs sanitizer reports unaligned access here, which is correct but intentional. It may well be that the code would be better of trying to avoid this unaligned access (and rather vectorizing harder). But, this is a bit of a tricky choice, since we have to optimize for different use-cases (in particular very short scans may be interesting). So changing this would best be done together with some more careful benchmarks. See also numpygh-21117, which introduced manual loop unrolling to avoid the unaligned access. Closes numpygh-21116

BUG: Fix unaligned memory access in npy_memchr.

01f1ac9

Fixes numpy#21116

github-actions bot added the 00 - Bug label Feb 24, 2022

charris added the 09 - Backport-Candidate PRs tagged should be backported label Mar 3, 2022

charris closed this Apr 6, 2022

charris reopened this Apr 6, 2022

seberg closed this May 6, 2022

seberg mentioned this pull request May 6, 2022

MAINT: Mark npy_memchr with no_sanitize("alignment") on clang #21467

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label May 6, 2022

charris mentioned this pull request May 9, 2022

MAINT: Mark npy_memchr with no_sanitize("alignment") on clang #21475

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Fix unaligned memory access in `npy_memchr`. #21117

BUG: Fix unaligned memory access in `npy_memchr`. #21117

Uh oh!

saran-t commented Feb 24, 2022

Uh oh!

seberg commented Feb 24, 2022

Uh oh!

saran-t commented Feb 24, 2022 •

edited

Loading

Uh oh!

seberg commented Feb 25, 2022

Uh oh!

saran-t commented Feb 25, 2022

Uh oh!

seberg commented Feb 25, 2022

Uh oh!

saran-t commented Feb 25, 2022 •

edited

Loading

Uh oh!

seberg commented Mar 7, 2022

Uh oh!

saran-t commented Mar 7, 2022

Uh oh!

charris commented May 5, 2022

Uh oh!

seberg commented May 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

BUG: Fix unaligned memory access in npy_memchr. #21117

BUG: Fix unaligned memory access in npy_memchr. #21117

Uh oh!

Conversation

saran-t commented Feb 24, 2022

Uh oh!

seberg commented Feb 24, 2022

Uh oh!

saran-t commented Feb 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Feb 25, 2022

Uh oh!

saran-t commented Feb 25, 2022

Uh oh!

seberg commented Feb 25, 2022

Uh oh!

saran-t commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Mar 7, 2022

Uh oh!

saran-t commented Mar 7, 2022

Uh oh!

charris commented May 5, 2022

Uh oh!

seberg commented May 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BUG: Fix unaligned memory access in `npy_memchr`. #21117

BUG: Fix unaligned memory access in `npy_memchr`. #21117

saran-t commented Feb 24, 2022 •

edited

Loading

saran-t commented Feb 25, 2022 •

edited

Loading