-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
BUG: Fix unaligned memory access in npy_memchr.
#21117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks! We could remove the alignment required flag if we do this change, I think? I have to admit, I am a bit curious if this is worthwhile and whether it might slow down things a bit. Or is this actually incorrect and not just the sanitizer being very strict? If you are up for it, I would be interested whether the |
|
I'm pretty busy right now but can probably pick this up sometime in the next couple of weeks. I wouldn't be surprised if this actually makes things a bit faster on largeish arrays. As thing stands if the array isn't uint-aligned then you would be repeatedly performing unaligned reads in the coarse-grained loop, and these may or may not come with a performance penalty (see e.g. https://stackoverflow.com/questions/70964604). We can also replace the modulo calculation with Edit: another thing to think about.
|
Well, we are also interested in random arrays (i.e. we should aim to do well for short loops) – OTOH short loops have a lot of other overheads, so it may not matter there. The % will be optimized away, no problem. Possibly just random, or the additional code makes gcc do some different optimization choices. The more "mixed" cases, seem to barely notice though. |
|
Thanks! In the meantime, I found a reference for a real situation where this UB can result in a real crash: https://blog.quarkslab.com/unaligned-accesses-in-cc-what-why-and-solutions-to-do-it-properly.html |
|
@saran-t we test on platforms that do not support unaligned access. This code path is only taken when that define evaluates to 0, which is only the case on selected CPUs/platforms. So we know that this is OK. Unless, the compiler goes one step further and optimizes this for SIMDs that have higher alignment requirements somehow. |
|
Ah, fair enough. I thought |
|
Considering that my timings were a bit mixed, I am somewhat tempted to close this. At least unless we see it as a speed-enhancement on CPUs that do not support this unaligned access. Or is that sanitizer warning problematic? (since I don't think the sanitizer is right to worry here.) |
|
The issue with the sanitizer firing is that it means that we cannot use UBsan to detect bugs in our own code that depends on NumPy. An alternative "fix" would be to add |
|
What is the status of this? |
|
Let me make a PR adding the attribute on clang for now, so we can backport that. I do think there may be value in this (or similar changes) in terms of optimizing this code. But, right now, as far as I understand, the code is correct; and clang is only half correct (in the sense that it affects some platforms, but is OK on the one where it is run). And since I am not sure that this might not slow down certain common code paths, I think optimization would be nice with benchmarks and a bit more care. |
Clangs sanitizer reports unaligned access here, which is correct but intentional. It may well be that the code would be better of trying to avoid this unaligned access (and rather vectorizing harder). But, this is a bit of a tricky choice, since we have to optimize for different use-cases (in particular very short scans may be interesting). So changing this would best be done together with some more careful benchmarks. See also numpygh-21117, which introduced manual loop unrolling to avoid the unaligned access. Closes numpygh-21116
Clangs sanitizer reports unaligned access here, which is correct but intentional. It may well be that the code would be better of trying to avoid this unaligned access (and rather vectorizing harder). But, this is a bit of a tricky choice, since we have to optimize for different use-cases (in particular very short scans may be interesting). So changing this would best be done together with some more careful benchmarks. See also numpygh-21117, which introduced manual loop unrolling to avoid the unaligned access. Closes numpygh-21116
Clangs sanitizer reports unaligned access here, which is correct but intentional. It may well be that the code would be better of trying to avoid this unaligned access (and rather vectorizing harder). But, this is a bit of a tricky choice, since we have to optimize for different use-cases (in particular very short scans may be interesting). So changing this would best be done together with some more careful benchmarks. See also numpygh-21117, which introduced manual loop unrolling to avoid the unaligned access. Closes numpygh-21116
Fixes #21116