-
Notifications
You must be signed in to change notification settings - Fork 12.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nest array initialization is not optimized #58625
Comments
Similarly, if I do |
No changes: https://godbolt.org/z/o1xxd1 Also for the uninitialized case: https://godbolt.org/z/EEh9ae This one is surprising to me. |
The uninitialized case seems to be due to a bug in the location-aware MSSA clobber walker, which currently bails on phi starting access. |
Upstream patch: https://reviews.llvm.org/D98557 |
Upstream fix: llvm/llvm-project@b2f933a Assuming no phase ordering issues, we should be generating optimal code for both the zero-initialization and the uninitialized case after this. |
After #87570 the uninitialized case results in:
The memcpy is eliminated, but we keep around a dead loop. The LoopDeletion pass runs before MemCpyOpt. And the zero-initialized results in:
The memcpy is gone here as well, but we still perform memsets in a loop, instead of one large memset. I believe LoopIdiomRecognize would handle this, but it also runs before MemCpyOpt. |
The uninit case looks good now: https://godbolt.org/z/vfo9PM1ca The init case still has memsets that could be combined: https://godbolt.org/z/eba6GT5xq |
See the following code:
Ideally they should generate the same code as
[[i32; 1000]; 1000]
is essentially just[i32; 1_000_000]
. However, the first function does onememset
to set a[i32; 1000]
on the stack, then usememcpy
to generate the big array, while the second just uses a singlememset
.The code generated from the first is way larger than the second, and I would also expect it to be much slower given its repeatedly invoking
memcpy
.The text was updated successfully, but these errors were encountered: