-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Use alias scope to relax waitcounts for LDS DMA #75974
Conversation
LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a psuedo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias analysis. Fixes: SWDEV-433427
Changed getVmemWaitEventType() to use mayWriteLDSThroughDMA instead of isLDSDMA as this is more sound and added a comment.
LDA DMA loads increase VMCNT and a load from the LDS stored must wait on this counter to only read memory after it is written. Wait count insertion pass does not track memory dependencies, it tracks register dependencies. To model the LDS dependency a pseudo register is used in the scoreboard, acting like if LDS DMA writes it and LDS load reads it. This patch adds 8 more pseudo registers to use for independent LDS locations if we can prove they are disjoint using alias scope info. Fixes: SWDEV-433427
✅ With the latest revision this PR passed the C/C++ code formatter. |
One thing to note: this alias.scope I am creating myself in the module LDS lowering, so I do exactly know what to expect. And then since there is this module LDS lowering even if any alias scope would be created before (which never happens, much less for an intrinsic call) it is already lost. It is lost along with the memory objects deleted by the lowering. That is the whole point of creating alias.scope metadata during the lowering: we are putting all module LDS into a single structure, so no AA will ever disambiguate it w/o alias scope info. In this situation I am the sole creator of the metadata, instructions carrying it, memory object accessed, and the consumer of this metadata. At -O0 there will be no LDS lowering, but there will be no AA either. I do not see how to exploit it on practice. One other thing to note here: there is also !noalias metadata generated in the very same place. I do not care about this because I am really searching for a store into this memory, which is a scope. When I was writing code to generate this metadata I kept in mind exactly a scenario similar to this. |
This is the place I am creating it: https://reviews.llvm.org/D108315 |
Ping |
LDA DMA loads increase VMCNT and a load from the LDS stored must
wait on this counter to only read memory after it is written.
Wait count insertion pass does not track memory dependencies, it
tracks register dependencies. To model the LDS dependency a pseudo
register is used in the scoreboard, acting like if LDS DMA writes
it and LDS load reads it.
This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias scope info.
Fixes: SWDEV-433427