-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[AMDGPU] Support true16 spill restore with sram-ecc #165320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1823,6 +1823,16 @@ void SIRegisterInfo::buildSpillLoadStore( | |
| } | ||
| } | ||
|
|
||
| Register FinalValueReg = ValueReg; | ||
| if (LoadStoreOp == AMDGPU::SCRATCH_LOAD_USHORT_SADDR) { | ||
| // If we are loading 16-bit value with SRAMECC endabled we need a temp | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't this also an issue for the 8-bit cases?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We do not have V8 spills. |
||
| // 32-bit VGPR to load and extract 16-bits into the final register. | ||
| ValueReg = | ||
| RS->scavengeRegisterBackwards(AMDGPU::VGPR_32RegClass, MI, false, 0); | ||
| SubReg = ValueReg; | ||
| IsKill = false; | ||
| } | ||
|
|
||
| MachinePointerInfo PInfo = BasePtrInfo.getWithOffset(RegOffset); | ||
| MachineMemOperand *NewMMO = | ||
| MF->getMachineMemOperand(PInfo, MMO->getFlags(), RemEltSize, | ||
|
|
@@ -1863,6 +1873,17 @@ void SIRegisterInfo::buildSpillLoadStore( | |
| MIB.addImm(0); // swz | ||
| MIB.addMemOperand(NewMMO); | ||
|
|
||
| if (FinalValueReg != ValueReg) { | ||
| // Extract 16-bit from the loaded 32-bit value. | ||
| ValueReg = getSubReg(ValueReg, AMDGPU::lo16); | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically I can even use the same SCRATCH_LOAD_SHORT_D16_SADDR_t16 as w/o sramecc, but I would need to chose lo16 or hi16 here. I do not think this is really needed.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. It is simpler the way you have it now. |
||
| MIB = BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_MOV_B16_t16_e64)) | ||
| .addReg(FinalValueReg, getDefRegState(true)) | ||
| .addImm(0) | ||
| .addReg(ValueReg, getKillRegState(true)) | ||
| .addImm(0); | ||
| ValueReg = FinalValueReg; | ||
| } | ||
|
|
||
| if (!IsAGPR && NeedSuperRegDef) | ||
| MIB.addReg(ValueReg, RegState::ImplicitDefine); | ||
|
|
||
|
|
@@ -2505,7 +2526,9 @@ bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI, | |
| unsigned Opc; | ||
| if (MI->getOpcode() == AMDGPU::SI_SPILL_V16_RESTORE) { | ||
| assert(ST.enableFlatScratch() && "Flat Scratch is not enabled!"); | ||
| Opc = AMDGPU::SCRATCH_LOAD_SHORT_D16_SADDR_t16; | ||
| Opc = ST.d16PreservesUnusedBits() | ||
| ? AMDGPU::SCRATCH_LOAD_SHORT_D16_SADDR_t16 | ||
| : AMDGPU::SCRATCH_LOAD_USHORT_SADDR; | ||
| } else { | ||
| Opc = MI->getOpcode() == AMDGPU::SI_BLOCK_SPILL_V1024_RESTORE | ||
| ? AMDGPU::SCRATCH_LOAD_BLOCK_SADDR | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no check for sramecc enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SCRATCH_LOAD_USHORT_SADDR is only used with sramecc, that was checked earlier.