Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SystemZ] Register clobber in L128 expansion #91437

Closed
nikic opened this issue May 8, 2024 · 7 comments · Fixed by #92105
Closed

[SystemZ] Register clobber in L128 expansion #91437

nikic opened this issue May 8, 2024 · 7 comments · Fixed by #92105

Comments

@nikic
Copy link
Contributor

nikic commented May 8, 2024

Running https://gist.github.com/nikic/974087a15f48f4495f25361bae3015d2 through llc -mtriple=s390x-- gives something like this:

  lg  %r4, 14920(%r4,%r15)            # 16-byte Folded Reload
  lg  %r5, 14928(%r4,%r15)            # 16-byte Folded Reload

Note that %r4 is used in the load offset calculation, but also one of the result registers.

Originally, this is a L128:

  renamable $r2q = L128 $r15d, 14920, killed $r2d :: (load (s128) from %stack.13, align 8)

That gets expanded into two LG, resulting in the register clobber:

  $r2d = LG $r15d, 14920, $r2d :: (load (s128) from %stack.13, align 8)
  $r3d = LG $r15d, 14928, killed $r2d :: (load (s128) from %stack.13, align 8)
@nikic
Copy link
Contributor Author

nikic commented May 9, 2024

It seems like L128 doesn't get produced for "normal" use of i128, but only for spills from the register allocator(?), so it's probably easier to consider this directly in MIR. The following test case is for -run-pass=postrapseudos:

--- |
  target triple = "s390x-unknown-unknown"

  define void @test() {
    ret void
  }

...
---
name:            'test'
body:             |
  bb.0:
    liveins: $r4d, $r15d
    $r4q = L128 $r15d, 14920, killed $r4d
    Return

...

Results ins:

body:             |
  bb.0:
    liveins: $r4d, $r15d
  
    $r4d = LG $r15d, 14920, $r4d
    $r5d = LG $r15d, 14928, killed $r4d
    Return

@nikic
Copy link
Contributor Author

nikic commented May 9, 2024

cc @JonPsson1 @uweigand

A possible fix would be to swap the two instructions in this case. Though one could still have a situation like

    $r4q = L128 $r5d, 14920, killed $r4d

where neither order would be correct. But I assume that can't happen in practice, at least as L128 is used currently.

@JonPsson1
Copy link
Contributor

cc @JonPsson1 @uweigand

A possible fix would be to swap the two instructions in this case. Though one could still have a situation like

    $r4q = L128 $r5d, 14920, killed $r4d

where neither order would be correct. But I assume that can't happen in practice, at least as L128 is used currently.

Looks to me like you meant to write L128 with $r15d (SP) and not $r5d..?

I agree that it seems correct to change the order of the loads, and also add an assert either that the base register is the SP, or that it is not a subreg of the destination quadword reg.

@JonPsson1 JonPsson1 self-assigned this May 13, 2024
@uweigand
Copy link
Member

Looks to me like you meant to write L128 with $r15d (SP) and not $r5d..?

I think @nikic did mean $r5d - if base and index happen to be the two registers that make up the GR128 (in this case $r4q made up of $r4d and $r5d), there is no way to split the load into component loads without a temporary register.

However, I believe there's no need to support this case, as this instruction is only ever used to access spill slots, so at least of of base and index (typically base, but we probably don't have to assume that) will be either the stack or the frame pointer, which cannot be part of any GR128 that is in use.

I think we should just check whether one of base or index overlap the output and adjust the sequence to accommodate. If both base or index overlap, this should just fail an assertion.

@JonPsson1
Copy link
Contributor

I think @nikic did mean $r5d - if base and index happen to be the two registers that make up the GR128 (in this case $r4q made up of $r4d and $r5d), there is no way to split the load into component loads without a temporary register.

Ah, I see the point now. Patch proposed (see above).

@nikic nikic added this to the LLVM 18.X Release milestone May 14, 2024
JonPsson1 added a commit that referenced this issue May 15, 2024
When expanding an L128 (which is used to reload i128) it is
possible that the quadword destination register clobbers an
address register. This patch adds an assertion against the case
where both of the expanded parts clobber the address, and in the
case where one of the expanded parts do so puts it last.

Fixes #91437
@nikic
Copy link
Contributor Author

nikic commented May 15, 2024

/cherry-pick d6ee7e8

llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue May 15, 2024
When expanding an L128 (which is used to reload i128) it is
possible that the quadword destination register clobbers an
address register. This patch adds an assertion against the case
where both of the expanded parts clobber the address, and in the
case where one of the expanded parts do so puts it last.

Fixes llvm#91437

(cherry picked from commit d6ee7e8)
@llvmbot
Copy link
Collaborator

llvmbot commented May 15, 2024

/pull-request #92221

tstellar pushed a commit to llvmbot/llvm-project that referenced this issue May 16, 2024
When expanding an L128 (which is used to reload i128) it is
possible that the quadword destination register clobbers an
address register. This patch adds an assertion against the case
where both of the expanded parts clobber the address, and in the
case where one of the expanded parts do so puts it last.

Fixes llvm#91437

(cherry picked from commit d6ee7e8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment