-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spill/Reload of Constant across CPSCall #11
Comments
Looks like this will be tricky to fix. This is the MBB before expanding the CPSCall, and # Machine code for function foo: IsSSA, TracksLiveness
Function Live Ins: %RBP in %vreg1
BB#0: derived from LLVM BB %cazq
Live Ins: %RBP
%vreg1<def> = COPY %RBP; GR64:%vreg1
%vreg2<def> = MOV64rm %RIP, 1, %noreg, <ga:@ghczmprim_GHCziTypes_True_closure>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg2
%vreg3<def,tied1> = ADD64ri8 %vreg2<tied0>, 2, %EFLAGS<imp-def,dead>; GR64:%vreg3,%vreg2
ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
%R13<def> = COPY %vreg3; GR64:%vreg3
%RBP<def> = COPY %vreg1; GR64:%vreg1
CPSCALLd64 <ga:@bar>, 1337, 2, 1, <regmask>, %RSP<imp-use>, %R13<imp-use>, %RBP<imp-use>, %RSP<imp-def>, %R13<imp-def>, %RBP<imp-def>
ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
%vreg4<def> = COPY %R13; GR64:%vreg4
%vreg5<def> = COPY %RBP; GR64:%vreg5
%R13<def> = COPY %vreg3; GR64:%vreg3
%RBP<def> = COPY %vreg5; GR64:%vreg5
CPSRET 0, %R13, %RBP |
Looks like it happens during initial IR -> DAG construction. What a weird time to do an optimization like this. *** IR Dump After Module Verifier ***
; Function Attrs: naked
define ghccc { i64, i64* } @foo(i64 %__x, i64* %__y) #0 {
cazq:
%lnbeg = ptrtoint i8* @ghczmprim_GHCziTypes_True_closure to i64
%lnbeh = add i64 %lnbeg, 2
%retVals = call ghccc { i64, i64* } ({ i64, i64* } (i64, i64*)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_i64p0i64s.p0f_sl_i64p0i64si64p0i64f({ i64, i64* } (i64, i64*)* @bar, i64 1337, i32 2, i16 1, i64 %lnbeh, i64* %__y)
%lnbeg2 = ptrtoint i8* @ghczmprim_GHCziTypes_True_closure to i64
%lnbeh2 = add i64 %lnbeg2, 2
%updated = insertvalue { i64, i64* } %retVals, i64 %lnbeh2, 0
ret { i64, i64* } %updated
}
=== foo
Initial selection DAG: BB#0 'foo:cazq'
SelectionDAG has 30 nodes:
t0: ch = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
t7: i64 = add GlobalAddress:i64<i8* @ghczmprim_GHCziTypes_True_closure> 0, Constant:i64<2>
t8: i64 = GlobalAddress<{ i64, i64* } (i64, i64*)* @bar> 0
t13: ch,glue = callseq_start t0, TargetConstant:i64<0>, TargetConstant:i64<0>
t15: ch,glue = CopyToReg t13, Register:i64 %R13, t7
t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
t17: ch,glue = CopyToReg t15, Register:i64 %RBP, t4, t15:1
t20: ch,glue = X86ISD::CPS_CALL t17, TargetGlobalAddress:i64<{ i64, i64* } (i64, i64*)* @bar> 0, Constant:i64<1337>, Constant:i32<2>, Constant:i16<1>, Register:i64 %R13, Register:i64 %RBP, RegisterMask:Untyped, t17:1
t21: ch,glue = callseq_end t20, TargetConstant:i64<0>, TargetConstant:i64<0>, t20:1
t22: i64,ch,glue = CopyFromReg t21, Register:i64 %R13, t21:1
t23: i64,ch,glue = CopyFromReg t22:1, Register:i64 %RBP, t22:2
t24: i64,i64 = merge_values t22, t23
t25: i64,i64 = merge_values t7, t24:1
t27: ch,glue = CopyToReg t23:1, Register:i64 %R13, t25
t28: ch,glue = CopyToReg t27, Register:i64 %RBP, t25:1, t27:1
t29: ch = X86ISD::CPS_RET t28, TargetConstant:i32<0>, Register:i64 %R13, Register:i64 %RBP, t28:1 |
Turns out that it's due to
|
Attempts to clear the One key assumption seems to be that all nodes created exist in one of these data structures, unless if |
Current solution I'm thinking of is to run a PreISel IR->IR pass. It's the perfect time to do it since that happens after codegen prepare and immediately before isel starts. Also, there's already infrastructure for this, which can even be target specific:
|
CSE breaks things again:
|
The workaround for this is to pass The best solution to this is to not mark a |
It seems that the prep pass and disabling machine CSE does not completely solve the issue. Here's why _r7IJ_info$def:
## BB#0: ## %c7WR
subq $24, %rsp
movq %r14, %rax
leaq -48(%rbp), %rcx
cmpq %r15, %rcx
jb LBB2_26 # basic heap/stack test
## BB#1:
movq _ghczmprim_GHCziTypes_False_closure@GOTPCREL(%rip), %rcx
incq %rcx
movq %rcx, 16(%rsp) ## 8-byte Spill
movq _ghczmprim_GHCziTypes_True_closure@GOTPCREL(%rip), %rcx
addq $2, %rcx
movq %rcx, 8(%rsp) ## 8-byte Spill
# ... several CPS calls later within this function ...
LBB2_18: ## %c7Xj
movq 8(%rbp), %rdi
movq 32(%rbp), %rsi
andl $7, %ebx
addq $40, %rbp
cmpq $2, %rbx
jne LBB2_19
## BB#20: ## %c7Xv
movq 16(%rsp), %rax ## 8-byte Reload
# .... other stuff ....
LBB2_19: ## %c7Xr
movq 8(%rsp), %rax ## 8-byte Reload The LLVM for this is quite different. c7Xj:
; ...
switch i64 %ln829, label %c7Xr [i64 1, label %c7Xr
i64 2, label %c7Xv]
c7Xr:
; ...
%ln82g = ptrtoint i8* @ghczmprim_GHCziTypes_True_closure to i64
%ln82h = add i64 %ln82g, 2
c7Xv:
; ...
%ln82o = ptrtoint i8* @ghczmprim_GHCziTypes_False_closure to i64
%ln82p = add i64 %ln82o, 1 We only reference those globals once each within the function, and LLVM does something stupid by floating both of those global offset computations to the beginning of the function, and "passing" that value via spill/reload from the stack. Why would anyone think this is a good optimization for LLVM to ever perform? This again goes back to the discussion in #19 where we just make LLVM not do this non-sensical thing. |
It seems that this is caused by an optimization in |
These constants were hoisted by Machine LICM. The workaround for this case is to use It really shouldn't happen at all. It makes the code worse to hoist this. The "loop" here is actually a GC loop I think, which is not hot. |
In the example below, the constant global's address is spilled and reloaded from the C stack across this call, which we should not have happen.
The text was updated successfully, but these errors were encountered: