Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'-march=native' incorrectly emits VZEROUPPER with Knights Landing #35967

Closed
davezarzycki opened this issue Mar 6, 2018 · 8 comments
Closed
Labels
backend:X86 bugzilla Issues migrated from bugzilla

Comments

@davezarzycki
Copy link
Member

Bugzilla Link 36619
Resolution FIXED
Resolved on Apr 06, 2018 17:59
Version trunk
OS All
Blocks #35997
CC @topperc,@davezarzycki,@RKSimon
Fixed by commit(s) r326840 r329473

Extended Description

The following code compiled on Knights Landing incorrectly generates VZEROUPPER with '-march=native', and correctly does NOT emit VZEROUPPER with '-march=knl':

class alignas(32) BigThing { int i[8]; };
void test(BigThing *in, BigThing *out) { *out = *in; }

@topperc
Copy link
Collaborator

topperc commented Mar 6, 2018

Can you provide an IR file so we can see what CPU it was detected as?

Also if you could provide a copy of /proc/cpuinfo from the system if you're on linux.

@davezarzycki
Copy link
Member Author

$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 87
model name : Intel(R) Xeon Phi(TM) CPU 7290 @ 1.50GHz
stepping : 1
microcode : 0x1ac
cpu MHz : 1003.135
cache size : 1024 KB
physical id : 0
siblings : 288
core id : 0
cpu cores : 72
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait cpuid_fault epb fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2
bogomips : 3000.03
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

... repeated 72 times because, well, that is how /proc/cpuinfo works ...

$ clang -O3 -march=native -S -o - -emit-llvm /tmp/phi_test.cpp
; ModuleID = '/tmp/phi_test.cpp'
source_filename = "/tmp/phi_test.cpp"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%class.BigThing = type { [8 x i32] }

; Function Attrs: nounwind uwtable
define dso_local void @​Z4testP8BigThingS0(%class.BigThing* nocapture readonly, %class.BigThing* nocapture) local_unnamed_addr #​0 {
%3 = bitcast %class.BigThing* %1 to i8*
%4 = bitcast %class.BigThing* %0 to i8*
tail call void @​llvm.memcpy.p0i8.p0i8.i64(i8* align 32 %3, i8* align 32 %4, i64 32, i1 false), !tbaa.struct !​2
ret void
}

; Function Attrs: argmemonly nounwind
declare void @​llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #​1

attributes #​0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="bdver1" "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er,+avx512f,+avx512pf,+bmi,+bmi2,+cmov,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+prfchw,+rdrnd,+rdseed,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt,-avx512bitalg,-avx512bw,-avx512dq,-avx512ifma,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vpopcntdq,-clflushopt,-clwb,-clzero,-fma4,-gfni,-ibt,-lwp,-mwaitx,-pku,-rdpid,-rtm,-sgx,-sha,-shstk,-sse4a,-tbm,-vaes,-vpclmulqdq,-xop,-xsavec,-xsaves" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #​1 = { argmemonly nounwind }

!llvm.module.flags = !{#0}
!llvm.ident = !{#1}

!​0 = !{i32 1, !"wchar_size", i32 4}
!​1 = !{!"clang version 7.0.0 (https://git.llvm.org/git/clang.git 7ee88a970e15132e3fc04fe9f2fc1c422e046ea2) (https://git.llvm.org/git/llvm.git d45c0f1)"}
!​2 = !{i64 0, i64 32, !​3}
!​3 = !{#4, !​4, i64 0}
!​4 = !{!"omnipotent char", !​5, i64 0}
!​5 = !{!"Simple C++ TBAA"}

@davezarzycki
Copy link
Member Author

PS – Thanks Craig for being so responsive to bug reports :-)

@topperc
Copy link
Collaborator

topperc commented Mar 6, 2018

Well this is a scary bug. KNL is being identified as "bdver1". Silvermont gets identified as "amdfam10h" I think.

@topperc
Copy link
Collaborator

topperc commented Mar 6, 2018

Fix commited in r326840

@topperc
Copy link
Collaborator

topperc commented Mar 6, 2018

Hopefully we can get this into the next 6.0 branch release

@tstellar
Copy link
Collaborator

tstellar commented Apr 7, 2018

Merged: r329473

@zmodem
Copy link
Collaborator

zmodem commented Nov 27, 2021

mentioned in issue #35997

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants