New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop compile-time CPU detection #370
Conversation
So that we can generate AVX binaries on non-AVX machines. See https://reproducible-builds.org/ for why this matters. Fixes gnuradio#369
|
LGTM. CI passes. Archs are still present in CI. So this seems to be a good change. Does that enable reproducable builds for VOLK? Or are there other issues pending? @bmwiedemann Do you have a CLA? |
|
I think for this small a change we don't actually need a CLA ... but, if you want to do more PRs then it would be wise to have one. |
|
Duplicating my comment on the issue: We added this configure execution check a long time ago because Volk didn't have the runtime capable of failing on a compiled kernel ... I mean, why enable a kernel if it won't execute on the target CPU? I agree that it would be better to allow any compilable kernel to be compiled, then do runtime detection and disabling of kernels based on the target CPU. I don't honestly know that Volk can do this yet. |
|
@jdemel it is the only reproducibility issue I found for openSUSE, but Debian people might find others later, because they vary more build environment parameters (build user, path etc) https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/volk.html mentions "records build flags" (CFLAGS). E.g. do one build with Edit: it also captures build path in html docs (<title> and somewhere else) |
|
For the CLA, I do not forsee doing more patches. I can state that I did all of these changes myself, my employer (SUSE) is totally fine with me contributing to open-source projects and therefore I declare my work CC0 licensed that allows FSF to claim copyright. |
|
We know that we can merge this PR. Further, CI passes. Also, it gives us reproducible builds which we want too. But before we can merge this PR, we need to figure out if this would break things for some users. VOLK should dynamically detect CPU capabilities. Also, only the AVX kernel is not compiled. Even newer ones, e.g. AVX2, are. @michaelld what are the exact circumstances under which VOLK fails? Why was this decision made? |
I honestly don't recall the -exact- circumstances, but here's the general memory: Thus, if the target CPU can't execute basic The above works really well when compiling from source for a specific target CPU. But if we don't take the CPU into account -- just what the compiler supports -- the actual execution check should be removed. Other code that should be removed is (at least) < Lines 179 to 181 in de93c27
I'm not against this change. But I do want to make sure the Volk runtime will properly handle kernel failures such as what this specific PR's current code targets (the |
A quick code review ... looks like this is the only other such CPU execution check. Thinking it would be wise to add removing it to this PR ... to be complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested this patch successfully on an old server with a CPU that does not have AVX instructions (only SSE4.2). Executing test_xgetbv result in a SIGILL.
$gdb --ex run --args ./lib/test_xgetbv
Program received signal SIGILL, Illegal instruction.
0x0000000000401112 in _xgetbv (index=0) at /root/volk/build/lib/test_xgetbv.c:2
2 unsigned long long _xgetbv(unsigned int index) { unsigned int eax, edx; __VOLK_ASM __VOLK_VOLATILE("xgetbv" : "=a"(eax), "=d"(edx) : "c"(index)); return ((unsigned long long)edx << 32) | eax; } int main (void) { (void) _xgetbv(0); return (0); }
(gdb) bt
#0 0x0000000000401112 in _xgetbv (index=0) at /root/volk/build/lib/test_xgetbv.c:2
#1 0x000000000040113b in main () at /root/volk/build/lib/test_xgetbv.c:2
(gdb) disassemble
Dump of assembler code for function _xgetbv:
0x0000000000401106 <+0>: push %rbp
0x0000000000401107 <+1>: mov %rsp,%rbp
0x000000000040110a <+4>: mov %edi,-0x14(%rbp)
0x000000000040110d <+7>: mov -0x14(%rbp),%eax
0x0000000000401110 <+10>: mov %eax,%ecx
=> 0x0000000000401112 <+12>: xgetbv
0x0000000000401115 <+15>: mov %eax,-0x4(%rbp)
The runtime detection is implemented here
Lines 47 to 55 in faf230e
| for(i=0; i<n_volk_machines; i++) { | |
| if(!(volk_machines[i]->caps & (~volk_get_lvarch()))) { | |
| if(volk_machines[i]->caps > max_score) { | |
| max_score = volk_machines[i]->caps; | |
| max_machine = volk_machines[i]; | |
| } | |
| } | |
| } | |
| machine = max_machine; |
and
get_machine() function is called in the various __init* function in build/volk/lib/volk.c.
(gdb) bt
#0 0x00007ffff74b665e in get_machine () at /home/vasko/gr38/src/gnuradio38/build/volk/lib/volk.c:7740
#1 __init_volk_32fc_s32fc_x2_rotator_32fc () at /home/vasko/gr38/src/gnuradio38/build/volk/lib/volk.c:7740
#2 0x00007ffff74b6919 in __volk_32fc_s32fc_x2_rotator_32fc (outVector=0x7ffff4eef010, inVector=0x7ffff4fb3010, phase_inc=<optimized out>,
phase=0x7fffffffb8c0, num_points=100000) at /home/vasko/gr38/src/gnuradio38/build/volk/lib/volk.c:7770
#3 0x0000000000402f4a in gr::blocks::rotator::rotateN (n=100000, in=0x7ffff4fb3010, out=0x7ffff4eef010, this=0x7fffffffb8c0)
at /home/vasko/gr38/src/gnuradio38/gr-blocks/lib/../include/gnuradio/blocks/rotator.h:63
after applying this patch volk-config-info gives the following results
$ ./apps/volk-config-info --all-machines
generic;sse2_64_mmx;sse3_64_mmx;ssse3_64_mmx;sse4_a_64_mmx;sse4_1_64_mmx;sse4_2_64_mmx;avx_64_mmx;avx2_64_mmx;avx512f_64_mmx;avx512cd_64_mmx
$ ./apps/volk-config-info --avail-machines
generic;sse2_64_mmx;sse3_64_mmx;ssse3_64_mmx;sse4_1_64_mmx;sse4_2_64_mmx;
$ ./apps/volk-config-info --machine
sse4_2_64_mmx
so all x86_64 kernels are built, the best one gets selected, all test pass and volk_profile does finish successfully.
And here is some more info about the CPU.
$cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
stepping : 2
microcode : 0x1f
cpu MHz : 2400.000
cache size : 12288 KB
physical id : 1
siblings : 8
core id : 0
cpu cores : 4
apicid : 32
initial apicid : 32
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat spec_ctrl intel_stibp flush_l1d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ; makes sense too!
|
@velichkov thanks for that info ... very informative! I need to go collect my very old, pre-AVX laptop ... would help in verifying changes such as this! |
|
@bmwiedemann do you want to add removing this to this PR too? It's similar to the current CPU test being removed already ... so thinking yes ... Lines 179 to 181 in de93c27
|
I added this removal in a 2nd commit |
even though this one did not affect x86_64 related to gnuradio#369 Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
|
PR still LGTM. Travis fails with |
|
I've restarted CI / Travis. Guessing most will pass, some might not but I don't think it's because of this PR. |
|
CI now passes. Multiple approvals and a good use case for why to remove this code. Merging. |
Drop compile-time CPU detection
So that we can generate AVX binaries on non-AVX machines.
See https://reproducible-builds.org/ for why this matters.
Fixes #369
This PR was done while working on reproducible builds for openSUSE.