Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop compile-time CPU detection #370

Merged
merged 2 commits into from Apr 1, 2020
Merged

Conversation

bmwiedemann
Copy link
Contributor

@bmwiedemann bmwiedemann commented Mar 29, 2020

Drop compile-time CPU detection
So that we can generate AVX binaries on non-AVX machines.

See https://reproducible-builds.org/ for why this matters.

Fixes #369

This PR was done while working on reproducible builds for openSUSE.

So that we can generate AVX binaries on non-AVX machines.

See https://reproducible-builds.org/ for why this matters.

Fixes gnuradio#369
@jdemel
Copy link
Contributor

jdemel commented Mar 29, 2020

LGTM. CI passes. Archs are still present in CI. So this seems to be a good change. Does that enable reproducable builds for VOLK? Or are there other issues pending?

@bmwiedemann Do you have a CLA?

@michaelld
Copy link
Contributor

I think for this small a change we don't actually need a CLA ... but, if you want to do more PRs then it would be wise to have one.

@michaelld
Copy link
Contributor

Duplicating my comment on the issue:

We added this configure execution check a long time ago because Volk didn't have the runtime capable of failing on a compiled kernel ... I mean, why enable a kernel if it won't execute on the target CPU?

I agree that it would be better to allow any compilable kernel to be compiled, then do runtime detection and disabling of kernels based on the target CPU. I don't honestly know that Volk can do this yet.

@bmwiedemann
Copy link
Contributor Author

bmwiedemann commented Mar 29, 2020

@jdemel it is the only reproducibility issue I found for openSUSE, but Debian people might find others later, because they vary more build environment parameters (build user, path etc)

https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/volk.html mentions "records build flags" (CFLAGS). E.g. do one build with -Werror=return-type -Wall and one without - should produce identical binaries if it would not include those strings in libvolk.so.2.x

Edit: it also captures build path in html docs (<title> and somewhere else)

@bmwiedemann
Copy link
Contributor Author

For the CLA, I do not forsee doing more patches.

I can state that I did all of these changes myself, my employer (SUSE) is totally fine with me contributing to open-source projects and therefore I declare my work CC0 licensed that allows FSF to claim copyright.

@jdemel
Copy link
Contributor

jdemel commented Mar 31, 2020

We know that we can merge this PR. Further, CI passes. Also, it gives us reproducible builds which we want too.

But before we can merge this PR, we need to figure out if this would break things for some users.

VOLK should dynamically detect CPU capabilities. Also, only the AVX kernel is not compiled. Even newer ones, e.g. AVX2, are.

@michaelld what are the exact circumstances under which VOLK fails? Why was this decision made?

@michaelld
Copy link
Contributor

@michaelld what are the exact circumstances under which VOLK fails? Why was this decision made?

I honestly don't recall the -exact- circumstances, but here's the general memory:
{{{
Some now-older CPUs don't properly implement the xgetbv instruction (so, partial AVX, but not full AVX), but the compiler doesn't know this. If the compiler supports AVX, then it will compile AVX code that includes the xgetbv instruction -- regardless of whether it will execute. Executing this code on a CPU with the buggy xgetbv results in some sort of runtime error.

Thus, if the target CPU can't execute basic xgetbv code then we know to eliminate AVX as a supported SIMD kernel.
}}}

The above works really well when compiling from source for a specific target CPU.

But if we don't take the CPU into account -- just what the compiler supports -- the actual execution check should be removed. Other code that should be removed is (at least) <

volk/lib/CMakeLists.txt

Lines 179 to 181 in de93c27

execute_process(COMMAND ${CMAKE_CURRENT_BINARY_DIR}/test_cvtpi32_ps
OUTPUT_QUIET ERROR_QUIET
RESULT_VARIABLE avx_exe_result)
> ... and maybe others ... anything that is actually executed by CMake to check for whether the target CPU can execute the target instruction.

I'm not against this change. But I do want to make sure the Volk runtime will properly handle kernel failures such as what this specific PR's current code targets (the xgetbv instruction). I don't know if it does.

@michaelld
Copy link
Contributor

Other code that should be removed is (at least) <

volk/lib/CMakeLists.txt

Lines 179 to 181 in de93c27

execute_process(COMMAND ${CMAKE_CURRENT_BINARY_DIR}/test_cvtpi32_ps
OUTPUT_QUIET ERROR_QUIET
RESULT_VARIABLE avx_exe_result)

... and maybe others ... anything that is actually executed by CMake to check for whether the target CPU can execute the target instruction.

A quick code review ... looks like this is the only other such CPU execution check. Thinking it would be wise to add removing it to this PR ... to be complete.

Copy link
Contributor

@velichkov velichkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested this patch successfully on an old server with a CPU that does not have AVX instructions (only SSE4.2). Executing test_xgetbv result in a SIGILL.

$gdb --ex run --args ./lib/test_xgetbv

Program received signal SIGILL, Illegal instruction.
0x0000000000401112 in _xgetbv (index=0) at /root/volk/build/lib/test_xgetbv.c:2
2	 unsigned long long _xgetbv(unsigned int index) { unsigned int eax, edx; __VOLK_ASM __VOLK_VOLATILE("xgetbv" : "=a"(eax), "=d"(edx) : "c"(index)); return ((unsigned long long)edx << 32) | eax; } int main (void) { (void) _xgetbv(0); return (0); }
(gdb) bt
#0  0x0000000000401112 in _xgetbv (index=0) at /root/volk/build/lib/test_xgetbv.c:2
#1  0x000000000040113b in main () at /root/volk/build/lib/test_xgetbv.c:2
(gdb) disassemble 
Dump of assembler code for function _xgetbv:
   0x0000000000401106 <+0>:	push   %rbp
   0x0000000000401107 <+1>:	mov    %rsp,%rbp
   0x000000000040110a <+4>:	mov    %edi,-0x14(%rbp)
   0x000000000040110d <+7>:	mov    -0x14(%rbp),%eax
   0x0000000000401110 <+10>:	mov    %eax,%ecx
=> 0x0000000000401112 <+12>:	xgetbv 
   0x0000000000401115 <+15>:	mov    %eax,-0x4(%rbp)

The runtime detection is implemented here

volk/tmpl/volk.tmpl.c

Lines 47 to 55 in faf230e

for(i=0; i<n_volk_machines; i++) {
if(!(volk_machines[i]->caps & (~volk_get_lvarch()))) {
if(volk_machines[i]->caps > max_score) {
max_score = volk_machines[i]->caps;
max_machine = volk_machines[i];
}
}
}
machine = max_machine;

and get_machine() function is called in the various __init* function in build/volk/lib/volk.c.

(gdb) bt
#0  0x00007ffff74b665e in get_machine () at /home/vasko/gr38/src/gnuradio38/build/volk/lib/volk.c:7740
#1  __init_volk_32fc_s32fc_x2_rotator_32fc () at /home/vasko/gr38/src/gnuradio38/build/volk/lib/volk.c:7740
#2  0x00007ffff74b6919 in __volk_32fc_s32fc_x2_rotator_32fc (outVector=0x7ffff4eef010, inVector=0x7ffff4fb3010, phase_inc=<optimized out>, 
    phase=0x7fffffffb8c0, num_points=100000) at /home/vasko/gr38/src/gnuradio38/build/volk/lib/volk.c:7770
#3  0x0000000000402f4a in gr::blocks::rotator::rotateN (n=100000, in=0x7ffff4fb3010, out=0x7ffff4eef010, this=0x7fffffffb8c0)
    at /home/vasko/gr38/src/gnuradio38/gr-blocks/lib/../include/gnuradio/blocks/rotator.h:63

after applying this patch volk-config-info gives the following results

$ ./apps/volk-config-info --all-machines
generic;sse2_64_mmx;sse3_64_mmx;ssse3_64_mmx;sse4_a_64_mmx;sse4_1_64_mmx;sse4_2_64_mmx;avx_64_mmx;avx2_64_mmx;avx512f_64_mmx;avx512cd_64_mmx
$ ./apps/volk-config-info --avail-machines
generic;sse2_64_mmx;sse3_64_mmx;ssse3_64_mmx;sse4_1_64_mmx;sse4_2_64_mmx;
$ ./apps/volk-config-info --machine
sse4_2_64_mmx

so all x86_64 kernels are built, the best one gets selected, all test pass and volk_profile does finish successfully.

And here is some more info about the CPU.

$cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 44
model name	: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
stepping	: 2
microcode	: 0x1f
cpu MHz		: 2400.000
cache size	: 12288 KB
physical id	: 1
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 32
initial apicid	: 32
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat spec_ctrl intel_stibp flush_l1d

Copy link
Contributor

@michaelld michaelld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ; makes sense too!

@michaelld
Copy link
Contributor

@velichkov thanks for that info ... very informative! I need to go collect my very old, pre-AVX laptop ... would help in verifying changes such as this!

@michaelld
Copy link
Contributor

@bmwiedemann do you want to add removing this to this PR too? It's similar to the current CPU test being removed already ... so thinking yes ...

volk/lib/CMakeLists.txt

Lines 179 to 181 in de93c27

execute_process(COMMAND ${CMAKE_CURRENT_BINARY_DIR}/test_cvtpi32_ps
OUTPUT_QUIET ERROR_QUIET
RESULT_VARIABLE avx_exe_result)

@bmwiedemann
Copy link
Contributor Author

do you want to add removing this to this PR too? It's similar to the current CPU test being removed already ... so thinking yes ...

I added this removal in a 2nd commit

even though this one did not affect x86_64

related to gnuradio#369

Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
@jdemel
Copy link
Contributor

jdemel commented Apr 1, 2020

PR still LGTM. Travis fails with sum_of_poly on ARM. It'd be helpful to be able to rerun jobs in such situations.

@michaelld
Copy link
Contributor

I've restarted CI / Travis. Guessing most will pass, some might not but I don't think it's because of this PR.

@michaelld
Copy link
Contributor

CI now passes. Multiple approvals and a good use case for why to remove this code. Merging.

@michaelld michaelld merged commit fe717db into gnuradio:master Apr 1, 2020
@bmwiedemann bmwiedemann deleted the cpu branch April 1, 2020 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build results vary from build machine CPU
4 participants