Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing tests on s390c, ppc64le, and MIPS #442

Closed
yarda opened this issue Jan 20, 2021 · 23 comments · Fixed by #695 · May be fixed by #488
Closed

Failing tests on s390c, ppc64le, and MIPS #442

yarda opened this issue Jan 20, 2021 · 23 comments · Fixed by #695 · May be fixed by #488
Labels

Comments

@yarda
Copy link
Contributor

yarda commented Jan 20, 2021

We are trying to rebase gnuradio to 3.9.0.0 in Fedora, but we found out that the volk-2.4.1 testsuite doesn't pass on s390x and ppc64le. We build with standard distro flags plus -fno-strict-aliasing. The following tests are failing:

92 - qa_volk_32fc_s32fc_multiply_32fc (Failed)
93 - qa_volk_32fc_s32fc_rotatorpuppet_32fc (Failed)
102 - qa_volk_32fc_x2_s32fc_multiply_conjugate_add_32fc (Failed)

Build logs:
https://jskarvad.fedorapeople.org/volk/ppc64le_build_failure.txt
https://jskarvad.fedorapeople.org/volk/s390x_build_failure.txt

@jdemel
Copy link
Contributor

jdemel commented Jan 20, 2021

VOLK does not provide optimizations for these architectures. Thus, only the generic kernels are used. We ran tests on these archs with TravisCI but need to stop it because of TravisCI policy changes.

Can you run your build with ctest -V? That should tell us more about these failing tests. I can only assume that the error is the same we saw with TravisCI, i.e. a segfault in a generic kernel. I can't test on ppc64le and s390x. Can you analyze the issue and suggest a fix? Or share more insight, e.g. a gdb backtrace?

@yarda
Copy link
Contributor Author

yarda commented Jan 20, 2021

You are right, it's segfaulting:

92: Test command: /usr/bin/sh "/root/rpmbuild/BUILD/volk-2.4.1/ppc64le-redhat-linux-gnu/lib/volk_32fc_s32fc_multiply_32fc_test.sh" "/root/rpmbuild/BUILD/volk-2.4.1/ppc64le-redhat-linux-gnu/lib"
92: Test timeout computed to be: 10000000
92: RUN_VOLK_TESTS: volk_32fc_s32fc_multiply_32fc(131071,1)
92: /root/rpmbuild/BUILD/volk-2.4.1/ppc64le-redhat-linux-gnu/lib/volk_32fc_s32fc_multiply_32fc_test.sh: line 6: 20289 Segmentation fault      (core dumped) volk_test_all volk_32fc_s32fc_multiply_32fc
 92/134 Test  #92: qa_volk_32fc_s32fc_multiply_32fc .....................***Failed    0.14 sec
test 93
        Start  93: qa_volk_32fc_s32fc_rotatorpuppet_32fc

93: Test command: /usr/bin/sh "/root/rpmbuild/BUILD/volk-2.4.1/ppc64le-redhat-linux-gnu/lib/volk_32fc_s32fc_rotatorpuppet_32fc_test.sh" "/root/rpmbuild/BUILD/volk-2.4.1/ppc64le-redhat-linux-gnu/lib"
93: Test timeout computed to be: 10000000
93: RUN_VOLK_TESTS: volk_32fc_s32fc_rotatorpuppet_32fc(131071,1)
93: /root/rpmbuild/BUILD/volk-2.4.1/ppc64le-redhat-linux-gnu/lib/volk_32fc_s32fc_rotatorpuppet_32fc_test.sh: line 6: 20293 Segmentation fault      (core dumped) volk_test_all volk_32fc_s32fc_rotatorpuppet_32fc
 93/134 Test  #93: qa_volk_32fc_s32fc_rotatorpuppet_32fc ................***Failed    0.17 sec
test 94
        Start  94: qa_volk_32fc_s32fc_x2_rotator_32fc

102: Test command: /usr/bin/sh "/root/rpmbuild/BUILD/volk-2.4.1/ppc64le-redhat-linux-gnu/lib/volk_32fc_x2_s32fc_multiply_conjugate_add_32fc_test.sh" "/root/rpmbuild/BUILD/volk-2.4.1/ppc64le-redhat-linux-gnu/lib"
102: Test timeout computed to be: 10000000
102: RUN_VOLK_TESTS: volk_32fc_x2_s32fc_multiply_conjugate_add_32fc(131071,1)
102: Volk warning: no arch found, returning generic impl
102: /root/rpmbuild/BUILD/volk-2.4.1/ppc64le-redhat-linux-gnu/lib/volk_32fc_x2_s32fc_multiply_conjugate_add_32fc_test.sh: line 6: 20322 Segmentation fault      (core dumped) volk_test_all volk_32fc_x2_s32fc_multiply_conjugate_add_32fc
102/134 Test #102: qa_volk_32fc_x2_s32fc_multiply_conjugate_add_32fc ....***Failed    0.18 sec
test 103
        Start 103: qa_volk_32fc_x2_square_dist_32f

I will run it through the debugger.

@yarda
Copy link
Contributor Author

yarda commented Jan 20, 2021

bt for the volk_32fc_s32fc_multiply_32fc_test.sh, compiled with -g -O0 -fno-strict-aliasing:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  volk_32fc_s32fc_multiply_32fc_generic (cVector=0x7fff86b80010, aVector=0x7fff86a70010, scalar=327 + 0i, num_points=4049594328)
    at /root/rpmbuild/BUILD/volk-2.4.1/kernels/volk/volk_32fc_s32fc_multiply_32fc.h:235
235	        *cPtr++ = (*aPtr++) * scalar;
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.32-1.fc33.ppc64le libgcc-10.2.1-9.fc33.ppc64le libstdc++-10.2.1-9.fc33.ppc64le orc-0.4.31-3.fc33.ppc64le
(gdb) bt
#0  volk_32fc_s32fc_multiply_32fc_generic (cVector=0x7fff86b80010, aVector=0x7fff86a70010, scalar=327 + 0i, num_points=4049594328)
    at /root/rpmbuild/BUILD/volk-2.4.1/kernels/volk/volk_32fc_s32fc_multiply_32fc.h:235
#1  0x00007fff874cbb68 in volk_32fc_s32fc_multiply_32fc_manual (cVector=0x7fff86b80010, aVector=0x7fff86a70010, scalar=327 + 0i, 
    num_points=4049594328, impl_name=0x7ffff15fe7c8 "\330\347_\361\377\177")
    at /root/rpmbuild/BUILD/volk-2.4.1/ppc64le-redhat-linux-gnu/lib/volk.c:7784
#2  0x0000000138b611a0 in run_cast_test2_s32fc (func=0x7fff874cbaa4 <volk_32fc_s32fc_multiply_32fc_manual>, 
    buffs=std::vector of length 2, capacity 2 = {...}, scalar=..., vlen=131071, iter=0, arch="generic")
    at /root/rpmbuild/BUILD/volk-2.4.1/lib/qa_utils.cc:335
#3  0x0000000138b5d664 in run_volk_tests (desc=..., manual_func=0x7fff874cbaa4 <volk_32fc_s32fc_multiply_32fc_manual>, 
    name="volk_32fc_s32fc_multiply_32fc", tol=9.99999997e-07, scalar=..., vlen=131071, iter=1, results=0x7ffff15ff048, 
    puppet_master_name="NULL", absolute_mode=false, benchmark_mode=true) at /root/rpmbuild/BUILD/volk-2.4.1/lib/qa_utils.cc:644
#4  0x0000000138b5c73c in run_volk_tests (desc=..., manual_func=0x7fff874cbaa4 <volk_32fc_s32fc_multiply_32fc_manual>, 
    name="volk_32fc_s32fc_multiply_32fc", test_params=..., results=0x7ffff15ff048, puppet_master_name="NULL")
    at /root/rpmbuild/BUILD/volk-2.4.1/lib/qa_utils.cc:498
#5  0x0000000138b52cf0 in main (argc=2, argv=0x7ffff15ff568) at /root/rpmbuild/BUILD/volk-2.4.1/lib/testqa.cc:58

@yarda
Copy link
Contributor Author

yarda commented Jan 20, 2021

To be honest I have no idea why there are non sense values at #1. Valgrind showed that the values weren't crippled before. It was gcc 10.2.1, I will try with older gcc.

@yarda
Copy link
Contributor Author

yarda commented Jan 20, 2021

The same problem occurred with gcc 9.3.1.

@yarda
Copy link
Contributor Author

yarda commented Jan 20, 2021

The problem is caused by lv_32fc_t when linking C and C++ code, I am completing minimal reproducer. Currently I am not sure whether it's gcc bug.

@yarda
Copy link
Contributor Author

yarda commented Jan 20, 2021

Stripped down reproducer:

$ cat Makefile
test1: test1.o test11.o
	g++ -o test1 test11.o test1.o

test1.o: test1.c cpl.h
	gcc -c -o test1.o test1.c

test11.o: test1.cc cpl.h
	g++ -c -o test11.o test1.cc
$ cat cpl.h
ifdef __cplusplus

#include <complex>

typedef std::complex<float> lv_32fc_t;

#else

#include <complex.h>

typedef float complex lv_32fc_t;

#endif

typedef void (*volk_fn_2arg_s32fc)(lv_32fc_t, unsigned int, const char*);
$ cat test1.c
#include "cpl.h"
#include <stdio.h>

void manual(const lv_32fc_t scalar, unsigned int num_points, const char* impl_name)
{
    printf("%u, %f, %f, %s\n", num_points, __real__ scalar, __imag__ scalar, impl_name);
}
$ cat test1.cc
#include <stdio.h>
#include <string>
#include <complex>
#include "cpl.h"

extern "C" void manual(const lv_32fc_t  scalar, unsigned int  num_points, const char* impl_name);

unsigned int vlen = 131071;
lv_32fc_t scalar(327.0f, 2.0f);
std::string s("manual");
volk_fn_2arg_s32fc manual_func = (volk_fn_2arg_s32fc) manual;

int main(int argc, char *argv[])
{
  manual_func(scalar, vlen, s.c_str());

  return 0;
}

Both std::complex<float> and float complex seems to be 8 bytes. They are interchangeable on x86, but it seems on ppc64le and s390x they aren't, moreover they crippled all the following args. I don't know why, I will try opening gcc bug.

@yarda
Copy link
Contributor Author

yarda commented Jan 20, 2021

I created gcc bug and let's see :) https://bugzilla.redhat.com/show_bug.cgi?id=1918519

@marcusmueller
Copy link
Member

Thank you, @yarda!

@jwakely
Copy link

jwakely commented Jan 21, 2021

They are interchangeable on x86, but it seems on ppc64le and s390x they aren't, moreover they crippled all the following args.

The standard doesn't say they're interchangeable so you're relying on undefined behaviour. All the standard says is that they have the same memory layout, so you can access std::complex<float>* and float _Complex* like an array of two floats and access the real and imaginary parts that way. That doesn't mean that function arguments of type std::complex<float> and float _Complex are passed the same way. The calling conventions for a given architecture might pass a std::complex<float> in a single 64-bit register, but might pass float _Complex by address, the same way a float[2] would be passed (decaying to a pointer).

@jwakely
Copy link

jwakely commented Jan 21, 2021

It works if you just pass the complex number by address instead of by value, see https://bugzilla.redhat.com/show_bug.cgi?id=1918519#c3

@jdemel
Copy link
Contributor

jdemel commented Jan 21, 2021

@jwakely Thanks for the thorough explanation.

Now it becomes quite obvious why all those three _s32fc_ kernels fail. Also, we need to fix this. Obviously, we can't just change the function signature. I assume, we need to change the signature and add a shim that still exposes the old signature as well. After all I'm quite sure at least gnuradio relies on the old signature. And this might make gnuradio fail on ppc64le and s390x even with a fix.
Is there a way to fix this without breaking our API?

@marcusmueller
Copy link
Member

@jdemel, well, the only way for current/previously released C++ software that uses lv_fc32_t as std::complex<float> not to break API is to leave it be std::complex<float>, that's the API.

However, not all is bleak:

We can fix this for GR master, 3.9.0.1, 3.8.2.1 (or whatever the next 3.8 is) by actually not relying on it being std::complex.

For people with un-upgradeable software that links against VOLK 3, you could just add a #ifdef __cpluspus #ifdef VOLK_COMPAT_LV_32FCT_STD_COMPLEX… to restore the old API, and write into the release notes that if you want to use old C++ software with new VOLK, you'll a) won't be able on s390x and ppc64, and b) you'll need to -DVOLK_COMPAT….

@jwakely
Copy link

jwakely commented Jan 21, 2021

You could add overloads on the C++ side which take std::complex<float> by value, then obtain float* from it, call a C function that takes the float* and the creates a new float _Complex from the values in the float* and calls the original C function with that new float _Complex.

That would allow the C++ code to keep calling the function and passing complex<float> by value, but have it safely convert that to the right type for the underlying C API.

But I don't see how to fix existing C++ code that has already been compiled and passes a std::complex<float> by value to the symbol manual (or whatever the real C function is called). That code is already calling the underlying C function with the wrong arguments and can't be fixed without a recompile on the targets where the calling convention differs.

@marcusmueller
Copy link
Member

@jwakely exactly, all you can do is have a compatibility mode for this legacy software activated with a #define; afterwards, you can make the new API as sensible as you want.

creates a new float _Complex from the values

Application-wise, copying the data even once is not an option; VOLK is library for SIMD-accelerated direct operation on uniform vectors in memory, and copying that nullifies nearly every benefit you get from it. Therefore, you really need to cast the pointers passed around here, not the values, in C-land. While I'm certain we could trick enough compilers to at least accept that a const std::complex newone(old.real, old.imag) should be done without a copy, we cannot do the same for mutable objects – aliasing rules and everything.

So, yeah, I think we shouldn't ever had a C++ API that looks 90% like the C API but isn't. Makes little sense, usage-wise, anyway.

@jwakely
Copy link

jwakely commented Jan 21, 2021

copying that nullifies nearly every benefit you get from it.

Ah yes, that's not appropriate then. Even with LTO you're unlikely to get those extra copies and forwarding functions inlined away.

So, yeah, I think we shouldn't ever had a C++ API that looks 90% like the C API but isn't.

100% agreed.

@jdemel
Copy link
Contributor

jdemel commented Jan 22, 2021

The issue "we rely on undefined behavior" slipped through for a long time. Now, that we know about it, we need to come up with a clever solution.

The clean and long term goal should be a simplified interface. Given the current scope of VOLK

"VOLK is a C library that offers stateless math operations intended towards DSP in communications."

I'd argue, we should try to make sure that we only offer a C interface. Then we have tooling for hazzle free work with C++.

Essentially, I can't think of a fix for this issue that wouldn't break our API. Thus, a VOLK version that includes such a fix requires a major version bump. A good reason to move to VOLK 3. But we want to avoid breakage. So, if we would break our API, we should first gather all the necessary changes that should go into a new major version. We'd avoid to have a VOLK 4 release for a longer period.

If we can get away with something like "declare everything:

#ifdef __cplusplus
extern "C" {
#endif

Users would "only" need to recompile. But we'd need to confirm that we add this declaration in every place where it is needed.
Anyways, Marcus' proposal to have a cmake switch sounds intriguing but we'd still need to have a new major release. Otherwise it would be hard to tell is this v.2.x.x.x-compat or v2.y.y.y-fixed.
If we'd just fix the 3 failing kernels, we'd still break the API for a lot of users. And in that case one would need to adopt their code in every place they use one of these kernels.

I'd be really happy to discuss this situation. I'm sure we'll come up with a good solution.

@jwakely @yarda How urgent is your need for a fix? It'd be a bummer to not have VOLK in Fedora at all.

@yarda
Copy link
Contributor Author

yarda commented Jan 22, 2021

I think the same problem exists in older gnuradio where the volk is bundled and nobody has complained so far. This can mean that in Fedora it is not much used by ppc64le and s390x users. So I think I could go ahead now, bypass the tests and just document this problem. Another possible solution is to exclude gnuradio/volk from the ppc64le and s390x in Fedora, but I think it could be somehow usable even with this known problem.

In the meantime I hit another problem on s390x (not on ppc64le) which I haven't time to investigate, the /usr/bin/list_cpu_features binary is not built there, is it expected?

@jdemel
Copy link
Contributor

jdemel commented Jan 31, 2021

@yarda I don't know about list_cpu_features. It is part of cpu_features which we just use as a submodule. I suggest to ask this question to the cpu features people.

If a cpu_features package were widely available, I'd argue we'd remove the submodule and just depend on the package. But that doesn't seem to be the case. In the meantime, I think, we shouldn't install it at all. This might only cause issues if multiple projects do the same.

@yarda
Copy link
Contributor Author

yarda commented Feb 2, 2021

@yarda I don't know about list_cpu_features. It is part of cpu_features which we just use as a submodule. I suggest to ask this question to the cpu features people.

If a cpu_features package were widely available, I'd argue we'd remove the submodule and just depend on the package. But that doesn't seem to be the case. In the meantime, I think, we shouldn't install it at all. This might only cause issues if multiple projects do the same.

It's caused by the following volk CMakeLists.txt code:

if(CMAKE_SYSTEM_PROCESSOR MATCHES
    "(^mips)|(^arm)|(^aarch64)|(x86_64)|(AMD64|amd64)|(^i.86$)|(^powerpc)|(^ppc)")
  option(VOLK_CPU_FEATURES "Volk uses cpu_features" ON)
else()
  option(VOLK_CPU_FEATURES "Volk uses cpu_features" OFF)
endif()

I.e. on the s390x the cpu_features are disabled. I quickly checked the cpu_features code and it seems it doesn't support s390x. The VOLK_CPU_FEATURES=ON also builds the list_cpu_features binary, which doesn't seem to be used anywhere except some cpu_features demo script. So I will probably drop the list_cpu_features binary and will not package it to Fedora.

@jdemel
Copy link
Contributor

jdemel commented Feb 13, 2021

@yarda I suppose this is the correct way to go for VOLK as well. We shouldn't add tools that are not part of our core project.

@jdemel
Copy link
Contributor

jdemel commented Feb 13, 2021

VOLK does have an issue with its interface on s390x and ppc64le. I hope this won't be an issue for other architectures. Still, I think it is reasonable to find a solution for VOLK that does not rely on undefined behavior. Otherwise it'll bite us some day.

Here I'd like to outline a few thoughts I had to fix this issue. Comments and feedback are welcome.

Since VOLK is a C project, I'd argue that it is reasonable to stick with C. Thus, my idea for an improved interface would be to strictly follow C. Then we can build bindings around it. I don't want to remove our C++ convenience though.

Our C interface is supposed to look like this:

lv_32fc_t *result = (lv_32fc_t*) volk_malloc(sizeof(lv_32fc_t) * 16, volk_alignment() );
lv_32fc_t *source = (lv_32fc_t*) volk_malloc(sizeof(lv_32fc_t) * 16, volk_alignment() );
lv_32fc_t value = lv_32fc_t(1.0f, 1.0f);
volk_32fc_s32fc_operation_32fc(result, source, value, 16);

Vectors are passed around by pointer and scalars are passed by value. I'd like to have a wrapper for C++ that works smth like this:

volk::vector<std::complex<float> > result(16);
volk::vector<std::complex<float> > source(16);
std::complex<float> value(1.0f, 1.0f);
volk_32fc_s32fc_operation_32fc(result.data(), source.data(), value, result.size());

(I just wrote this code from the top of my head. It might have some issue.)
Basically, this is how we use VOLK now. I suppose we can discuss simplifications. Anyways, the C++ function call should just be an interface for the C function.

I want to avoid to write extra code for the C++ interface for VOLK developers. For VOLK users I want to avoid the need to do casts like (lv_32fc_t*) result.data(). If we could tell the compiler that these two floats in consecutive memory should just be interpreted like std::complex<float> in C++ and like lv_32fc_t in C that would be great.
I hope we don't break anything by requiring that complex values are interleaved IQ in consecutive memory.
VOLK function calls should have as little overhead as possible. Preferably none. We want to provide functions that are optimized for speed after all. Thus, if all the C++ to C and back interface can be dealt with at compile time that would be great. Here, the question would be: How do we verify that?

@marcusmueller marcusmueller changed the title Failing tests on s390x and ppc64le Failing tests on s390c, ppc64le, and MIPS Nov 6, 2023
@marcusmueller
Copy link
Member

As per maintainer request, re-titled the issue to include MIPS.

@marcusmueller marcusmueller removed Future Future Work Low Low Priority labels Nov 6, 2023
argilo added a commit to argilo/gnuradio that referenced this issue Nov 15, 2023
Passing lv_32fc_t arguments by value results in Undefined Behaviour, as
explained in gnuradio/volk#442. To avoid this,
we use VOLK 3.1.0's updated API, where lv_32fc_t arguments are passed in
using pointers.

Three of the affected volk_32fc_s32fc_multiply_32fc calls (all in
gr-dtv) did not actually use a complex scalar, so I switched those calls
to volk_32f_s32f_multiply_32f instead.

Signed-off-by: Clayton Smith <argilo@gmail.com>
argilo added a commit to argilo/gnuradio that referenced this issue Nov 15, 2023
Passing lv_32fc_t arguments by value results in Undefined Behaviour, as
explained in gnuradio/volk#442. To avoid this,
we use VOLK 3.1.0's updated API, where lv_32fc_t arguments are passed in
using pointers.

Three of the affected volk_32fc_s32fc_multiply_32fc calls (all in
gr-dtv) did not actually use a complex scalar, so I switched those calls
to volk_32f_s32f_multiply_32f instead.

Signed-off-by: Clayton Smith <argilo@gmail.com>
willcode pushed a commit to gnuradio/gnuradio that referenced this issue Dec 6, 2023
Passing lv_32fc_t arguments by value results in Undefined Behaviour, as
explained in gnuradio/volk#442. To avoid this,
we use VOLK 3.1.0's updated API, where lv_32fc_t arguments are passed in
using pointers.

Three of the affected volk_32fc_s32fc_multiply_32fc calls (all in
gr-dtv) did not actually use a complex scalar, so I switched those calls
to volk_32f_s32f_multiply_32f instead.

Signed-off-by: Clayton Smith <argilo@gmail.com>
willcode pushed a commit to willcode/gnuradio that referenced this issue Dec 9, 2023
Passing lv_32fc_t arguments by value results in Undefined Behaviour, as
explained in gnuradio/volk#442. To avoid this,
we use VOLK 3.1.0's updated API, where lv_32fc_t arguments are passed in
using pointers.

Three of the affected volk_32fc_s32fc_multiply_32fc calls (all in
gr-dtv) did not actually use a complex scalar, so I switched those calls
to volk_32f_s32f_multiply_32f instead.

Signed-off-by: Clayton Smith <argilo@gmail.com>
(cherry picked from commit bdaf5df)
Signed-off-by: Jeff Long <willcode4@gmail.com>
willcode pushed a commit to gnuradio/gnuradio that referenced this issue Dec 9, 2023
Passing lv_32fc_t arguments by value results in Undefined Behaviour, as
explained in gnuradio/volk#442. To avoid this,
we use VOLK 3.1.0's updated API, where lv_32fc_t arguments are passed in
using pointers.

Three of the affected volk_32fc_s32fc_multiply_32fc calls (all in
gr-dtv) did not actually use a complex scalar, so I switched those calls
to volk_32f_s32f_multiply_32f instead.

Signed-off-by: Clayton Smith <argilo@gmail.com>
(cherry picked from commit bdaf5df)
Signed-off-by: Jeff Long <willcode4@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants