Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests segfault on i386 #3637

Closed
nmeum opened this issue Jul 23, 2023 · 18 comments
Closed

Tests segfault on i386 #3637

nmeum opened this issue Jul 23, 2023 · 18 comments

Comments

@nmeum
Copy link

nmeum commented Jul 23, 2023

On Alpine Linux Edge, we noticed that the botan test suite segfaults in the whirlpool.cpp code on i386.

The relevant part of the backtrace looks as follows:

Threefish-512/OCB(32) ran 41 tests in 0.12 msec all ok
TripleDES/EAX ran 818 tests in 3.23 msec all ok
Twofish/EAX ran 1602 tests in 4.60 msec all ok

Thread 4 "botan-test" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1689]
0xf7a4b2b5 in Botan::rotr<32u, unsigned long long> (input=<optimized out>) at build/include/botan/internal/rotate.h:36
36	   return static_cast<T>((input >> ROT) | (input << (8 * sizeof(T) - ROT)));
(gdb) bt
#0  0xf7a4b2b5 in Botan::rotr<32u, unsigned long long> (input=<optimized out>) at build/include/botan/internal/rotate.h:36
#1  Botan::Whirlpool::compress_n (this=0xf72756c0, in=0xf7ec1080 "\200", blocks=1) at src/lib/hash/whirlpool/whirlpool.cpp:114
#2  0xf7a275da in Botan::MDx_HashFunction::final_result (this=0xf72756c0, output=0xf7ec1200 "")
    at src/lib/hash/mdx_hash/mdx_hash.cpp:102
#3  0x56755a64 in Botan::Buffered_Computation::final<std::vector<unsigned char, Botan::secure_allocator<unsigned char> > > (
    this=0xf72756c0) at build/include/botan/buf_comp.h:79
#4  Botan_Tests::(anonymous namespace)::Hash_Function_Tests::run_one_test (this=<optimized out>, algo=..., vars=...)
    at src/tests/test_hash.cpp:95

So it seems to segfault in Botan::rotr:

/**
* Bit rotation right by a compile-time constant amount
* @param input the input word
* @return input rotated right by ROT bits
*/
template <size_t ROT, typename T>
inline constexpr T rotr(T input)
requires(ROT > 0 && ROT < 8 * sizeof(T))
{
return static_cast<T>((input >> ROT) | (input << (8 * sizeof(T) - ROT)));
}

The invocation of Botan::rotr is rotr<32>(WHIRL_S[get_byte<4>(K4)]) from:

rotr<24>(WHIRL_S[get_byte<3>(K5)]) ^ rotr<32>(WHIRL_S[get_byte<4>(K4)]) ^

Dockerfile to reproduce the segfault:

FROM i386/alpine:edge

RUN apk update && apk add --no-cache build-base boost-dev bzip2-dev \
	sqlite-dev xz-dev zlib-dev python3 setarch git
RUN git clone --depth=1 https://github.com/randombit/botan.git /tmp/botan

RUN cd /tmp/botan && setarch i386 python3 ./configure.py
RUN cd /tmp/botan && make -j35 && env LD_LIBRARY_PATH="/tmp/botan" ./botan-test

Make sure to adjust the make jobs accordingly.

algitbot pushed a commit to alpinelinux/aports that referenced this issue Jul 23, 2023
See randombit/botan#3637

Note that this only seems to segfault on Edge and not on -stable.
@randombit
Copy link
Owner

This definitely looks like a miscompilation bug - get_byte returns a uint8_t and WHIRL_S is a table of 256 elements, so I really don't see what could be going wrong.

Thanks for the docker script, that certainly will help with reproducing. Presumably we can modify the code in some way to avoid the miscompilation similar to what was done in #3492

@randombit
Copy link
Owner

Unfortunately docker seems to be in some broken state on my machine, it may take some time before I can repro this.

@nekopsykose
Copy link

nekopsykose commented Jul 23, 2023

This definitely looks like a miscompilation bug

something to support this is the fact this doesn't crash built with clang. it also does not crash with -O0. it crashes with -O1 or anything higher (but not -Os).

@nekopsykose
Copy link

that said, "magic UB somewhere nobody saw" is also likely to crash with one and not another.. and sadly we don't have sanitizers on x86 to at least perform a few checks for that

@nekopsykose
Copy link

since it doesn't crash with -Os but does crash with -O1, that narrows the list of optimisations that can be looked at for which one breaks it..

@randombit
Copy link
Owner

OK well "good" news - I can reproduce this locally on an Arch Linux system with GCC 13.1.1 building for x86_32

Program received signal SIGSEGV, Segmentation fault.
0xf7f8d740 in Botan::Whirlpool::compress_n(unsigned char const*, unsigned int) () from ./libbotan-3.so.2
(gdb) bt
#0  0xf7f8d740 in Botan::Whirlpool::compress_n(unsigned char const*, unsigned int) () from ./libbotan-3.so.2
#1  0xf7f8cb08 in Botan::MDx_HashFunction::final_result(unsigned char*) () from ./libbotan-3.so.2
#2  0x5659255c in Botan_Tests::(anonymous namespace)::Hash_Function_Tests::run_one_test(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Botan_Tests::VarMap const&) ()
#3  0x565cf80c in Botan_Tests::Text_Based_Test::run() ()
#4  0x5657ddb6 in Botan_Tests::(anonymous namespace)::run_a_test(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#5  0x5657f263 in Botan_Tests::Test_Runner::run_tests(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) ()
#6  0x5657fb1a in Botan_Tests::Test_Runner::run(Botan_Tests::Test_Options const&) ()
#7  0x5656d9f6 in main ()

We have a CI build for x86_32 (https://github.com/randombit/botan/actions/runs/5636570107/job/15287907155) which isn't having problems, but that is on GCC 11.2

@randombit
Copy link
Owner

randombit commented Jul 24, 2023

Rebuilding with UbSan/Asan to see if that gives a clue

Update: with -O2 -fsanitize=address -fsanitize=undefined -fno-sanitize-recover=undefined the tests pass 😢

@randombit
Copy link
Owner

valgrind:

==760892== Invalid read of size 4
==760892==    at 0x4867740: rotr<32, long long unsigned int> (rotate.h:36)
==760892==    by 0x4867740: whirl (whirlpool.cpp:81)
==760892==    by 0x4867740: Botan::Whirlpool::compress_n(unsigned char const*, unsigned int) (whirlpool.cpp:124)
==760892==    by 0x4866B07: Botan::MDx_HashFunction::final_result(unsigned char*) (mdx_hash.cpp:102)
==760892==    by 0x14555B: final<> (buf_comp.h:79)
==760892==    by 0x14555B: Botan_Tests::(anonymous namespace)::Hash_Function_Tests::run_one_test(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Botan_Tests::VarMap const&) (test_hash.cpp:95)
==760892==    by 0x18280B: Botan_Tests::Text_Based_Test::run() (tests.cpp:1107)
==760892==    by 0x130DB5: Botan_Tests::(anonymous namespace)::run_a_test(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (test_runner.cpp:160)
==760892==    by 0x132262: Botan_Tests::Test_Runner::run_tests(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (test_runner.cpp:237)
==760892==    by 0x132B19: run_tests_multithreaded (test_runner.cpp:193)
==760892==    by 0x132B19: Botan_Tests::Test_Runner::run(Botan_Tests::Test_Options const&) (test_runner.cpp:137)
==760892==    by 0x1209F5: main (main.cpp:105)
==760892==  Address 0xc54bf6c0 is not stack'd, malloc'd or (recently) free'd

@nekopsykose
Copy link

nekopsykose commented Jul 24, 2023

since it doesn't crash with -Os

this was actually a random fluke, and.. now it does

(was passing it wrong, actually does pass with -Os)

@randombit
Copy link
Owner

I tried creating a minimal repro (https://gist.github.com/randombit/b0008fc5e1ee752b7fb62d6e0e78af12) which ... doesn't crash 😭

@randombit
Copy link
Owner

WTF - the repro does crash if I put the function whirl in an anonymous namespace

That to me puts this firmly into GCC bug territory.

@nekopsykose
Copy link

the current gist state then sounds small enough to be put in the gcc bugzilla. do you feel like it, or should we?

@randombit
Copy link
Owner

@nekopsykose
Copy link

thank you!

@randombit
Copy link
Owner

Even knowing what tickles the bug I cannot find a way of restructuring the library code in a way that prevents the miscompilation. I'd suggest in the meantime building with --disable-modules=whirlpool either specifically for x86-32 (if that's possible) or just universally. Whirlpool is pretty obscure so I don't think many people will miss it.

@nekopsykose
Copy link

since the Os does work, insofar as tests, (edited above) we set that for now. hopefully there's a response and it's not an ignored bug, in which case we get a real fix :)

@randombit
Copy link
Owner

GCC bug is at least confirmed now, hopefully someone will take a look at it

@randombit
Copy link
Owner

Bug is reportedly fixed in GCC master, unfortunately the patch did not make it in time to be included in GCC 13.2.

Since there is not (AFAICT) a way for us to work around this bug within our code I don't think there is any further action possible, so closing this.

Thanks again for reporting this, and for maintaining the Alpine packaging.

gentoo-bot pushed a commit to gentoo/gcc-patches that referenced this issue Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants