Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Payment test failure on FreeBSD only (Version: 11.3) #2224

Closed
graydon opened this issue Aug 13, 2019 · 2 comments
Closed

Payment test failure on FreeBSD only (Version: 11.3) #2224

graydon opened this issue Aug 13, 2019 · 2 comments
Assignees
Labels

Comments

@graydon
Copy link
Contributor

graydon commented Aug 13, 2019

Strange bug found today when testing #2204 on FreeBSD: the payment transaction test fails with a slight numerical discrepancy in 7 cases. Does not fail on any other platform!

Example failure:

-------------------------------------------------------------------------------
payment
  protocol version 11
  send to self
  native
-------------------------------------------------------------------------------
transactions/test/PaymentTests.cpp:1534
...............................................................................

transactions/test/PaymentTests.cpp:1548: FAILED:
  REQUIRE( sendToSelf.getBalance() == minBalance2 - txfee )
with expansion:
  400001000 (0x17d787e8)
  ==
  400000900 (0x17d78784)
@graydon graydon added the bug label Aug 13, 2019
@graydon graydon self-assigned this Aug 13, 2019
@graydon
Copy link
Contributor Author

graydon commented Aug 14, 2019

First day of hunting ruled out a surprising number of possible sources of variation!

  • Failed to fix on FreeBSD by upgrading catch.hpp to newest
  • Failed to find any problems with --enable-asan
  • Failed to reproduce on Linux by trying:
    • Exact version of clang (6.0.1) used on FreeBSD
    • Use of libc++ rather than libstdc++, as on FreeBSD
    • Use of libcxxrt rather than libc++abi for the C++ ABI layer, as on FreeBSD

Only thing we did discover was a hunch @jonjove had: the failing asserts are all REQUIRE lines in the testsuite that occur after the end of a SECTION. Moving one if those REQUIREs back inside the previous SECTION does fix the bug. So the hypothesis here is that there's something about how SECTION is implemented that's at fault (and hopefully: nothing actually wrong with the payments code).

Still a mystery though!

@graydon
Copy link
Contributor Author

graydon commented Aug 14, 2019

Mystery solved: there was a bug in the C++ ABI runtime library that FreeBSD uses (the pathscale libcxxrt, rather than libc++abi on macos or most linux libc++ deployments). The library provides a handful of low-level services including exception handling routines, and it was mis-counting the number of uncaught exceptions in some cases: https://github.com/pathscale/libcxxrt/issues/45

This in turn causes the Catch section-retrying machinery to fail because it asks about the number of uncaught exceptions. The exact set of circumstances in Catch code that can cause this isn't entirely clear to me -- I tried reading Catch's retry logic but got pretty lost -- but the symptom is "skipping or re-running sections unexpectedly". There are a few reproductions in the catch tracker, as well as a note in the known issues section:

https://github.com/catchorg/Catch2/blob/master/docs/limitations.md#clangg----skipping-leaf-sections-after-an-exception

catchorg/Catch2#1028
catchorg/Catch2#807
catchorg/Catch2#352
catchorg/Catch2#271

With this knowledge in hand, I rebuilt an LLVM toolchain for linux atop libc++ and libcxxrt before its bugfix, and was able to reproduce the failure in stellar-core on linux.

The only suggested fix from Catch is to upgrade the libcxxrt in question. This is not especially easy to do on FreeBSD in-place without recompiling the libc++ as well; the few attempts I've made failed. Given that FreeBSD 12.1 will be out in November, they've not yet branched for that release, and the fix to libcxxrt in question has been merged in FreeBSD SVN, I think we can just close this with a "known issue" warning to anyone testing on FreeBSD until then. This bug isn't a logic problem in stellar-core.

@graydon graydon closed this as completed Aug 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant