Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out why test (256) keeps failing #717

Closed
Strilanc opened this issue Mar 15, 2024 · 5 comments
Closed

Figure out why test (256) keeps failing #717

Strilanc opened this issue Mar 15, 2024 · 5 comments

Comments

@Strilanc
Copy link
Collaborator

Strilanc commented Mar 15, 2024

Sometimes (only sometimes!) it gets stuck in an infinite loop of printing out address sanitizer errors. Very concerning, but only seems to happen for the 256-avx case so far. This one is currently not allowed to go into the python wheels due to a previous mysterious crash, so this isn't urgent, but it sure would be nice to know why these things are occurring.

image

@Strilanc
Copy link
Collaborator Author

Strilanc commented Mar 15, 2024

Nevermind it has now also happened for test (64) and is therefore high priority.

image

@Strilanc
Copy link
Collaborator Author

Strilanc commented Mar 15, 2024

Based on #718 this is a bug in gtest rather than a bug in stim. Reported it in google/googletest#4491 .

@kimwalisch
Copy link

kimwalisch commented Mar 17, 2024

I have the same bug (AddressSanitizer:DEADLYSIGNAL) in my primecount project in the code below which only uses the math functions from the C++ standard library (my project does not use googletest):

  for (int i = 0; i < 100; i++)
  {
    T term = (Li(t) - x) * std::log(t);

    // Not converging anymore
    if (std::abs(term) >= std::abs(old_term))
      break;

    t -= term;
    old_term = term;
  }

My bug only occurs on Ubuntu 22.04 & 23.10 (x64) when running in a virtual machine and enabling the GCC/Clang sanitizers. When I switched my CI test to ubuntu-20.04 the bug disappeared. (When I tested using Ubuntu 22.04 & GCC sanitizers on a real server (no VM or Docker container) it also works without any issues)

After more than 2 hours of debugging I couldn't figure out the exact cause of the issue, but it looks like the issue is caused by a Ubuntu >= 22.04 bug or a compiler/sanitizer bug.


UPDATE 18/03/2024: Today I also tested on a Fedora 36 x64 VM using GCC and the same compiler options but I was not able to reproduce the issue. Hence the issue seems to only occur on Ubuntu x64 VMs (and possibly also on Debian VMs).

@Strilanc
Copy link
Collaborator Author

@kimwalisch Thanks, that's very helpful to know that I can work around it by pinning the version of ubuntu used by CI.

@Strilanc
Copy link
Collaborator Author

This seems to have been resolved externally. Hasn't happened in a PR for about a week now, whereas before it was happening multiple times per PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants