Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cvise stops intermittently #113

Open
avikivity opened this issue May 25, 2023 · 7 comments
Open

cvise stops intermittently #113

avikivity opened this issue May 25, 2023 · 7 comments

Comments

@avikivity
Copy link
Contributor

Occasionally I see

[1]+  Stopped                 cvise --clang-delta-std=c++20 --print-diff ./check.sh sstable_datafile_test.cc

And I have to restart the job with fg. This is of course problematic for unattended runs.

The interestingness test runs gdb in batch mode (gdb also runs the program). Perhaps gdb signals interfere with cvise?

@avikivity
Copy link
Contributor Author

A workaround is to send SIGCONT in a loop from some shell script, but I'd like to understand and fix it.

@marxin
Copy link
Owner

marxin commented May 26, 2023

Hmm, I haven't seen this behavior during my cvise use. So you say that the master process (cvise ...) got moved to the background and so that it's stopped? All the interestingness tests are run in a separate sub-process (using Pebble) library and it should not interact with the master process at all.
Can you get a Python back-trace of the master process when it gets moved to the background? Does it really happen only if the int. test uses gdb?

@avikivity
Copy link
Contributor Author

I don't know if it was moved into the background, or if something else happened.

I don't use cvise very often (but when I do, it's for multi-day reductions), so I can't tell if it's related to gdb or not. It seems likely since gdb plays with signals.

I don't know how to generate a Python backtrace (and if it's stopped, I'm sure I won't get once).

I guess a workaround is to package the interestingness test into a container, this should isolate any signals leakage. Still, it would be nice if cvise protected itself from this.

@marxin
Copy link
Owner

marxin commented Jun 2, 2023

I don't know if it was moved into the background, or if something else happened.

Well the described behavior seems pretty unusual.

I don't use cvise very often (but when I do, it's for multi-day reductions), so I can't tell if it's related to gdb or not. It seems likely since gdb plays with signals.

Anyway, can you please attach a reproduces I can run locally a try to reproduce it?

I guess a workaround is to package the interestingness test into a container, this should isolate any signals leakage. Still, it would be nice if cvise protected itself from this.

Well, that sounds like a solution, but C-Vise should not behaved like you described ;)

@avikivity
Copy link
Contributor Author

https://github.com/avikivity/scylladb/commits/bug-13730-investigation

Steps to reproduce:

  1. clone the repo into a Fedora 38 installation (or anything with clang 16 + all the dependencies)
  2. run ./cvise.sh
  3. wait for long, long hours

There's a container image with all the dependencies: docker.io/scylladb/scylla-toolchain:fedora-38-20230517
However, I did not try reproducing within the container, only on my Fedora 38 host. Note you'll need to run the container as --privileged since ptrace isn't available otherwise.

The problem reproduces rarely. I have a feeling it happens when the pass changes, but it hasn't happened enough times, and usually I wasn't looking when it did.

@marxin
Copy link
Owner

marxin commented Jun 5, 2023

Thanks for the reproducer. Note I'm changing a job right now and I will get to it in one month from now when I'll have a reasonable powerful machine to reproduce it on. Hope it's fine?

@marxin
Copy link
Owner

marxin commented Jul 7, 2023

All right, so I've changed a job and got a reasonably fast desktop machine.

Looking at your reproducer: can you please create a container (the provided one docker.io/scylladb/scylla-toolchain:fedora-38-20230517 seems to be unavailable), add there your git branch, and provide me a link, thanks! Note it seems one needs to have built things like /home/avi/scylla/build/release/seastar/libseastar.a (and probable others) in order to link the sstable_datafile_test_g binary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants