Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChildPidWatcher: fix a race condition when unregistering a pid #2814

Merged
merged 1 commit into from
Mar 29, 2023

Conversation

sporksmith
Copy link
Contributor

We were getting rare failures in CI unit tests, and consistent failures in large benchmarks. I was able to reproduce with:

for i in `seq 1000`; do
  echo trial $i
  if ! RUST_BACKTRACE=full cargo test --manifest-path=src/Cargo.toml -p shadow-rs childpid_watcher ; then
    break
  fi
done

I think what was happening is that we could get an epoll event for a dying process in the worker thread just as we were in the middle of unregistering it from another thread.

The simplest fix is to just let the worker thread handle unregistration so that it's serialized with other operations in that thread.

We were getting rare failures in CI unit tests, and consistent failures
in large benchmarks. I was able to reproduce with:

```
for i in `seq 1000`; do
  echo trial $i
  if ! RUST_BACKTRACE=full cargo test --manifest-path=src/Cargo.toml -p shadow-rs childpid_watcher ; then
    break
  fi
done
```

I think what was happening is that we could get an epoll event for a
dieing process in the worker thread just as we were in the middle of
unregistering it from another thread.

The simplest fix is to just let the worker thread handle unregistration
so that it's serialized with other operations in that thread.
@sporksmith sporksmith self-assigned this Mar 29, 2023
@github-actions github-actions bot added the Component: Main Composing the core Shadow executable label Mar 29, 2023
@sporksmith
Copy link
Contributor Author

Actually I'm going to take a look at a more thorough revision moving all the "work" into the worker thread ...

@sporksmith
Copy link
Contributor Author

Though maybe it makes sense to get this in now since it's causing the regular CI to be flaky in addition to breaking the runner

@sporksmith sporksmith merged commit b43386c into shadow:main Mar 29, 2023
@sporksmith sporksmith deleted the childpidwatcher-unreg2 branch March 29, 2023 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Main Composing the core Shadow executable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants