-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Incoherent job statepoint access in subprocess #528
Comments
What commands to you use to start this script? |
@csadorf Save the script as |
I think I was a bit stumped by the whole
At least for the forked process that's not true, is it? We iterate over the job and then need to load the state point from disk, no? |
@vyasr @csadorf I met with @joaander and we debugged this together. It turns out that the issue is indeed from my optimizations in #497 and can be explained like this:
I am not sure if I'm pointing to the right spot for the actual data corruption, but I can positively identify that every thread is instantiating its own Possible solutions:
|
I'd have to look at the code again, but from my recollections step 4 doesn't quite make sense to me. Locks are never stored per-instance level; to avoid precisely this problem, all locks are stored in a class-level dictionary mapping You said that this only fails if you fork, right? If you run the ThreadPool on process 0 without forking it works as expected? If so, somehow the forking is critical to triggering this failure. Assuming I'm remembering that correctly, perhaps the problem is related to the way that I'm creating the A potentially useful basis for comparison would be the tests for multithreaded behavior that I previously wrote in |
Yes, that's what I meant in "3." above. Every thread shares the same |
No, not separate instances. When locks are instantiated here, they are stored in a dictionary owned by the class. So for example if I do
I should see something like
Instances never own RLocks. What I'm asking is if it is possible that either the |
Locks should always be accessed via the property |
We confirmed via |
|
Got it. In order for that to happen, one of the following must be true:
|
Perhaps this would be easiest sorted out in a call? |
Just had a call with @bdice and @joaander. There are two classes of problems that need to be resolved.
|
Description
I was facing a strange bug in signac-flow's tests and have identified it is occurring because of some issue in signac.
I did a git bisect and it looks like the issue was introduced in PR #497 but I can't tell why.
There's some kind of a race condition that leads to
job.sp
returning{}
even though there should be data in the state point.This happens both with and without buffering.
The job is opened by statepoint, so there is no lazy state point access.
To reproduce
Here's a minimal failing example.
System configuration
Please complete the following information:
master
is pointing to 786d75f)The text was updated successfully, but these errors were encountered: