Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a race condition in EventedFileUpdateCheckerTest #47748

Merged

Conversation

casperisfine
Copy link
Contributor

Fix: #47746

We need to wait for the listener threads to exit.

@zzak I believe this is the real root cause.

Fix: rails#47746

We need to wait for the listener threads to exit.
@zzak
Copy link
Member

zzak commented Mar 23, 2023

This does seem to fix test_can_be_garbage_collected, but I'm still getting an error when I run the whole file in test_notifies_forked_processes but NOT when I just run that single test. Similar to the output here:
https://buildkite.com/rails/rails/builds/84109#4a72a57f-20d3-44e1-b0a1-61c6a0baf3b3/1191-1201

Failure:
EventedFileUpdateCheckerTest#test_notifies_forked_processes [/Users/zzak/code/rails/activesupport/test/evented_file_update_checker_test.rb:95]:
Expected: ""
  Actual: "Minitest::Assertion"

I wonder it's because they're sharing the same files? (ref)

Also was looking at #45061 to see if it was related.

Not at all familiar with these tests, so appreciate any feedback and thank you for taking the time to look into this!

@zzak
Copy link
Member

zzak commented Mar 23, 2023

Ok, I literally commented out every single other test in that file.

When I just run the single test it passes:

bin/test test/evented_file_update_checker_test.rb -n test_notifies_forked_processes

Running 20 tests in a single process (parallelization threshold is 50)
Run options: -n test_notifies_forked_processes --seed 44627

# Running:

.

Finished in 2.025592s, 0.4937 runs/s, 1.9747 assertions/s.
1 runs, 4 assertions, 0 failures, 0 errors, 0 skips

But when running the whole file, I get that failure:

bin/test test/evented_file_update_checker_test.rb

Failure:
EventedFileUpdateCheckerTest#test_notifies_forked_processes [/Users/zzak/code/rails/activesupport/test/evented_file_update_checker_test.rb:95]:
Expected: ""
  Actual: "Minitest::Assertion"


bin/test test/evented_file_update_checker_test.rb:47

.....

Finished in 40.116586s, 0.4985 runs/s, 1.2464 assertions/s.
20 runs, 50 assertions, 1 failures, 0 errors, 0 skips

🤯

@casperisfine
Copy link
Contributor Author

Interesting. I'll have a look tomorrow (unless you find the issue first).

@zzak
Copy link
Member

zzak commented Mar 24, 2023

Ok, the other 19 runs come from the shared test file, but I realized that just re-running the single test a bunch of times reproduces it about every 3rd or 4th try -- without having to comment out the other tests.

for i in {1..10}; do bin/test test/evented_file_update_checker_test.rb -n test_notifies_forked_processes; done

So it seems this test is just flakey and there isn't any sort of race condition or leak afaict.

@casperisfine
Copy link
Contributor Author

So it seems this test is just flakey and there isn't any sort of race condition or leak afaict.

I can see the same result. Starting to dig into that one as well.

But I'm thinking let's merge this one fix, as it's already a win, and I'll open another PR if I find a fix for that other one. If I can't find a fix, we can open an issue, or just consider deleting the test.

@byroot byroot merged commit ca195a8 into rails:main Mar 24, 2023
7 checks passed
@casperisfine casperisfine deleted the fix-race-condition-in-evented-file-update-checker branch March 24, 2023 08:57
@byroot
Copy link
Member

byroot commented Mar 24, 2023

Ok, so I have figured why test_notifies_forked_processes is flaky.

When you fork, the inotify (or FSEvent on macOS) file descriptor is inherited by the child.

Both the parent and the child will read in the same queue, so the event can be seen by either process but not both.

I'm unfamiliar with this code, I'm not sure what the intent of this forking test was, so I'll have to dig a bit in the history to understand. But the fix is likely to re-open these descriptors after fork.

@jonathanhefner
Copy link
Member

It's interesting that this fixes the failure. Since #40133, the listener threads should not hold a reference to the EventedFileUpdateChecker instance, neither directly nor indirectly. They should hold a reference to the EventedFileUpdateChecker::Core instance instead, via a EventedFileUpdateChecker::Core#changed method instance.

The only thing I can think of is that maybe the finalizer registry is holding on to the EventedFileUpdateChecker instance (due to this ObjectSpace.define_finalizer call) even after GC.start returns? In which case, waiting for the listener threads to terminate is comparable to waiting for @core.finalizer to complete and for the finalizer registry to (hopefully) flush the reference. But that is just speculation. 🤷‍♂️

@casperisfine
Copy link
Contributor Author

@jonathanhefner it's not define_finalizer, it's after_fork I think. Still need to fix that one.

@jonathanhefner
Copy link
Member

it's not define_finalizer, it's after_fork I think. Still need to fix that one.

The fork tracker should not be holding a reference to the EventedFileUpdateChecker instance either. It should be holding a reference to the EventedFileUpdateChecker::Core instance instead. (The after_fork call is in EventedFileUpdateChecker::Core#initialize.)

Additionally, when @core.finalizer is called, it unregisters the callback:

def finalizer
proc do
stop
ActiveSupport::ForkTracker.unregister(@after_fork)
end
end

So the EventedFileUpdateChecker::Core instance should also be garbage collectable.

🤷‍♂️

@byroot
Copy link
Member

byroot commented Mar 31, 2023

Hum, that's weird.

If you really want to know, the way to be sure what's going on would be to dump the heap to find what's keeping a ref on it. github.com/csfrancis/harb is great for that.

zzak pushed a commit to zzak/rails that referenced this pull request Jun 9, 2023
…nted-file-update-checker

Fix a race condition in EventedFileUpdateCheckerTest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants