Rewrite connection reaper test with timeout #51038

fatkodima · 2024-02-10T11:11:44Z

Trying to debug reaper_test.rb.

fatkodima · 2024-02-10T15:12:09Z

After the second attempt (#50037), I was finally be able to debug that flaky test that is cancelled after the 30m timeout time!

See https://buildkite.com/rails/rails/builds/104764#018d9373-57da-4bb8-a019-e1e2a9c12f41/1156-1587

We fork a process in

rails/activerecord/test/cases/reaper_test.rb

Line 131 in 0f9aaa5

pid = fork do

but it never gets run and the runtime is blocked waiting for its termination

rails/activerecord/test/cases/reaper_test.rb

Line 145 in 0f9aaa5

Process.waitpid(pid)

We can wrap that waitpid with Timeout.timeout, but it will be better to understand why the process did not run? Do buildkite have a limit on some number of processes and we had leaked processes from somewhere else? Or what can be the reason?

cc @byroot Do you have any ideas?

byroot · 2024-02-11T20:57:08Z

but it never gets run and the runtime is blocked waiting for its termination

If the parent process is blocked on wait, it means the child process is still alive. So in a way, it "gets run".

It likely run into some sort of deadlock.

fatkodima · 2024-02-11T20:59:21Z

But there is no printing in child from https://github.com/rails/rails/pull/51038/files#diff-db41b8a333b0b2a49f0b5cd5361a65426419bea8746069f52f07d0ab80f359f8R149. What can be the reason?

matthewd · 2024-02-12T04:17:07Z

$stdout.buffer?

byroot · 2024-02-12T09:08:51Z

What can be the reason?

This test is about restarting the reaper thread in the child, which we do from an after_fork callback that is executed before everything else. So it probably is deadlocked in there.

rails/activesupport/lib/active_support/fork_tracker.rb

Lines 19 to 25 in a8d6d47

    
           def after_fork_callback 
        
             new_pid = Process.pid 
        
             if @pid != new_pid 
        
               @callbacks.each(&:call) 
        
               @pid = new_pid 
        
             end 
        
           end

fatkodima · 2024-02-12T10:47:50Z

@byroot Do you have an idea on why that can be and how to solve it? This test is so annoying ... If this is not easily solvable/debuggable, maybe we can wrap it into some timeout to reduce the damage, until it gets solved?

byroot · 2024-02-12T12:37:23Z

Do you have an idea on why that can be and how to solve it?

No, I'd need to dig into it. I barely had a look.

maybe we can wrap it into some timeout to reduce the damage, until it gets solved?

It's probably a good idea in general when forking to then invoke code. Might be worth extracting some sort of test helper.

The proper way to timeout a forked process being to use a pipe, allowing to do reader_pipe.wait_readable(timeout).

fatkodima · 2024-02-17T18:44:35Z

@byroot Rewrote that test with your suggestion. Please take a look.

CI is red, but seems unrelated.

byroot · 2024-02-17T18:46:02Z

activerecord/test/cases/reaper_test.rb

+            Process.waitpid(pid)
+            assert_predicate $?, :success?
+          else
+            Process.kill("KILL", pid)


You should call Process.wait even after a SIGKILL otherwise the child will linger around as a zombie.

Also, maybe excessive, but rather than SIGKILL, you could send SIGABRT and cause the child process to print a Ruby crash report, which would include its main thread backtrace, giving more information on where it's stuck.

byroot · 2024-02-17T18:47:14Z

activerecord/test/cases/reaper_test.rb

+          if completed
+            Process.waitpid(pid)
+            assert_predicate $?, :success?
+          else
+            Process.kill("KILL", pid)
+            flunk("Process timed out")


Suggested change

if completed

Process.waitpid(pid)

assert_predicate $?, :success?

else

Process.kill("KILL", pid)

flunk("Process timed out")

unless completed

Process.kill("KILL", pid)

end

_, status = Process.wait2(pid)

assert_predicate status, :success?

fatkodima · 2024-02-17T18:57:23Z

Thanks! Applied all the suggestions.

rails-bot bot added actioncable actionmailbox actionmailer actionpack actiontext actionview activejob activemodel activerecord activestorage activesupport docs railties labels Feb 10, 2024

fatkodima force-pushed the debug-ci branch from 1a3504b to 71a6157 Compare February 10, 2024 14:25

fatkodima force-pushed the debug-ci branch from 4ab54ea to cd40fa0 Compare February 17, 2024 18:31

fatkodima marked this pull request as ready for review February 17, 2024 18:31

fatkodima removed actionmailer actionpack activemodel activesupport railties docs actionview labels Feb 17, 2024

fatkodima removed activejob actioncable activestorage actionmailbox actiontext labels Feb 17, 2024

fatkodima changed the title ~~Debug CI flaky test~~ Rewrite connection reaper test with timeout Feb 17, 2024

byroot reviewed Feb 17, 2024

View reviewed changes

Rewrite connection reaper test with timeout

4880ec6

fatkodima force-pushed the debug-ci branch from cd40fa0 to 4880ec6 Compare February 17, 2024 18:56

byroot merged commit 057563a into rails:main Feb 17, 2024
3 of 4 checks passed

fatkodima deleted the debug-ci branch February 17, 2024 20:44

yahonda mentioned this pull request Feb 20, 2024

SQLite3Adapter test using ruby master branch got Received cancellation signal after 30 min #49841

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite connection reaper test with timeout #51038

Rewrite connection reaper test with timeout #51038

fatkodima commented Feb 10, 2024

fatkodima commented Feb 10, 2024 •

edited

byroot commented Feb 11, 2024

fatkodima commented Feb 11, 2024

matthewd commented Feb 12, 2024

byroot commented Feb 12, 2024

fatkodima commented Feb 12, 2024

byroot commented Feb 12, 2024

fatkodima commented Feb 17, 2024

byroot Feb 17, 2024

byroot Feb 17, 2024

byroot Feb 17, 2024

fatkodima commented Feb 17, 2024

Rewrite connection reaper test with timeout #51038

Rewrite connection reaper test with timeout #51038

Conversation

fatkodima commented Feb 10, 2024

fatkodima commented Feb 10, 2024 • edited

byroot commented Feb 11, 2024

fatkodima commented Feb 11, 2024

matthewd commented Feb 12, 2024

byroot commented Feb 12, 2024

fatkodima commented Feb 12, 2024

byroot commented Feb 12, 2024

fatkodima commented Feb 17, 2024

byroot Feb 17, 2024

Choose a reason for hiding this comment

byroot Feb 17, 2024

Choose a reason for hiding this comment

byroot Feb 17, 2024

Choose a reason for hiding this comment

fatkodima commented Feb 17, 2024

fatkodima commented Feb 10, 2024 •

edited