New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky OpenSSL tests #8118
Comments
I've tested 9.4.6.0 and I still see the same errors. One thing I've seen however is that the failing test case isn't always the same, eg. I've seen |
@jcharaoui That's good information. Some tests inherited from CRuby expect threads to not run in parallel, or to behave more predictably than parallel threads. I'll have a look at the test and see if I can suss out why it's flaky. |
Interesting! So I've tried adding So I tried instead with both I had a look around the issue tracker and found #7611 which is quite oddly familiar because the |
I'm wondering if the same bug could be also responsible for #8037. |
I ran a long loop on my M1 and could not reproduce the error, but your evidence still tells me that there are lingering threads causing future tests to fail. I'll see if I can figure out a cleaner way to shut them down when each test completes. |
The OpenURI tests only seem to use RubyVM in one test, but it is omitted if |
@jcharaoui Since you seem to be able to reproduce more easily than I can, could you try this patch and see if it helps? The shutdown logic for the threads seems sound, but I noticed it does not attempt to wait for the server thread, which could easily interfere with future tests. diff --git a/test/mri/open-uri/test_ssl.rb b/test/mri/open-uri/test_ssl.rb
index 3f94cab40f..b0f7231027 100644
--- a/test/mri/open-uri/test_ssl.rb
+++ b/test/mri/open-uri/test_ssl.rb
@@ -35,7 +35,7 @@ class TestOpenURISSL
:Port => 0})
_, port, _, host = srv.listeners[0].addr
threads = []
- server_thread = srv.start
+ threads << server_thread = srv.start
threads << Thread.new {
server_thread.join
if log_tester |
Here it is, let me know if you need anything else to track this down.
|
Thank you! I've started a loop with this patch, and will let you know the results. One thing you may find interesting (before and after the patch), is that the majority of iterations complete in just under one second. Some iterations, at random it seems, complete after just under 31 seconds, which seems to indicate that somtimes, the test is hitting a 30-second timeout of some sort. Furthermore, I had noticed that failed iterations would complete after just over 60 seconds, which might tell us that a failure happens when the timeout is getting hit twice? It might also not mean anything but just throwing it out there is case it's useful! |
Here's a log of the test loop running 35 times: https://paste.lib3.net/lavamind/2024-02-26-2Pao29HezYUjLqAL8zrChgpq70KKaB0IFZiycDT4GRk/stdin.txt It failed 3 times. As I mentioned, each of these failures happened after 60 seconds of wall clock time. Edit: this is with the patch at #8118 (comment) |
Some discoveries that might help:
This is not an issue with running tests in parallel; rather it seems the thread in the failing test (test_ssl.rb, test_validation_noverify) are hanging until the timeout triggers, but we have a bug in our |
I couldn't find For now I'm going to exclude these three tests from the autopkgtest in Debian, with a reference to this issue. Let me know if you think of anything else I should try. |
I hit another seemingly related error in a run of
From: https://salsa.debian.org/lavamind/jruby/-/jobs/5380922#L7735 I was able to reproduce it locally using the same sort of test loop:
I'm not sure if |
Hello,
I'm seeing intermittent
mri-stdlib
test failures with JRuby 9.4.5.0 on Debian testing/sid with OpenJDK 17.0.10.The failures are related to one specific test:
TestOpenURISSL#test_validation_noverify
.When I loop this command, usually after anywhere between one and forty runs, it always ends up failing:
while true; do echo "attempt #${n} ..."; jruby test/mri/runner.rb --excludes=test/mri/excludes:test/mri/excludes_wip test/mri/open-uri/test_ssl.rb; sleep 1; n=$(( n+1 )); done
When it fails, this is the stack trace:
I don't see any other failures like this, related to OpenSSL, in the full test suite run.
The text was updated successfully, but these errors were encountered: