Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io.grpc.okhttp.TlsTest hostnameVerifierFails_fails is flaky #11012

Closed
ejona86 opened this issue Mar 13, 2024 · 3 comments · Fixed by #11266
Closed

io.grpc.okhttp.TlsTest hostnameVerifierFails_fails is flaky #11012

ejona86 opened this issue Mar 13, 2024 · 3 comments · Fixed by #11266
Assignees
Labels
highly flaky Issue is for a test that is crying wolf
Milestone

Comments

@ejona86
Copy link
Member

ejona86 commented Mar 13, 2024

We've seen lots of flakes. I wanna say mostly in Kokoro, but this is the one most recently I saw.

https://github.com/grpc/grpc-java/actions/runs/8256184976/job/22584330763#step:7:1556

io.grpc.okhttp.TlsTest > hostnameVerifierFails_fails FAILED
    java.lang.AssertionError: Resources could not be released in time at the end of test: [ServerImpl{logId=280, transportServer=io.grpc.okhttp.OkHttpServer@9730004}]
        at io.grpc.testing.GrpcCleanupRule.after(GrpcCleanupRule.java:201)
        at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:59)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
        at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
        at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
        at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112)
        at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
        at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
        at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
        at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
        at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
        at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
        at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
        at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
        at org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176)
        at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
        at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
        at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
        at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
        at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:113)
        at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
        at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
        at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)

CC @larry-safran

@ejona86 ejona86 added the highly flaky Issue is for a test that is crying wolf label Mar 13, 2024
@ejona86 ejona86 added this to the 1.63 milestone Mar 13, 2024
@larry-safran
Copy link
Contributor

larry-safran commented Mar 14, 2024 via email

@ejona86
Copy link
Member Author

ejona86 commented Mar 14, 2024

This reproduces at a rate of ~4% inside Google. Running just the one failing test has 100% pass rate. Since the tests don't share state, it would either be a race with a shared resource (like executors) or you have to wait for the JIT to make the race possible to hit in that environment.

Changing the timeout to 30s didn't change the failure rate, so something is definitely stuck.

@ejona86
Copy link
Member Author

ejona86 commented Mar 22, 2024

@YifeiZhuang YifeiZhuang modified the milestones: 1.63, 1.64 Apr 4, 2024
@temawi temawi modified the milestones: 1.64, 1.65 May 2, 2024
ejona86 added a commit to ejona86/grpc-java that referenced this issue Jun 6, 2024
Using --runs_per_test=1000, this changes the flake rate of TlsTest from
2% to 0%.

While I believe it is possible to write a reliable test for this
(including noticing the SSLSocket behavior), it was becoming too
invasive so I gave up.

Fixes grpc#11012
ejona86 added a commit to ejona86/grpc-java that referenced this issue Jun 6, 2024
Using --runs_per_test=1000, this changes the flake rate of TlsTest from
2% to 0%.

While I believe it is possible to write a reliable test for this
(including noticing the SSLSocket behavior), it was becoming too
invasive so I gave up.

Fixes grpc#11012
@ejona86 ejona86 self-assigned this Jun 6, 2024
ejona86 added a commit that referenced this issue Jun 6, 2024
Using --runs_per_test=1000, this changes the flake rate of TlsTest from
2% to 0%.

While I believe it is possible to write a reliable test for this
(including noticing the SSLSocket behavior), it was becoming too
invasive so I gave up.

Fixes #11012
larry-safran pushed a commit to larry-safran/grpc-java that referenced this issue Jun 25, 2024
Using --runs_per_test=1000, this changes the flake rate of TlsTest from
2% to 0%.

While I believe it is possible to write a reliable test for this
(including noticing the SSLSocket behavior), it was becoming too
invasive so I gave up.

Fixes grpc#11012
larry-safran pushed a commit to larry-safran/grpc-java that referenced this issue Jun 25, 2024
Using --runs_per_test=1000, this changes the flake rate of TlsTest from
2% to 0%.

While I believe it is possible to write a reliable test for this
(including noticing the SSLSocket behavior), it was becoming too
invasive so I gave up.

Fixes grpc#11012
larry-safran added a commit that referenced this issue Jun 25, 2024
Using --runs_per_test=1000, this changes the flake rate of TlsTest from
2% to 0%.

While I believe it is possible to write a reliable test for this
(including noticing the SSLSocket behavior), it was becoming too
invasive so I gave up.

Fixes #11012

Co-authored-by: Eric Anderson <ejona@google.com>
larry-safran added a commit that referenced this issue Jun 25, 2024
Using --runs_per_test=1000, this changes the flake rate of TlsTest from
2% to 0%.

While I believe it is possible to write a reliable test for this
(including noticing the SSLSocket behavior), it was becoming too
invasive so I gave up.

Fixes #11012

Co-authored-by: Eric Anderson <ejona@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
highly flaky Issue is for a test that is crying wolf
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants