Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce flakiness and document reasons for flakiness #1348

Merged
merged 1 commit into from Dec 15, 2015

Conversation

nfuller
Copy link
Collaborator

@nfuller nfuller commented Jan 23, 2015

Android has been receiving reports of some tests being flaky
on what are probably lower-spec devices.

This introduces delays into tests where sockets are being
poisoned after the entire response body has been written to
them and where there are follow-up requests.

This change also improves the documentation for the problematic
SocketPolicy values.

@@ -28,6 +28,11 @@
/**
* Close the socket after the response. This is the default HTTP/1.0
* behavior.
*
* <p>Note: Be careful if this used in a test and it is not the final queued request. The client
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, triggering this flakiness is the entire motivation here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests appear to be testing "what happens if a pooled connection has gone bad?", not "what happens if the server closes the connection at an arbitrary point during the request / response".

You agree it's non-deterministic? Just to be clear in case my comments didn't explain:

I've seen this policy used in two ways:

  1. Advertise a body as being N bytes, but disconnect before delivering the full N bytes.
  2. Advertise a body as being N bytes, deliver the full N bytes, then (after time T) disconnect.

(1) is fine, since the client is blocked waiting for N bytes.
(2) is problematic if the test is trying to test "what happens if a pooled connection is no longer valid" : the client receives the N bytes, then the result depends on how long T is. Usually the test is expecting the connection to be closed by the time a second request is issued. However, because time T is varied, the test can move on to make another request and have the connection closed in the middle or after the client has issued the request.

If the server closes the connection before request 2 has been made the mock server needs to be set up to expect 2 requests. If it closes it after request 2 has been made and OkHttp retries, then it needs to be set up to expect 3 requests.

@swankjesse
Copy link
Member

You fixed the tests, but I think the right solution is fixing OkHttp itself.

Android has been receiving reports of some tests being flaky
on what are probably lower-spec devices.

This introduces delays into tests where sockets are being
poisoned after the entire response body has been written to
them *and* where there are follow-up requests.

This change also improves the documentation for the problematic
SocketPolicy values.
@nfuller
Copy link
Collaborator Author

nfuller commented Feb 11, 2015

Updated the documentation.

It's possible there are also problems with OkHttp not retrying these when it should but I haven't investigated too much beyond working out a reason for why they were flaky.

The problems manifest as problems when reading the response:

01-09 11:23:44 I/VGLV8S7999999999: com.squareup.okhttp.internal.http.URLConnectionTest#postFailsWithChunkedRequestForLargeRequest FAIL
java.net.SocketException: recvfrom failed: ECONNRESET (Connection reset by peer)
at libcore.io.IoBridge.maybeThrowAfterRecvfrom(IoBridge.java:592)
at libcore.io.IoBridge.recvfrom(IoBridge.java:556)
at java.net.PlainSocketImpl.read(PlainSocketImpl.java:485)
at java.net.PlainSocketImpl.access$000(PlainSocketImpl.java:37)
at java.net.PlainSocketImpl$PlainSocketInputStream.read(PlainSocketImpl.java:237)
at okio.Okio$2.read(Okio.java:113)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:147)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:94)
at com.squareup.okhttp.internal.http.HttpConnection.readResponse(HttpConnection.java:179)
at com.squareup.okhttp.internal.http.HttpTransport.readResponseHeaders(HttpTransport.java:101)
at com.squareup.okhttp.internal.http.HttpEngine.readResponse(HttpEngine.java:628)
at com.squareup.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:388)
at com.squareup.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:332)
at com.squareup.okhttp.internal.http.HttpURLConnectionImpl.getInputStream(HttpURLConnectionImpl.java:199)
at com.squareup.okhttp.internal.http.URLConnectionTest.assertContent(URLConnectionTest.java:3016)
at com.squareup.okhttp.internal.http.URLConnectionTest.assertContent(URLConnectionTest.java:3020)
at com.squareup.okhttp.internal.http.URLConnectionTest.reusedConnectionFailsWithPost(URLConnectionTest.java:2644)
at com.squareup.okhttp.internal.http.URLConnectionTest.postFailsWithChunkedRequestForLargeRequest(URLConnectionTest.java:2613)
at java.lang.reflect.Method.invoke(Native Method)
at java.lang.reflect.Method.invoke(Method.java:372)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:24)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
at org.junit.runner.JUnitCore.run(JUnitCore.java:136)
at android.support.test.runner.AndroidJUnitRunner.onStart(AndroidJUnitRunner.java:245)
at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1853)
Caused by: android.system.ErrnoException: recvfrom failed: ECONNRESET (Connection reset by peer)
at libcore.io.Posix.recvfromBytes(Native Method)
at libcore.io.Posix.recvfrom(Posix.java:185)
at libcore.io.BlockGuardOs.recvfrom(BlockGuardOs.java:250)
at libcore.io.IoBridge.recvfrom(IoBridge.java:553)

@swankjesse
Copy link
Member

I still fear that we're fixing the tests when we should be fixing the implementation. The motivation for these wacky socket policies is to put the client in an awkward state, and to confirm that the client isn't flaky even when the server is flaky.

@swankjesse swankjesse merged commit 432ca1e into square:master Dec 15, 2015
@swankjesse
Copy link
Member

Rebased & merged.

@swankjesse
Copy link
Member

Epic delay I know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants