Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.net.SocketTimeoutException from HTTP/2 connection leaves dead okhttp clients in pool #3146

Closed
kenyee opened this issue Jan 31, 2017 · 148 comments
Labels
bug Bug in existing code

Comments

@kenyee
Copy link

kenyee commented Jan 31, 2017

Tried writing a unit test w/ TestButler on Android w/ no luck, so I'll write up the steps to reproduce this and include some sample code. This happens if you connect to an HTTP/2 server and your network goes down while the okhttp client is connected to it:

  1. create an okhttp client
  2. tell it to read from the HTTP/2 server
  3. bring the network down
  4. tell it to read from the HTTP/2 server (it'll get a SocketTimeoutException)
  5. bring the network back up
  6. tell it to read from the HTTP/2 server again (it'll be stuck w/ SocketTimeoutExceptions)
  7. if you create new http clients at this point, it'll work, but the dead http client will eventually come back in the pool and fail.

okhttp client should attempt to reopen the HTTP/2 connection instead of being stuck in this state

Code sample for Android (create a trivial view w/ a button and a textview):

public class MainActivity extends AppCompatActivity {
    OkHttpClient okhttpClient = new OkHttpClient();

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        Button loadButton = (Button) findViewById(R.id.loadButton);
        TextView outputView = (TextView) findViewById(R.id.outputView);

        loadButton.setOnClickListener(view -> Observable.fromCallable(() -> {
                    Request request = new Request.Builder()
                            .url(<INSERT URL TO YOUR HTTP/2 SERVER HERE>)
                            .build();

                    Response response = okhttpClient.newCall(request).execute();

                    return response.body().string();
                })
                .subscribeOn(Schedulers.io())
                .observeOn(AndroidSchedulers.mainThread())
                .subscribe(outputView::setText, t -> outputView.setText(t.toString()))
        );
    }
}

@kenyee
Copy link
Author

kenyee commented Jan 31, 2017

FYI, we found a workaround...set the connectionPool in the builder so it uses a new connection pool w/ a size of zero and also turn off HTTP/2 support by setting a new protocolList in the builder with only HTTP/1.1 support.

@swankjesse
Copy link
Member

You’re using 3.6.0?

@kenyee
Copy link
Author

kenyee commented Jan 31, 2017

yep...3.6.0 unfortunately. Thought about rolling back to pre-http/2 support but that would mean 2.2 which is too far back because of all the okhttp3 dependencies :-(

@swankjesse
Copy link
Member

Oh that's terrible. We've had problems with similar failures before but I thought we'd fixed ’em all. If you can make a test case that'd be handy, otherwise I'll try to look soon.

In the interim you can disable HTTP/2 with the protocols list in the OkHttpClient.Builder.

@dave-r12
Copy link
Collaborator

dave-r12 commented Feb 1, 2017

Correct me if I'm wrong, but I think part of this is working as expected. HTTP/2 connections can carry N outstanding requests. If one of those requests times out and the HTTP/2 connection is closed, then the other N - 1 requests are also lost. I think the intent is that for HTTP/2 connections, a timeout does not necessarily mean the connection is bad.

Is it surprising to 'bring the network down' and not receive any sort of socket exception reading or writing?

@kenyee
Copy link
Author

kenyee commented Feb 1, 2017

N-1 requests being lost is fine if the connection is down.
The issue is that it doesn't recover when you bring the network back up...i.e., the broken idle connection objects are in the pool stay there and when you try connecting again, you can't connect until the user kills off your app to restart everything...

@kenyee
Copy link
Author

kenyee commented Feb 1, 2017

@swankjesse : I couldn't figure out how to write a test for this because making all the sockets disconnected was happening at at an OS level. Tried to write and Android Test Butler one (to flip the network switch on/off on an Android emulator) but the current version of that has issues and probably wouldn't work in this code base :-)

@swankjesse
Copy link
Member

So our attempts to write to the socket are failing silently? Might need to steal the automatic pings that we added for web sockets.

@kenyee
Copy link
Author

kenyee commented Feb 1, 2017

Essentially...not that they're failing silently, but they're dead sockets and they're stuck in the pool. We traced through a bit of the code and saw some code that was pulling the a dead socket out of the pool each time it tried to use one which should have cleared things up after 5 dead sockets were pulled out but the network layer still appeared stuck unless we purged the pool w/ evictAll() or waited for the 5 min eviction timeout. Wasn't obvious what a proper fix was...
HTTP/2 essentially behaves like web sockets so you're probably on the right track...

@laurencedawson
Copy link

Pretty sure this issue is another manifestation of this one:

#3118

@kenyee
Copy link
Author

kenyee commented Feb 2, 2017

I'm sure it's not. We don't see any SSL Handshake exceptions.

This bug is actually probably two bugs because we had to disable the connection pool and the HTTP/2 support. #3118 might be affected by the connection pool bug (it doesn't clear the broken idle connection objects in the pool).

@laurencedawson
Copy link

I've seen what you've described but then also the ssl exceptions. Same steps to reproduce as you outlined.

@aheliver
Copy link

aheliver commented Feb 6, 2017

any updates? I've same issue

@kenyee
Copy link
Author

kenyee commented Feb 6, 2017

The workaround I described works in our QA testing so far :-)

@aheliver
Copy link

aheliver commented Feb 6, 2017

@kenyee setting new pool works, but I wonder when an update will arrive?

@servlette
Copy link

servlette commented Mar 16, 2017

Is this issue resolved? I ran into the same issue using 3.5.0. I am using OkHttp to send push to Apple http/2. Yesterday I had this issue resulting in almost 80k push messages not getting delivered.

Caused by: java.net.SocketTimeoutException: timeout
	at okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.java:587) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.java:595) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http2.Http2Stream.getResponseHeaders(Http2Stream.java:140) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http2.Http2Codec.readResponseHeaders(Http2Codec.java:115) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:54) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[okhttp-3.5.0.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[okhttp-3.5.0.jar:?]
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:179) ~[okhttp-3.5.0.jar:?]
	at okhttp3.RealCall.execute(RealCall.java:63) ~[okhttp-3.5.0.jar:?]

After I got this error, none of my other requests succeeded.

Code:

KeyStore ks = KeyStore.getInstance("PKCS12");
      ks.load(new ByteArrayInputStream("/foo/bar/mycert"), password.toCharArray());


      KeyManagerFactory kmf = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm());
      kmf.init(ks, password.toCharArray());
      KeyManager[] keyManagers = kmf.getKeyManagers();
      SSLContext sslContext = SSLContext.getInstance("TLS");

      final TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
      tmf.init((KeyStore) null);
      sslContext.init(keyManagers, tmf.getTrustManagers(), null);


      TrustManager[] trustManagers = tmf.getTrustManagers();
      if (trustManagers != null && (trustManagers.length != 1 || !(trustManagers[0] instanceof X509TrustManager))) {
          throw new IllegalStateException("Unexpected default trust managers:"
                  + Arrays.toString(trustManagers));
      }
      final X509TrustManager trustManager = (X509TrustManager) trustManagers[0];
      final SSLSocketFactory sslSocketFactory = sslContext.getSocketFactory();

      OkHttpClient.Builder builder = new OkHttpClient.Builder();
      builder.connectTimeout(5, TimeUnit.SECONDS).writeTimeout(10, TimeUnit.SECONDS).readTimeout(10, TimeUnit.SECONDS);
      builder.connectionPool(new ConnectionPool(3, 10, TimeUnit.MINUTES));
      builder.sslSocketFactory(sslSocketFactory, trustManager);
    
          Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyHost, proxyPort));
          builder.proxy(proxy);  

      OkHttpClient client = builder.build();

@servlette
Copy link

servlette commented Mar 16, 2017

As socket timeout exception is an instance of IO exception, I am not sure if the following approach will work.
Can one of you pls get back to me?

I am calling evictAll() in the catch block of IOException.

try {
          response = client.newCall(request).execute();
          statusCode = response.code();
          responseBody = response.body().string();
      } catch (IOException ioe) {
          client.connectionPool().evictAll();
      } finally {
          if (response != null) {
              response.body().close();
          }
   
      }

Also how do we check if a connection is stale or not?

With Apache HttpClient, there is a way to do it to set a flag for checking stale connections.
Wondering how OkHttp3 checks for it internally before it uses the connection.

CloseableHttpClient client = HttpClients.custom().setDefaultRequestConfig(
    RequestConfig.custom().setStaleConnectionCheckEnabled(true).build()
).setConnectionManager(connManager).build();

@swankjesse swankjesse added this to the 3.7 milestone Mar 18, 2017
@swankjesse swankjesse added the bug Bug in existing code label Mar 18, 2017
@swankjesse swankjesse modified the milestones: 3.7, 3.8 Apr 28, 2017
@b95505017
Copy link

b95505017 commented May 22, 2017

Any updates? I have the same issue too. :(

@teobaranga
Copy link

Same issue here!

@ekstro
Copy link

ekstro commented May 31, 2017

We still experiencing the same issue :-(

@jpearl
Copy link

jpearl commented Jun 26, 2017

I think i'm seeing another manifestation of this on 3.5.0, when the server forcibly closes the connection.

We try to establish both a h2 and http1.1 connection. The server responds with 200 to both:

06-26 15:07:55.286 22094 22380 I okhttp3.OkHttpClient: --> GET<url> http/1.1
06-26 15:07:55.524 22094 22380 I okhttp3.OkHttpClient: --> GET<url> h2

06-26 15:07:55.596 22094 22380 I okhttp3.OkHttpClient: <-- 200  <url> (71ms)
06-26 15:07:55.597 22094 22380 I okhttp3.OkHttpClient: <-- 200  <url> (303ms)

Then at some point we try to read from the http2 connection, which fails in checkNotClosed and throws a StreamResetException

06-26 15:06:01.560 22094 22126 I MyProject: Caused by: okhttp3.internal.http2.StreamResetException: stream was reset: PROTOCOL_ERROR
06-26 15:06:01.560 22094 22126 I MyProject: 	at okhttp3.internal.http2.Http2Stream$FramedDataSource.checkNotClosed(Http2Stream.java:428)
06-26 15:06:01.560 22094 22126 I MyProject: 	at okhttp3.internal.http2.Http2Stream$FramedDataSource.read(Http2Stream.java:330)
06-26 15:06:01.560 22094 22126 I MyProject: 	at okio.ForwardingSource.read(ForwardingSource.java:35)
06-26 15:06:01.560 22094 22126 I MyProject: 	at okio.RealBufferedSource$1.read(RealBufferedSource.java:409)
06-26 15:06:01.560 22094 22126 I MyProject: 	at com.google.android.exoplayer.upstream.HttpDataSource.read(HttpDataSourceImpl.java:699)
06-26 15:06:01.560 22094 22126 I MyProject: 	at com.google.android.exoplayer.upstream.HttpDataSource.read(HttpDataSourceImpl.java:424)

Then, since this is media, we do something that causes a seek to 0 in the media, which needs to reopen the request from the beginning. At this point, we see the same exception as is posted above:

06-26 15:08:39.387 22094 22126 I MyProject: Caused by: java.net.SocketTimeoutException: timeout
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.java:587)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.java:595)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http2.Http2Stream.getResponseHeaders(Http2Stream.java:140)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http2.Http2Codec.readResponseHeaders(Http2Codec.java:115)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:54)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:212)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:212)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:179)
06-26 15:08:39.387 22094 22126 I MyProject: 	at okhttp3.RealCall.execute(RealCall.java:63)

this seems to be very similar to the other cases here, which seem to all be related to an ungraceful shutdown of the connection, and it remaining pooled.

I've also confirmed that disabling the ConnectionPool "works around" this issue:

OkHttpClient.Builder clientBuilder = new OkHttpClient.Builder()
            .connectTimeout(connectTimeoutMillis, TimeUnit.MILLISECONDS)
            .retryOnConnectionFailure(true)
            .readTimeout(readTimeoutMillis, TimeUnit.MILLISECONDS).connectionPool(new ConnectionPool(0, 1, TimeUnit.NANOSECONDS));

@Kiran89kumar
Copy link

is thr any update on this issue?

@swankjesse swankjesse added this to the 3.10 milestone Aug 30, 2017
@yschimke
Copy link
Collaborator

yschimke commented May 5, 2022

I was able to reproduce the problem reliably in one scenario related to cancellations (which become interrupts) while the stream is being created and direct IO is done against the connection socket. OkHttp advises against using interrupts, and there is a fair amount of work to eventually make this supported.

This case leads to OkHttp creating streams that it isn't tracking and the server sending data and consuming the available data window. New streams created then don't receive any data and that's why the stream then throws a SocketTimeoutException (not against an actual socket).

I have a fix hopefully going into Media3/ExoPlayer that should workaround this (sing enqueue instead of execute), so it would be good to test your reproduction against this fix. Not sure when that will be available unless you want to patch ExoPlayer yourself.

However, there are probably other paths that might be a problem. I'm pretty confident that this is the majority of these problems, it's way less problematic to cancel the stream after we have returned a Response.

@robertszuba
Copy link

@yschimke We are NOT using ExoPlayer. We are using just OkHttp+Retrofit, like in the provided project.

@yschimke
Copy link
Collaborator

yschimke commented May 5, 2022

Ahhhh, was mixing this up with comment by GouravSna.

Is there any chance you are interrupting threads doing OkHttp requests? That's still the only reproduction I have for this problem.

Interrupting could be a) Thread.interrupt, or b) future.cancel(true).

@robertszuba
Copy link

robertszuba commented May 5, 2022

Nope, that's basically the whole logic related to Okhttp request:

  public void makeRequest() {
    textView.setText("...");
    Call<String> repos = service.helloWorld();
    repos.enqueue(new Callback<String>() {
      @Override
      public void onResponse(Call<String> call, Response<String> response) {
        Log.d("TEST", response.body());
        textView.setText(response.body());
      }

      @Override
      public void onFailure(Call<String> call, Throwable t) {
        t.printStackTrace();
        textView.setText(t.getMessage());
      }
    });
  }

@yschimke
Copy link
Collaborator

yschimke commented May 5, 2022

@robertszuba I'll take a further look, since I was able to repro with ExoPlayer with the same symptom, I was focusing on that.

It's likely these are two separate bugs in that case.

Your repro seems quite simple, I'll try to reproduce with it on the weekend and get back to you.

@swankjesse
Copy link
Member

I believe this is a bug in the emulator. In particular I think the emulator is breaking the TCP socket in a way OkHttp can't detect.

@robertszuba
Copy link

The bug can be related to a specific versions of emulator indeed (it works fine on most of them). The problem is that some of our customers report it for specific real devices as well (even though I was not able to reproduce it on my test devices).

@swankjesse
Copy link
Member

@robertszuba any specific devices you can name?

icbaker pushed a commit to androidx/media that referenced this issue May 9, 2022
Relates to square/okhttp#3146. This was from #71.

There is a draft PR https://github.com/square/okhttp/pull/7185/files which documents OkHttp's ideal handling of cancellation including interrupts.

But a few key points

1) This is a target state, and OkHttp does not currently handle interrupts correctly.  In the past this has been identified, and the advice is to avoid interrupts on Http threads, see discussion on square/okhttp#1902. Also an attempt at a fix here square/okhttp#7023 which wasn't in a form to land.

2) Even with this fixed, it is likely to never be optimal, because of OkHttp sharing a socket connection for multiple inflight requests.

From square/okhttp#7185

```
Thread.interrupt() is Clumsy
----------------------------

`Thread.interrupt()` is Java's built-in mechanism to cancel an in-flight `Thread`, regardless of
what work it's currently performing.

We recommend against using `Thread.interrupt()` with OkHttp because it may disrupt shared resources
including HTTP/2 connections and cache files. In particular, calling `Thread.interrupt()` may cause
unrelated threads' call to fail with an `IOException`.
```

This PR leaves the Loader/DataSource thread parked on a countdown latch, while this may seem wasteful and an additional context switch. However in practice the response isn't returned until the Http2Connection and Http2Reader have a response from the server and these means effectively parking in a `wait()` statement here https://github.com/square/okhttp/blob/9e039e94123defbfd5f11dc64ae146c46b7230eb/okhttp/src/jvmMain/kotlin/okhttp3/internal/http2/Http2Stream.kt#L140

PiperOrigin-RevId: 446652468
@robertszuba
Copy link

Samsung Galaxy S21 Ultra with Android 12. But as mentioned, this comes from end users, I don't have S21 to test it myself and I wasn't able to reproduce it on other devices yet.

icbaker pushed a commit to google/ExoPlayer that referenced this issue May 9, 2022
Relates to square/okhttp#3146. This was from androidx/media#71.

There is a draft PR https://github.com/square/okhttp/pull/7185/files which documents OkHttp's ideal handling of cancellation including interrupts.

But a few key points

1) This is a target state, and OkHttp does not currently handle interrupts correctly.  In the past this has been identified, and the advice is to avoid interrupts on Http threads, see discussion on square/okhttp#1902. Also an attempt at a fix here square/okhttp#7023 which wasn't in a form to land.

2) Even with this fixed, it is likely to never be optimal, because of OkHttp sharing a socket connection for multiple inflight requests.

From square/okhttp#7185

```
Thread.interrupt() is Clumsy
----------------------------

`Thread.interrupt()` is Java's built-in mechanism to cancel an in-flight `Thread`, regardless of
what work it's currently performing.

We recommend against using `Thread.interrupt()` with OkHttp because it may disrupt shared resources
including HTTP/2 connections and cache files. In particular, calling `Thread.interrupt()` may cause
unrelated threads' call to fail with an `IOException`.
```

This PR leaves the Loader/DataSource thread parked on a countdown latch, while this may seem wasteful and an additional context switch. However in practice the response isn't returned until the Http2Connection and Http2Reader have a response from the server and these means effectively parking in a `wait()` statement here https://github.com/square/okhttp/blob/9e039e94123defbfd5f11dc64ae146c46b7230eb/okhttp/src/jvmMain/kotlin/okhttp3/internal/http2/Http2Stream.kt#L140

PiperOrigin-RevId: 446652468
@rolandsarosy-verycreatives

In the last month, since we had this issue crop up, we had 14 occurrences, across 5 OS versions, 6 manufacturers and 12 models.

OS Versions:
Android 12 - 5 instances
Android 10 - 4 instances
Android 11 - 2 instances
Android 8.1.0 - 2 instances
Android 9 - 1 instances.

Models:
Archos Alba - 2 instances
Samsung Galaxy A52s 5G - 1 instances
Xiaomi 11T Pro - 1 instances
Xiaomi Poco X3 NFC - 1 instances
Google Pixel 4A - 1 instances
Samsung Galaxy A12 - 1 instances
Samsung Galaxy S20 FE - 1 instances
Samsung Galaxy S8 - 1 instances
Samsung Galaxy S9 - 1 instances
Samsung Galaxy S9+ - 1 instances
Sony Xperia 10 III - 1 instances
Motorola E7 Power - 1 instances

I've just pushed and update to our users, changing the connection pool and protocols, as per one of the first posts.

.connectionPool(ConnectionPool(0, OKHTTP_CONNECTION_KEEP_ALIVE_DURATION, TimeUnit.MINUTES))
.protocols(listOf(Protocol.HTTP_1_1))

I'm unable to provide any more info for the time being, we've mostly run into this issue when using our ForceUpdateInterceptor to, well, force our users to update their application. Here is the code snippet:

class ForceUpdateInterceptor : Interceptor {

    companion object {
        private const val FORCE_UPDATE_HTTP_CODE = 443
    }

    @Throws(IOException::class)
    override fun intercept(chain: Interceptor.Chain): Response = chain.run {
        val request = this.request()
        val response = chain.proceed(request)
        if (response.code == FORCE_UPDATE_HTTP_CODE) RxBus.publish(RxEvent.ForceUpdateEvent())
        response
    }
}

I'll report back whether the aformentioned suggestion still produces the issue.

This was all with OkHttp version 5.0.0-alpha.7 and previous alphas.

@ADopeReader
Copy link

ADopeReader commented Aug 1, 2022

@rolandsarosy-verycreatives any news if the ConnectionPool workaround fixed the problem for you? Also someone mentioned at https://stackoverflow.com/a/60949304/4514843 that setting a pingIntervall would be preferable to disabling the Connectionpool, any thoughts on this maybe @swankjesse?

@saxal28
Copy link

saxal28 commented Dec 20, 2022

@ADopeReader

setting the ping interval did not resolve the issue for me on 4.10.0

will try @jpearl 's solution and will report back
#3146 (comment)

@saxal28
Copy link

saxal28 commented Dec 21, 2022

^^ above section didn't resolve the issue either :(

@fsomba
Copy link

fsomba commented Feb 3, 2023

I am facing the java.net.SocketTimeoutException issue on okhttp client calls.

Timeout happens when connected on Mobile Data but WIFI connection works pretty fine. (I have verified the mobile data connection is strong & can stream YouTube videos)

The error is noted in below Android Devices & OS versions

  • Samsung Galaxy A32 - Android 13 Version
  • Samsung Galaxy A20 - Android 11 Version

I have tested on different okhttp versions 4.7.2 , 4.10.0 and latest 5.0.0-alpha.10

The workaround(s) shared (see below), e.g setting the connection pool and protocols has not solved.

.connectionPool(new ConnectionPool(0, 1, TimeUnit.NANOSECONDS))
                    .protocols(listOf(Protocol.HTTP_1_1))

Anyone who has a lead on how to fix the timeouts on Mobile Data for select devices can share tips.

hello @swankjesse your knowledge on this will also be appreciated.

@fsomba
Copy link

fsomba commented Feb 3, 2023

@saxal28 @rolandsarosy-verycreatives did you find a working fix for this that has lasted. if so, kindly share.

@mellowplace
Copy link

mellowplace commented Mar 14, 2023

@yschimke We are using okhttp 4.9.3 in a Java backend app - wrapped with FailSafe, which ensures a timeout (so some kind of thread interrupt/cancel I guess?)

We get a lot of these timeouts and with call instrumentation we see the following

Failed call for transaction [redacted] timings in ms
0 call-start
0 connection-acquired
0 request-headers-start
0 request-headers-end
0 request-body-start
0 request-body-end
450 canceled
450 response-failed
450 connection-released
450 call-failed

Our 99th percentile response time for request-body-end is around 150 milliseconds. But with these kinds of failures we ALWAYS see 0 millis.

We have an overall 450 millisecond timeout set with FailSafe.

Where we send the data we often see they received and sent the response well within the time limit.

We're using http/2 connections.

Does this sound the same as the issue you reproduced?

@yschimke
Copy link
Collaborator

It's not clear it's the same. Are you seeing any interrupt exceptions? The safe way to cancel a Call is Call.cancel(). If you are doing that it's not the same issue.

@mellowplace
Copy link

@yschimke we do appear to see interrupted exceptions...

Caused by: java.io.InterruptedIOException: timeout
	at okhttp3.internal.connection.RealCall.timeoutExit(RealCall.kt:398)
	at okhttp3.internal.connection.RealCall.callDone(RealCall.kt:360)
	at okhttp3.internal.connection.RealCall.noMoreExchanges$okhttp(RealCall.kt:325)
	at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:209)
	at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154)

I think we possibly want the okhttp plugin for failsafe - https://failsafe.dev/okhttp/ - seems like it does a more graceful kill with call.cancel

We'll try that out and report back.

@yschimke
Copy link
Collaborator

yschimke commented Mar 15, 2023

Yeah, since it's doing cancels without interrupts I think that looks better.

https://github.com/failsafe-lib/failsafe/blob/master/modules/okhttp/src/main/java/dev/failsafe/okhttp/FailsafeCall.java#L93

The problem I faced is that if using Call.execute() which is synchronous, the actual initial request processing including writing to the HTTP/2 socket is done on the calling thread, so an interrupt leaves the HTTP/2 connection in an inconsistent state.

The fix in media3, androidx/media@80928e7

Was to always use enqueue, even to simulate an execute so the thread writing the underlying socket is never a user thread that can be interrupted.

lwoydziak added a commit to xenonview-com/view-java-sdk that referenced this issue Apr 27, 2023
- [x] close down all cached connections upon socket time out (see)[square/okhttp#3146]
- [x] units and impls
@yschimke
Copy link
Collaborator

Closing as a dupe of #7841

This issue should be already fixed in Media3 by using call.enqueue and clean cancellation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug in existing code
Projects
None yet
Development

No branches or pull requests