Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spring 5 Webclient throws ReadTimeout Exception [SPR-17610] #22142

Closed
spring-issuemaster opened this issue Dec 18, 2018 · 13 comments
Closed

Spring 5 Webclient throws ReadTimeout Exception [SPR-17610] #22142

spring-issuemaster opened this issue Dec 18, 2018 · 13 comments
Assignees

Comments

@spring-issuemaster
Copy link
Collaborator

@spring-issuemaster spring-issuemaster commented Dec 18, 2018

Viswanathan Manickam opened SPR-17610 and commented

I am getting readtimeout exception when I try to load test 1000 requests per second for 30 mins. Also, when the server sits idle for sometime and when I send request,  same error occurs the very first time but further requests process fine.

The response time for some requests is around 5 sec but other requests are in  250ms.

Please find the configuration below: 

@Bean
public WebClient webClient() {
 TcpClient tcpClient = TcpClient.create()
 .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 30000)
 .option(ChannelOption.SO_KEEPALIVE, true)
 .doOnConnected { connection ->
 connection.addHandlerLast(new ReadTimeoutHandler(5))
 .addHandlerLast(new WriteTimeoutHandler(30))
 }

 ReactorClientHttpConnector httpConnector = new ReactorClientHttpConnector(HttpClient.from(tcpClient));

 return WebClient.builder()
 .clientConnector(httpConnector)
 .build();
}

 

public Mono<String> retryPost(String url, JSONObject body) {
 Mono<String> result = webClient.post().uri(url)
 .contentType(MediaType.APPLICATION_JSON)
 .accept(MediaType.APPLICATION_JSON_UTF8)
 .body(BodyInserters.fromObject(body))
 .exchange()
 .log()
 .flatMap { clientResponse ->
 return handleResponse(clientResponse)
 }
 return result;
}

 

private Mono<String> handleResponse(ClientResponse clientResponse) {
 if (clientResponse.statusCode().is4xxClientError() || clientResponse.statusCode().is5xxServerError()) {
 return clientResponse.bodyToMono(String.class)
 .flatMap { errorBody ->
 return Mono.error(new CustomException(errorBody, clientResponse.statusCode().value()))
 }
 } else {
 return clientResponse.bodyToMono(String.class).log();
 }
}

 Please find the log I received:

14:39:33.092 [http-nio-8080-exec-4] INFO  reactor.Mono.SwitchIfEmpty.11 - onSubscribe(FluxSwitchIfEmpty.SwitchIfEmptySubscriber)
14:39:33.093 [http-nio-8080-exec-4] INFO  reactor.Mono.SwitchIfEmpty.11 - request(unbounded)
14:39:38.128 [reactor-http-nio-4] ERROR r.n.http.client.HttpClientConnect - [id: 0x05f49b52, L::60850 - R:] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
14:39:38.131 [reactor-http-nio-4] ERROR reactor.Mono.SwitchIfEmpty.11 - onError(io.netty.handler.timeout.ReadTimeoutException)
14:39:38.131 [reactor-http-nio-4] ERROR reactor.Mono.SwitchIfEmpty.11 - 
io.netty.handler.timeout.ReadTimeoutException: null
14:39:38.179 [http-nio-8080-exec-5] ERROR c.f.l.a.GlobalControllerExceptionHandler - null
io.netty.handler.timeout.ReadTimeoutException: null

Let me know if I miss anything


Affects: 5.1.3

Reference URL: https://stackoverflow.com/questions/53786424/springboot-webclient-throws-readtimeout-errors

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Dec 19, 2018

Brian Clozel commented

Thanks for raising this issue - the logs you've shared are somewhat different, since now it seems the timeout doesn't happen right away but is triggered 5 seconds after the HTTP call. Let me track down where that timeout comes from and how we can override it.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Dec 19, 2018

Viswanathan Manickam commented

Brian Clozel As I mentioned, I have 2 issues:

  1. When I run load test 1000 request per second for 30 mins, I see the following read timeouts where I don't see that 5 seconds different
13:55:50.287 [reactor-http-nio-6] INFO reactor.Mono.MapFuseable.49127 - | request(unbounded)
13:55:50.288 [reactor-http-nio-6] INFO reactor.Mono.SwitchIfEmpty.48910 - onComplete()
13:55:50.310 [reactor-http-nio-6] ERROR r.n.r.PooledConnectionProvider - [id: 0xa1c67a8f, L:/10.253.8.138:55671 - REMOTEhOST:PORT] Pooled connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
13:55:50.312 [reactor-http-nio-6] ERROR reactor.Mono.SwitchIfEmpty.48759 - onError(io.netty.handler.timeout.ReadTimeoutException)
13:55:50.312 [reactor-http-nio-6] ERROR reactor.Mono.SwitchIfEmpty.48759 -
io.netty.handler.timeout.ReadTimeoutException: null
  1. When the server sits idle for sometime and when they process the first request, I get the read time out with 5 seconds difference
14:39:33.092 [http-nio-8080-exec-4] INFO  reactor.Mono.SwitchIfEmpty.11 - onSubscribe(FluxSwitchIfEmpty.SwitchIfEmptySubscriber)
14:39:33.093 [http-nio-8080-exec-4] INFO  reactor.Mono.SwitchIfEmpty.11 - request(unbounded)
14:39:38.128 [reactor-http-nio-4] ERROR r.n.http.client.HttpClientConnect - [id: 0x05f49b52, L::60850 - R:] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
14:39:38.131 [reactor-http-nio-4] ERROR reactor.Mono.SwitchIfEmpty.11 - onError(io.netty.handler.timeout.ReadTimeoutException)
14:39:38.131 [reactor-http-nio-4] ERROR reactor.Mono.SwitchIfEmpty.11 - 
io.netty.handler.timeout.ReadTimeoutException: null
14:39:38.179 [http-nio-8080-exec-5] ERROR c.f.l.a.GlobalControllerExceptionHandler - null
io.netty.handler.timeout.ReadTimeoutException: null
@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Dec 20, 2018

Brian Clozel commented

If I remember correctly, the suffix number you're seeing in the logs like reactor.Mono.SwitchIfEmpty.48759 is tied to the subscriber.

In your case, this means that 

13:55:50.287 [reactor-http-nio-6] INFO reactor.Mono.MapFuseable.49127 - | request(unbounded) 

and

13:55:50.312 [reactor-http-nio-6] ERROR reactor.Mono.SwitchIfEmpty.48759 - onError(

are not about the same HTTP requests. Reactor is scheduling tasks on worker threads, so when you're seeing things happening sequentially on the same thread, it doesn't mean those units of work are being done for the same subscriber.

If you follow the subscriber id in the logs, you should see that the amount of time between requesting the data and not getting is about 5 seconds.

Not on that last point, I've created a Reactor HTTP server that sends the response body after 10 seconds:

HttpServer.create().host("localhost").port(8080).handle((req, res) -> {
	return res.sendString(Flux.just("Hello").delayElements(Duration.ofSeconds(10)), StandardCharsets.UTF_8);
	}).bindNow().onDispose().block();

And a WebClient that fetches this data:

String response = WebClient.create().get().uri("http://localhost:8080/")
	.retrieve().bodyToMono(String.class).log().block();
System.out.println(response);

I'm not getting any read timeout. So something in your application must be configuring such a timeout.

I'm closing this issue for now, don't hesitate to reopen it if you manage to reproduce it and provide me with a sample project that I can run on my computer (something that I could clone from github, or even a zip file I could download).

Thanks!

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Dec 20, 2018

Viswanathan Manickam commented

Brian Clozel 

If you look at the second issue logs, the suffix number is 11 for all logs. 

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Dec 20, 2018

Brian Clozel commented

Yes, but in this case this is consistent with your configuration. You're configuring a read timeout of 5 seconds in your WebClient and the timeout is triggered after 5 seconds. Am i missing something here?

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Dec 20, 2018

Viswanathan Manickam commented

Server always return the response with in 500ms, why I am getting read timeouts? Is there any block? I am trying to understand where the issue occurs. 

Thanks for your time.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Dec 21, 2018

Brian Clozel commented

In the sample you're showing, this is not the case:

14:39:33.092 [http-nio-8080-exec-4] INFO  reactor.Mono.SwitchIfEmpty.11 - onSubscribe(FluxSwitchIfEmpty.SwitchIfEmptySubscriber)
14:39:33.093 [http-nio-8080-exec-4] INFO  reactor.Mono.SwitchIfEmpty.11 - request(unbounded)
14:39:38.128 [reactor-http-nio-4] ERROR r.n.http.client.HttpClientConnect - [id: 0x05f49b52, L::60850 - R:] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null

There are more than 5 seconds between 14:39:33.093 and 14:39:38.128.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Dec 21, 2018

Viswanathan Manickam commented

Thank you for your time Brian Clozel Really appreciate your help. 

Finally, What would you recommend to avoid read timeouts

  1. Increase read time out seconds?

  2. Retries?

 

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Dec 21, 2018

Brian Clozel commented

It really depends on your use case and constraints. I don't know if you're running a benchmark to optimize for throughput, if expecting this load on a regular basis, or if your app is doing some kind of batch processing, if the load is generated by hits on your own app, etc.

If fetching the responses in a timely manner is not really important and you're running some kind of batch processing, In this case, you probably have a Flux<Something> and you want to do HTTP calls for each element. In this case, you could use the limitRate operator and operator variants that accept a concurrency parameter. This will prevent your application from flooding the remote service and hitting those high latencies.

Adding retries will work for the odd case, but if your app is consistently flooding the remote service, then you're not really solving the issue. But in a regular case, retries are pretty useful.

Changing the timeout configuration at the HTTP level or with timeout operators can work, but you have to operate within your constraints. Your service should probably respond and not create too much latency for others.

In the end, you're also limited by the performance of the remote system you're querying.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Jan 4, 2019

Viswanathan Manickam commented

Brian Clozel Thank you for your comments. 

For the issue (When the server sits idle for sometime and when they process the first request, I get the read time out with 5 seconds difference), I did try increase the timeout up to 30 seconds and still getting read timeout. Is thread block on reading the response and throw readtimeout exception when timeout occurs? Can you suggest anything to resolve this? 

Also, I tried hitting our remote server using jmeter with same load and not getting any timeouts/exception. All requests processed successfully and return response. 

Please advice. 

I really appreciate your time.

@bijuvnair

This comment has been minimized.

Copy link

@bijuvnair bijuvnair commented Jun 10, 2019

I am getting same error . @Viswanathan Manickan : Is this issue resolved? If yes , how it got resolved? Appreciate any reference for the same

@phantomedc

This comment has been minimized.

Copy link

@phantomedc phantomedc commented Sep 5, 2019

I am getting same error .

2019-09-05 23:14:09.389 [http-nio-80-exec-2] DEBUG o.s.web.reactive.function.client.ExchangeFunctions - [e75bf24] HTTP GET http://***.com/api/v1/query?time=1567439463&query)
2019-09-05 23:14:09.389 [reactor-http-kqueue-4] DEBUG reactor.netty.resources.PooledConnectionProvider - [id: 0x5d539d8f, L:/10.49.3.9:54769 - R:*******/10.104.232.131:9801] Channel acquired, now 1 active connections and 0 inactive connections
2019-09-05 23:14:09.389 [reactor-http-kqueue-4] DEBUG reactor.netty.ReactorNetty - [id: 0x5d539d8f, L:/10.49.3.9:54769 - R:*******/10.104.232.131:9801] Added decoder [ReadTimeoutHandler] at the end of the user pipeline, full pipeline: [reactor.left.httpCodec, ReadTimeoutHandler, reactor.right.reactiveBridge, DefaultChannelPipeline$TailContext#0]
2019-09-05 23:14:09.389 [reactor-http-kqueue-4] DEBUG reactor.netty.http.client.HttpClientConnect - [id: 0x5d539d8f, L:/10.49.3.9:54769 - R:********/10.104.232.131:9801] Handler is being applied: {uri=http://******/api/v1/query), method=GET}
2019-09-05 23:14:09.390 [reactor-http-kqueue-4] DEBUG reactor.netty.resources.PooledConnectionProvider - [id: 0x5d539d8f, L:/10.49.3.9:54769 - R:*****.com/****:****] onStateChange(GET{uri=/api/v1/query?time=1567439463&query), connection=PooledConnection{channel=[id: 0x5d539d8f, L:/10.49.3.9:54769 - R:********/10.104.232.131:9801]}}, [request_sent])
2019-09-05 23:14:24.401 [reactor-http-kqueue-4] WARN  reactor.netty.http.client.HttpClientConnect - [id: 0x5d539d8f, L:/10.49.3.9:54769 - R:****.com/****:**] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
@phantomedc

This comment has been minimized.

Copy link

@phantomedc phantomedc commented Sep 5, 2019

@bclozel
i found this issue may relates to the Reactor Connection Pool, i reproduced this problem in my dev envrionment,I did 4 requests at 2019-09-06 00:10:14.739(t1)、2019-09-06 00:19:51.274(t2)、2019-09-06 00:30:37.992(t3)、2019-09-06 00:33:13.314(t4),
the request1(at t1), request2(at t2) webclient didn't idle too long time, the requests were processed fine, but request3(at t3), webclient was idle 10minutes, webclient got Read Timeout, then i did request4 , it was processed fine as well.
and i noticed that request1、request2、request3,they used same local port: 59838
but request4 used local port:60928

so i guess that: the connection(or channel in netty) will be cached in the connection pool, reactor netty will manage and reuse it, but when the connection idle too long, the connection may be closed or cut down by remote side(F5 loadbalance or Nginx?) or some other reason, it became disconnected on the link layer, but the connection pool did not realize what happened actually, the connection pool still thinks this connection is connected and provide it to webclient, and boom: read time out.

        HttpClient httpClient = HttpClient.create()
                .tcpConfiguration(tcpClient -> {
                    tcpClient = tcpClient.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 5000);
                    tcpClient = tcpClient.doOnConnected(conn -> conn
                            .addHandlerLast(new ReadTimeoutHandler(15000, TimeUnit.MILLISECONDS)));
                    return tcpClient;
                });

request1:

2019-09-06 00:10:14.768 [reactor-http-kqueue-5] DEBUG reactor.netty.http.client.HttpClientOperations - [id: 0x0759376f, L:/10.49.3.9:59838 - R:****.com/****:****] Received response (auto-read:false) : [Server=nginx, Date=Thu, 05 Sep 2019 16:10:15 GMT, Content-Type=application/json, Content-Length=64, Connection=keep-alive, Access-Control-Allow-Headers=Accept, Accept-Encoding, Authorization, Content-Type, Origin, Access-Control-Allow-Methods=GET, OPTIONS, Access-Control-Expose-Headers=Date, Vary=Accept-Encoding, Access-Control-Allow-Credentials=true]
2019-09-06 00:10:14.768 [reactor-http-kqueue-5] DEBUG reactor.netty.resources.PooledConnectionProvider - [id: 0x0759376f, L:/10.49.3.9:59838 - R:****.com/****:****] onStateChange(GET{uri=/api/v1/query?time=1567439463&query****), connection=PooledConnection{channel=[id: 0x0759376f, L:/10.49.3.9:59838 - R:****/****:****]}}, [response_received])
2019-09-06 00:10:14.768 [reactor-http-kqueue-5] DEBUG o.s.web.reactive.function.client.ExchangeFunctions - [1807c047] Response 200 OK

request2:

2019-09-06 00:19:51.272 [reactor-http-kqueue-5] DEBUG reactor.netty.http.client.HttpClientOperations - [id: 0x0759376f, L:/10.49.3.9:59838 - R:****.com/****:****] Received response (auto-read:false) : [Server=nginx, Date=Thu, 05 Sep 2019 16:19:51 GMT, Content-Type=application/json, Content-Length=64, Connection=keep-alive, Access-Control-Allow-Headers=Accept, Accept-Encoding, Authorization, Content-Type, Origin, Access-Control-Allow-Methods=GET, OPTIONS, Access-Control-Expose-Headers=Date, Vary=Accept-Encoding, Access-Control-Allow-Credentials=true]
2019-09-06 00:19:51.272 [reactor-http-kqueue-5] DEBUG reactor.netty.resources.PooledConnectionProvider - [id: 0x0759376f, L:/10.49.3.9:59838 - R:****.com/****:****] onStateChange(GET{uri=/api/v1/query?time=1567439463&query=****), connection=PooledConnection{channel=[id: 0x0759376f, L:/10.49.3.9:59838 - R:****.com/***:***]}}, [response_received])
2019-09-06 00:19:51.272 [reactor-http-kqueue-5] DEBUG o.s.web.reactive.function.client.ExchangeFunctions - [1a576a1] Response 200 OK

request3:

2019-09-06 00:30:52.998 [reactor-http-kqueue-5] WARN  reactor.netty.http.client.HttpClientConnect - [id: 0x0759376f, L:/10.49.3.9:59838 - R:njyh.data.promes.cloudytrace.com/10.104.232.131:9801] The connection observed an error
io.netty.handler.timeout.ReadTimeoutException: null
	Suppressed: java.lang.Exception: #block terminated with an error

request4:

2019-09-06 00:33:13.338 [reactor-http-kqueue-3] DEBUG reactor.netty.http.client.HttpClientOperations - [id: 0x47c1f6c3, L:/10.49.3.9:60928 - R:****.com/****:****] Received response (auto-read:false) : [Server=nginx, Date=Thu, 05 Sep 2019 16:33:13 GMT, Content-Type=application/json, Content-Length=64, Connection=keep-alive, Access-Control-Allow-Headers=Accept, Accept-Encoding, Authorization, Content-Type, Origin, Access-Control-Allow-Methods=GET, OPTIONS, Access-Control-Expose-Headers=Date, Vary=Accept-Encoding, Access-Control-Allow-Credentials=true]
2019-09-06 00:33:13.338 [reactor-http-kqueue-3] DEBUG reactor.netty.resources.PooledConnectionProvider - [id: 0x47c1f6c3, L:/10.49.3.9:60928 - R:****.com/****:***] onStateChange(GET{uri=/api/v1/query?time=1567439463&query=****), connection=PooledConnection{channel=[id: 0x47c1f6c3, L:/10.49.3.9:60928 - R:****.com/***:****]}}, [response_received])
2019-09-06 00:33:13.338 [reactor-http-kqueue-3] DEBUG o.s.web.reactive.function.client.ExchangeFunctions - [11c0862d] Response 200 OK

sorry for my poor english

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.