Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection reset by peer exception #1774

Closed
RitikaDangal opened this issue Aug 17, 2021 · 12 comments
Closed

Connection reset by peer exception #1774

RitikaDangal opened this issue Aug 17, 2021 · 12 comments
Labels
for/stackoverflow Questions are best asked on SO or Gitter

Comments

@RitikaDangal
Copy link

We have a micro service based spring boot architecture where we are using spring webclient (which internally uses reactor netty) for internal communication between services.
The issue that we faced on production was, we were getting random "connection reset by peer" exception in our services. No logs for the same request could be found in the called service.
This is how we were initialising our webclient earlier:

webClient = WebClient.builder().build();

To fix the same, we diabled connection pooling and initialised our webclient as below, post that the same exception was fixed.

webClient = WebClient.builder().clientConnector(new ReactorClientHttpConnector(HttpClient.newConnection())).build();

But how can we fix the same with connection pooling enabled as disabling connection pooling comes with its own disadvantages?

Reactor Netty version: 1.0.9
Spring boot version: 2.5.3

Exception:

2021-08-16 12:20:24,095 WARN [reactor-http-epoll-1] reactor.util.Loggers$Slf4JLogger: [id:04a24430-45, L:/10.0.8.88:33848 - R:172.20.0.20/172.20.0.20:3148] The connection observed an error, the request cannot be retried as the headers/body were sent
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
2021-08-16 12:20:24,100 ERROR [reactor-http-epoll-1] reactor.util.Loggers$Slf4JLogger: Operator called default onErrorDropped
reactor.core.Exceptions$ErrorCallbackNotImplemented: org.springframework.web.reactive.function.client.WebClientRequestException: readAddress(..) failed: Connection reset by peer; nested exception is io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
Caused by: org.springframework.web.reactive.function.client.WebClientRequestException: readAddress(..) failed: Connection reset by peer; nested exception is io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
at org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:141)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Request to GET http://172.20.0.20:3148/v1/users/referral/ec148ff3-5dd9-473f-a7f0-cb180a5e21f0 [DefaultWebClient]
Stack trace:
at org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:141)
at reactor.core.publisher.MonoErrorSupplied.subscribe(MonoErrorSupplied.java:55)
at reactor.core.publisher.Mono.subscribe(Mono.java:4338)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:103)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:93)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain.onError(MonoFlatMapMany.java:204)
at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.whenError(FluxRetryWhen.java:225)
at reactor.core.publisher.FluxRetryWhen$RetryWhenOtherSubscriber.onError(FluxRetryWhen.java:274)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:414)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onNext(FluxConcatMap.java:251)
at reactor.core.publisher.EmitterProcessor.drain(EmitterProcessor.java:491)
at reactor.core.publisher.EmitterProcessor.tryEmitNext(EmitterProcessor.java:299)
at reactor.core.publisher.SinkManySerialized.tryEmitNext(SinkManySerialized.java:100)
at reactor.core.publisher.InternalManySink.emitNext(InternalManySink.java:27)
at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onError(FluxRetryWhen.java:190)
at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:189)
at reactor.netty.http.client.HttpClientConnect$HttpObserver.onUncaughtException(HttpClientConnect.java:384)
at reactor.netty.ReactorNetty$CompositeConnectionObserver.onUncaughtException(ReactorNetty.java:647)
at reactor.netty.resources.DefaultPooledConnectionProvider$DisposableAcquire.onUncaughtException(DefaultPooledConnectionProvider.java:219)
at reactor.netty.resources.DefaultPooledConnectionProvider$PooledConnection.onUncaughtException(DefaultPooledConnectionProvider.java:467

@RitikaDangal RitikaDangal added status/need-triage A new issue that still need to be evaluated as a whole type/bug A general bug labels Aug 17, 2021
@violetagg
Copy link
Member

@RitikaDangal Please capture the traffic with Wireshark and share it. Is it possible that some network component (e.g. firewall etc.) closes the connection because of inactivity? If you configure maxIdleTime for the connection pool, do you see the issue? (https://projectreactor.io/docs/netty/release/reference/index.html#connection-pool-timeout)

@violetagg violetagg added for/user-attention This issue needs user attention (feedback, rework, etc...) and removed status/need-triage A new issue that still need to be evaluated as a whole labels Aug 17, 2021
@RitikaDangal
Copy link
Author

@violetagg I have tried capturing the traffic using Wireshark but did not see anything there. All was at network layer.
Will configure maxIdleTime and monitor for a day or two.
Thanks

@violetagg
Copy link
Member

@RitikaDangal Were you able to verify the maxIdleTime configuration?

@RitikaDangal
Copy link
Author

@violetagg Will get back to you with an update by the end of this week.

@supr015
Copy link

supr015 commented Aug 25, 2021

Hi @RitikaDangal , @violetagg ,

We were also facing a very similar issue with communication between springboot based microservices deployed in kubernetes.
Reactor Netty version: 1.0.10
Spring boot version: 2.5.4

we were also using webClient = WebClient.builder().build(); , but we observed that once a request is complete, any subsequent request after about 20 mins was throwing the connection reset by peer issue with the same error as you have mentioned. However, the next request would go through as a new channel would get created then because of earlier disconnection. Most likely kubernetes was internally closing the connections on its end after 20 mins.

We tried setting the maxIdleTime with env varibales through reactor.netty.pool.maxIdleTime: 600000.
It ddint seem to be updating the maxIdleTime though. we use spring-boot-starter-webflux.

So we added custom connector to the webclient like below.

var provider = ConnectionProvider.builder("custom-name")
.maxConnections(500)
.pendingAcquireTimeout(Duration.ofSeconds(45))
.maxIdleTime(Duration.ofSeconds(600)).build();
HttpClient client = HttpClient.create(provider).compress(true);
WebClient.builder().clientConnector(new ReactorClientHttpConnector(client));

After this, the connection reset by peer exceptions were fixed . Any subsequent request after 10 mins of idle time would always cause the existing channel to disconnect and a new channel created.

@RitikaDangal
Copy link
Author

@violetagg We used the following connection provider and the issue is now resolved.

	ConnectionProvider provider = ConnectionProvider.builder("fixed")
			.maxConnections(500)
			.maxIdleTime(Duration.ofSeconds(20))
			.maxLifeTime(Duration.ofSeconds(60))
			.pendingAcquireTimeout(Duration.ofSeconds(60))
			.evictInBackground(Duration.ofSeconds(120)).build();

	this.webClient = WebClient.builder()
			.clientConnector(new ReactorClientHttpConnector(HttpClient.create(provider)))
			.build();

Thanks

@violetagg violetagg added for/stackoverflow Questions are best asked on SO or Gitter and removed type/bug A general bug for/user-attention This issue needs user attention (feedback, rework, etc...) labels Aug 30, 2021
@TDtianzhenjiu
Copy link

TDtianzhenjiu commented Sep 24, 2022

Hello @violetagg
after configuring the maxIdelTime it is work.
But why?

Does it mean the connection in the connection pool has been closed by a remote peer?
But it is still on the connection pool and still marked as available, once acquire that connection to read/write remote peer, will cause this exception?

@violetagg
Copy link
Member

@TDtianzhenjiu Take a look here https://projectreactor.io/docs/netty/release/reference/index.html#faq.connection-closed it is similar to what you are asking.

@TDtianzhenjiu
Copy link

Thanks @violetagg 🙏
in this case, however,
We can also retry on WebClientRequestException, it also can resolve this issue. am I right?

@violetagg
Copy link
Member

Thanks @violetagg 🙏
in this case, however,
We can also retry on WebClientRequestException, it also can resolve this issue. am I right?

@TDtianzhenjiu you have to be careful with requests retry (for example if they are not idempotent https://www.rfc-editor.org/rfc/rfc9110.html#section-9.2.2)

@dalvan-bevilaqua
Copy link

@violetagg We used the following connection provider and the issue is now resolved.

	ConnectionProvider provider = ConnectionProvider.builder("fixed")
			.maxConnections(500)
			.maxIdleTime(Duration.ofSeconds(20))
			.maxLifeTime(Duration.ofSeconds(60))
			.pendingAcquireTimeout(Duration.ofSeconds(60))
			.evictInBackground(Duration.ofSeconds(120)).build();

	this.webClient = WebClient.builder()
			.clientConnector(new ReactorClientHttpConnector(HttpClient.create(provider)))
			.build();

Thanks

solved to me

minsoub pushed a commit to minsoub/internal-gateway-api that referenced this issue Jan 25, 2023
minsoub pushed a commit to minsoub/internal-gateway-api that referenced this issue Jan 25, 2023
ethanmills added a commit to govuk-one-login/life-events-platform that referenced this issue Jan 27, 2023
Aims to prevent connection closed by client errors
reactor/reactor-netty#1774
ethanmills added a commit to govuk-one-login/life-events-platform that referenced this issue Jan 27, 2023
Aims to prevent connection closed by client errors
reactor/reactor-netty#1774
@jzpeepz
Copy link

jzpeepz commented Feb 22, 2024

@violetagg We used the following connection provider and the issue is now resolved.

	ConnectionProvider provider = ConnectionProvider.builder("fixed")
			.maxConnections(500)
			.maxIdleTime(Duration.ofSeconds(20))
			.maxLifeTime(Duration.ofSeconds(60))
			.pendingAcquireTimeout(Duration.ofSeconds(60))
			.evictInBackground(Duration.ofSeconds(120)).build();

	this.webClient = WebClient.builder()
			.clientConnector(new ReactorClientHttpConnector(HttpClient.create(provider)))
			.build();

Thanks

This worked for me, BUT my request now seem to take a LOT longer. up to 20+ seconds from <= 5 seconds before. Anyone else experience this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for/stackoverflow Questions are best asked on SO or Gitter
Projects
None yet
Development

No branches or pull requests

6 participants