Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server closes the connection with a FIN packet immediately after sending a GO_AWAY frame. All accepted streams upto the lastStreamId are not getting processed #2735

Closed
Sumi264 opened this issue Mar 20, 2023 · 32 comments · Fixed by #2736, #2758 or #2777
Assignees
Labels
type/bug A general bug
Milestone

Comments

@Sumi264
Copy link

Sumi264 commented Mar 20, 2023

My application server using http2 reactive netty server (version: 1.0.26) closes the connection immediately after sending a GO_AWAY packet, without processing outstanding requests up to the lastStreamId.

+--- org.springframework.boot:spring-boot-starter-webflux:2.7.7
| +--- org.springframework.boot:spring-boot-starter:2.7.7 ()
| +--- org.springframework.boot:spring-boot-starter-json:2.7.7 (
)
| +--- org.springframework.boot:spring-boot-starter-reactor-netty:2.7.7
| | --- io.projectreactor.netty:reactor-netty-http:1.0.26

https://www.rfc-editor.org/rfc/rfc7540#section-6.8 states that all "Activity on streams numbered lower or equal to the last stream identifier might still complete successfully. The sender of a GOAWAY frame might gracefully shut down a connection by sending a GOAWAY frame, maintaining the connection in an "open" state until all in-progress streams complete."

Does reactive netty server support processing of all accepted streams upto the lastStreamId sent in GO_AWAY frame before terminating the connection? If yes, then please advise if some configuration is needed to make reactive netty http2 server to ensure the processing of all streams up to the lastStreamId quoted in GO_AWAY frame.

@Sumi264 Sumi264 changed the title Server closes the connection with a FIN packet immediately after sending a GO_AWAY frame. All accepted streams upto the lastStreamId is not getting processed Server closes the connection with a FIN packet immediately after sending a GO_AWAY frame. All accepted streams upto the lastStreamId are not getting processed Mar 20, 2023
@violetagg violetagg self-assigned this Mar 20, 2023
@violetagg
Copy link
Member

@Sumi264 Can you describe your scenario in more details? Is this a graceful shutdown of the server or some other connection close scenario?

@violetagg violetagg added the for/user-attention This issue needs user attention (feedback, rework, etc...) label Mar 20, 2023
@Sumi264
Copy link
Author

Sumi264 commented Mar 21, 2023

@violetagg Yes, graceful shutdown is configured for the application server. The application is running in a containerized environment in a kubenetes pod. I'm deleting the pod to observe the behaviour of server in graceful shutdown scenario.

factory.addServerCustomizers(httpServer -> httpServer.idleTimeout(Duration.ofMillis(idleTimeout))); factory.addServerCustomizers(builder -> builder.protocol(HttpProtocol.H2C, HttpProtocol.HTTP11)); factory.setShutdown(Shutdown.GRACEFUL);

@Sumi264
Copy link
Author

Sumi264 commented Mar 21, 2023

@violetagg I'm pasting the logs snippet where it's visible that GO_AWAY frame has lastStreamId=2127, and INBOUND DATA is seen for this streamId, but no OUTBOUND DATA. The connection gets closed as soon as the graceful shutdown is triggered with GO_AWAY frame, without processing this lastStream.

"reactor.netty.http.server.h2","message":"[id: 0xb7b87797, L:/192.168.219.77:8080 - R:/10.75.213.193:21174] INBOUND DATA: streamId=2123 padding=0 endStream=false length=1648 bytes=7b0a20202020226e66496e7374616e63654964223a202263623539633438632d613266612d313165392d613261332d326132616532646263636534222c0a2020..." "reactor.netty.http.server.h2","message":"[id: 0xb7b87797, L:/192.168.219.77:8080 - R:/10.75.213.193:21174] INBOUND DATA: streamId=2123 padding=0 endStream=true length=0 bytes=" "reactor.netty.http.server.h2","message":"[id: 0xb7b87797, L:/192.168.219.77:8080 - R:/10.75.213.193:21174] OUTBOUND DATA: streamId=2123 padding=0 endStream=true length=18 bytes=7b44656661756c7420526573706f6e73657d" "reactor.netty.http.server.h2","message":"[id: 0xb7b87797, L:/192.168.219.77:8080 - R:/10.75.213.193:21174] INBOUND DATA: streamId=2125 padding=0 endStream=false length=1648 bytes=7b0a20202020226e66496e7374616e63654964223a202263623539633438632d613266612d313165392d613261332d326132616532646263636534222c0a2020..." "reactor.netty.http.server.h2","message":"[id: 0xb7b87797, L:/192.168.219.77:8080 - R:/10.75.213.193:21174] INBOUND DATA: streamId=2125 padding=0 endStream=true length=0 bytes=" "reactor.netty.http.server.h2","message":"[id: 0xb7b87797, L:/192.168.219.77:8080 - R:/10.75.213.193:21174] OUTBOUND DATA: streamId=2125 padding=0 endStream=true length=18 bytes=7b44656661756c7420526573706f6e73657d" "reactor.netty.http.server.h2","message":"[id: 0xb7b87797, L:/192.168.219.77:8080 - R:/10.75.213.193:21174] INBOUND DATA: streamId=2127 padding=0 endStream=false length=1648 bytes=7b0a20202020226e66496e7374616e63654964223a202263623539633438632d613266612d313165392d613261332d326132616532646263636534222c0a2020..." "reactor.netty.http.server.h2","message":"[id: 0xb7b87797, L:/192.168.219.77:8080 - R:/10.75.213.193:21174] INBOUND DATA: streamId=2127 padding=0 endStream=true length=0 bytes=" "reactor.netty.http.server.h2","message":"[id: 0xb7b87797, L:/192 .168.219.77:8080 - R:/10.75.213.193:21174] OUTBOUND GO_AWAY: lastStreamId=2127 errorCode=0 length=0 bytes=" "reactor.netty.http.server.HttpServer","message":"[b7b87797, L:/192.168.219.77:8080 - R:/10.75.213.193:21174] CLOSE" "reactor.netty.http.server.HttpServer","message":"[b7b87797, L:/192.168.219.77:8080 ! R:/10.75.213.193:21174] INACTIVE" "reactor.netty.http.server.HttpServer","message":"[b7b87797, L:/192.168.219.77:8080 ! R:/10.75.213.193:21174] UNREGISTERED"

@violetagg violetagg added type/bug A general bug and removed for/user-attention This issue needs user attention (feedback, rework, etc...) labels Mar 21, 2023
@violetagg violetagg added this to the 1.0.31 milestone Mar 21, 2023
@Sumi264
Copy link
Author

Sumi264 commented Mar 23, 2023

Hi @violetagg ,
Thanks for the fix.
I was trying to test it in my environment, but seems the issue is not resolved.

I was going through the fix that was provided for this issue, and came across some comments saying '//"FutureReturnValueIgnored" this is deliberate'.

I want to give a brief description of my application server (built on reactive netty using spring-webflux), which exposes a rest api with the below signature.

@RequestMapping("/api/requests")
public CompletableFuture<ResponseEntity<?>> handleRequest(RequestEntity<byte[]> reqEntity,
			@RequestHeader HttpHeaders reqHeaders) {...}

This API is responsible for receiving the requests and handing them over to an executor thread pool, which does some processing.
Once the business logic processing is done by the executor threads, the CompletableFuture is marked as complete with ResponseEntity.

In the cases where CompletableFuture is used in the RestController, will the fix provided work?
Please advise.

@violetagg
Copy link
Member

@Sumi264 Please either provide some reproducible example or at least some logs where wiretap is enabled.

@violetagg violetagg reopened this Mar 23, 2023
@Sumi264
Copy link
Author

Sumi264 commented Mar 24, 2023

@violetagg Please find the logs with wiretap enabled.
netty-streams-not-processed-goaway.log

Here,
OUTBOUND GO_AWAY: lastStreamId=12837 , but there's only inbound data logs for this streamId, no outbound data log is observed.

@violetagg
Copy link
Member

violetagg commented Mar 24, 2023

I added one additional log so that you can check when the server is about to stop and then you can check when GO_AWAY happens #2742
Can you check the latest SNAPSHOT?

@violetagg violetagg added the for/user-attention This issue needs user attention (feedback, rework, etc...) label Mar 27, 2023
@Sumi264
Copy link
Author

Sumi264 commented Mar 28, 2023

@violetagg Sure, let me try it out, and share the logs with you. Thanks!!

@Sumi264
Copy link
Author

Sumi264 commented Mar 28, 2023

Hi @violetagg, I've captured the logs. Here, for streamId=3767 no outbound data is there, and the connection is closed after goaway packet.
L:/192.168.219.118:8080 - R:/10.87.213.193:20457] OUTBOUND GO_AWAY: lastStreamId=3767
reactor_netty_filtered.txt

Please let me know if any more info is needed. Thanks!!

@violetagg violetagg removed the for/user-attention This issue needs user attention (feedback, rework, etc...) label Mar 29, 2023
@violetagg
Copy link
Member

@Sumi264 Any chance for a reproducible example?

@violetagg violetagg added the for/user-attention This issue needs user attention (feedback, rework, etc...) label Mar 29, 2023
@Sumi264
Copy link
Author

Sumi264 commented Mar 30, 2023

@violetagg I'll try to share a reproducible example soon.

@Sumi264
Copy link
Author

Sumi264 commented Mar 30, 2023

@violetagg
I'm providing a sample http2 server code where the issue is reproducible.
I tested by sending requests using an http2 client. To achieve graceful shutdown of the server application, I used "kill -15 <PID_of_application>" in linux environment.

In my sample application, each request processing takes 200ms.

From the captured logs it is observed that after receiving graceful shutdown signal, the server waits for some amount of time (~190ms) and processes OUTBOUND response for some of the accepted streams, and then it sends GO_AWAY and terminates the server, even though there are unprocessed streams with streamId less than or equal to lastStreamId quoted in GO_AWAY frame.

2023-03-30 10:35:35.579  INFO 8279 --- [ionShutdownHook] o.s.b.w.embedded.netty.GracefulShutdown  : Commencing graceful shutdown. Waiting for active requests to complete
2023-03-30 10:35:35.580 DEBUG 8279 --- [ netty-shutdown] reactor.netty.transport.ServerTransport  : [876f8457, L:/[0:0:0:0:0:0:0:0%0]:8080] Server is about to be disposed with timeout: PT2562047H47M16.854775807S

...

2023-03-30 10:35:35.618 DEBUG 8279 --- [ntLoopGroup-2-2] reactor.netty.http.server.h2             : [id: 0xa88118ff, L:/10.75.213.193:8080 - R:/10.87.213.193:41778] OUTBOUND DATA: streamId=1551 padding=0 endStream=true length=2 bytes=6f6b


2023-03-30 10:35:35.669 DEBUG 8279 --- [ntLoopGroup-2-2] reactor.netty.http.server.h2             : [id: 0xa88118ff, L:/10.75.213.193:8080 - R:/10.87.213.193:41778] OUTBOUND DATA: streamId=1553 padding=0 endStream=true length=2 bytes=6f6b


2023-03-30 10:35:35.718 DEBUG 8279 --- [ntLoopGroup-2-2] reactor.netty.http.server.h2             : [id: 0xa88118ff, L:/10.75.213.193:8080 - R:/10.87.213.193:41778] OUTBOUND DATA: streamId=1555 padding=0 endStream=true length=2 bytes=6f6b


2023-03-30 10:35:35.768 DEBUG 8279 --- [ntLoopGroup-2-2] reactor.netty.http.server.h2             : [id: 0xa88118ff, L:/10.87.213.193:8080 - R:/10.87.213.193:41778] OUTBOUND DATA: streamId=1557 padding=0 endStream=true length=2 bytes=6f6b
....


2023-03-30 10:35:35.769  INFO 8279 --- [ netty-shutdown] o.s.b.w.embedded.netty.GracefulShutdown  : Graceful shutdown complete
2023-03-30 10:35:35.769 DEBUG 8279 --- [ionShutdownHook] reactor.netty.transport.ServerTransport  : [876f8457, L:/[0:0:0:0:0:0:0:0%0]:8080] Server is about to be disposed with timeout: PT3S
2023-03-30 10:35:35.770 DEBUG 8279 --- [ntLoopGroup-2-2] reactor.netty.http.server.h2             : [id: 0xa88118ff, L:/10.87.213.193:8080 - R:/10.87.213.193:41778] OUTBOUND GO_AWAY: lastStreamId=1565 errorCode=0 length=0 bytes=

2023-03-30 10:35:35.771 DEBUG 8279 --- [ntLoopGroup-2-2] reactor.netty.channel.ChannelOperations  : [a88118ff/778-1, L:/10.87.213.193:8080 ! R:/10.87.213.193:41778] [HttpServer] Channel inbound receiver cancelled (channel disconnected).

2023-03-30 10:35:35.773 DEBUG 8279 --- [ntLoopGroup-2-2] reactor.netty.http.server.HttpServer     : [a88118ff, L:/10.87.213.193:8080 - R:/10.87.213.193:41778] CLOSE
2023-03-30 10:35:35.773 DEBUG 8279 --- [ntLoopGroup-2-2] reactor.netty.http.server.HttpServer     : [a88118ff, L:/10.87.213.193:8080 ! R:/10.87.213.193:41778] INACTIVE

I was looking into the fix made in disposeNow(Duration timeout) api.
I was wondering if the api waits till this configured timeout and then closes connection.
How does it take into account which was the last acknowledged stream, and that the server should wait till that streamId gets processed, before shutting down the server?

RFC https://www.rfc-editor.org/rfc/rfc7540#section-6.8 states that all "Activity on streams numbered lower or equal to the last stream identifier might still complete successfully. The sender of a GOAWAY frame might gracefully shut down a connection by sending a GOAWAY frame, maintaining the connection in an "open" state until all in-progress streams complete."

On receiving a signal for graceful shutdown, the server should have sent a GO_AWAY, and waited till the time lastStream accounted in GO_AWAY gets processed. But here, in logs it seems after receiving a graceful shutdown signal at "2023-03-30 10:35:35.579", server processes requests till streamId 1557 till "2023-03-30 10:35:35.768".
And GO_AWAY is sent at "2023-03-30 10:35:35.770" accounting lastStreamId=1565.
For requests after streamId-1557, no outbound request is processed, and server closes connection just after sending GO_AWAY.

nettytest.zip
This zip contains

  • source files
  • build.gradle
  • Executable Jar for running server application
  • Logs for graceful shutdown scenario

@violetagg violetagg removed the for/user-attention This issue needs user attention (feedback, rework, etc...) label Mar 31, 2023
@violetagg
Copy link
Member

@Sumi264 You are right we need to send GO_AWAY to notify the client to stop opening streams and then we need to keep the connection "open" until we process all streams up to the last stream id sent with GO_AWAY. Thanks a lot for the reproducible example.
Will you be able to try this PR #2758

@Sumi264
Copy link
Author

Sumi264 commented Apr 3, 2023

Sure, let me try it out @violetagg. Thanks!!

@Sumi264
Copy link
Author

Sumi264 commented Apr 5, 2023

Hi @violetagg ,
I have done initial testing and it seems to be working.
I'm further planning to test with High traffic.
I've a query as below
-> Is there a time limit till which Netty will wait for the lastStreamId to be processed? Or the wait is purely based on last StreamId processed?

Thanks!!

@violetagg
Copy link
Member

Hi @violetagg , I have done initial testing and it seems to be working. I'm further planning to test with High traffic.

That's very good. Thanks!

I've a query as below -> Is there a time limit till which Netty will wait for the lastStreamId to be processed? Or the wait is purely based on last StreamId processed?

The timeout that is used for disposeNow is the limit. Once reaching the timeout there will be a force close.

@Sumi264
Copy link
Author

Sumi264 commented Apr 6, 2023

Hi @violetagg
Is the timeout value being passed in disposeNow(timeout) API for graceful termination configurable?
It would be helpful if it's configurable since, in different application this graceful termination period can be configured as per the application environment's need.

I'm running this application in a k8s pod, built with springboot(as shown in the earlier attached reproducible example).
Will it be possible to provide an API in "WebServerFactoryCustomizer" to set this graceful termination period?
For example this class has an API to set the shutdwon to graceful, as shown below.
factory.setShutdown(Shutdown.GRACEFUL);

It would be nice to have any api like above where user can pass the graceful termination period in seconds.

@violetagg
Copy link
Member

@Sumi264 For questions related to Spring you should contact Spring maintainers. Quick search shows that there is such configuration already https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#web.graceful-shutdown

@Sumi264
Copy link
Author

Sumi264 commented Apr 6, 2023

@violetagg
My question is, whether the timeout value being passed in disposeNow(timeout) API user configurable?
What is the timeout value passed in disposeNow(timeout) api?

I'm using "spring.lifecycle.timeout-per-shutdown-phase=20s" already in my application, does it mean, disposeNow(timeout) api will consider timeout as 20s?

@violetagg
Copy link
Member

I don't know the integration in Spring Boot.
https://github.com/spring-projects/spring-boot/search?q=disposeNow
Here it seems they call disposeNow with Duration.ofNanos(Long.MAX_VALUE)

@Sumi264
Copy link
Author

Sumi264 commented Apr 18, 2023

Hi @violetagg

While testing the above fix with high traffic I observed some connections are closing without a GoAway.
Sometimes the connection is closed with FIN, also sometimes a connection is closed by RST or sometimes with FIN followed by RST.
I'm attaching the logs for your reference to help me understand the reason behind this behavior.

Test 1: 20 Connections and 200 MPS Traffic
1 connection was closed by FIN without a GO_AWAY attaching logs for reference.
log file: direct_FIN_reference.log

Test 2: 100 Connections and 1000 MPS Traffic with 150ms delay in my application built with reactive netty (with spring webflux).
Observed connections closed by RST and FIN without GO_AWAY attaching logs for reference.
log file: RST_close.log , FIN_close.log

@pderop
Copy link
Member

pderop commented Apr 18, 2023

Hi @Sumi264 ,

I don't see any CLOSE in the log files. Regarding the RST or FIN, is it sent by the client or by the server ?

Can you tell how to reproduce this ? is it with your reproducer nettyyest ? should I run a load using the "netty/test/goaway" URI or using "netty/streams" ?

thanks

@Sumi264
Copy link
Author

Sumi264 commented Apr 19, 2023

Hi @pderop,
I'm attaching the tcpdump for the connection getting terminated without GO_AWAY, with either an RST or FIN, where it is visible that RST, or FIN is sent by server.
reactor-netty-RST_FIN.zip
It contains below three files.

  1. FIN_CLOSE.pcap
    Server(sample application server built with reactive netty embedded in spring-webflux) => 192.168.8.14, Port:8080
    Client => 10.75.212.241, Port:52279
  2. direct_FIN_reference.pcap
    Server: 10.75.213.193, Port:8080
    Client: 10.75.213.193, Port:52646
  3. RST_Close.pcap
    Server => 192.168.8.14, Port:8080
    Client => 10.75.212.241, Port:56548

For reproducing this issue, NettyServer-sample-app can be used, and load can be run on "netty/streams" API.

Please let me know if any further info is needed.

@pderop
Copy link
Member

pderop commented Apr 19, 2023

thanks a lot @Sumi264;

FYI, I'm investigating since yesterday, will try to get back to you soon.

@pderop
Copy link
Member

pderop commented Apr 19, 2023

I have reproduced, thanks @Sumi264;

We might have a problem when the server is configured with H2C, and HTTP1.1
I'm working on a candidate PR, will probably create it tomorrow.
thanks.

@pderop
Copy link
Member

pderop commented Apr 20, 2023

I have attempted to do a PR (see #2777); will you be able to test it ?
I can't merge this PR this week, because it must be discussed with the team next week; but that would be nice if you could rebuild it and confirm if it works ?

thanks @Sumi264 !

@Sumi264
Copy link
Author

Sumi264 commented Apr 20, 2023

Sure @pderop, let me try it out. Thanks !!

@Sumi264
Copy link
Author

Sumi264 commented Apr 21, 2023

Hi @pderop, Tried testing it with 2 MPS and 4 MPS traffic. It seems to be working fine now.
Thanks !!

@Sumi264
Copy link
Author

Sumi264 commented May 8, 2023

Hi @violetagg and @pderop,
Could you please tell me in which release of reactor-netty will the complete fix for this issue be available?

Thanks in advance!

@violetagg
Copy link
Member

@Sumi264 The release is planned for tomorrow.

@violetagg
Copy link
Member

@move02 The mentioned PR is related to HTTP/2 only.

@move02
Copy link

move02 commented Jun 30, 2023

Sorry for deleting my comment. The comment was this PR also related when HTTP11 server also and it wasn't.

Thanks for replying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment