Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpMetricsHandlerTests testServerConnectionsMicrometer still unstable #2421

Conversation

pderop
Copy link
Contributor

@pderop pderop commented Aug 8, 2022

The HttpMetricsHandlerTests testServerConnectionsMicrometer is still unstable.
Sometimes, the following assertion fails because the SERVER_CONNECTIONS_TOTAL is 1, see this failing assertion

Reproduced the problem in github using netty5 branch while validating the multipart porting draft PR.

What I can imagine is the following probable scenario:

  1. the testServerConnectionsMicrometer is finishing an HTTP2 test, so the tearDown is in progress, and the client connection pool is disposing, but the server has not yet seen the close, so the SERVER_CONNECTIONS_TOTAL meter is still set to 1
  2. now, the testServerConnectionsMicrometer starts a next test using HTTP1.1, but when it waits for the H1 connection to be closed here, then the SERVER_CONNECTIONS_TOTAL is still set to 2, because the connection close done in previous test in not yet seen by the server.
  3. So, when the HTTP1.1 socket is closed and see by the server, the latch is counted down, but the SERVER_CONNECTIONS_TOTAL is now set to 1, because the client socket close from previous test is still in progress, and not yet seen by the server.

So, in testServerConnectionsMicrometer, this problem seems to be fully fixed when we dispose the client connection pool in case protocol is HTT2 and when we also wait for the server to see the H2 connection close.

Fixes #2368

@pderop pderop added type/bug A general bug type/enhancement A general enhancement and removed type/bug A general bug labels Aug 8, 2022
@pderop pderop added this to the 1.0.22 milestone Aug 8, 2022
@violetagg
Copy link
Member

violetagg commented Aug 8, 2022

When we dispose a server we also dispose the connection pool. It happens here https://github.com/reactor/reactor-netty/blob/netty5/reactor-netty5-core/src/main/java/reactor/netty5/transport/ServerTransport.java#L528-L543.

Do you think that the new test starts before server dispose? Or is there an issue with server dispose?

@pderop pderop added type/bug A general bug and removed type/enhancement A general enhancement labels Aug 8, 2022
@pderop
Copy link
Contributor Author

pderop commented Aug 8, 2022

When ServerTransport.dispose method call is done, the server connection is indeed closed (the server does not listen anymore on the server port), that's ok.

But when the dispose method returns, I think that the server has not yet seen the client connection close, it will eventually see it, but not immediately. That's what I remember having observed with the debugger last week.

@pderop
Copy link
Contributor Author

pderop commented Aug 8, 2022

(the failure CI test on windows is unrelated)

@violetagg violetagg modified the milestones: 1.0.22, 1.0.23 Aug 9, 2022
@pderop pderop marked this pull request as draft August 10, 2022 07:17
…metrics like SERVER_CONNECTIONS_TOTAL. Do not dispose the server multiple times. The connection pool provider was disposed by some tests, that's not necessary.
@pderop
Copy link
Contributor Author

pderop commented Aug 17, 2022

@violetagg ,

I have tried to clean up the test, and I have validated it using the multipart PR which consistently reproduces the issue, can you please review it ?

So, this PR does the following:

  • some tests were disposing the server uselessly, now, it's not the case anymore
  • same for the connection pool provider: some tests were disposing the connection pool multiple times, it's not the case anymore
  • the testServerConnectionsMicrometer problem was the following:
  1. testServerConnectionsMicrometer is finishing an HTTP2 test, the tearDown method is closing the provider, but the server has not yet seen the connection close. Even if the server is disposed and does not listen anymore on its server port, it still see the old H2 connection
  2. the next testServerConnectionsMicrometer is starting with HTTP1, but the SERVER_CONNECTIONS_TOTAL meter is still set to 1 (because the old server has not yet seen the connection close), so the test fails, see here

a first solution was to use a channelGroup, because when a server configured with a channelGroup it does a graceful shutdown at dispose time. But currently, pending client sockets are closed asynchronously, that's why the SERVER_CONNECTIONS_TOTAL meter may remain positive for a short amount of time.
So, in addition , the ServerCloseHandler from the test is still used to ensure all sockets are closed on the server (see tearDown method).

@pderop pderop marked this pull request as ready for review August 17, 2022 14:56
@pderop pderop requested a review from violetagg August 17, 2022 14:57
Copy link
Member

@violetagg violetagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering - do we need something like this:

private static final EventExecutor executor = new DefaultEventExecutor();

@AfterClass
public static void afterClass() throws Exception {
	executor.shutdownGracefully()
	        .get(5, TimeUnit.SECONDS);
}

@pderop
Copy link
Contributor Author

pderop commented Aug 18, 2022

Indeed, I have updated the PR like in ITTracingHttpServerDecoratorTest
I also have closed the group, like it is done in ITTracingHttpServerDecoratorTest.close (isn't it also necessary ? else I can revert and just shutdown the executor ?)

PS: in HttpServerTests, the executor and the group don't seem to be closed, should something be fixed in HttpServerTests.java (with another PR) ?

@violetagg
Copy link
Member

yes go ahead and fix them all

@pderop
Copy link
Contributor Author

pderop commented Aug 18, 2022

The failed test on windows is not relevant to this PR.

@pderop pderop merged commit a8b7a7f into reactor:1.0.x Aug 18, 2022
@pderop pderop deleted the test-fix-httpmetricshandlertests-testserverconnectionsmicrometer branch August 18, 2022 12:00
@pderop
Copy link
Contributor Author

pderop commented Aug 18, 2022

@violetagg ,

thanks for the review !!

pderop added a commit that referenced this pull request Aug 18, 2022
pderop added a commit that referenced this pull request Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug A general bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants