Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close TcpTransport on RST in some Spots to Prevent Leaking TIME_WAIT Sockets #26764

Merged
merged 3 commits into from Sep 26, 2017

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Sep 23, 2017

I think I tracked down #26701 here:

The problem is that we're using quite a few connections/sockets here, so with some (relatively low) probability this line:

 socket.connect(address, Math.toIntExact(connectTimeout.millis()))

will throw this exception because of running out of ports for the connecting sockets locally.

screen shot 2017-09-23 at 23 56 45

Eventually, this exception is just silently ignored by only traceing it in org.elasticsearch.common.util.concurrent.AbstractRunnable#onFailure:

 public void onFailure(Exception e) {
                if (e instanceof ConnectTransportException || e instanceof AlreadyClosedException) {
                    // can't connect to the node - this is more common path!
                    logger.trace(
                        (Supplier<?>) () -> new ParameterizedMessage(
                            "[{}] failed to ping {}", pingingRound.id(), node), e);

https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/discovery/zen/UnicastZenPing.java#L520

=> which leads to pings silently not happening => counts are off.
I ran the test in a loop to reproduce this and upped the log level for that statement and missed pings like in the issue always coincided with the exception from the screenshot.

I fixed this by fixing two types of spots where we were leaking local ports/sockets (in the form of TIME_WAIT).

  • In one spot, we were setting TCP_REUSE_ADDRESS after a bind call which will not work on all systems
  • In the other changed spots we were leaking TIME_WAIT sockets when shutting down TcpTransport on a FIN instead of RST after the other side went down for sure. This was leaking so many TIME_WAIT clients that we eventually ran out of local ports and saw above Exception in tests.

Closes #26701

@jasontedor
Copy link
Member

I don’t think we should make this change in mock TCP transport. It’s not realistic that we retry connects like that.

@original-brownbear
Copy link
Member Author

@jasontedor fair point, but in production code, this kind of thing would just get retried further up the stack right?

=> My reasoning was this:

  • I guess we don't want to add retries like this into the production code
  • We cannot avoid this bind exception, it's just something that'll happen from time to time
  • Putting retries in the test makes it a lot harder to maintain and makes the test less realistic, unless we refactor the test to the degree that allows us to only retry on this kind of ConnectException
  • Making MockSocket retry here, makes it less realistic when under non-ideal conditions where network resources are under pressure, but makes it more realistic in the sense of reproducing the ideal "network works fine" case?

@DaveCTurner
Copy link
Contributor

If you're not binding the outgoing socket, is a port collision possible? I didn't think you got port collisions in the kernel without a bind(). Willing to be corrected and not sure what's going on above the kernel level here, just curious.

@s1monw
Copy link
Contributor

s1monw commented Sep 24, 2017

I also don't think we are having a port collision here. this is the code that connects to a server ie. client code. @DaveCTurner that's why we don't bind there. What happens here likely is that we are exhausting the our resources here and don't have any ports left. There might be tons of connections in TIME_WAIT state here which is good though. I am just wondering why we create so many connections. It's also independent of the transport impl I think so if we fix it we also need to fix in in other places or in TCPTransport.

That said, I don't think we should retry in such a way. I am more curious why we get into this state?

@original-brownbear
Copy link
Member Author

@s1monw tracked down where we leak a lot of TIME_WAITs, it's in org.elasticsearch.transport.MockTcpTransport.MockChannel#accept.

Activating linger with timeout 0 (like it's now done in the PR) obv. fixes the situation, do you see any downside of doing that in the MockTcpTransport.
Also, we were potentially (not on all systems) wasting a port in the other spot I changed because we were binding before setting SO_REUSEADDR => changed it to binding after the setting.

@@ -318,6 +303,7 @@ public void accept(Executor executor) throws IOException {
MockChannel incomingChannel = null;
try {
configureSocket(incomingSocket);
incomingSocket.setSoLinger(true, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hesitating to set a 0 timeout for SO_LINGER The reason is that the abnormal termination of a socket that causes it to move into TIME_WAIT can be caused due to an exception ie. during request parsing on the server then the server closes the connection and in this case we should really stay in TIME_WAIT. The other case where the server closes the connection instead of the client is when we shutdown and then we can set our socket into SO_LINGER=0 since it's the right thing todo in this situation. I actually think we should do this in a broader scope. There is for instance a method TcpTransport#closeChannels(List<Channels> channels, boolean blocking) I wonder if we can piggyback on this boolean and set SO_LINGER to true,0 if we close our transport. It might even be a better idea to add another boolean boolean closingTransport in this case we can do the right thing also in other transport impls?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s1monw

It might even be a better idea to add another boolean boolean closingTransport in this case we can do the right thing also in other transport impls?

Makes sense to me to do the additional boolean. My reasoning would be that the blocking parameter currently seems to be more about the way handle Netty's internal threading and not any low-level Socket behavior/setting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s1monw done, added another parameter and only made it RST on org.elasticsearch.transport.TcpTransport#doStop and the actual close call of TcpTransport leaving the failure handlers and such untouched.
This still (after removing the SO_LINGER setting I added) resolves the issue just fine, tests still don't leak any TIME_WAITs.
Hope this is what you had in mind :)

socket.setReuseAddress(TCP_REUSE_ADDRESS.get(settings));
ByteSizeValue tcpReceiveBufferSize = TCP_RECEIVE_BUFFER_SIZE.get(settings);
if (tcpReceiveBufferSize.getBytes() > 0) {
socket.setReceiveBufferSize(tcpReceiveBufferSize.bytesAsInt());
}
socket.bind(address);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense ++

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM @tbrooks8 please also take a look if this makes sense to you as well. @original-brownbear can you update the description for this PR please. Thanks for working on this.

@original-brownbear
Copy link
Member Author

@s1monw description updated :)

@original-brownbear original-brownbear changed the title #26701 retry in org.elasticsearch.transport.MockTcpTransport#connect… #26701 Close TcpTransport on RST in some Spots to Prevent Leaking TIME_WAIT Sockets Sep 26, 2017
@Tim-Brooks Tim-Brooks self-requested a review September 26, 2017 14:53
Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

@tbrooks8 thanks, should I squash this?

@Tim-Brooks
Copy link
Contributor

Tim-Brooks commented Sep 26, 2017

You can if you wanted. But we often just use the GitHub "Squash and merge" button.

@original-brownbear
Copy link
Member Author

@tbrooks8 ah ok, doing that then. Thanks!

@original-brownbear original-brownbear merged commit af06231 into elastic:master Sep 26, 2017
s1monw pushed a commit that referenced this pull request Sep 27, 2017
…ME_WAIT Sockets (#26764)

 #26701 Added option to RST instead of FIN to TcpTransport#closeChannels
@s1monw s1monw added :Distributed/Network Http and internode communication implementations >bug v6.1.0 v7.0.0 labels Sep 27, 2017
@s1monw
Copy link
Contributor

s1monw commented Sep 27, 2017

@original-brownbear I backported it to 6.x

@clintongormley clintongormley changed the title #26701 Close TcpTransport on RST in some Spots to Prevent Leaking TIME_WAIT Sockets Close TcpTransport on RST in some Spots to Prevent Leaking TIME_WAIT Sockets Oct 2, 2017
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Oct 3, 2017
This is a follow up to elastic#26764. That commit set SO_LINGER to 0 in order
to fix a scenario where we were running out of resources during CI. We
are primarily interested in setting this to 0 when stopping the
tranport. Allowing TIMED_WAIT is standard for other failure scenarios
during normal operation.

Unfortunately this commit set SO_LINGER to 0 every time we close
NodeChannels. NodeChannels can be closed in case of an exception or
other failures (such as parsing a response). We want to only disable
linger when actually shutting down.
Tim-Brooks added a commit that referenced this pull request Oct 4, 2017
This is a follow up to #26764. That commit set SO_LINGER to 0 in order
to fix a scenario where we were running out of resources during CI. We
are primarily interested in setting this to 0 when stopping the
tranport. Allowing TIMED_WAIT is standard for other failure scenarios
during normal operation.

Unfortunately this commit set SO_LINGER to 0 every time we close
NodeChannels. NodeChannels can be closed in case of an exception or
other failures (such as parsing a response). We want to only disable
linger when actually shutting down.
Tim-Brooks added a commit that referenced this pull request Oct 4, 2017
This is a follow up to #26764. That commit set SO_LINGER to 0 in order
to fix a scenario where we were running out of resources during CI. We
are primarily interested in setting this to 0 when stopping the
tranport. Allowing TIMED_WAIT is standard for other failure scenarios
during normal operation.

Unfortunately this commit set SO_LINGER to 0 every time we close
NodeChannels. NodeChannels can be closed in case of an exception or
other failures (such as parsing a response). We want to only disable
linger when actually shutting down.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Network Http and internode communication implementations v6.1.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

org.elasticsearch.discovery.zen.UnicastZenPingTests testSimplePings failure
6 participants