-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing the way Selector handles SSL connection handshakes #310
Conversation
if (sslTransmission != null && sslTransmission.ready()) { | ||
connected.add(sslTransmission.getConnectionId()); | ||
this.metrics.selectorConnectionCreated.inc(); | ||
Long handshakeStartTime = sslHandshakeTimer.remove(sslTransmission.getConnectionId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A connectionId will only be removed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the patch with a minor fix. Someone can close it explicitly too by calling close(key)
a314af0
to
f22676e
Compare
@@ -72,6 +72,8 @@ | |||
private final List<NetworkReceive> completedReceives; | |||
private final List<String> disconnected; | |||
private final List<String> connected; | |||
private final List<SSLTransmission> pendingSslHandshakes; | |||
private final Map<String, Long> sslHandshakeTimer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not avoid the map completely? Just use a list of objects of a private class:
private class PendingHandshakeTransmission {
SSLTransmission transmission;
long pendingSinceMs;
}
Done with a pass. Let me know when you update the PR. |
ebcabcc
to
789341e
Compare
@@ -455,6 +481,9 @@ private void close(SelectionKey key) { | |||
activeConnections.set(this.keyMap.size()); | |||
try { | |||
transmission.close(); | |||
if (pendingHandshakeTransmissions.get(transmission.getConnectionId()) != null) { | |||
pendingHandshakeTransmissions.remove(transmission.getConnectionId()); | |||
} | |||
} catch (IOException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If a key can be closed in a different thread, would this part be thread safe?
- Not related to this PR.
Transmission.close()
will NOT throw anIOException
if taking a look at its implementation in bothPlainTextTransmission
andSSLTransmission
though the abstract method defines so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, looks like we might have an issue. even keyMap and activeConnections looks to be affected. Lets discuss in person.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the documentation it looks like the Selector
isn't meant to be thread safe. The documentation of the class on line 64 says so. There are a lot of parts in this class that work on the fact that this class was not meant to be thread safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I cannot think of why we would want to use this Selector
across threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selector should never be used across threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad. I thought the processNewResponses() in SocketServer() was a daemon thread. Hence, the close(SelectionKey) could be called simultaneously when selector poll() is happening. Looks like its called sequentially and selector.poll() follows it. So, I don't think we have a problem of multiple threads accessing it.
Left a few more minor comments. The rest LGTM. |
LGTM. |
this.metrics.selectorIORate.inc(); | ||
} | ||
long endIo = time.milliseconds(); | ||
this.metrics.selectorIOTime.update(endIo - endSelect); | ||
} | ||
|
||
/** | ||
* Add those Ssl connections to connected list on handshake completion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think doc can be improved. What are "those"?
* @param numConnectionsPendingHandhsake Count of connections that are awaiting handshake completion | ||
*/ | ||
public void initializeSelectorMetricsIfRequired(final AtomicLong activeConnections, | ||
final AtomicLong numConnectionsPendingHandhsake) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same typo
@nsivabalan I don't think you have updated the patch. |
@Override | ||
public Long getValue() { | ||
return activeConnections.get(); | ||
} | ||
}; | ||
this.numConnectionsPendingHandhsake = new Gauge<Long>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you named the other arg as activeConnections
so that the this
is avoided. Could you do the same here? (or name the other also as numActiveConnections
). This is minor but I would like some consistency.
@pnarayanan just told me that this requires some more work from @nsivabalan. Holding off reviewing until that is resolved. |
@pnarayanan : can you tell me what are the comments yet to be addressed? |
1c95e63
to
ad1f146
Compare
"... I have removed it. Updated the patch" - what have you removed? |
@@ -72,10 +72,12 @@ | |||
private final List<NetworkReceive> completedReceives; | |||
private final List<String> disconnected; | |||
private final List<String> connected; | |||
private final List<String> connectionPendingHandshakes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be connectionsPendingHandshake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@pnarayanan I think @nsivabalan meant the duplicate metric for ssl handshaking time is removed. |
@pnarayanan yes, I meant the same. That I have removed the duplicate metric |
@xiahome @pnarayanan @vgkholla : I have updated the patch. |
a32cf78
to
23b0454
Compare
@@ -44,7 +44,8 @@ | |||
public final Counter selectorKeyOperationErrorCount; | |||
public final Counter selectorCloseKeyErrorCount; | |||
public final Counter selectorCloseSocketErrorCount; | |||
public Gauge<Long> selectorActiveConnections; | |||
public Gauge<Long> numActiveConnections; | |||
public Gauge<Long> numConnectionsPendingHandshake; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be a selector
metric as this is not "general". This should be moved below under SSL Metrics and SSLTransmission
should be the one updating it.
That said, I would suggest getting rid of this metric completely. This is a Gauge
and number of connections pending handshake at a particular instant is not useful as these will be very low in number and they would either error out or get established. Gauge
would be sort of useful for things that stay for a while like numActiveConnections
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to track the pool (pending connections) size. This shouldn't be linearly growing. I expect the number to be stable as each connections take 5 to 6 poll() calls to complete its handshake. If the pending connections count is growing (which is not feasible to track if not for this metric), for some reason, we are doing something wrong. But that being said, I am not very pertinent in having this metric. @vgkholla @xiahome whats your take on this.
Once again, Your changes therefore should not make references to "handshakes" in the p.s: The current selector code has references to handshakes, but only in comments (and that is wrong). |
@@ -72,10 +72,12 @@ | |||
private final List<NetworkReceive> completedReceives; | |||
private final List<String> disconnected; | |||
private final List<String> connected; | |||
private final List<String> connectionsPendingHandshake; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be made a set?
271e4ea
to
0b59f29
Compare
@@ -72,10 +73,11 @@ | |||
private final List<NetworkReceive> completedReceives; | |||
private final List<String> disconnected; | |||
private final List<String> connected; | |||
private final Set<String> unreadycConnections; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
LGTM |
private final Time time; | ||
private final NetworkMetrics metrics; | ||
private final AtomicLong IdGenerator; | ||
private AtomicLong activeConnections; | ||
private AtomicLong numActiveConnections; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
?
Two very minor comments. LGTM otherwise. It looks like the tests cover the new code but could you add coverage details in the description and confirm that the new code is fully covered (including |
* @param activeConnections count of current active connections | ||
*/ | ||
void initializeSelectorMetrics(final AtomicLong activeConnections) { | ||
numActiveConnections = new Gauge<Long>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked at the rest, but this Gauge
is never registered as far as I can see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to your change though, but we should get this fixed.
Currently, for connections over SSL, the ambry selector returns an established connection after a poll even before the handshake is completed. This is not very useful as the caller will then have to repeatedly check outside of the selector poll on whether the handshake is complete, which is not ideal.
This patch fixes this issue.
The selector will only return a connection after the connection is established and the handshake is complete. If the handshake does not complete for any reason, then it should be treated as a connection establishment failure. This way the callers of the selector can be agnostic to SSL, and can interact with the selector in a completely non-blocking way.
Coverage:
Selector 100% (1/ 1) 96.4% (27/ 28) 84.5% (191/ 226)
All the new lines are covered
SLA: 10 mins
Reviewers: Priyesh, Ming
Addresses issue #297