Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the way Selector handles SSL connection handshakes #310

Merged
merged 15 commits into from
May 24, 2016

Conversation

nsivabalan
Copy link
Contributor

@nsivabalan nsivabalan commented May 16, 2016

Currently, for connections over SSL, the ambry selector returns an established connection after a poll even before the handshake is completed. This is not very useful as the caller will then have to repeatedly check outside of the selector poll on whether the handshake is complete, which is not ideal.

This patch fixes this issue.
The selector will only return a connection after the connection is established and the handshake is complete. If the handshake does not complete for any reason, then it should be treated as a connection establishment failure. This way the callers of the selector can be agnostic to SSL, and can interact with the selector in a completely non-blocking way.

Coverage:
Selector 100% (1/ 1) 96.4% (27/ 28) 84.5% (191/ 226)
All the new lines are covered

SLA: 10 mins
Reviewers: Priyesh, Ming

Addresses issue #297

@nsivabalan nsivabalan changed the title Fixing the way Selector handles SSL connections wrt handshakes Fixing the way Selector handles SSL connection handshakes May 16, 2016
if (sslTransmission != null && sslTransmission.ready()) {
connected.add(sslTransmission.getConnectionId());
this.metrics.selectorConnectionCreated.inc();
Long handshakeStartTime = sslHandshakeTimer.remove(sslTransmission.getConnectionId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A connectionId will only be removed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the patch with a minor fix. Someone can close it explicitly too by calling close(key)

@@ -72,6 +72,8 @@
private final List<NetworkReceive> completedReceives;
private final List<String> disconnected;
private final List<String> connected;
private final List<SSLTransmission> pendingSslHandshakes;
private final Map<String, Long> sslHandshakeTimer;
Copy link
Contributor

@pnarayanan pnarayanan May 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not avoid the map completely? Just use a list of objects of a private class:

private class PendingHandshakeTransmission {
  SSLTransmission transmission;
  long pendingSinceMs;
}

@pnarayanan
Copy link
Contributor

Done with a pass. Let me know when you update the PR.

@@ -455,6 +481,9 @@ private void close(SelectionKey key) {
activeConnections.set(this.keyMap.size());
try {
transmission.close();
if (pendingHandshakeTransmissions.get(transmission.getConnectionId()) != null) {
pendingHandshakeTransmissions.remove(transmission.getConnectionId());
}
} catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If a key can be closed in a different thread, would this part be thread safe?
  2. Not related to this PR.
    Transmission.close() will NOT throw an IOException if taking a look at its implementation in both PlainTextTransmission and SSLTransmission though the abstract method defines so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, looks like we might have an issue. even keyMap and activeConnections looks to be affected. Lets discuss in person.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the documentation it looks like the Selector isn't meant to be thread safe. The documentation of the class on line 64 says so. There are a lot of parts in this class that work on the fact that this class was not meant to be thread safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I cannot think of why we would want to use this Selector across threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selector should never be used across threads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad. I thought the processNewResponses() in SocketServer() was a daemon thread. Hence, the close(SelectionKey) could be called simultaneously when selector poll() is happening. Looks like its called sequentially and selector.poll() follows it. So, I don't think we have a problem of multiple threads accessing it.

@xiahome
Copy link
Contributor

xiahome commented May 17, 2016

Left a few more minor comments. The rest LGTM.

@xiahome
Copy link
Contributor

xiahome commented May 17, 2016

LGTM.

this.metrics.selectorIORate.inc();
}
long endIo = time.milliseconds();
this.metrics.selectorIOTime.update(endIo - endSelect);
}

/**
* Add those Ssl connections to connected list on handshake completion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think doc can be improved. What are "those"?

* @param numConnectionsPendingHandhsake Count of connections that are awaiting handshake completion
*/
public void initializeSelectorMetricsIfRequired(final AtomicLong activeConnections,
final AtomicLong numConnectionsPendingHandhsake) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same typo

@pnarayanan
Copy link
Contributor

@nsivabalan I don't think you have updated the patch.

@Override
public Long getValue() {
return activeConnections.get();
}
};
this.numConnectionsPendingHandhsake = new Gauge<Long>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you named the other arg as activeConnections so that the this is avoided. Could you do the same here? (or name the other also as numActiveConnections). This is minor but I would like some consistency.

@vgkholla
Copy link
Contributor

@pnarayanan just told me that this requires some more work from @nsivabalan. Holding off reviewing until that is resolved.

@nsivabalan
Copy link
Contributor Author

@pnarayanan : can you tell me what are the comments yet to be addressed?

@pnarayanan
Copy link
Contributor

"... I have removed it. Updated the patch" - what have you removed?

@@ -72,10 +72,12 @@
private final List<NetworkReceive> completedReceives;
private final List<String> disconnected;
private final List<String> connected;
private final List<String> connectionPendingHandshakes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be connectionsPendingHandshake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@xiahome
Copy link
Contributor

xiahome commented May 21, 2016

@pnarayanan I think @nsivabalan meant the duplicate metric for ssl handshaking time is removed.

@nsivabalan
Copy link
Contributor Author

@pnarayanan yes, I meant the same. That I have removed the duplicate metric

@nsivabalan
Copy link
Contributor Author

@xiahome @pnarayanan @vgkholla : I have updated the patch.

@@ -44,7 +44,8 @@
public final Counter selectorKeyOperationErrorCount;
public final Counter selectorCloseKeyErrorCount;
public final Counter selectorCloseSocketErrorCount;
public Gauge<Long> selectorActiveConnections;
public Gauge<Long> numActiveConnections;
public Gauge<Long> numConnectionsPendingHandshake;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a selector metric as this is not "general". This should be moved below under SSL Metrics and SSLTransmission should be the one updating it.

That said, I would suggest getting rid of this metric completely. This is a Gauge and number of connections pending handshake at a particular instant is not useful as these will be very low in number and they would either error out or get established. Gauge would be sort of useful for things that stay for a while like numActiveConnections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to track the pool (pending connections) size. This shouldn't be linearly growing. I expect the number to be stable as each connections take 5 to 6 poll() calls to complete its handshake. If the pending connections count is growing (which is not feasible to track if not for this metric), for some reason, we are doing something wrong. But that being said, I am not very pertinent in having this metric. @vgkholla @xiahome whats your take on this.

@pnarayanan
Copy link
Contributor

pnarayanan commented May 21, 2016

Once again, selector should not refer to handshakes. Handshakes are internal to SSLTransmission. As far as the Selector is concerned, connections are ready or unready.

Your changes therefore should not make references to "handshakes" in the Selector code. And like I said in the other comment, the other metric does not seem useful, so I would think you wouldn't have to refer to "handshakes" anywhere in the patch.

p.s: The current selector code has references to handshakes, but only in comments (and that is wrong).

@@ -72,10 +72,12 @@
private final List<NetworkReceive> completedReceives;
private final List<String> disconnected;
private final List<String> connected;
private final List<String> connectionsPendingHandshake;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be made a set?

@@ -72,10 +73,11 @@
private final List<NetworkReceive> completedReceives;
private final List<String> disconnected;
private final List<String> connected;
private final Set<String> unreadycConnections;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@pnarayanan
Copy link
Contributor

LGTM

private final Time time;
private final NetworkMetrics metrics;
private final AtomicLong IdGenerator;
private AtomicLong activeConnections;
private AtomicLong numActiveConnections;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final?

@vgkholla
Copy link
Contributor

Two very minor comments. LGTM otherwise.

It looks like the tests cover the new code but could you add coverage details in the description and confirm that the new code is fully covered (including true and false for any conditionals that exist)?

@vgkholla vgkholla merged commit 62fe306 into linkedin:master May 24, 2016
* @param activeConnections count of current active connections
*/
void initializeSelectorMetrics(final AtomicLong activeConnections) {
numActiveConnections = new Gauge<Long>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at the rest, but this Gauge is never registered as far as I can see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to your change though, but we should get this fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants