Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the way Selector handles SSL connection handshakes #310

Merged
merged 15 commits into from
May 24, 2016
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,15 @@ public class NetworkMetrics {
public final Histogram selectorSelectTime;
public final Counter selectorIORate;
public final Histogram selectorIOTime;
public final Histogram selectorPerceivedSslHandshakeTime;
public final Counter selectorNioCloseErrorCount;
public final Counter selectorDisconnectedErrorCount;
public final Counter selectorIOErrorCount;
public final Counter selectorKeyOperationErrorCount;
public final Counter selectorCloseKeyErrorCount;
public final Counter selectorCloseSocketErrorCount;
public Gauge<Long> selectorActiveConnections;
public Gauge<Integer> selectorPendingHandshakes;
public final Map<String, SelectorNodeMetric> selectorNodeMetricMap;

// Plaintext metrics
Expand Down Expand Up @@ -87,6 +89,8 @@ public NetworkMetrics(MetricRegistry registry) {
selectorIORate = registry.counter(MetricRegistry.name(Selector.class, "SelectorIORate"));
selectorSelectTime = registry.histogram(MetricRegistry.name(Selector.class, "SelectorSelectTime"));
selectorIOTime = registry.histogram(MetricRegistry.name(Selector.class, "SelectorIOTime"));
selectorPerceivedSslHandshakeTime =
registry.histogram(MetricRegistry.name(Selector.class, "SelectorSslHandshakeTime"));
selectorNioCloseErrorCount = registry.counter(MetricRegistry.name(Selector.class, "SelectorNioCloseErrorCount"));
selectorDisconnectedErrorCount =
registry.counter(MetricRegistry.name(Selector.class, "SelectorDisconnectedErrorCount"));
Expand Down Expand Up @@ -121,13 +125,25 @@ public NetworkMetrics(MetricRegistry registry) {
selectorNodeMetricMap = new HashMap<String, SelectorNodeMetric>();
}

public void initializeSelectorMetricsIfRequired(final AtomicLong activeConnections) {
/**
* Initializes few network metrics for the selector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a few"

* @param activeConnections count of current active connections
* @param pendingSslHandshakes List of {@link SSLTransmission}s that are awaiting for handshake completion
*/
public void initializeSelectorMetricsIfRequired(final AtomicLong activeConnections,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this method is named ifRequired? It is always gets executed seems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

final List<SSLTransmission> pendingSslHandshakes) {
selectorActiveConnections = new Gauge<Long>() {
@Override
public Long getValue() {
return activeConnections.get();
}
};
selectorPendingHandshakes = new Gauge<Integer>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selectorConnectionsPendingHandshake

@Override
public Integer getValue() {
return pendingSslHandshakes.size();
}
};
}

public void initializeSelectorNodeMetricIfRequired(String hostname, int port) {
Expand Down
49 changes: 37 additions & 12 deletions ambry-network/src/main/java/com.github.ambry.network/Selector.java
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ public class Selector implements Selectable {
private final List<NetworkReceive> completedReceives;
private final List<String> disconnected;
private final List<String> connected;
private final List<SSLTransmission> pendingSslHandshakes;
private final Map<String, Long> sslHandshakeTimer;
Copy link
Contributor

@pnarayanan pnarayanan May 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not avoid the map completely? Just use a list of objects of a private class:

private class PendingHandshakeTransmission {
  SSLTransmission transmission;
  long pendingSinceMs;
}

private final Time time;
private final NetworkMetrics metrics;
private final AtomicLong IdGenerator;
Expand All @@ -90,10 +92,12 @@ public Selector(NetworkMetrics metrics, Time time, SSLFactory sslFactory)
this.completedReceives = new ArrayList<NetworkReceive>();
this.connected = new ArrayList<String>();
this.disconnected = new ArrayList<String>();
this.pendingSslHandshakes = new ArrayList<>();
this.sslHandshakeTimer = new HashMap<String, Long>();
this.metrics = metrics;
this.IdGenerator = new AtomicLong(0);
this.activeConnections = new AtomicLong(0);
this.metrics.initializeSelectorMetricsIfRequired(activeConnections);
this.metrics.initializeSelectorMetricsIfRequired(activeConnections, pendingSslHandshakes);
this.sslFactory = sslFactory;
}

Expand Down Expand Up @@ -315,7 +319,14 @@ public void poll(long timeoutMs, List<NetworkSend> sends)
Transmission transmission = getTransmission(key);
try {
if (key.isConnectable()) {
handleConnect(key, transmission);
transmission.finishConnect();
if (transmission.ready()) {
connected.add(transmission.getConnectionId());
metrics.selectorConnectionCreated.inc();
} else {
pendingSslHandshakes.add((SSLTransmission) transmission);
sslHandshakeTimer.put(transmission.getConnectionId(), System.currentTimeMillis());
}
}

/* if channel is not ready, finish prepare */
Expand Down Expand Up @@ -347,12 +358,32 @@ public void poll(long timeoutMs, List<NetworkSend> sends)
close(key);
}
}
completeSslHandshakes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this call should really happen before line 321, shouldn't it? why bother checking those connections that were just added to the list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

Copy link
Contributor Author

@nsivabalan nsivabalan May 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to explain why I have it here. The ssl handshake would complete in either write() read() call for keys returned via nioSelector.selectedKeys(). Which means, at the end of the while loop (313 - 358), we will have some keys which have completed their handshakes. If we wait until next poll() call which the caller has to make, we add some more additional latency to it and also, we add one more additional poll() call since we haven't added these connections which have completed handshakes to the connected list. Hence the network client still thinks that the connection is not ready yet until the next poll(). For these reasons, I want to have it here instead of above the while loop.

this.metrics.selectorIORate.inc();
}
long endIo = time.milliseconds();
this.metrics.selectorIOTime.update(endIo - endSelect);
}

/**
* Add those Ssl connections to connected list on handshake completion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think doc can be improved. What are "those"?

*/
private void completeSslHandshakes() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does not do what its name implies, could you rename it to checkHandshakeStatus() or something?

Iterator<SSLTransmission> sslTransIter = pendingSslHandshakes.iterator();
while (sslTransIter.hasNext()) {
Transmission sslTransmission = sslTransIter.next();
if (sslTransmission != null && sslTransmission.ready()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the null check is unnecessary and misleading.

connected.add(sslTransmission.getConnectionId());
metrics.selectorConnectionCreated.inc();
Long handshakeStartTime = sslHandshakeTimer.remove(sslTransmission.getConnectionId());
if (handshakeStartTime != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this is unnecessary, but I think you should simply use the other suggestion of using a single data structure.

metrics.selectorPerceivedSslHandshakeTime.update(System.currentTimeMillis() - handshakeStartTime.longValue());
}
sslTransIter.remove();
}
}
}

/**
* Generate the description for a SocketChannel
*/
Expand Down Expand Up @@ -455,6 +486,10 @@ private void close(SelectionKey key) {
activeConnections.set(this.keyMap.size());
try {
transmission.close();
if (pendingSslHandshakes.contains(transmission.getConnectionId())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pendingSslHandshakes is a list of SSLTransmission, so the argument to the contains() call is wrong. It is surprising that Lists don't complain about this.

pendingSslHandshakes.remove(transmission.getConnectionId());
sslHandshakeTimer.remove(transmission.getConnectionId());
}
} catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If a key can be closed in a different thread, would this part be thread safe?
  2. Not related to this PR.
    Transmission.close() will NOT throw an IOException if taking a look at its implementation in both PlainTextTransmission and SSLTransmission though the abstract method defines so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, looks like we might have an issue. even keyMap and activeConnections looks to be affected. Lets discuss in person.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the documentation it looks like the Selector isn't meant to be thread safe. The documentation of the class on line 64 says so. There are a lot of parts in this class that work on the fact that this class was not meant to be thread safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I cannot think of why we would want to use this Selector across threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selector should never be used across threads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad. I thought the processNewResponses() in SocketServer() was a daemon thread. Hence, the close(SelectionKey) could be called simultaneously when selector poll() is happening. Looks like its called sequentially and selector.poll() follows it. So, I don't think we have a problem of multiple threads accessing it.

logger.error("IOException thrown during closing of transmission with connectionId {} :",
transmission.getConnectionId(), e);
Expand Down Expand Up @@ -483,16 +518,6 @@ private SelectionKey keyForId(String id) {
return this.keyMap.get(id);
}

/**
* Process connections that have finished their handshake
*/
private void handleConnect(SelectionKey key, Transmission transmission)
throws IOException {
transmission.finishConnect();
this.connected.add(transmission.getConnectionId());
this.metrics.selectorConnectionCreated.inc();
}

/**
* Process reads from ready sockets
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import junit.framework.Assert;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for? Don't we usually use org.junit.Assert?

import org.junit.After;
import org.junit.Before;
import org.junit.Test;
Expand Down Expand Up @@ -49,9 +50,7 @@ public void setup()
SSLFactory clientSSLFactory = new SSLFactory(clientSSLConfig);
this.server = new EchoServer(serverSSLFactory, 18383);
this.server.start();
this.selector =
new Selector(new NetworkMetrics(new MetricRegistry()), SystemTime.getInstance(),
clientSSLFactory);
this.selector = new Selector(new NetworkMetrics(new MetricRegistry()), SystemTime.getInstance(), clientSSLFactory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, why the this prefix in this method?

}

@After
Expand Down Expand Up @@ -203,6 +202,19 @@ public void testEmptyRequest()
assertEquals("", blockingRequest(connectionId, ""));
}

@Test
public void testSSLConnect()
throws IOException {
String connectionId =
selector.connect(new InetSocketAddress("localhost", server.port), BUFFER_SIZE, BUFFER_SIZE, PortType.SSL);
Assert.assertFalse("Channel should not be ready by now (until handshake completes)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that a handshake completes very fast and becomes ready before the connectionId is added to connected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is not possible as handshake involves 6 interactions (read and write)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is this all async? How can we rely on the fact that it won't complete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assumption might create a race condition. Also, is it really relevant to us that the channel is "not ready"? I think this test can be skipped. All you need to check is that it is ready when it is added to connected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this test does not seem relevant.

Copy link
Contributor

@pnarayanan pnarayanan May 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check I meant. the rest of it (line 212 and after) is a valid check.

selector.isChannelReady(connectionId));
while (!selector.connected().contains(connectionId)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you have resolved Gopal's comment. Could you talk to him in person if you are not convinced about his suggestion?

selector.poll(10000L);
}
Assert.assertTrue("Channel should have been ready by now ", selector.isChannelReady(connectionId));
}

private String blockingRequest(String connectionId, String s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test case for handshake failure and cleanup part? Basically to cover all new code.

throws Exception {
selector.poll(1000L, asList(SelectorTest.createSend(connectionId, s)));
Expand All @@ -224,10 +236,6 @@ private String blockingSSLConnect()
while (!selector.connected().contains(connectionId)) {
selector.poll(10000L);
}
//finish the handshake as well
while (!selector.isChannelReady(connectionId)) {
selector.poll(10000L);
}
return connectionId;
}
}