Permalink
Browse files

Additional performance debugging in the broker: per node timing

  • Loading branch information...
1 parent dd0112e commit a1549ecc0806ece733245ef9ed1f0076100dce47 @santip santip committed Dec 5, 2012
@@ -18,17 +18,11 @@
*/
package com.senseidb.search.node;
-import com.linkedin.norbert.javacompat.network.RequestBuilder;
-import com.linkedin.norbert.network.ResponseIterator;
-import com.linkedin.norbert.network.common.ExceptionIterator;
-import com.linkedin.norbert.network.common.PartialIterator;
-import com.linkedin.norbert.network.common.TimeoutIterator;
-import com.senseidb.metrics.MetricFactory;
-import com.senseidb.search.req.SenseiRequest;
import it.unimi.dsi.fastutil.ints.IntOpenHashSet;
import it.unimi.dsi.fastutil.ints.IntSet;
import java.util.ArrayList;
+import java.util.LinkedList;
import java.util.List;
import java.util.Set;
import java.util.concurrent.Callable;
@@ -41,12 +35,19 @@
import com.linkedin.norbert.NorbertException;
import com.linkedin.norbert.javacompat.cluster.Node;
import com.linkedin.norbert.javacompat.network.PartitionedNetworkClient;
+import com.linkedin.norbert.javacompat.network.RequestBuilder;
+import com.linkedin.norbert.network.ResponseIterator;
import com.linkedin.norbert.network.Serializer;
+import com.linkedin.norbert.network.common.ExceptionIterator;
+import com.linkedin.norbert.network.common.PartialIterator;
+import com.linkedin.norbert.network.common.TimeoutIterator;
+import com.senseidb.metrics.MetricFactory;
import com.senseidb.metrics.MetricsConstants;
import com.senseidb.search.req.AbstractSenseiRequest;
import com.senseidb.search.req.AbstractSenseiResult;
import com.senseidb.search.req.ErrorType;
import com.senseidb.search.req.SenseiError;
+import com.senseidb.search.req.SenseiRequest;
import com.senseidb.svc.api.SenseiException;
import com.yammer.metrics.core.Meter;
import com.yammer.metrics.core.MetricName;
@@ -238,16 +239,23 @@ public RESULT call() throws Exception {
protected List<RESULT> doCall(final REQUEST req) throws ExecutionException {
List<RESULT> resultList = new ArrayList<RESULT>();
+
+ // only instantiate if debug logging is enabled
+ final List<StringBuilder> timingLogLines = logger.isDebugEnabled() ? new LinkedList<StringBuilder>() : null;
+
ResponseIterator<RESULT> responseIterator =
buildIterator(_networkClient.sendRequestToOneReplica(getRouteParam(req), new RequestBuilder<Integer, REQUEST>() {
- private int count = 0;
@Override
public REQUEST apply(Node node, Set<Integer> nodePartitions) {
// TODO: Cloning is yucky per http://www.artima.com/intv/bloch13.html
REQUEST clone = (REQUEST) (((SenseiRequest) req).clone());
clone.setPartitions(nodePartitions);
-
+ if (timingLogLines != null) {
+ // this means debug logging was enabled, produce first portion of log lines
+ timingLogLines.add(buildLogLineForRequest(node, clone));
@vzhabiuk

vzhabiuk Dec 7, 2012

Contributor

We might have concurrency issue here. Does the apply method being executed in a single thread? LinkedList is not threadsafe

@santip

santip Dec 7, 2012

Contributor

It looks like it's a simple for each that builds each request and adds the response future to the queue:
https://github.com/linkedin/norbert/blob/master/network/src/main/scala/com/linkedin/norbert/network/partitioned/PartitionedNetworkClient.scala#L207

however I'm not familiar with that code so I may be mistaken and the foreach may be executed in parallel

@jhartman could you take a look at this change?

+ }
+
REQUEST customizedRequest = customizeRequest(clone);
return customizedRequest;
}
@@ -257,10 +265,35 @@ public REQUEST apply(Node node, Set<Integer> nodePartitions) {
resultList.add(responseIterator.next());
}
- logger.debug(String.format("There are %d responses", resultList.size()));
+ if (timingLogLines != null) {
+ // this means debug logging was enabled, complete the timing log lines and log them
+ int i = 0;
+ for (StringBuilder logLine : timingLogLines) {
+ // we are assuming the request builder gets called in the same order as the response
+ // iterator is built, otherwise the loglines would be out of sync between req & res
+ if (i < resultList.size()) {
+ logger.debug(buildLogLineForResult(logLine, resultList.get(i++)));
+ }
+ }
+ logger.debug(String.format("There are %d responses", resultList.size()));
+ }
return resultList;
}
+
+ protected StringBuilder buildLogLineForRequest(Node node, REQUEST clone) {
+ return new StringBuilder()
+ .append("Request to individual node - id:")
+ .append(node.getId())
+ .append(" - url:")
+ .append(node.getUrl())
+ .append(" - partitions:")
+ .append(node.getPartitionIds());
+ }
+
+ protected StringBuilder buildLogLineForResult(StringBuilder logLine, RESULT result) {
+ return logLine.append(" - took ").append(result.getTime()).append("ms.");
+ }
protected ResponseIterator<RESULT> buildIterator(ResponseIterator<RESULT> responseIterator) {
TimeoutIterator<RESULT> timeoutIterator = new TimeoutIterator<RESULT>(responseIterator, _timeout);
@@ -180,7 +180,19 @@ public SenseiRequest customizeRequest(SenseiRequest request)
return request;
}
+ @Override
+ protected StringBuilder buildLogLineForRequest(Node node, SenseiRequest req) {
+ return super.buildLogLineForRequest(node, req).append(" - routeParam: ").append(req.getRouteParam());
+ }
+ @Override
+ protected StringBuilder buildLogLineForResult(StringBuilder logLine, SenseiResult result) {
+ return super.buildLogLineForResult(logLine, result)
+ .append(" - hits: ")
+ .append(result.getNumHits())
+ .append("/")
+ .append(result.getTotalDocs());
+ }
public void handleClusterConnected(Set<Node> nodes)
{

3 comments on commit a1549ec

Contributor

vzhabiuk replied Dec 7, 2012

Hi Santiago
Thanks your answer.
Two more questions:

  1. In Sensei we were leveraging com.yammer.metrics.core.Timer class to get statistics about Broker/Node timing. It's exposed via JMX and can be easily converted to autometrics. Those debug statements bring a little more info, but since the broker might serve hundreds of qps, request and response logs would definitely be out of sync. While working on the homepage we had to measure latency for different types of queries and the best place to do so was in the client, that sends queries to Sensei
  2. You could also place the debugging logic at AbstractSenseiCoreService to expose the timing per a single node. This would be easier

With many thanks,
Volodymyr

Contributor

santip replied Dec 7, 2012

Hi Volodymyr,

The main intent of this change is to be able to diagnose performance issues in large clusters. Currently the nodes are already logging the individual times but this attempts to aggregate all the individual times of the scatter gather in the broker log to more easily identify misbehaving nodes. If I can leverage yammer but still expose the data through the logs tha

Regarding them being out of sync, the request and response are merged into a single line precisely to avoid that, and everything is turned off until DEBUG logging is enabled. Notice that in the apply it stores the lines in order in the list and then completes them (also in order) when the responses are received.

Is there a way to get the individual node times from the client?

Contributor

vzhabiuk replied Dec 7, 2012

Please sign in to comment.