Vizceral is a streaming aggregation tool similar to the service dependency graph we have now (but more pretty and powerful).
There've been a few look into this. I played around with it a bit, toying with a custom version of our dependency graph linker or using jq to translate stored traces. I also thought maybe this could be done online with a custom kafka or elasticsearch 5 pipeline. Or maybe something in-between, like a 1 minute interval hack of our spark job. There's also rumors of a spark alternative to normal zipkin collector (ahem @mansu ahem :) )
Here is a summary of notes last time this came up, when I chatted with @tramchamploo
so just to play with things you could use dependency linker like the zipkin-dependencies job does, except windowed into minute, not days. Use the zipkin api and GET /api/v1/traces with a timestamp and lookback of your choosing (ex 1 minute). With a custom linker, you can emit vizceral data directly or into a new index for the experiment, like zipkin-vizceral-yyyy-MM-dd-HHmm. In other words, it is like the existing spark job, but writing vizceral format and much more frequently.
To dig deeper, you'd want to some "partition" vs a "grouping" command like a groupBy, in order to group the traces into minutes.. so like before this flatMap here: https://github.com/openzipkin/zipkin-dependencies/blob/master/elasticsearch/src/main/java/zipkin/dependencies/elasticsearch/ElasticsearchDependenciesJob.java#L116 This would be the thing that buckets traces into epoch minutes.
In order to get the service relationships, you need to walk the trace tree. To generate the tree you need to merge multiple documents (which consitute a trace), to tell which pieces are a client or server call. This is what the DependencyLinker does.
So basically, by bucketing offline data into 1 minute intervals (based on the root span's timestamp), you can get pretty good feedback. It will be mostly correct as traces are less duration than a minute. By using the api and a variation of our linker, you'd get a good head start which can of course be refactored later if/when a real-time ingestion pipeline exists.
@adriancole We used the open source spark job to get a dependency graph and visualized it with vizceral. It was a very good visualization. But the data quality can be improved. We will try to open source this also.
I'm very interested in this. I was hoping to be able to do some of this using SQS/SNS/Kinesis for reading the data in realtime instead of doing windows from the database
Working on integration between vizceral and zipkin. I've added a new api like /vizceral which simply query all traces in the last few seconds and use DependencyLinker to link them. Tried to set the limit to Integer.MAX_VALUE in order to retrieve traces as much as possible to get a full dependency graph. But ES won't allow that, throwing
org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: too_many_clauses: maxClauseCount is set to 1024
at org.apache.lucene.search.BooleanQuery$Builder.add(BooleanQuery.java:137) ~[lucene-core-5.5.2.jar!/:5.5.2 8e5d40b22a3968df065dfc078ef81cbb031f0e4a - sarowe - 2016-06-21 11:38:23]
at org.elasticsearch.index.query.TermsQueryParser.parse(TermsQueryParser.java:200) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:250) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.index.query.IndexQueryParserService.innerParse(IndexQueryParserService.java:320) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:223) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:218) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.search.query.QueryParseElement.parse(QueryParseElement.java:33) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.search.SearchService.parseSource(SearchService.java:856) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.search.SearchService.createContext(SearchService.java:667) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:633) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:377) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:368) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:293) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-2.4.1.jar!/:2.4.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.8.0_91]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
Anyone have any ideas?
the code @tramchamploo mentions that limits the result count in elasticsearch to 1024 is here https://github.com/openzipkin/zipkin/blob/master/zipkin-storage/elasticsearch/src/main/java/zipkin/storage/elasticsearch/ElasticsearchSpanStore.java#L140