Streaming dependency items #63

albertteoh · 2021-06-16T10:11:38Z

Requirement - what kind of business use case are you trying to solve?

Create a production-grade implementation of System Architecture Feature that is able to build dependency items from a continuous stream of spans consumed from Kafka.

Problem - what in Jaeger blocks you from solving the requirement?

It is worth recognizing that there is an existing means of computing dependency items from spans through spark-dependencies. This solution is based on a single query on spans in a backing store that builds and bulk-loads the dependency items into the backing store.

However, to maintain an accurate count of edges (call counts) between services and an up-to-date topology, a streaming solution would be more suitable while also removing the need to manage cron jobs and date boundaries.

Proposal - what do you suggest to solve the problem or improve the existing situation?

We currently have a streaming solution for the System Architecture feature running in our (Logz.io) production environment. It is based on the Kafka Streams library, which could be integrated into the existing Jaeger architecture with Kafka as an intermediate buffer.

The following illustrates how we have implemented the System Architecture feature currently, courtesy of the engineer of this solution @PropAnt:

The reason why dependency items are written back to Kafka is to allow the Kafka streams application to efficiently write processed data without being limited by back-pressure from by the dependency items backing store.

High-level description of the Kafka streams topology:

We propose to adopt and open source our implementation that already works in production.

Any open questions to address

There are few ways to design the solution:

A single module with a Kafka Streams application can be added to calculate dependency items and stream them to the separate Kafka topic for further ingestion. As above, this approach is tried and tested in our production environment and code is available to open-source (more or less) as-is.
Another approach is to structure the code to encapsulate the business logic and data models in separate modules. The advantage of this approach is that Jaeger will have a streaming framework-agnostic implementation of System Architecture Feature. The trade-off is the additional effort required to architect the existing code to be agnostic to the streaming framework.

pavolloffay · 2021-06-17T08:23:06Z

I am glad to see this effort 👍🏼

I do have one question on how the results (dependency links) are stored and queried from the storage. The current spark batch implementation creates one record per time interval (e.g. 24h). I assume the streaming creates multiple records per the same interval. Did you have to change the query or perhaps the store implementation to deal with this?

PropAnt · 2021-06-17T11:02:05Z

@pavolloffay no we didn't. Jaeger queries for a certain period and gets a set of dependency links (items) back. Depending on the time interval for the dependencies, the query will return a bigger or smaller set. We didn't change data models and queries. The time interval for dependency links can be set through properties.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming dependency items #63

Streaming dependency items #63

albertteoh commented Jun 16, 2021 •

edited

Loading

pavolloffay commented Jun 17, 2021

PropAnt commented Jun 17, 2021 •

edited

Loading

Streaming dependency items #63

Streaming dependency items #63

Comments

albertteoh commented Jun 16, 2021 • edited Loading

Requirement - what kind of business use case are you trying to solve?

Problem - what in Jaeger blocks you from solving the requirement?

Proposal - what do you suggest to solve the problem or improve the existing situation?

Any open questions to address

pavolloffay commented Jun 17, 2021

PropAnt commented Jun 17, 2021 • edited Loading

albertteoh commented Jun 16, 2021 •

edited

Loading

PropAnt commented Jun 17, 2021 •

edited

Loading