-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Toward Magma Distributed Tracing #10492
Labels
Comments
cc @mstre123 @andreilee for visibility |
Nit request, in the |
@electronjoe all the |
Closing as accepted |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Toward Magma Distributed Tracing
tl;dr
Distributed tracing unlocks developers with enhanced debugging tools and empowers operators with radically improved visibility into their deployments. Imagine real-time, dev-or-prod, passively-generated request tracing — across Orc8r and gateways.
If you zoom in on the picture below, you’ll see a cross-service, end-to-end understanding of a request’s path through an example application. This includes services visited, latencies, annotations, and more — both for application-level code, as well as for DB operations.
Intro
This document aims to provide a guiding outline for how to progressively outfit the Magma project with distributed tracing functionalities.
In particular, we propose
Context
This section provides context on the distributed tracing landscape, for the chosen set of technologies. Note that we’re proposing to use both Jaeger and OpenTelemetry together as complementary technologies.
For an intro to distributed tracing, see the short tracing intro or comprehensive Jaeger intro.
Why distributed tracing
The Magma project has grown large enough that debugging, especially root-causing performance bottlenecks, has become unwieldy. Distributed tracing provides a mechanism for an online, real-time understanding of how requests pass through a series of services. This will unlock developers with enhanced debugging tools and empower operators with radically improved visibility into their deployments.
Jaeger
CNCF graduated project. End-to-end solution for outfitting a project with distributed tracing. Compatible with OpenTracing (and mostly+imminently compatible with OpenTelemetry). Supports multiple storage backends, defaulting to Elasticsearch. Also supports tunable sampling rates and patterns, to limit network and storage pressure.
Includes the following components
Each component can be deployed as an individual container, or the full solution can be managed by the Jaeger K8s operator.
Additional reading
OpenTelemetry
CNCF sandbox project (OpenTracing, its predecessor, was a CNCF incubating project). Open specification of how to represent, propagate, and store spans. Also includes alpha specifications for metrics and logs formats — may be of interest to us in the future, especially logs, but for now we can focus on tracing. OpenTelemetry is the new, backwards-compatible incarnation of OpenTracing.
Also includes language-specific (e.g. Go) and framework-specific (e.g. gRPC) libraries to generate, propagate, and report spans. Supported languages include
Also supports reporting spans for application code requests into storage backends (e.g. reporting on Postgres lock contention), as shims in the caller’s language
Additional reading
Architecture
This section describes the proposed architecture for outfitting the Magma project with distributed tracing.
Desiderata include
With these desiderata in mind, we present the following architecture
Description
jaeger-
components are deployments of existing Jaeger components -- no custom codeAffordances
Appendix
Option: use fluentd to aggregate spans
It’s possible to use a FluentBit exporter for OpenTelemetry. This would allow our data pipeline, on the AGW side, to remain unchanged. This is something we will want to look into after the POC tasks, specifically to answer the questions
The text was updated successfully, but these errors were encountered: