Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to separate context propagation from observability #42

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
dff8df9
Proposal to separate context propagation from observability
tedsuo Sep 8, 2019
5ad7d1c
cleanup description for Extract
tedsuo Sep 10, 2019
1dc3c7b
commas
tedsuo Sep 10, 2019
58248e6
Update text/0000-separate-context-propagation.md
tedsuo Sep 10, 2019
68cb0ba
RFC proposal: A layered approach to data formats
tedsuo Aug 13, 2019
3dc6a76
whitespace
tedsuo Aug 22, 2019
459435e
Capitalization
tedsuo Aug 22, 2019
c9c64f4
whitespace
tedsuo Aug 22, 2019
c3c7c24
CleanBaggage -> ClearBaggage
tedsuo Sep 10, 2019
4588096
move function descriptions to new line
tedsuo Sep 10, 2019
2d80dae
Add Optional subheader
tedsuo Sep 10, 2019
7a73210
cleanup rough edits
tedsuo Sep 10, 2019
0d8e41b
clean up advice on pre-existing context implementations
tedsuo Sep 10, 2019
aad5605
Better context descriptions
tedsuo Sep 10, 2019
4a930eb
remove data format file
tedsuo Sep 11, 2019
e1ef61f
remove git diff message
tedsuo Sep 11, 2019
f949435
improved code sytnax
tedsuo Sep 11, 2019
1cb155e
stop stuttering
tedsuo Sep 11, 2019
7b9e861
Update text/0000-separate-context-propagation.md
tedsuo Sep 11, 2019
07eb397
spacing
tedsuo Sep 11, 2019
0ebeb6c
Refine propagation
tedsuo Sep 25, 2019
147d6b0
Add RFC ID number from PR
tedsuo Oct 1, 2019
72d4651
remove RFC status line
tedsuo Oct 1, 2019
1472197
slight calrification for GetHTTPExtractor
tedsuo Oct 1, 2019
18a37d4
add global propagators
tedsuo Oct 1, 2019
7ea1834
Clean up motivation
tedsuo Oct 15, 2019
7317747
Clean up explanbation intro
tedsuo Oct 15, 2019
43ba8fd
Clarify context types
tedsuo Oct 15, 2019
d7d6f1c
Fix ChainHTTPInjector and ChainHTTPExtractor
tedsuo Oct 15, 2019
3a817a2
typo
tedsuo Oct 15, 2019
3381e0f
Reference Trace-Context, not just traceparent
tedsuo Oct 15, 2019
c15a107
Bagge context cleanup
tedsuo Oct 15, 2019
310e8d5
stronger language around context access
tedsuo Oct 15, 2019
f59fc27
Update text/0042-separate-context-propagation.md
tedsuo Oct 15, 2019
153b9aa
clean up tradeoffs
tedsuo Oct 15, 2019
f70855a
Update text/0042-separate-context-propagation.md
tedsuo Oct 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 213 additions & 0 deletions text/0042-separate-context-propagation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# Proposal: Separate Layer for Context Propagation

Design OpenTelemetry as a set of separate applications which operate on a shared context propagation mechanism.


## Motivation

Based on prior art, we know that fusing the observability system and the context propagation system together creates issues. Observability systems have special rules for propagating information, such as sampling, and may have different requirements from other systems which require non-local information to be sent downstream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Reading this sentence as "Observability systems [...] may have different requirements from other systems which require non-local information to be sent downstream." makes me wonder how I should interpret the last part. Obviously, observability systems also need non-local information to be sent downstream so that is no requirement difference. Maybe reformulate as "Observability systems have special rules for propagating information, such as sampling. This may result in different requirements from other systems regarding which information should be sent downstream." "non-local" seems redundant with "sent downstream" BTW.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I prefer your wording as well! Adding it in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use "downstream" quite often in this document, but I don't know if it is completely clear to all readers what is meant by this. Maybe add some "definitions" section or explain it in parentheses at the first occurrence. But maybe it is clear enough as-is, I'm not a native speaker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that is good feedback. We use the term "downstream" in a number of places in OTel, maybe we should add a glossary? https://github.com/search?q=org%3Aopen-telemetry+downstream&type=Code

* Separation of concerns
* Remove the Tracer dependency from context propagation mechanisms.
* Separate distributed context into Baggage and Correlations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Separate distributed context into Baggage and Correlations
* Separate DistributedContext into Baggage and Correlations

* Extensibility
* Allow users to create new applications for context propagation.
* For example: A/B testing, encrypted or authenticated data, and new, experimental forms of observability.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though you have two second-level points under "Extensibility", the second one is just examples for the first one. Maybe redistribute these e.g. as "Extensibility: Allow users ..., for example: * A/B testing * encrypted or ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks.


## Explanation

# OpenTelemetry Layered Architecture

![drawing](img/context_propagation_explanation.png)

OpenTelemetry is a distributed program, which requires non-local, transaction-level context in order to execute correctly. Transaction-level context can also be used to build other distributed programs, such as security, versioning, and network switching programs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpenTelemetry is a distributed program, which requires non-local, transaction-level context in order to execute correctly. Transaction-level context can also be used to build other distributed programs, such as security, versioning, and network switching programs.
Telemetry is a cross-cutting concern, which requires non-local, transaction-level context in order to execute correctly. Transaction-level context can also be used to build other cross-cutting concerns, such as security, fault injection, A/B testing, network routing, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpenTelemetry is a distributed program, which requires non-local, transaction-level context in order to execute correctly. Transaction-level context can also be used to build other distributed programs, such as security, versioning, and network switching programs.
OpenTelemetry is a distributed program, which requires non-local, transaction-level context to execute. Transaction-level context can also be used to build other distributed programs, e.g., for security, versioning, and network switching.


To allow for this extensibility, OpenTelemetry is separated into an **application layer** and a **context propagation layer**. In this architecture, multiple distributed applications - such as the observability and baggage systems provided by OpenTelemetry - simultaneously share the same underlying context propagation system in order to execute their programs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "this extensibility" refer to here? Maybe better: "the required extensibility [for the aforementioned applications/"programs"]".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To allow for this extensibility, OpenTelemetry is separated into an **application layer** and a **context propagation layer**. In this architecture, multiple distributed applications - such as the observability and baggage systems provided by OpenTelemetry - simultaneously share the same underlying context propagation system in order to execute their programs.
To allow for this extensibility, OpenTelemetry is separated into an **application layer** and a **context propagation layer**. In this architecture, multiple distributed applications - such as the observability and baggage systems provided by OpenTelemetry - share the same underlying context propagation system in order to execute their programs.

"simultaneously" to what? This is redundant to "share", I guess.



# Application Layer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC doesn't really explain how different applications would work, rather it goes into the implementation details of metrics and tracing "observability systems". It would be nice to have a better definition of what an application is in the application layer. Are all applications required to share a common interface?


## Observability API

OpenTelemetry currently contains two observability systems - Tracing and Metrics – and may be extended over time. These separate systems are bound into a unified Observability API through sharing labels – a mechanism for correlating independent observations – and through sharing propagators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the propagators themselves really shared though? I think it is merely the API that signals propagation points that is shared. That might be more hair-splitting than is good for easy understandability though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that Observability system does not automatically have a separate propagators for tracing and for metrics, but I see your point that Trace-Context and Correlation-Context are separate, and right now it's not clear if Tracing would use Correlation-Context.


**Observe(context, labels…, observations...) -> context**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some introducing sentence like "The following general forms of APIs exist:" would be good here. Although then the HTTP API's don't seem to fit, which really makes me wonder: What does this list of APIs actually enumerate? Or would it be better to reformat this like "The general form for all observability APIs is a function with the following signature: Observe(...). That is, it takes a Context, label keys, ...".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should definitely remove the "all" here.

In this RFC, I'm trying to define the Context and Propagation APIs. I want it to be clear enough about how we intend to leverage the context and propagation APIs that we can catch any design errors in those APIs. If I enumerate all of the details of Tracing and Metrics, I've hauled in too much. If I define nothing but GetHTTPExtractor and GetHTTPInjector in the Observability API, that will be insufficient for review.

I am trying to show how DistribtuedContext can be cleanly split into Baggage and Correlations. So I feel like Correlate should be present, along with Observe. But, again, this is a general form. I will try to be clearer that these two definitions - Observe and Correlate - are more abstract than the rest. I would still like feedback to the tune of "even as a general form, these functions are not correct."

I suggest that the details about how exactly these changes will affect our Tracing and Metrics APIs should be worked out in specification PRs, not in an RFC.

The general form for all observability APIs is a function which takes a Context, label keys, and observations as input, and returns an updated Context.

**Correlate(context, label, value, hoplimit) -> context**
To set the label values used by all observations in the current transaction, the Observability API provides a function which takes a context, a label key, a value, and a hoplimit, and returns an updated context. If the hoplimit is set to NO_PROPAGATION, the label will only be available to observability functions in the same process. If the hoplimit is set to UNLIMITED_PROPAGATION, it will be available to all downstream services.

**GetHTTPExtractor() -> extractor**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure of the value of calling out HTTP explicitly. There are plenty of practical examples where people use non-HTTP transports

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the "HTTP"-propagators can really be used to propagate over any protocol that supports ASCII key-value pairs as metadata/headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent is just to show that I am not proposing a "generic" propagator: HTTP, BINARY, etc, are each handled by a separate function. I didn't call this out before and it confused some people into thinking we were going back to the OpenTracing way of doing things.

We could call it TEXTMAP instead of HTTP, or HTTPText. Don't really have an opinion on that front. HTTP is the shortest though. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notion of extractor/injector is irrelevant to the API for the application (we didn't have them in OpenTracing). Injector/extractor are only needed when registering them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these have the same name as their "Baggage" counterparts, it seems they'll have to be namespaced in some way. here is no explicit observability concept today tough, only meters and tracers. Should there be a separate propagator API for each of these? If not, why does Baggage have a separate one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Oberon00 I actually assumed that we would not have separate propagators for Metrics and Tracing as something in the API layer. So I was namespacing by splitting things into an Observability API and a Baggage API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there can be no separation in the propagation API for any propagators, not just metrics/tracing, but any custom/baggage propagators. The only thing the application needs to know is "I want to inject/extract context in this format".

To deserialize the state of the system sent from the prior upstream process, the Observability API provides a function which returns a HTTPExtract function.

**GetHTTPInjector() -> injector**
To serialize the the current state of the observability system and send it to the next downstream process, the Observability API provides a function which returns a HTTPInject function.


## Baggage API

In addition to observability, OpenTelemetry provides a simple mechanism for propagating arbitrary data, called Baggage. This allows new distributed applications to be implemented without having to create new propagators.

To manage the state of a distributed application, the Baggage API provides a set of functions which read, write, and remove data.

**SetBaggage(context, key, value) -> context**
To record the distributed state of an application, the Baggage API provides a function which takes a context, a key, and a value as input, and returns an updated context which contains the new value.

**GetBaggage(context, key) -> value**
To access the distributed state of an application, the Baggage API provides a function which takes a context and a key as input, and returns a value.

**RemoveBaggage(context, key) -> context**
To delete distributed state from an application, the Baggage API provides a function which takes a context, a key, and a value as input, and returns an updated context which contains the new value.

**ClearBaggage(context) -> context**
To avoid sending baggage to an untrusted downstream process, the Baggage API provides a function remove all baggage from a context,

**GetHTTPExtractor() -> extractor**
To deserialize the state of the system sent from the the prior upstream process, the Baggage API provides a function which returns a HTTPExtract function.

**GetHTTPInjector() -> injector**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's misleading to combine injectors/extractors with the API for manipulating the baggage. I am not even sure there should be "get" methods for those - who would call that? The inter-process propagation layer (see my diagram above) sits below specific contexts, so they can register with it, but it shouldn't need to "call up"

To serialize the the current state of the system and send it to the next downstream process, the Baggage API provides a function which returns a HTTPInject function.


## Additional APIs

Because the application and context propagation layers are separated, it is possible to create new distributed applications which do not depend on either the Observability or Baggage APIs.

**GetHTTPExtractor() -> extractor**
To deserialize the state of the system in the prior upstream process, all additional APIs provide a function which returns a HTTPExtract function.

**GetHTTPInjector() -> injector**
To serialize the the current state of the system and send it to the next downstream process, all additional APIs provide a function which returns a HTTPInject function.


# Context Propagation Layer

## Context API

Distributed applications access data in-process using a shared context object. Each distributed application sets a single key in the context, containing all of the data for that system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e., value can be an arbitrary object, instead of just a scalar value.


**SetValue(context, key, value) -> context**
To record the local state of an application, the Context API provides a function which takes a context, a key, and a value as input, and returns an updated context which contains the new value.

**GetValue(context, key) -> value**
To access the local state of an application, the Context API provides a function which takes a context and a key as input, and returns a value.

### Optional: Automated Context Management
When possible, context should automatically be associated with program execution . Note that some languages do not provide any facility for setting and getting a current context. In these cases, the user is responsible for managing the current context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When possible, context should automatically be associated with program execution . Note that some languages do not provide any facility for setting and getting a current context. In these cases, the user is responsible for managing the current context.
When possible, the OpenTelemetry context should automatically be associated with the program execution context. Note that some languages do not provide any facility for setting and getting a current context. In these cases, the user is responsible for managing the current context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch.


**SetCurrent(context)**
To associate a context with program execution, the Context API provides a function which takes a Context.

**GetCurrent() -> context**
To access the context associated with program execution, the Context API provides a function which takes no arguments and returns a Context.


## Propagation API

Distributed applications propagate their state by data to downstream processes via injectors, functions which read and write application context into RPC requests. Each distributed application creates a set of propagators for every type of supported medium - currently only HTTP requests.

**HTTPInject(context, request)**
To send the data for all distributed applications downstream to the next process, the Propagation API provides a function which takes a context and an HTTP request, and mutates the HTTP request to include an HTTP Header representation of the context.

**HTTPExtract(context, request) -> context**
To receive data injected by prior upstream processes, the Propagation API provides a function which takes a context and an HTTP request, and returns context which represents the state of the upstream system.

**ChainHTTPInjector(injector, injector) injector**
In order for the application layer to function correctly, Propagation choices must be syncronized between all processes in the distributed system, and multiple applications must be able to inject and extract their context into the same request. To meet these requirements, the Propagation API provides a function which registers a set of propagators, which will all be executed in order when the future calls to inject and extract are made. A canonical propagator consists of an inject and an extract function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"applications" is a bit of a dangerous word choice here as it means not another application program but a application of the distributed context in the same program.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really struggle what to call these. I added a definition for distributed application at the beginning, to help. It feels natural to refer to this architecture as an "application layer" and a "context propagation layer" but it is easy to see how the term "application" could be confusing. Not sure what is better though - I see Yuri using Aspect Oriented Programming terms like "Cross-cutting concern" but people who know what that is might think we are being literal about leveraging AOP.

Ideas?

  • Cross-cutting concerns
  • Distributed subsystems
  • Some made up word, idk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Leave out "the future".


**ChainHTTPExtractor(extractor, extractor) extractor**
In order for the application layer to function correctly, Propagation choices must be syncronized between all processes in the distributed system, and multiple applications must be able to inject and extract their context into the same request. To meet these requirements, the Propagation API provides a function which registers a set of propagators, which will all be executed in order when the future calls to inject and extract are made. A canonical propagator consists of an inject and an extract function.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description is identical to ChainHTTPInjector. Perhaps combine them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, copypasta error, one moment please...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any news on this? It seems the two signatures should belong under this same description instead of vice versa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


### Optional: Global Propagators
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid global state. It can always be added later, but cannot be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess most languages already crossed the Rubicon here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make it optional, but I'm now realizing this has to rethought, given named tracers.

It is often convenient to create a chain of propagators during program initialization, and then access these combined propagators later in the program. To facilitate this, global injectors and extractors are optionally available. However, there is no requirement to use this feature.

**SetHTTPInjector(injector)**
To update the global injector, the Propagation API provides a function which takes an injector.

**GetHTTPInjector() -> injector**
To access the global injector, the Propagation API provides a function which returns an injector.

**SetHTTPExtractor(extractor)**
To update the global extractor, the Propagation API provides a function which takes an injector.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/takes an injector/takes an extractor/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm getting buried by my own boilerplate here. :)


**GetHTTPExtractor() -> extractor**
To access the global extractor, the Propagation API provides a function which returns an extractor.

# Internal details

![drawing](img/context_propagation_details.png)

## Context details
OpenTelemetry currently implements three context types of context propagation.

**Span Context -** The serializable portion of a span, which is injected and extracted. The readable attributes are defined to match those found in the W3C **traceparent** header.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make positively sure I understand this -- when we do sample rate propagation, will it be classified as part of the span context for purposes of injection/extraction, but be output to a hidden Correlation context field? Or will we make it a part of the correlation context, even though it's not a label/value per se, only a value for a predefined key? And how do we make sure that nothing else clobbers our key if we share with correlation context?

Copy link
Member

@lizthegrey lizthegrey Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lizthegrey sorry for the slow update on this! I believe that sampling is intended to be propagated as part of SpanContext. In the Trace-Context spec, sampling would be included as a "vendor-specifc detail" (with OpenTelemetry SDK counting as a vendor in this case) inside of the tracestate field. https://w3c.github.io/trace-context/#tracestate-header

There is also a sampled flag in the traceparent field which should be set: https://w3c.github.io/trace-context/#trace-flags

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Span Context -** The serializable portion of a span, which is injected and extracted. The readable attributes are defined to match those found in the W3C **traceparent** header.
**Span Context -** The serializable portion of a span, which is injected and extracted. The readable attributes are defined to match those found in the [W3C Trace Context specification](https://www.w3.org/TR/trace-context/).

traceparent is only one of two headers in the W3C spec (the other being tracestate).


**Correlation Context -** Correlation Context contains a map of labels and values, to be shared between metrics and traces. This allows observability data to be indexed and dimensionalized in a variety of ways. Note that correlations can quickly add overhead when propagated in-band. But because this data is write-only, it may be possible to optimize how it is transmitted.

**Baggage Context -** Transaction-level application data, meant to be shared with downstream components. This data is readable, and must be propagated in-band. Because of this, Baggage should be used sparingly, to avoid ballooning the size of RPC requests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Baggage Context -** Transaction-level application data, meant to be shared with downstream components. This data is readable, and must be propagated in-band. Because of this, Baggage should be used sparingly, to avoid ballooning the size of RPC requests.
**Baggage -** Transaction-level application data, meant to be shared with downstream components. This data is readable, and must be propagated in-band. Because of this, Baggage should be used sparingly, to avoid ballooning the size of all downstream requests.


Note that when possible, OpenTelemetry APIs calls are given access to the entire context object, and not a specific context type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas of scenarios where this might not be possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I updated the language to be stronger here.



## Context Management and in-process propagation

In order for Context to function, it must always remain bound to the execution of code it represents. By default, this means that the programmer must pass a Context down the call stack as a function parameter. However, many languages provide automated context management facilities, such as thread locals. OpenTelemetry should leverage these facilities when available, in order to provide automatic context management.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order for Context to function, it must always remain bound to the execution of code it represents. By default, this means that the programmer must pass a Context down the call stack as a function parameter. However, many languages provide automated context management facilities, such as thread locals. OpenTelemetry should leverage these facilities when available, in order to provide automatic context management.
For Context to function, it must always remain bound to the execution of code it represents. By default, this means that the programmer must pass a Context down the call stack as a function parameter. However, many languages provide automated context management facilities, such as thread locals. OpenTelemetry should leverage these facilities when available, in order to provide automatic context management.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This seems to indicate that we must change the way we think about "Span" activation: We should really activate a whole context not a Span.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for setting the active span in the current context, but when moving work from one thread to another, or context switching in async/nonblocking runtimes, this needs to be addressed.

In general, I'm concerned that there will be important details which present themselves better when we attempt to implement this. We've gotten a bit of a pass so far because we did so much work in java at the very beginning of this process, before RFCs, etc. FWIW, in OpenTracing, we required a working implementation before adding an committing a change to the spec.


## Pre-existing Context implementations

In some languages, a single, widely used Context implementation exists. In other languages, there many be too many implementations, or none at all. For example, Go has a the context.Context object, and widespread conventions for how to pass it down the call stack.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved

In the cases where an extremely clear, pre-existing option is not available, OpenTelemetry should provide its own Context implementation.

## Default Propagators

When available, OpenTelemetry defaults to propagating via HTTP header definitions which have been standardized by the W3C.


# Trade-offs and mitigations

## Why separate Baggage from Correlations?

Since Baggage Context and Correlation Context appear very similar, why have two?

First and foremost, the intended uses for Baggage and Correlations are completely different. Secondly, the propagation requirements diverge significantly.

Correlation values are solely to be used as labels for metrics and traces. By making Correlation data write-only, how and when it is transmitted remains undefined. This leaves the door open to optimizations, such as propagating some data out-of-band, and situations where sampling decisions may cease the need to propagate correlation context any further.

Baggage values, on the other hand, are explicitly added in order to be accessed by downstream by other application code. Therefore, Baggage Context must be readable, and reliably propagated in-band in order to accomplish this goal.

There may be cases where a key-value pair is propagated as TagMap for observability and as a Baggage for application specific use. AB testing is one example of such use case. There is potential duplication here at call site where a pair is created and also at propagation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only mention of TagMap in this document . Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope!

tedsuo marked this conversation as resolved.
Show resolved Hide resolved

Solving this issue is not worth having semantic confusion with dual purpose. However, because all observability functions take the complete context as input, it may still be possible to use baggage values as labels.
Copy link
Member

@Oberon00 Oberon00 Oct 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it may still be possible to use baggage values as labels": That seems to contradict the separation. Should this sentence be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, that is true, it is a contradiction. I'm pointing out that it would be technically possible for the SDK to still do this. Maybe that is not a good idea. :)



## What about complex propagation behavior?

Some OpenTelemetry proposals have called for more complex propagation behavior. For example, falling back to extracting B3 headers if W3C Trace-Context headers are not found. Chained propagators and other complex behavior can be modeled as implementation details behind the Propagator interface. Therefore, the propagation system itself does not need to provide chained propagators or other additional facilities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this, and therefore do not understand why earlier section explicitly mentions Chained API functions. Chaining can be achieved through composition of SDK objects without cluttering the API that the end-user is exposed to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no solution for this at the API-level (or at least SDK-level), then there must be only one dedicated component (e.g. "the application") that sets all propagators as there is no shared protocol for how to coordinate on setting propagators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this language should be updated, now that the API has simple chaining. But, at the API level, we are only chaining together propagators for different applications. Complex details within one application – such as checking for Trace-Context and falling back to B3 if it is not present – does not need to be something handled at the API level. I was getting comments and questions about that kind of behavior, so I felt compelled to add this...

But, with the addition of named tracers, how a single inject/extract call can leverage multiple independent propagators needs to be thought about a bit more, since propagation behavior now depends on which tracer instance is used - or at least, I think that is what named tracers implies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on chaining, I think it doesn't belong here (see earlier comment in the API section)

on named tracers - I would propose to ignore those. Propagators, like exporters, are 1-1 with TracerFactory, not with Tracer, i.e. all differently named tracers share the propagator. And the application that ultimately invokes the propagation API does it at the generic propagation layer, not via tracers.



## Did you add a context parameter to every API call because Go has infected your brain?

No. The concept of an explicit context is fundamental to a model where independent distributed applications share the same context propagation layer. How this context appears or is expressed is language specific, but it must be present in some form.


# Prior art and alternatives

Prior art:
* OpenTelemetry distributed context
* OpenCensus propagators
* OpenTracing spans
* gRPC context
Comment on lines +200 to +203
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make these links.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do!


# Open questions

Related work on HTTP propagators has not been completed yet.
tedsuo marked this conversation as resolved.
Show resolved Hide resolved

* [W3C Trace-Context](https://www.w3.org/TR/trace-context/) candidate is not yet accepted
* Work on [W3C Correlation-Context](https://w3c.github.io/correlation-context/) has begun, but was halted to focus on Trace-Context.
* No work has begun on a theoretical W3C Baggage-Context.

Given that we must ship with working propagators, and the W3C specifications are not yet complete, how should we move forwards with implementing context propagation?

# Future possibilities

Cleanly splitting OpenTelemetry into an Application and Context Propagation layer may allow us to move the Context Propagation layer into its own, stand-alone project. This may facilitate adoption, by allowing us to share Context Propagation with gRPC and other projects.
Binary file added text/img/context_propagation_details.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added text/img/context_propagation_explanation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.