open-telemetry · bogdandrutu · Aug 26, 2019 · Aug 16, 2019 · Aug 17, 2019 · Aug 17, 2019
diff --git a/text/0006-sampling.md b/text/0006-sampling.md
@@ -0,0 +1,337 @@
+# Sampling API
+
+*Status: proposed*
+
+## TL;DR
+This section tries to summarize all the changes proposed in this RFC:
+ 1. Move the `Sampler` interface from the API to SDK package. Apply some minor changes to the
+ `Sampler` API.
+ 1. Add a new `SamplerHint` concept to the API package.
+ 1. Add capability to record `Attributes` that can be used for sampling decision during the `Span`
+ creation time.
+ 1. Remove `addLink` APIs from the `Span` interface, and allow recording links only during the span
+ construction time.
+
+## Motivation
+
+Different users of OpenTelemetry, ranging from library developers, packaged infrastructure binary
+developers, application developers, operators, and telemetry system owners, have separate use cases
+for OpenTelemetry that have gotten muddled in the design of the original Sampling API. Thus, we need
+to clarify what APIs each should be able to depend upon, and how they will configure sampling and
+OpenTelemetry according to their needs.
+
+```
+
+                    +----------+           +-----------+
+           grpc     |  Library |           |           |
+           Django   |  Devs    +---------->| OTel API  |
+           Express  |          |   +------>|           |
+                    +----------+   |  +--->+-----------+                  +---------+
+                                   |  |          ^                        | OTel    |
+                                   |  |          |                     +->| Proxy   +---+
+                                   |  |          |                     |  |         |   |
+                    +----------+   |  |    +-----+-----+------------+  |  +---------+   |
+                    |          |   |  |    |           | OTel Wire  |  |                |
+           Hbase    |  Infra   |   |  |    |           | Export     |+-+                v
+           Envoy    |  Binary  +---+  |    |  OTel     |            |  |           +----v-----+
+                    |  Devs    |      |    |  SDK      +------------+  |           |          |
+                    +----------+---------->|           |            |  +---------->|  Backend |
+                                   +------>|           | Custom     |  +---------->|          |
+                                   |  |    |           | Export     |  |           +----------+
+                    +----------+   |  |    |           |            |+-+             ^
+                    |          +---+  |    +-----------+------------+                |
+                    |  App     +------+       ^              ^                       |
+                    |  Devs    +              |              |          +------------+-+
+                    |          |              |              |          |              |
+                    +----------+          +---+----+         +----------+   Telemetry  |
+                                          |  SRE   |                    |   Owner      |
+                                          |        |                    |              |
+                                          +--------+                    +--------------+
+                                                                          Lightstep
+                                                                          Honeycomb
+
+```
+## Explanation
+
+We outline five different use cases (who may be overlapping sets of people), and how they should
+interact with OpenTelemetry:
+
+### Library developer
+Examples: gRPC, Express, Django developers.
+
+ * They must only depend upon the OpenTelemetry API and not upon the SDK.
+   * For testing only they may depend on the SDK with InMemoryExporter.
+ * They are shipping source code that will be linked into others' applications.
+ * They have no explicit runtime control over the application.
+ * They know some signal about what traces may be interesting (e.g. unusual control plane requests)
+ or uninteresting (e.g. health-checks), but have to write fully generically.
+
+**Solution:**
+
+ * On the start Span operation, the OpenTelemetry API will allow marking a span with one of three
+ choices for the [SamplingHint](#samplinghint).
+
+### Infrastructure package/binary developer
+Examples: HBase, Envoy developers.
+
+ * They are shipping self-contained binaries that may accept YAML or similar run-time configuration,
+ but are not expected to support extensibility/plugins beyond the default OpenTelemetry SDK, 
+ OpenTelemetry SDKTracer, and OpenTelemetry wire format exporter.
+ * They may have their own recommendations for sampling rates, but don't run the binaries in
+ production, only provide packaged binaries. So their sampling rate configs, and sampling strategies
+ need to a finite "built in" set from OpenTelemetry's SDK.
+ * They need to deal with upstream sampling decisions made by services calling them.
+
+**Solution:**
+ * Allow different sampling strategies by default in OpenTelemetry SDK, all configurable easily via
+ YAML or feature flags. See [default samplers](#default-samplers).
+
+### Application developer
+These are the folks we've been thinking the most about for OpenTelemetry in general.
+
+ * They have full control over the OpenTelemetry implementation or SDK configuration. When using the
+ SDK they can configure custom exporters, custom code/samplers, etc.
+ * They can choose to implement runtime configuration via a variety of means (e.g. baking in feature
+ flags, reading YAML files, etc.), or even configure the library in code.
+ * They make heavy usage of OpenTelemetry for instrumenting application-specific behavior, beyond
+ what may be provided by the libraries they use such as gRPC, Django, etc.
+
+**Solution:**
+ * Allow application developers to link in custom samplers or write their own when using the
+ official SDK.
+   * These might include dynamic per-field sampling to achieve a target rate
+   (e.g. https://github.com/honeycombio/dynsampler-go)
+ * Sampling decisions are made within the start Span operation, after attributes relevant to the
+ span have been added to the Span start operation but before a concrete Span object exists (so that
+ either a NoOpSpan can be made, or an actual Span instance can be produced depending upon the
+ sampler's decision).
+ * Span.IsRecording() needs to be present to allow costly span attribute/log computation to be
+ skipped if the span is a NoOp span.
+
+### Application operator
+Often the same people as the application developers, but not necessarily
+
+ * They care about adjusting sampling rates and strategies to meet operational needs, debugging,
+ and cost.
+
+**Solution:**
+ * Use config files or feature flags written by the application developers to control the
+ application sampling logic.
+ * Use the config files to configure libraries and infrastructure package behavior.
+
+### Telemetry infrastructure owner
+They are the people who provide an implementation for the OpenTelemetry API by using the SDK with
+custom `Exporter`s, `Sampler`s, hooks, etc. or by writing a custom implementation, as well as 
+running the infrastructure for collecting exported traces.
+
+ * They care about a variety of things, including efficiency, cost effectiveness, and being able to
+ gather spans in a way that makes sense for them.
+
+**Solution:**
+ * Infrastructure owners receive information attached to the span, after sampling hooks have already
+ been run.
+
+## Internal details
+In Dapper based systems (or systems without a deferred sampling decision) all exported spans are 
+stored to the backend, thus some of these systems usually don't scale to a high volume of traces,
+or the cost to store all the Spans may be too high. In order to support this use-case and to 
+ensure the quality of the data we send, OpenTelemetry needs to natively support sampling with some
+requirements:
+ * Send as many complete traces as possible. Sending just a subset of the spans from a trace is 
+ less useful because in this case the interaction between the spans may miss.
+ * Allow application operator to configure the sampling frequency.
+
+For new modern systems that need to collect all the Spans and later may or may not do a deferred 
+sampling decision, OpenTelemetry needs to natively support a way to configure the library to 
+collect and export all the Spans. This is possible even though OpenTelemetry supports sampling by
+setting a default config to always collect all the spans.
+
+### Sampling flags
+OpenTelemetry API has two flags/properties:
+ * `RecordEvents`
+   * This property is exposed in the `Span` interface (e.g. `Span.isRecordingEvents()`).
+   * If `true` the current `Span` records tracing events (attributes, events, status, etc.), 
+   otherwise all tracing events are dropped.
+   * Users can use this property to determine if expensive trace events can be avoided.
+ * `SampledFlag`
+   * This flag is propagated via the `TraceOptions` to the child Spans (e.g.
+   `TraceOptions.isSampled()`). For more details see the w3c definition [here][trace-flags].
+   * In Dapper based systems this is equivalent to `Span` being `sampled` and exported.
+
+The flag combination `SampledFlag == false` and `RecordEvents == true` means that the current `Span`
+does record tracing events, but most likely the child `Span` will not. This combination is 
+necessary because:
+ * Allow users to control recording for individual Spans.
+ * OpenCensus has this to support z-pages, so we need to keep backwards compatibility.
+
+The flag combination `SampledFlag == true` and `RecordEvents == false` can cause gaps in the 
+distributed trace, and because of this OpenTelemetry API should NOT allow this combination.
+
+It is safe to assume that users of the API should only access the `RecordEvents` property when 
+instrumenting code and never access `SampledFlag` unless used in context propagators.
+
+### SamplingHint
+This is a new concept added in the OpenTelemetry API that allows to suggest sampling hints to the
+implementation of the API:
+ * `NOT_RECORD`
+   * Suggest to not `RecordEvents = false` and not propagate `SampledFlag = false`.
+ * `RECORD`
+   * Suggest `RecordEvents = true` and `SampledFlag = false`.
+ * `RECORD_AND_PROPAGATE`
+   * Suggest to `RecordEvents = true` and propagate `SampledFlag = true`.
+
+The default option for the span creation is to not have any suggestion (or suggestion is not
+specified). This can be implemented by using `null` as the default option or any language specific
+mechanism to achieve the same result.
+
+### Sampler interface
+The interface for the Sampler class that is available only in the OpenTelemetry SDK:
+ * `TraceID`
+ * `SpanID`
+ * Parent `SpanContext` if any
+ * `SamplerHint`
+ * `Links`
+ * Span name
+ * `SpanKind`
+ * Initial set of `Attributes` for the `Span` being constructed
+
+It produces as an output called `SamplingResult`:
+ * A `SamplingDecision` enum [`NOT_RECORD`, `RECORD`, `RECORD_AND_PROPAGATE`].
+ * A set of span Attributes that will also be added to the `Span`.
+   * These attributes will be added after the initial set of `Attributes`.
+ * (under discussion in separate RFC) the SamplingRate float.
+
+### Default Samplers
+These are the default samplers implemented in the OpenTelemetry SDK:
+ * ALWAYS_ON
+   * Ignores all values in SamplingHint.
+ * ALWAYS_OFF
+   * Ignores all values in SamplingHint.
+ * ALWAYS_PARENT
+   * Ignores all values in SamplingHint.
+   * Trust parent sampling decision (trusting and propagating parent `SampledFlag`).
+   * For root Spans (no parent available) returns `NOT_RECORD`.
+ * Probability
+   * Allows users to configure to ignore or not the SamplingHint for every value different than 
+   `UNSPECIFIED`. 
+     * Default is to NOT ignore `NOT_RECORD` and `RECORD_AND_PROPAGATE` but ignores `RECORD`.
+   * Allows users to configure to ignore the parent `SampledFlag`.
+   * Allows users to configure if probability applies only for "root spans", "root spans and remote 
+   parent", or "all spans".
+     * Default is to apply only for "root spans and remote parent".
+     * Remote parent property should be added to the SpanContext see specs [PR/216][specs-pr-216]
+   * Sample with 1/N probability
+
+**Root Span Decision:**
+
+|Sampler|RecordEvents|SampledFlag|
+|---|---|---|
+|ALWAYS_ON|`True`|`True`|
+|ALWAYS_OFF|`False`|`False`|
+|ALWAYS_PARENT|`False`|`False`|
+|Probability|`SamplingHint==RECORD OR SampledFlag`|`SamplingHint==RECORD_AND_PROPAGATE OR Probability`|
+
+**Child Span Decision:**
+
+|Sampler|RecordEvents|SampledFlag|
+|---|---|---|
+|ALWAYS_ON|`True`|`True`|
+|ALWAYS_OFF|`False`|`False`|
+|ALWAYS_PARENT|`ParentSampledFlag`|`ParentSampledFlag`|
+|Probability|`SamplingHint==RECORD OR SampledFlag`|`ParentSampledFlag OR SamplingHint==RECORD_AND_PROPAGATE OR Probability`|
+
+### Links
+This RFC proposes that Links will be recorded only during the start `Span` operation, because:
+* Link's `SampledFlag` can be used in the sampling decision.
+* OpenTracing supports adding references only during the `Span` creation.
+* OpenCensus supports adding links at any moment, but this was mostly used to record child Links 
+which are not supported in OpenTelemetry.
+* Allowing links to be recorded after the sampling decision is made will cause samplers to not 
+work correctly and unexpected behaviors for sampling.
+
+### When does sampling happen?
+The sampling decision will happen before a real `Span` object is returned to the user, because:
+ * If child spans are created they need to know the 'SampledFlag'.
+ * If `SpanContext` is propagated on the wire the 'SampledFlag' needs to be set.
+ * If user records any tracing event the `Span` object needs to know if the data are kept or not.
+ It may be possible to always collect all the events until the sampling decision is made but this is
+ an important optimization.
+
+There are two important use-cases to be considered:
+ * All information that may be used for sampling decisions are available at the moment when the 
+ logical `Span` operation should start. This is the most common case.
+ * Some information that may be used for sampling decision are NOT available at the moment when the
+ logical `Span` operation should start (e.g. `http.route` may be determine later).
+
+The current [span creation logic][span-creation] facilitates very well the first use-case, but
+the second use-case requires users to record the logical `start_time` and collect all the
+information necessarily to start the `Span` in custom objects, then when all the properties are
+available call the span creation API.
+
+The RFC proposes that we keep the current [span creation logic][span-creation] as it is and we will
+address the delayed sampling in a different RFC when that becomes a high priority.
+
+The SDK must call the `Sampler` every time a `Span` is created during the start span operation.
+
+**Alternatives considerations:**
+ * We considered, to offer a delayed span construction mechanism:
+    * For languages where a `Builder` pattern is used to construct a `Span`, to allow users to
+    create a `Builder` where the start time of the Span is considered when the `Builder` is created.
+    * For languages where no intermediate object is used to construct a `Span`, to allow users maybe
+     via a `StartSpanOption` object to start a `Span`. The `StartSpanOption` allows users to set all
+     the start `Span` properties.
+    * Pros:
+      * Would resolve the second use-case posted above.
+    * Cons:
+      * We could not identify too many real case examples for the second use-case and decided to
+      postpone the decision to avoid premature decisions.
+ * We considered, instead of requiring that sampling decision happens before the `Span` is 
+ created to add an explicit `MakeSamplingDecision(SamplingHint)` on the `Span`. Attempts to create
+ a child `Span`, or to access the `SpanContext` would fail if `MakeSamplingDecision()` had not yet
+ been run.
+   * Pros:
+     * Simplifies the case when all the attributes that may be used for sampling are not available
+     when the logical `Span` operation should start.
+   * Cons:
+     * The most common case would have required an extra API call.
+     * Error prone, users may forget to call the extra API.
+     * Unexpected and hard to find errors if user tries to create a child `Span` before calling
+     MakeSamplingDecision().
+ * We considered allowing the sampling decision to be arbitrarily delayed, but guaranteed before 
+ any child `Span` is created, or `SpanContext` is accessed, or before `Span.end()` finished.
+   * Pros:
+     * Similar and smaller API that supports both use-cases defined ahead.
+   * Cons:
+     * If `SamplingHint` needs to also be delayed recorded then an extra API on Span is required 
+     to set this.
+     * Does not allow optimization to not record tracing events, all tracing events MUST be 
+     recorded before the sampling decision is made.
+
+## Prior art and alternatives
+Prior art for Zipkin, and other Dapper based systems: all client-side sampling decisions are made at
+head. Thus, we need to retain compatibility with this.
+
+## Open questions
+This RFC does not necessarily resolve the question of how to propagate sampling rate values between
+different spans and processes. A separate RFC will be opened to cover this case.
+
+## Future possibilities
+In the future, we propose that library developers may be able to defer the decision on whether to
+recommend the trace be sampled or not sampled until mid-way through execution;
+
+## Related Issues
+ * [opentelemetry-specification/189](https://github.com/open-telemetry/opentelemetry-specification/issues/189)
+ * [opentelemetry-specification/187](https://github.com/open-telemetry/opentelemetry-specification/issues/187)
+ * [opentelemetry-specification/164](https://github.com/open-telemetry/opentelemetry-specification/issues/164)
+ * [opentelemetry-specification/125](https://github.com/open-telemetry/opentelemetry-specification/issues/125)
+ * [opentelemetry-specification/87](https://github.com/open-telemetry/opentelemetry-specification/issues/87)
+ * [opentelemetry-specification/66](https://github.com/open-telemetry/opentelemetry-specification/issues/66)
+ * [opentelemetry-specification/65](https://github.com/open-telemetry/opentelemetry-specification/issues/65)
+ * [opentelemetry-specification/53](https://github.com/open-telemetry/opentelemetry-specification/issues/53)
+ * [opentelemetry-specification/33](https://github.com/open-telemetry/opentelemetry-specification/issues/33)
+ * [opentelemetry-specification/32](https://github.com/open-telemetry/opentelemetry-specification/issues/32)
+ * [opentelemetry-specification/31](https://github.com/open-telemetry/opentelemetry-specification/issues/31)
+
+[trace-flags]: https://github.com/w3c/trace-context/blob/master/spec/20-http_header_format.md#trace-flags
+[specs-pr-216]: https://github.com/open-telemetry/opentelemetry-specification/pull/216
+[span-creation]: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-tracing.md#span-creation