Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jaeger remote sampler to SDK configuration #1791

Merged
merged 4 commits into from
Jul 28, 2021

Conversation

pavolloffay
Copy link
Member

Signed-off-by: Pavol Loffay p.loffay@gmail.com

Related to open-telemetry/opentelemetry-java#3368

This PR adds SDK configuration for jaeger_remote_sampler. The goal is to make the jaeger remote sampler available and configurable in auto-instrumentations.

cc) @tedsuo @anuraaga

@pavolloffay pavolloffay requested review from a team as code owners July 5, 2021 09:40

Depending on the value of `OTEL_TRACES_SAMPLER`, `OTEL_TRACES_SAMPLER_ARG` may be set as follows:

- For `traceidratio` and `parentbased_traceidratio` samplers: Sampling probability, a number in the [0..1] range, e.g. "0.25". Default is 1.0 if unset.
- For `jaeger_remote_sampler`: The value is a comma separated list of: pooling interval `poolingIntervalMs` in milliseconds and initial sampling rate `initialSamplingRate` in the [0..1] range. For example: `poolingIntervalMs=100,initialSamplingRate=0.25`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from this the remote sampler needs 2 configuration options:

  • service name - it will (re)use otel.service.name
  • collector endpoint - it will (re)use otel.exporter.jaeger.endpoint

specification/sdk-environment-variables.md Outdated Show resolved Hide resolved

Depending on the value of `OTEL_TRACES_SAMPLER`, `OTEL_TRACES_SAMPLER_ARG` may be set as follows:

- For `traceidratio` and `parentbased_traceidratio` samplers: Sampling probability, a number in the [0..1] range, e.g. "0.25". Default is 1.0 if unset.
- For `jaeger_remote_sampler`: The value is a comma separated list of: pooling interval `poolingIntervalMs` in milliseconds and initial sampling rate `initialSamplingRate` in the [0..1] range. For example: `poolingIntervalMs=100,initialSamplingRate=0.25`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use initial_sampler and initial_sampler_arg instead of (or possibly in addition to) just a rate. The intent is to use a different Otel Sampler as initial, not to restrict it I think. This is reflected in how the code API accepts Sampler as the argument

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the API in OTEL might support this but let me see if it is supported in Jaeger. The main use-case is to provide a migration path for Jaeger users.

In the long term there will be a native OTEL remote sampling implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Jaeger Java SDK the default sampler type in environment config is always percentage-based https://github.com/jaegertracing/jaeger-client-java/blob/568ab68ec2ac5e9e811b1d0d76f752f3242d45d3/jaeger-core/src/main/java/io/jaegertracing/Configuration.java#L394

The programmatic API allows to supply any sampler like in OTEL, but not via env variables.

@yurishkuro please loop in here. Do we need to set the initial sampler to rate limiting type? Was this something users were using? We didn't define environment configuration for it.


Depending on the value of `OTEL_TRACES_SAMPLER`, `OTEL_TRACES_SAMPLER_ARG` may be set as follows:

- For `traceidratio` and `parentbased_traceidratio` samplers: Sampling probability, a number in the [0..1] range, e.g. "0.25". Default is 1.0 if unset.
- For `jaeger_remote`: The value is a comma separated list of: pooling interval `poolingIntervalMs` in milliseconds and initial sampling rate `initialSamplingRate` in the [0..1] range. For example: `poolingIntervalMs=100,initialSamplingRate=0.25`.
Copy link
Member

@yurishkuro yurishkuro Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will (re)use otel.exporter.jaeger.endpoint

Why not have a separate setting like server=? Reusing otel.exporter.jaeger.endpoint is both confusing (what does exporter have to do with sampler?) and inflexible, as it would couple exporter and sampler, even though someone might want to have them at different addresses.

Also, is this a gRPC or HTTP address?

poolingIntervalMs=100

It's polling, not pooling.

I would use a different, more realistic example value than 100ms, like 10000. Does this even need to be in ms (is that the convention for other vars?), or can it be in seconds?

`initialSamplingRate=0.25

I would rename initialSamplingRate to defaultSamplingRate.

And I would add clear documentation of the sub-properties:

For jaeger_remote: The value is a comma separated list of:

  • pollingIntervalMs in milliseconds indicating how often the sampler will poll the backend for updates to sampling strategy
  • defaultSamplingRate in the [0..1] range, which is used as the sampling probability when the backend cannot be reached to retrieve a sampling strategy. This value stops having an effect once a sampling strategy is retrieved successfully, as the remote strategy will be used until a new update is retrieved.
  • server (name tbd) - the address of gRPC (or http?) server that will serve the sampling strategy for the service (link to schema)
  • Example: pollingIntervalMs=10000,defaultSamplingRate=0.25,server=...

Copy link
Member Author

@pavolloffay pavolloffay Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about adding the endpoint/address to the arguments. I think it makes sense. 99.99999% users will use the same address as for the exporter, but having it in the argument makes it easy understandable what needs to be configured.

I don't like server. I will use endpoint to be consistent with other names in the SDK. The address could be perhaps both HTTP and grpc (like in case of OTLP), this offers flexibility for the SDKs to implement or or the other. I will check if the OTELcol exposes both protocols.

@pavolloffay
Copy link
Member Author

PR updated

@carlosalberto carlosalberto added area:sdk Related to the SDK spec:trace Related to the specification/trace directory labels Jul 6, 2021
specification/sdk-environment-variables.md Outdated Show resolved Hide resolved

## Batch Span Processor
- For `jaeger_remote`: The value is a comma separated list:
- `endpoint`: the endpoint of gRPC server that serves the sampling strategy for the service ([sampling.proto](https://github.com/jaegertracing/jaeger-idl/blob/master/proto/api_v2/sampling.proto)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the syntax, just host:port? Is TLS supported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below is an example of the endpoint. I will add it here to be more obvious...

TLS could be supported by providing a scheme in the endpoint like it is done for OTLP e.g. https://github.com/open-telemetry/opentelemetry-java/blob/43f1ecf7ede49104c2aac1c9e5448b7cf7859ce7/exporters/otlp/trace/src/main/java/io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporterBuilder.java#L131. We could add another option with path to the cert.

@@ -56,12 +56,16 @@ Known values for `OTEL_TRACES_SAMPLER` are:
- `"parentbased_always_on"`: `ParentBased(root=AlwaysOnSampler)`
- `"parentbased_always_off"`: `ParentBased(root=AlwaysOffSampler)`
- `"parentbased_traceidratio"`: `ParentBased(root=TraceIdRatioBased)`
- `"jaeger_remote"`: `JaegerRemoteSampler`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to give one more push for just jaeger. Doesn't the syntax make most cognitive sense looking something like OTEL_EXPORTER=jaeger OTEL_PROPAGATORS=jaeger OTEL_SAMPLER=jaeger?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for me, I would still advocate for jaeger_remote. Just jaeger seems ambiguous to me.

Copy link

@MSNev MSNev Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that rather than calling out a named "jaeger_remote" this should be perhaps "custom" or something similar. The resulting value can still be "JeagerRemoteSampler"

And then the configuration below would be specific to the instance of the custom sampler and would be decoded by the sampler.

So the OTEL_TRACES_SAMPLER_ARG field would then be <CustomSampler>;<CustomSamplerSpecificArgs>

So an example: JaegerRemoteSampler;endpoint=localhost:14250,pollingIntervalMs=5000,initialSamplingRate=0.25

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MSNev this complicates the parsing and still requires the spec to call out valid custom samplers & schemes. I don't see a lot of benefits over the current proposed format.

Copy link

@MSNev MSNev Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means we don't have to spec every sampler and the sdk developers don't have to make any code changes to support new samplers (beyond creating the sampler) so in the spec we just say use the custom sampler config.

still requires the spec to call out valid custom samplers & schemes

Only if we want them to be officially provided, ie. we don't have to call out "valid custom samplers" we just say you can add a customer sampler using this format with the JaegerRemote Sampler in this case providing an example. This could include a simple section on available/provided customer samplers as I can envisage that all environments won't necessarily want or need all samplers

So another example could just be

OTEL_TRACES_SAMPLER=custom
OTEL_TRACES_SAMPLER_ARG=MyCustomSampler;<MyCustomSamplerSpecificArgs>

So the value after the ';' is simple "passed" to the custom sampler itself to process rather than enforcing every customer sampler to conform to something it doesn't need. eg. There could be a sampler that doesn't take any args.

Available custom samplers

Jaeger Remote

Name: JaegerRemoteSampler
Args: ... as defined ...

Example Usage:

OTEL_TRACES_SAMPLER=custom
OTEL_TRACES_SAMPLER_ARG=JaegerRemoteSampler;endpoint=localhost:14250,pollingIntervalMs=5000,initialSamplingRate=0.25

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still requires the spec to call out valid custom samplers & schemes
Only if we want them to be officially provided, ie. we don't have to call out "valid custom samplers"

Well, the whole point of this PR is to have Jaeger's remote sampler officially supported, so how does it benefit from being classified as "custom" yet still being fully spec-ed?

And even for unofficial custom samplers, I still don't see the value in the indirection. If someone wants to use a truly custom sampler that is not in the spec, they can still read its name from OTEL_TRACES_SAMPLER, and the SDK needs to have a configurable factory that could recognize the name and invoke the appropriate builder.

Copy link
Member

@yurishkuro yurishkuro Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, #1829 could be a good candidate for OTEL_TRACES_SAMPLER=custom

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
@pavolloffay
Copy link
Member Author

@tedsuo could you please review as well?

Here is OTEPS for OTEL remote sampler open-telemetry/oteps#167. I would like to get this PR merged to have a clear migration path for Jaeger users. The native OTEL based approach will take a longer time to get it to fully working solution.

@tedsuo
Copy link
Contributor

tedsuo commented Jul 22, 2021

Before merging this work, I have a couple of requests:

First, can we get a specification for the current Jaeger remote sampling format and protocol OpenTelemetry is expected to support via a sampling plugin? Either written here as a spec, or linked to in the Jaeger documentation?

Second, is it possible to get a commitment from the Jaeger team that future work on remote sampling will be moved over to the OpenTelemetry Sampling SIG? In other words, we'd like to add these sampling mechanisms in order to provide legacy support for existing Jaeger backends, which will allow the Jaeger community to switch over to using the OTel clients. But there are some concerns about having two independent groups working on sampling mechanisms in an uncoordinated fashion. @pavolloffay @yurishkuro @jpkrohling for future sampling work, is the Jaeger community comfortable with specifying those mechanisms as part of an OpenTelemetry SIG?

The main concern is that plugin-based sampling affects data collection in a transparent fashion, and could lead to users creating an incoherent configuration which results in data loss or inaccurate analysis. So we want to ensure that all of the sampling features available in OpenTelemetry are designed to be coherent for the broader set of features OpenTelemetry provides, such as metrics and exemplars. Jaeger is the only backend that OpenTelemetry is committed to supporting which is doing work in this field, so I just want to confirm that you all are fine with coordinating on this work going forwards via an OpenTelemetry SIG.

@yurishkuro
Copy link
Member

@tedsuo sgtm

@jpkrohling
Copy link
Member

Green light from me as well.

@pavolloffay
Copy link
Member Author

@pavolloffay @yurishkuro @jpkrohling for future sampling work, is the Jaeger community comfortable with specifying those mechanisms as part of an OpenTelemetry SIG?

In Jaeger community we don't have any plans on enhancing the remote sampler, in fact we want to sunset our clients/SDKs in favor of OTEL.

@pavolloffay
Copy link
Member Author

@tedsuo

First, can we get a specification for the current Jaeger remote sampling format and protocol OpenTelemetry is expected to support via a sampling plugin? Either written here as a spec, or linked to in the Jaeger documentation?

The text already has a link to jaeger sampling spec (proto file). Is that enough?

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
@pavolloffay
Copy link
Member Author

What is the process to get this merged? It has been approved by a couple of people. I would like to start working on the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants