Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "baggage" policy to tailsamplingprocessor #18462

Closed
imomaliev opened this issue Feb 8, 2023 · 16 comments
Closed

Add "baggage" policy to tailsamplingprocessor #18462

imomaliev opened this issue Feb 8, 2023 · 16 comments
Labels
enhancement New feature or request processor/tailsampling Tail sampling processor

Comments

@imomaliev
Copy link
Contributor

imomaliev commented Feb 8, 2023

Component(s)

processor/tailsampling

Is your feature request related to a problem? Please describe.

As far as I know, there is no existing mechanism to "force" span creation (sampling) on initial request, before trace context is created. For example, by setting a header or a cookie in the browser before making a request to the page.

I've spent a couple of days trying to see what parts of OpenTelemetry's specification and w3c's specs (i.e., https://www.w3.org/TR/trace-context/ and https://www.w3.org/TR/baggage/) I could reuse to get desired functionality. My current idea is to set baggage: force_sample=1 header on the initial request and configure tailsamplingprocessor to sample based on this parameter. But due to tailsamplingprocessor not having baggage policy, I will have to set custom span attribute if baggage: force_sample=1 header is present via sdk and then enable numeric_attribute policy on collector with tailsamplingprocessor.

Maybe the way for "forcing" sampling on initial request should be discussed separately, but nonetheless, I think baggage policy may be useful in tailsamplingprocessor

Describe the solution you'd like

I think having baggage policy would be useful in tailsamplingprocessor for many usecases.

Describe alternatives you've considered

No response

Additional context

open-telemetry/opentelemetry-go#2651

@imomaliev imomaliev added enhancement New feature or request needs triage New item requiring triage labels Feb 8, 2023
@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Feb 8, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jpkrohling jpkrohling removed the needs triage New item requiring triage label Feb 8, 2023
@imomaliev imomaliev changed the title Add "baggage" policies to tailsamplingprocessor Add "baggage" policy to tailsamplingprocessor Feb 9, 2023
@imomaliev
Copy link
Contributor Author

imomaliev commented Feb 9, 2023

@jpkrohling Hi! What would be the best place to start discussion for the way to "force" sampling on initial request? I am not sure if we will be able to come up with a standard, but at least maybe we could provide a recommended approach.

Alternative way to achieve it I considered is:

  1. Set baggage: force_sample=1 header on the initial request
  2. Add internal=fs:1 to tracestate header via SDK when baggage: force_sample=1 is present (where fs stands for force_sample)
  3. Configure tailsamplingprocessor with tracestate policy to check for internal=fs:1

This approach has an advantage from the one with "passing via attributes" I described above, because it could be used in head-sampling configuration as well. The only thing that is missing to make this approach work for head-sampling is a custom sampler which will check for internal=fs:1 value in the tracestate header. Due to tracestate specification and usage for OTel is still being experimental, I've decided against it in our current implementation.

@jpkrohling
Copy link
Member

Do you think this discussion can wait a bit? I'll be AFK for a couple of weeks starting tomorrow, and unfortunately, I cannot give this one here a proper thought.

@imomaliev
Copy link
Contributor Author

Do you think this discussion can wait a bit? I'll be AFK for a couple of weeks starting tomorrow, and unfortunately, I cannot give this one here a proper thought.

Sure, no problem👍

@imomaliev
Copy link
Contributor Author

imomaliev commented Feb 17, 2023

After further testing, I've decided to use the second approach. It is much more robust and ensures that force_sample flag will be passed till the very end of the pipeline. To achieve it I implemented custom sampler. It supposed to be a sampler that wraps your existing one. It also ensures that we will force sampling in both head-sampling and tail-sampling.

class ForceSampler(sampling.Sampler):
    """
    Sampler that runs base_sampler and checks 'baggage' header for 'force_sample=1'.

    If force_sample=1 is present add 'internal=fs:1' to 'tracestate' header and sets
    SamplingResult.decision to RECORD_AND_SAMPLE.
    Else in samples like base_sampler.

    This way we ensure that this approach will work for both head-sampling and
    tail-sampling. To check for 'tracestate' in tail-sampling configuration use
    'trace_state' policy in tailsamplingprocessor

    {
        name: force-sample-policy,
        type: trace_state,
        trace_state: { key: internal, values: ["fs:1"] },
    },
    """

    def __init__(self, base_sampler: sampling.Sampler = None):
        if base_sampler is None:
            base_sampler = sampling._get_from_env_or_default()
        self.base_sampler = base_sampler
        super().__init__()

    def should_sample(
        self,
        parent_context: Context | None,
        trace_id: int,
        name: str,
        kind: SpanKind = None,
        attributes: Attributes = None,
        links: Sequence[Link] = None,
        trace_state: TraceState = None,
    ) -> sampling.SamplingResult:
        base_sampling_result = self.base_sampler.should_sample(
            parent_context=parent_context,
            trace_id=trace_id,
            name=name,
            kind=kind,
            attributes=attributes,
            links=links,
        )

        trace_state = base_sampling_result.trace_state
        # apply sampling based on baggage
        force_sample = baggage.get_baggage("force_sample", parent_context)
        if force_sample == "1":
            if trace_state is None:
                trace_state = TraceState()
            trace_state = trace_state.update(key="internal", value="fs:1")

        # return new result that is Sampled in case of head-sampling and
        # has 'internal=fs:1' added to 'tracestate' to check in tail-sampling
        return sampling.SamplingResult(
            decision=sampling.Decision.RECORD_AND_SAMPLE,
            attributes=base_sampling_result.attributes,
            trace_state=trace_state,
        )

    def get_description(self) -> str:
        return "ForceSampler"

Usage:

tracer_provider = TracerProvider(resource=resource, sampler=ForceSampler())

Make a request by passing baggage: force_sample=1 header

@jpkrohling
Copy link
Member

jpkrohling commented Mar 2, 2023

Alright, if we need to come back to this, let me know and I'll reopen this.

@imomaliev
Copy link
Contributor Author

imomaliev commented Mar 3, 2023

@jpkrohling Hi. I think there was some confusion because the feature request and the reason for this request were not quite aligned. I still think having an option to create a "baggage" policy in tailsamplingprocessor would be beneficial. This is not related to my questions about ForceSample feature.

Can we keep this issue open? If my questions about ForceSample are distracting from the "baggage" policy feature request, I would be happy to clean the up.

Also, I wanted to start a separate discussion about ForceSample feature, could you please tell me what would be the best place for this?

@jpkrohling jpkrohling reopened this Mar 3, 2023
@jpkrohling
Copy link
Member

The reason I closed is that you seem to have been able to accomplish your needs without us having to implement a new feature. Unless there's something you couldn't otherwise do, I would not implement a new sampling policy.

About forcing a trace to be sampled: I know we had a debug flag in Jaeger (https://github.com/jaegertracing/jaeger-client-java#debug-traces-forced-sampling), and I would have thought that the trace parent's sampled flag would be equivalent: https://www.w3.org/TR/trace-context/#sampled-flag . Did you try that and confirm that it doesn't work? If it doesn't, perhaps you could check first with the language-specific SDK you are using.

@imomaliev
Copy link
Contributor Author

imomaliev commented Mar 6, 2023

Unless there's something you couldn't otherwise do, I would not implement a new sampling policy.

If we add "baggage" policy for the tailsapmlingprocessor I can achieve ForceSample functionality without a custom Sampler which sets tracestate. I would simply need to pass the baggage: force_sample=1 header on initial request and filter based on that value in the tailsapmlingprocessor

I know we had a debug flag in Jaeger (https://github.com/jaegertracing/jaeger-client-java#debug-traces-forced-sampling)?

Thanks, I will check this out.

I would have thought that the trace parent's sampled flag would be equivalent: https://www.w3.org/TR/trace-context/#sampled-flag

I also hopped that sampled-falg is what I need at the beginning of my research. But sampled-flag should be sent as part of a valid traceparent header. Which means you will have to create trace on initial request on the client because you need to have trace-id and parent-id which are associated with trace and a parent span with Sampled = 1 and Rercoding = 1 parameters.

Let's say my backend is instrumented and frontend is not. This means if I wanted to rely on sampled-flag I will have to instrument my frontend as well and also use some sort of mechanism to circumvent any sampling configuration in a frontend instrumentation code.

@jpkrohling
Copy link
Member

Not sure I'm following it entirely, but if you are sending a custom HTTP header with the sampling flag, surely you can also generate a UUID as the trace ID and send it along? You don't have to instrument your frontend for that.

@imomaliev
Copy link
Contributor Author

imomaliev commented Mar 28, 2023

surely you can also generate a UUID as the trace ID and send it along

We could, but it will require ForceSample users to generate UUID manually before setting the header. Also, it is quite cumbersome to set self-generated trace_id when using SDKs' for web frameworks like django.

@jpkrohling
Copy link
Member

Got it. I'm still not sure the proper solution involves baggage items, but perhaps you could start a thread on the #otel-collector Slack channel or join one of our SIG Collector meetings, so that we can have a wider discussion about the possible solutions?

@imomaliev
Copy link
Contributor Author

imomaliev commented Mar 31, 2023

perhaps you could start a thread on the #otel-collector Slack channel or join one of our SIG Collector meetings

Sure, I will create a discussion in the Slack channel

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 31, 2023
@imomaliev
Copy link
Contributor Author

@github-actions github-actions bot removed the Stale label Jun 1, 2023
@imomaliev
Copy link
Contributor Author

Closing this issue for now. I hope discussion in slack will help us with finding a general solution for ForceSample feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request processor/tailsampling Tail sampling processor
Projects
None yet
Development

No branches or pull requests

2 participants