Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load Balancing by Attribute #33660

Open
danielbanks opened this issue Jun 19, 2024 · 4 comments
Open

Load Balancing by Attribute #33660

danielbanks opened this issue Jun 19, 2024 · 4 comments
Labels
enhancement New feature or request exporter/loadbalancing needs triage New item requiring triage

Comments

@danielbanks
Copy link

Component(s)

exporter/loadbalancing

Is your feature request related to a problem? Please describe.

In our project want to setup load balanced sampling based on a session ID an attribute.

In our project, we care about RUM and we have the concept of a user session with a session ID set as an attribute on our traces. We sample based on the session ID rather than the trace ID. This way we preserve telemetry of the whole user session.

The sampling is working fine but it is client-side head sampling. As we look to scale our solution we want to move this to the collector and introduce load balancing.

How do we achieve session sampling?

Right now we use the same ID generator for trace IDs as for session and we set it as an attribute. Then we have a head sampling strategy that uses the same logic as the probabilistic head sampler, but rather than applying the decision to the trace ID we apply it to the session ID. This ensures we make the same sampling decision for the whole user session.

What we would like to do is move these sampling decisions off the client and into the collector so that we have more flexibility. Our client is an Android application and making these decisions client side is not a long-term solution because we have to deal with application updates etc.

Following the recommended practice we would like to have a 2 layer collector setup, with the first layer load balancing the second. The issue is that the load balancer only supports decisions based on trace ID or service name.

Given that we want to sample based on session ID (an attribute), then making load balancing decisions on trace ID alone is not enough. We need to load balance telemetry with the same session ID to the same collector instance so that consistent sampling decisions can be made.

It doesn't look like the load balancer currently supports balancing based on an attribute. This is a friendly request to add it!

Describe the solution you'd like

The ability to route telemetry based on attributes in addition to service name and trace ID

Describe alternatives you've considered

No response

Additional context

No response

@danielbanks danielbanks added enhancement New feature or request needs triage New item requiring triage labels Jun 19, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jpkrohling
Copy link
Member

I believe there are a couple of comments to this:

  1. balancing based on an arbitrary attribute is doable, and we are doing that already for the service name. It should be easy to extend this function here to do that:
    func routingIdentifiersFromTraces(td ptrace.Traces, key routingKey) (map[string]bool, error) {
    ids := make(map[string]bool)
    rs := td.ResourceSpans()
    if rs.Len() == 0 {
    return nil, errors.New("empty resource spans")
    }
    ils := rs.At(0).ScopeSpans()
    if ils.Len() == 0 {
    return nil, errors.New("empty scope spans")
    }
    spans := ils.At(0).Spans()
    if spans.Len() == 0 {
    return nil, errors.New("empty spans")
    }
    if key == svcRouting {
    for i := 0; i < rs.Len(); i++ {
    svc, ok := rs.At(i).Resource().Attributes().Get("service.name")
    if !ok {
    return nil, errors.New("unable to get service name")
    }
    ids[svc.Str()] = true
    }
    return ids, nil
    }
    tid := spans.At(0).TraceID()
    ids[string(tid[:])] = true
    return ids, nil
    }
  2. I'm not quite sure you need two layers: if you are doing probabilistic sampling based on the session ID, it's pretty much the same idea we have for the probabilistic sampling at the collector, which means that it can be consistent across collector instances without the need to centralize all session IDs on the same decision instances. So, you might not need the balancer to know about session IDs at all

@danielbanks
Copy link
Author

danielbanks commented Jul 2, 2024

Thanks for the reply @jpkrohling. That's useful insight.

I'd like to move our probabilistic sampling of sessions into the collector rather than having this client side. But the sampler configuration can only specify custom attributes for logs not traces. Our target solution is to have load-balanced telemetry across logs and traces, which is sampled based on complete sessions. We want to observe the users sessions so that we can understand the full journey.

Do you have any recommendations for how this can be achieved with the current tooling?

@jpkrohling
Copy link
Member

Take a look at the code for the probabilistic sampling processor at contrib. It could be changed to use specific attributes instead of trace ID, which would be sufficient for your use case, if I'm understanding it correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request exporter/loadbalancing needs triage New item requiring triage
Projects
None yet
Development

No branches or pull requests

2 participants