Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queued Retry Unusable without Batch Processor #1813

Closed
chris-smith-zocdoc opened this issue Sep 19, 2020 · 6 comments · Fixed by #1930
Closed

Queued Retry Unusable without Batch Processor #1813

chris-smith-zocdoc opened this issue Sep 19, 2020 · 6 comments · Fixed by #1930
Labels
bug Something isn't working

Comments

@chris-smith-zocdoc
Copy link
Contributor

Describe the bug
Queued Retry is capturing the context when enqueuing an item. If this context completes before the item is dequeued the spans are never delivered to the next processor/exporter. When queued_retry is early in the pipeline and receives the context from the receiver, its unable to ever dequeue items because the receiver context has already completed after the http response was sent to the client.

Steps to reproduce
Run 2 instances of the collector, one for receiving the spans via zipkin (http) and another for receiving via OTLP from the first instance.

I'm not sure if the type of receiver on instance 1 matters; the exporter type does though. To be effected by the bug, the exporter must check the context to see if it is cancelled/done. exporthelper/common.go is effected

curl --location --request POST 'http://localhost:9411/api/v2/spans' \
--header 'Content-Type: application/json' \
--data-raw '[
    {
        "traceId": "55013b49f5b6b1cc",
        "parentId": "95f02a8fbed7421e",
        "id": "bb858095c38ea60c",
        "name": "/6",
        "timestamp": 1600448564521000,
        "duration": 1,
        "localEndpoint": {
            "serviceName": "spanfiller6",
            "ipv4": "192.168.2.6"
        },
        "tags": {
            "http.method": "GET",
            "http.url": "http://spanFiller6/6",
            "load_generator.seq_num": "36",
            "region": "us-east-1",
            "version": "v37"
        }
    }
]'

What did you expect to see?
The span should be sent to the OTLP receiver.

What did you see instead?
Spans are never delivered to the OTLP receiver, they are stuck in the queued_retry processor forever.

The individual errors look like this

rpc error: code = Canceled desc = context canceled

The queued_processor logs this

 WARN    queuedprocessor/queued_processor.go:335 Backing off before next attempt {"component_kind": "processor", "component_type": "queued_retry", "component_name": "queued_retry", "backoff_delay": "5s"}
go.opentelemetry.io/collector/processor/queuedprocessor.(*queuedProcessor).processItemFromQueue
        /Users/chris.smith/go/pkg/mod/go.opentelemetry.io/collector@v0.9.1-0.20200911135115-e886a01ebe2e/processor/queuedprocessor/queued_processor.go:335
go.opentelemetry.io/collector/processor/queuedprocessor.(*queuedProcessor).Start.func1
        /Users/chris.smith/go/pkg/mod/go.opentelemetry.io/collector@v0.9.1-0.20200911135115-e886a01ebe2e/processor/queuedprocessor/queued_processor.go:218
github.com/jaegertracing/jaeger/pkg/queue.(*BoundedQueue).StartConsumers.func1
        /Users/chris.smith/go/pkg/mod/github.com/jaegertracing/jaeger@v1.19.2/pkg/queue/bounded_queue.go:77

What version did you use?
go.opentelemetry.io/collector v0.9.1-0.20200911135115-e886a01ebe2e

What config did you use?

Zipkin Receiver (clients send here)

receivers:
  zipkin:
    endpoint: "0.0.0.0:9411"

exporters:
  otlp:
    endpoint: "$OTLP_HOSTNAME"
    insecure: true

processors:
  queued_retry:
    backoff_delay: 5s

service:
  pipelines:
    traces:
      receivers: [zipkin]
      processors: [queued_retry]
      exporters: [otlp]

OTLP receiver (first otel collector sends here)

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:55680

exporters:
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging]

Environment
Reproduced on my laptop, but encountered the issue originally on Amazon Linux

OS: OSX 10.14
Compiler: go version go1.14.3 darwin/amd64

@chris-smith-zocdoc chris-smith-zocdoc added the bug Something isn't working label Sep 19, 2020
@chris-smith-zocdoc chris-smith-zocdoc changed the title Queued Retry Unusable in without Batch Processor Queued Retry Unusable without Batch Processor Sep 19, 2020
@chris-smith-zocdoc
Copy link
Contributor Author

Forgot to mention, the way the Batch processor alleviates this issue is by ignoring the incoming context and using context.Background() for the next processor

@bogdandrutu
Copy link
Member

@chris-smith-zocdoc do you want to give a try to the per exporter queued retry config instead, see #1821? Let me know if that shows the same problem, and will investigate.

@chris-smith-zocdoc
Copy link
Contributor Author

@bogdandrutu yes this affects the per exporter queued retry config also.

example config

receivers:
  zipkin:
    endpoint: "0.0.0.0:9411"

exporters:
  otlp:
    endpoint: "0.0.0.0:55680"
    insecure: true
    sending_queue:
      enabled: true

service:
  pipelines:
    traces:
      receivers: [zipkin]
      exporters: [otlp]

@bogdandrutu
Copy link
Member

@chris-smith-zocdoc finally I was able to find the cause

@chris-smith-zocdoc
Copy link
Contributor Author

Thanks @bogdandrutu !

@bogdandrutu
Copy link
Member

Let me know if that fixed your case, I assume it does, but want to double check.

hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this issue Apr 27, 2023
…y#1813)

Bumps [boto3](https://github.com/boto/boto3) from 1.24.38 to 1.24.39.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](boto/boto3@1.24.38...1.24.39)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Troels51 pushed a commit to Troels51/opentelemetry-collector that referenced this issue Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants