Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLP trace receiver not working #1977

Closed
giuliohome opened this issue Jul 30, 2022 · 20 comments
Closed

OTLP trace receiver not working #1977

giuliohome opened this issue Jul 30, 2022 · 20 comments
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.

Comments

@giuliohome
Copy link

giuliohome commented Jul 30, 2022

My code to send an OTLP trace is on github .

Tried to feed the agent and send to Tempo local or in Cloud but it doesn't work (I also texted about this on Slack).

I resolved the issue by using an standard OTEL collector instead of the Agent, as per the very helpful @mdisibio blog post in Grafana Labs . I'm happy with this solution. If it's by design, please close this issue but notice and consider that in your Agent documentation (see here and this blog) they say it can receive OTLP traces using grpc or http protocol.

@giuliohome
Copy link
Author

giuliohome commented Jul 31, 2022

I don't know why @cfstras closed his similar issue but my understanding is that Grafana Agent only accepts Prometheus emitted Histograms with examplars:

Your applications need to include those trace IDs in their calls to emit metrics

Also in traces config, they write

In order to use the remote_write exporter, you have to configure a Prometheus instance in the Agent and pass its name to the metrics_instance field.

The point is that my simple app only emits traces and does not include the trace id in any metrics, I guess this is why the Agent doesn't receive such traces, that are standard opentelemetry traces though. Indeed they can be received (and sent wherever, including Grafana Coud) thanks to an opentelemetry collector, instead of the agent, as already explained in the above mentioned @mdisibio blog post.

@cfstras
Copy link

cfstras commented Aug 1, 2022

My main error was trying to test the gRPC endpoint with curl -- which won't work. So as long as your OTLP receiver config has the gRPC port enabled, and the insecure flag active, it should work without certificates.
Exemplars should not be necessary, my apps certainly don't generate any.

It might be helpful if you post the config you use for grafana agent, and the exact URL you set your trace exporter to.

@giuliohome
Copy link
Author

giuliohome commented Aug 1, 2022

Hello sir, thanks for your help, I'm surprised you say the Agent works without exemplars (well, the explanation could be the insecure flag missing in the receiver part? I'm going to test it, more about it below: nope there is no insecure flag there). I resorted to the standard OTEL collector and I'm happy with it but surely I want to learn what I did wrong and I really appreciate your help. I published all my kubernetes files (well, I started with docker-compose, then I moved to kubernetes to double check, it's more or less the same thing, we can easily switch from one to another. The only think I obscured is my grafana key... lol, actually I published it, but then I rewrited git history, last commit). Ok so they are in the grafana-cloud branch of my last public repo. In particulare you ask about the OTLP receiver port (I assume you are speaking about the grafana agent, correct? because with the pure otel collector I said it's ok), so that the agent config in k8s is this agent configmap.

apiVersion: v1
data:
          agent.yaml: |-
            server:
              log_level: debug
            traces:
                configs:
                - name: default
                  automatic_logging:
                    backend: stdout
                    roots: true
                    spans: true
                  remote_write:
                    - endpoint: otel-collector:4317
                      insecure: true
                  receivers:
                    otlp:
                      protocols:
                        grpc:
                            endpoint: 0.0.0.0:4317
                        http:
                            endpoint: 0.0.0.0:4318
kind: ConfigMap
metadata:
          creationTimestamp: null
          name: agent-yaml

Notice I can write otel-collector:4317 or tempo:4317 (depending on what I have in my kubernetes cluster or in my docker-compose) or - without insecure - Grafana Cloud url

    remote_write:
      - endpoint: tempo-eu-west-0.grafana.net:443
        basic_auth:
          username: <MyUser>
          password: <MyKey>

Ehm, actually there is an insecure flag in the remote write part but there is no insecure flag active in the receiver part. Now, to save money, I destroyed the kuberenetes cluster, so I will copy the docker agent config of your issue and I will proceed with docker-compose. I will add that insecure flag there (in your initial comment you didn't include it indeed). I will comment again when done. Hopefully this should be the solution, in any case thank you very much again for your help!!! very much appreciated!!!

Edit - Please note
Nope, the insecure part is only in the remote write, and it is not needed there because I'm going to Grafana Cloud tls, while I can't see an insecure flag in the receiver part, so at the moment I confirm the Agent is not working for me in my context.

@giuliohome
Copy link
Author

giuliohome commented Aug 1, 2022

Ehm, I am running your exact cmd:

docker run -it --rm -p 4317:4317 -p 4318:4318 -p 12345:12345 -v $(pwd):/cfg:ro grafana/agent:v0.24.1 -config.file=/cfg/config.yaml

with the only change in your config.yaml being adding an insecure flag

      receivers:
        otlp:
          protocols:
            grpc:
              endpoint: 0.0.0.0:4317
              insecure: true
            http:
              endpoint: 0.0.0.0:4318

but docker-compose complains:

ts=2022-08-01T09:37:19.394972941Z caller=main.go:57 level=error msg="error creating the agent server entrypoint" err="failed to create tracing instance integrations: failed to create pipeline: failed to load otelConfig from agent traces config: failed to load OTel config: error reading receivers configuration for \"otlp\": 1 error(s) decoding:\n\n* 'protocols.grpc' has invalid keys: insecure"
giuliohome@DESKTOP-5Q65LFI:~/grafana/

Notice also that in my working otel collector, there is no insecure flag activated in the receiver part. Afaics, the insecure part is only in the remote write, and it is not needed there because I'm going to secure Grafana Cloud tls.

As far as I can see, the issue is still open on my side.

@giuliohome
Copy link
Author

It might be helpful if you post the config you use for grafana agent, and the exact URL you set your trace exporter to.

It would be the same I'm using in the otel collector, so I would use the following in the agent to send to Grafana Cloud

traces:
  configs:
  - name: default
    remote_write:
      - endpoint: tempo-eu-west-0.grafana.net:443
        basic_auth:
          username: <MyUser>
          password: <MyKey>

or I would put insecure for an internal destination, but neither is working for me.

@giuliohome
Copy link
Author

In conclusion - AFAICS - I would still claim that the Agent remote write would not work without Prometheus metric exemplars, as opposite to the standard opentelemetry collector.

@rfratto
Copy link
Member

rfratto commented Aug 1, 2022

Could you paste your entire agent config which doesn't work, scrubbing out credentials?

The tracing parts of Grafana Agent use OpenTelemetry Collector internally, and it should work for forwarding OTLP trace data to an OTLP endpoint.

@giuliohome
Copy link
Author

giuliohome commented Aug 1, 2022

Hello @rfratto , very pleased to meet you!
(minor note, I understand that Grafana Agent uses Otel Collector internally, but in the pure Otel Collector I have the exporters section, in Grafana Agent I have the remote_write, they seem to do different things... anyway, I reply to your question).

Assuming a docker-compose situation, the agent.yaml (or config.yaml) is this

server:
  http_listen_port: 12345
  log_level: debug
traces:
  configs:
    - name: integrations
      receivers:
        otlp:
          protocols:
            grpc:
              endpoint: 0.0.0.0:4317
            http:
              endpoint: 0.0.0.0:4318
      remote_write:
        - endpoint: tempo-eu-west-0.grafana.net:443
          basic_auth:
            username: 201581
            password: <my API key>

then I run

docker run -it --rm -p 4317:4317 -p 4318:4318 -p 12345:12345 -v $(pwd):/cfg:ro grafana/agent:v0.24.1 -config.file=/cfg/config.yaml

on an ubuntu window and on another I call my myinstr.py python on localhost port 4317 like that:

  • for otlp grpc - with insecure=True in my python code OTLPSpanExporter
python3 myinstr.py  localhost:4317 go-cloud
  • or - in case of otlp http
python3 myinstr.py  http://localhost:4318/v1/traces go-cloud

I have tested also grafana/agent:latest by commenting the http_listen_port: 12345 in the config.yaml

Full output below (until I breack with ctrl-c)

$ docker run -it --rm -p 4317:4317 -p 4318:4318 -p 12345:12345 -v $(pwd):/cfg:ro grafana/agent:latest -config.file=/cfg/config.yaml
ts=2022-08-01T14:03:48.517549661Z caller=server.go:191 level=info msg="server listening on addresses" http=127.0.0.1:12345 grpc=127.0.0.1:12346 http_tls_enabled=false grpc_tls_enabled=false
ts=2022-08-01T14:03:48.519388796Z caller=node.go:85 level=info agent=prometheus component=cluster msg="applying config"
ts=2022-08-01T14:03:48.523680234Z caller=remote.go:180 level=info agent=prometheus component=cluster msg="not watching the KV, none set"
ts=2022-08-01T14:03:48Z level=info caller=traces/traces.go:143 msg="Traces Logger Initialized" component=traces
ts=2022-08-01T14:03:48Z level=info caller=traces/instance.go:141 msg="shutting down receiver" component=traces traces_config=integrations
ts=2022-08-01T14:03:48Z level=info caller=traces/instance.go:141 msg="shutting down processors" component=traces traces_config=integrations
ts=2022-08-01T14:03:48Z level=info caller=traces/instance.go:141 msg="shutting down exporters" component=traces traces_config=integrations
ts=2022-08-01T14:03:48Z level=info caller=traces/instance.go:141 msg="shutting down extensions" component=traces traces_config=integrations
ts=2022-08-01T14:03:48Z level=info caller=builder/exporters_builder.go:255 msg="Exporter was built." component=traces traces_config=integrations kind=exporter name=otlp/0
ts=2022-08-01T14:03:48Z level=info caller=builder/exporters_builder.go:40 msg="Exporter is starting..." component=traces traces_config=integrations kind=exporter name=otlp/0
ts=2022-08-01T14:03:48Z level=info caller=builder/exporters_builder.go:48 msg="Exporter started." component=traces traces_config=integrations kind=exporter name=otlp/0
ts=2022-08-01T14:03:48Z level=info caller=builder/pipelines_builder.go:223 msg="Pipeline was built." component=traces traces_config=integrations name=pipeline name=traces
ts=2022-08-01T14:03:48Z level=info caller=builder/pipelines_builder.go:54 msg="Pipeline is starting..." component=traces traces_config=integrations name=pipeline name=traces
ts=2022-08-01T14:03:48Z level=info caller=builder/pipelines_builder.go:65 msg="Pipeline is started." component=traces traces_config=integrations name=pipeline name=traces
ts=2022-08-01T14:03:48Z level=info caller=builder/receivers_builder.go:226 msg="Receiver was built." component=traces traces_config=integrations kind=receiver name=otlp datatype=traces
ts=2022-08-01T14:03:48Z level=info caller=builder/receivers_builder.go:226 msg="Receiver was built." component=traces traces_config=integrations kind=receiver name=push_receiver datatype=traces
ts=2022-08-01T14:03:48Z level=info caller=builder/receivers_builder.go:68 msg="Receiver is starting..." component=traces traces_config=integrations kind=receiver name=otlp
ts=2022-08-01T14:03:48Z level=info caller=otlpreceiver/otlp.go:69 msg="Starting GRPC server on endpoint 0.0.0.0:4317" component=traces traces_config=integrations kind=receiver name=otlp
ts=2022-08-01T14:03:48Z level=info caller=otlpreceiver/otlp.go:87 msg="Starting HTTP server on endpoint 0.0.0.0:4318" component=traces traces_config=integrations kind=receiver name=otlp
ts=2022-08-01T14:03:48Z level=info caller=builder/receivers_builder.go:73 msg="Receiver started." component=traces traces_config=integrations kind=receiver name=otlp
ts=2022-08-01T14:03:48Z level=info caller=builder/receivers_builder.go:68 msg="Receiver is starting..." component=traces traces_config=integrations kind=receiver name=push_receiver
ts=2022-08-01T14:03:48Z level=info caller=builder/receivers_builder.go:73 msg="Receiver started." component=traces traces_config=integrations kind=receiver name=push_receiver
ts=2022-08-01T14:03:48.553535127Z caller=manager.go:224 level=debug msg="Applying integrations config changes"
ts=2022-08-01T14:03:48.559845072Z caller=manager.go:221 level=debug msg="Integrations config is unchanged skipping apply"
ts=2022-08-01T14:03:48.561716027Z caller=reporter.go:107 level=info msg="running usage stats reporter"
ts=2022-08-01T14:04:03.524881331Z caller=config_watcher.go:139 level=debug agent=prometheus component=cluster msg="waiting for next reshard interval" last_reshard=2022-08-01T14:04:03.524762866Z next_reshard=2022-08-01T14:05:03.524762866Z remaining=59.999990963s
^Cts=2022-08-01T14:04:52.998750687Z caller=gokit.go:72 level=info msg="=== received SIGINT/SIGTERM ===\n*** exiting"
ts=2022-08-01T14:04:53.000592908Z caller=node.go:319 level=info agent=prometheus component=cluster msg="shutting down node"
ts=2022-08-01T14:04:53.001082601Z caller=node.go:345 level=info agent=prometheus component=cluster msg="node shut down"
ts=2022-08-01T14:04:53.001188893Z caller=node.go:169 level=info agent=prometheus component=cluster msg="node run loop exiting"
ts=2022-08-01T14:04:53.001604375Z caller=config_watcher.go:104 level=info agent=prometheus component=cluster msg="config watcher run loop exiting"
ts=2022-08-01T14:04:53Z level=info caller=traces/instance.go:141 msg="shutting down receiver" component=traces traces_config=integrations
ts=2022-08-01T14:04:53.001852658Z caller=cleaner.go:234 level=debug agent=prometheus component=cleaner msg="stopping cleaner..."
ts=2022-08-01T14:04:53Z level=info caller=traces/instance.go:141 msg="shutting down processors" component=traces traces_config=integrations
ts=2022-08-01T14:04:53Z level=info caller=builder/pipelines_builder.go:73 msg="Pipeline is shutting down..." component=traces traces_config=integrations name=pipeline name=traces
ts=2022-08-01T14:04:53Z level=info caller=builder/pipelines_builder.go:77 msg="Pipeline is shutdown." component=traces traces_config=integrations name=pipeline name=traces
ts=2022-08-01T14:04:53Z level=info caller=traces/instance.go:141 msg="shutting down exporters" component=traces traces_config=integrations
ts=2022-08-01T14:04:53Z level=info caller=traces/instance.go:141 msg="shutting down extensions" component=traces traces_config=integrations
ts=2022-08-01T14:04:53.004370868Z caller=main.go:67 level=info msg="agent exiting"

and the other window in the meantime - in case of otlp grpc (not http)

$ python3 myinstr.py  localhost:4317 go-cloud
trace id: e8439efa768b28d67a47e163e56977d4
{
    "name": "test_step_123_go-cloud",
    "context": {
        "trace_id": "0xe8439efa768b28d67a47e163e56977d4",
        "span_id": "0x49873061f67f1169",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0x8ee88051deb833eb",
    "start_time": "2022-08-01T14:04:03.853418Z",
    "end_time": "2022-08-01T14:04:03.853799Z",
    "status": {
        "status_code": "OK"
    },
    "attributes": {},
    "events": [
        {
            "name": "log",
            "timestamp": "2022-08-01T14:04:03.853501Z",
            "attributes": {
                "roll.sides": 10,
                "roll.result": 6
            }
        },
        {
            "name": "log",
            "timestamp": "2022-08-01T14:04:03.853688Z",
            "attributes": {
                "roll.sides": 10,
                "roll.result": 8
            }
        }
    ],
    "links": [],
    "resource": {
        "service.name": "service-pyshell"
    }
}
{
    "name": "test_case_ABC_go-cloud",
    "context": {
        "trace_id": "0xe8439efa768b28d67a47e163e56977d4",
        "span_id": "0x8ee88051deb833eb",
        "trace_state": "[]"
    },
    "kind": "SpanKind.SERVER",
    "parent_id": null,
    "start_time": "2022-08-01T14:04:03.853254Z",
    "end_time": "2022-08-01T14:04:03.853833Z",
    "status": {
        "status_code": "OK"
    },
    "attributes": {
        "url": "myurl1"
    },
    "events": [],
    "links": [],
    "resource": {
        "service.name": "service-pyshell"
    }
}

@rfratto
Copy link
Member

rfratto commented Aug 1, 2022

Thanks! Can you also post the OpenTelemetry Collector config which did work for you?

but in the pure Otel Collector I have the exporters section, in Grafana Agent I have the remote_write, they seem to do different things... anyway, I reply to your question

That's true, at the time we called the section remote_write because we wanted to be consistent with how metrics is configured, but the remote_write section for traces gets transformed into a config for an OTLP exporter.

@giuliohome
Copy link
Author

giuliohome commented Aug 1, 2022

The OpenTelemetry Collector otel-config.yaml which did work for me.

receivers:
  otlp:
    protocols:
      grpc:

exporters:
  logging:
    loglevel: debug
  otlp:
   endpoint: tempo-eu-west-0.grafana.net:443
   headers:
     authorization: Basic MyToken

processors:
  batch:

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging, otlp]

MyToken is obtained following the instructions of the mentioned blog:

$ echo -n "<your user id>:<your api key>" | base64

The command to run docker

docker run -it --rm -p 4317:4317 -p 4318:4318 -p 12345:12345 -v $(pwd):/cfg:ro otel/opentelemetry-collector-contrib-dev:latest --config=/cfg/otel-config.yaml

The command to run myinstr.py is the same as above.

Full output of docker until I break it

docker run -it --rm -p 4317:4317 -p 4318:4318 -p 12345:12345 -v $(pwd):/cfg:ro otel/opentelemetry-collector-contrib-dev:latest --config=/cfg/otel-config.yaml
2022/08/01 16:23:16 proto: duplicate proto type registered: jaeger.api_v2.PostSpansRequest
2022/08/01 16:23:16 proto: duplicate proto type registered: jaeger.api_v2.PostSpansResponse
2022-08-01T16:23:16.125Z        info    builder/exporters_builder.go:255        Exporter was built.     {"kind": "exporter", "name": "otlp"}
2022-08-01T16:23:16.125Z        info    builder/exporters_builder.go:255        Exporter was built.     {"kind": "exporter", "name": "logging"}
2022-08-01T16:23:16.126Z        info    builder/pipelines_builder.go:224        Pipeline was built.     {"kind": "pipeline", "name": "traces"}
2022-08-01T16:23:16.126Z        info    builder/receivers_builder.go:226        Receiver was built.     {"kind": "receiver", "name": "otlp", "datatype": "traces"}
2022-08-01T16:23:16.126Z        info    service/telemetry.go:109        Setting up own telemetry...
2022-08-01T16:23:16.127Z        info    service/telemetry.go:129        Serving Prometheus metrics      {"address": ":8888", "level": "basic", "service.instance.id": "4fb5c53c-db2d-4a77-9868-825863a6c190", "service.version": "latest"}
2022-08-01T16:23:16.127Z        info    service/service.go:76   Starting extensions...
2022-08-01T16:23:16.127Z        info    service/service.go:81   Starting exporters...
2022-08-01T16:23:16.127Z        info    builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "otlp"}
2022-08-01T16:23:16.128Z        info    builder/exporters_builder.go:48 Exporter started.       {"kind": "exporter", "name": "otlp"}
2022-08-01T16:23:16.128Z        info    builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "logging"}
2022-08-01T16:23:16.128Z        info    builder/exporters_builder.go:48 Exporter started.       {"kind": "exporter", "name": "logging"}
2022-08-01T16:23:16.128Z        info    service/service.go:86   Starting processors...
2022-08-01T16:23:16.128Z        info    builder/pipelines_builder.go:54 Pipeline is starting... {"kind": "pipeline", "name": "traces"}
2022-08-01T16:23:16.128Z        info    builder/pipelines_builder.go:65 Pipeline is started.    {"kind": "pipeline", "name": "traces"}
2022-08-01T16:23:16.128Z        info    service/service.go:91   Starting receivers...
2022-08-01T16:23:16.129Z        info    builder/receivers_builder.go:68 Receiver is starting... {"kind": "receiver", "name": "otlp"}
2022-08-01T16:23:16.129Z        info    otlpreceiver/otlp.go:70 Starting GRPC server on endpoint 0.0.0.0:4317   {"kind": "receiver", "name": "otlp"}
2022-08-01T16:23:16.129Z        info    builder/receivers_builder.go:73 Receiver started.       {"kind": "receiver", "name": "otlp"}
2022-08-01T16:23:16.129Z        info    service/collector.go:251        Starting otelcontribcol...      {"Version": "40d055d", "NumCPU": 2}
2022-08-01T16:23:16.129Z        info    service/collector.go:146        Everything is ready. Begin running and processing data.
2022-08-01T16:24:39.099Z        INFO    loggingexporter/logging_exporter.go:42  TracesExporter  {"#spans": 2}
2022-08-01T16:24:39.099Z        DEBUG   loggingexporter/logging_exporter.go:51  ResourceSpans #0
Resource SchemaURL:
Resource labels:
     -> service.name: STRING(service-pyshell)
ScopeSpans #0
ScopeSpans SchemaURL:
InstrumentationScope pyshell
Span #0
    Trace ID       : d6c53e27f97f5c4faa9670601cc509ac
    Parent ID      : efd153c34f6f8da9
    ID             : 70ca7bf3e7252d57
    Name           : test_step_123_go-cloud
    Kind           : SPAN_KIND_INTERNAL
    Start time     : 2022-08-01 16:24:38.957808982 +0000 UTC
    End time       : 2022-08-01 16:24:38.957878534 +0000 UTC
    Status code    : STATUS_CODE_OK
    Status message :
Events:
SpanEvent #0
     -> Name: log
     -> Timestamp: 2022-08-01 16:24:38.957846674 +0000 UTC
     -> DroppedAttributesCount: 0
     -> Attributes:
         -> roll.sides: INT(10)
         -> roll.result: INT(8)
SpanEvent #1
     -> Name: log
     -> Timestamp: 2022-08-01 16:24:38.957861762 +0000 UTC
     -> DroppedAttributesCount: 0
     -> Attributes:
         -> roll.sides: INT(10)
         -> roll.result: INT(2)
Span #1
    Trace ID       : d6c53e27f97f5c4faa9670601cc509ac
    Parent ID      :
    ID             : efd153c34f6f8da9
    Name           : test_case_ABC_go-cloud
    Kind           : SPAN_KIND_SERVER
    Start time     : 2022-08-01 16:24:38.957572621 +0000 UTC
    End time       : 2022-08-01 16:24:38.95790237 +0000 UTC
    Status code    : STATUS_CODE_OK
    Status message :
Attributes:
     -> url: STRING(myurl1)


^C2022-08-01T16:26:40.016Z      info    service/collector.go:177        Received signal from OS {"signal": "interrupt"}
2022-08-01T16:26:40.016Z        info    service/collector.go:267        Starting shutdown...
2022-08-01T16:26:40.016Z        info    service/service.go:111  Stopping receivers...
2022-08-01T16:26:40.017Z        info    service/service.go:116  Stopping processors...
2022-08-01T16:26:40.017Z        info    builder/pipelines_builder.go:73 Pipeline is shutting down...    {"kind": "pipeline", "name": "traces"}
2022-08-01T16:26:40.017Z        info    builder/pipelines_builder.go:77 Pipeline is shutdown.   {"kind": "pipeline", "name": "traces"}
2022-08-01T16:26:40.017Z        info    service/service.go:121  Stopping exporters...
2022-08-01T16:26:40.018Z        info    service/service.go:126  Stopping extensions...
2022-08-01T16:26:40.018Z        info    service/collector.go:281        Shutdown complete.

And finally my awesome trace on Grafana Could! 😄 🎆

image

@rfratto
Copy link
Member

rfratto commented Aug 1, 2022

Thanks!

That's interesting, I wonder if it's because you don't have the batch processor defined on the agent equivalent, though I don't understand why that would matter.

server:
  http_listen_port: 12345
  log_level: debug
traces:
  configs:
    - name: integrations
      # Configure batch processor 
      batch: 
      receivers:
        otlp:
          protocols:
            grpc:
              endpoint: 0.0.0.0:4317
            http:
              endpoint: 0.0.0.0:4318
      remote_write:
        - endpoint: tempo-eu-west-0.grafana.net:443
          basic_auth:
            username: 201581
            password: <my API key>

@mapno would you expect traces to not show up in tempo if the batch processor isn't declared?

@rfratto
Copy link
Member

rfratto commented Aug 1, 2022

FWIW, if you're happy with using OpenTelemetry Collector, you should by all means continue using it :) But I appreciate you helping take the time for us to get to the bottom of why the Grafana Agent config wasn't working for you so we can help other people that might run into the same issue.

@giuliohome
Copy link
Author

My wild guess at this point is that this was a bug in an older release of the opentelemetry collector that has been fixed in a release maybe newer than the one used to produce the Agent image. But I don't know if this makes really sense.

Yes, as you said I'm happy with the Collector, but I think it would be also great to have the Agent working in this situation. Not strictly needed though. Yes I confirm that on my side this is resolved with the Collector.
Thanks for your help again.

@giuliohome
Copy link
Author

giuliohome commented Aug 1, 2022

That's interesting, I wonder if it's because you don't have the batch processor defined on the agent equivalent, though I don't understand why that would matter.

I commented that part and indeed it doesn't matter, sorry for adding it in the minimal repro.

My wild guess at this point is that this was a bug in an older release of the opentelemetry collector that has been fixed in a release maybe newer than the one used to produce the Agent image.

Well, you recently upgraded from v0.46.0 to v0.55.0 and the former version of the colletor (namely otel/opentelemetry-collector-contrib:0.46.0) does not work.
So I think we have arrived at the bottom line, let me know if you agree and feel free to close this as you prefer.

Edit

What's even more important is that the otel/opentelemetry-collector-contrib:0.55.0 or the otel/opentelemetry-collector-contrib:latest - i.e. from version 55 on of the contrib collector - are working OK (for my use case) but the otel/opentelemetry-collector:latest - i.e. even the latest version of the core collector - is not working.
I'm bit perplexed about this... But hopefully that's still fine for me...

Looking at your go.mod I see that you do use the contrib version for e.g. the jaeger receiver and exporter but oddly not in general hence not for the otlp receiver/exporter. I imagine you could "fix" this by using the contrib version also for the otlp part (or something like that...)

@giuliohome
Copy link
Author

giuliohome commented Aug 2, 2022

Wait a moment, why is the latest version of the core collector not working?
Because docker latest tag is not updated!
Actually otel/opentelemetry-collector:0.56.0 does work! (minor note, actually now, after repulling the latest, it works, they updated it 6 days ago with version 56, finally)
Also the version 55 but not the 46 - this is confirmed (after the latest tag introduced some confusion) and it's maybe the only valid explanation why the Agent is not working (and will work soon, I think)

@giuliohome
Copy link
Author

giuliohome commented Aug 2, 2022

Yeah, grafana/agent:main-53e8cf3 is working!!!

And also grafana:main

docker run -it --rm -p 4317:4317 -p 4318:4318 -p 12345:12345 -v $(pwd):/cfg:ro grafana/agent:main -config.file=/cfg/config.yaml

giuliohome added a commit to giuliohome/pytest-otel-jaeger-kubernetes that referenced this issue Aug 2, 2022
@giuliohome
Copy link
Author

giuliohome commented Aug 3, 2022

Reference to otel collector core issue and changelog

Docker tags

NOTICE that your latest tag on docker grafana/agent is "Last pushed 9 days ago" with agent release v0.26 pointing to collector v0.46, hence it is still not working at the time of writing, whilst the main tag is already OK.

giuliohome pushed a commit to giuliohome/pytest-otel-jaeger-kubernetes that referenced this issue Aug 7, 2022
@daper
Copy link
Contributor

daper commented Aug 12, 2022

Hello, I also faced the same issue here, and it looks like it's fixed on main. Could you make an agent release? Not feeling very comfortable running main 😅 . Thanks!

@giuliohome
Copy link
Author

giuliohome commented Aug 25, 2022

I agree with @daper: better to consider the issue "open" until it's fixed on latest.

BTW There is also another subtle aspect that is not very clear to me.
In principle, I should be able to send a trace to grafana cloud (tempo saas) without the collector (even if it is better to use the agent or the collector).

But a code like this

no_otel_collector_url = "https://tempo-eu-west-0.grafana.net"
otlp_exporter = OTLPSpanExporter(endpoint=no_otel_collector_url, headers={"authorization": "Basic myencodedtoken")

correctly passing the authentication, still results in no trace shown on grafancloud explore search.
Ok this aspect is not strictly related to the agent itself but more to Grafana cloud (Tempo Traces)... that having said, I wonder why the trace sent straight to grafana cloud without the collector is silently dropped...

@giuliohome
Copy link
Author

@daper I've tested that now - with the new version 1.12 of python opentelemetry sdk - even the latest tag of grafana agent (though not yet updated) is working!

Same solution as for this tempo straight ingestion issue.

@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 22, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

No branches or pull requests

4 participants