Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy dropping connections to in_opentelemetry #8742

Open
edsiper opened this issue Apr 19, 2024 · 3 comments
Open

Envoy dropping connections to in_opentelemetry #8742

edsiper opened this issue Apr 19, 2024 · 3 comments

Comments

@edsiper
Copy link
Member

edsiper commented Apr 19, 2024

Bug Report

Doing a local test between Envoy and Fluent Bit upstream, Envoy cannot succeed in the GRPC session giving a 14 error.

The following is the Envoy configuration being used:

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address: { address: 0.0.0.0, port_value: 9901 }

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 10000 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          codec_type: AUTO
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                direct_response:
                  status: 200
                  body:
                    inline_string: "Hello, World!"
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          tracing:
            provider:
              name: envoy.tracers.opentelemetry
              typed_config:
                "@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
                grpc_service:
                  envoy_grpc:
                    cluster_name: otel-collector
          access_log:
          - name: envoy.access_loggers.stdout
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog

  clusters:
  - name: otel-collector
    connect_timeout: 1s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        common_tls_context:
          validation_context:
            # This configures the context to not verify the peer certificate.
            trust_chain_verification: ACCEPT_UNTRUSTED
    http2_protocol_options: {}
    load_assignment:
      cluster_name: otel-collector
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 4317

Envoy usage

Run it with the following command:

envoy -c envoy-traces.yaml --log-level debug

Fluent Bit

Run it locally with (inside fluent-bit/build):

bin/fluent-bit -i opentelemetry 
          -p port=4317 
          -p tls=on 
          -p tls.verify=off 
          -p tls.crt_file=../tests/runtime_shell/tls/certificate.pem 
          -p tls.key_file=../tests/runtime_shell/tls/private_key.pem 
          -p tls.debug=5 
      -o stdout -vv

Envoy fails with the following information:

[2024-04-19 15:19:55.367][25505973][debug][router] [source/common/router/router.cc:1332] [Tags: "ConnectionId":"0","StreamId":"16569461505730323531"] upstream reset: reset reason: connection timeout, transport failure reason:
[2024-04-19 15:19:55.367][25505973][debug][http] [source/common/http/async_client_impl.cc:106] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: connection timeout'
@edsiper
Copy link
Member Author

edsiper commented Apr 24, 2024

The bug is being fixed now

@edsiper
Copy link
Member Author

edsiper commented Apr 24, 2024

CTraces fix: fluent/ctraces#53

@edsiper
Copy link
Member Author

edsiper commented Apr 26, 2024

Merging fix through #8768

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants