Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No observeability into failures at listener_filter level #12809

Closed
howardjohn opened this issue Aug 25, 2020 · 0 comments · Fixed by #12935
Closed

No observeability into failures at listener_filter level #12809

howardjohn opened this issue Aug 25, 2020 · 0 comments · Fixed by #12935

Comments

@howardjohn
Copy link
Contributor

I am trying to get some sort of observability into failures at the listener_filter level. Specifically, we see common issues where users have server first protocols, causing the inspectors to timeout (if timeout enabled) or hang (if no timeout set).

Config:

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
      protocol: TCP
      address: 127.0.0.1
      port_value: 9901
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 10000
    listener_filters:
    - name: envoy.filters.listener.tls_inspector
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
    - name: envoy.filters.listener.http_inspector
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.listener.http_inspector.v3.HttpInspector
    listener_filters_timeout: 0s
    access_log:
    - name: envoy.access_loggers.file
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
        path: "/dev/stdout"
        format: "[%START_TIME%] listener %RESPONSE_CODE% %RESPONSE_FLAGS% %DOWNSTREAM_LOCAL_ADDRESS%
          %DOWNSTREAM_REMOTE_ADDRESS%\n"
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  host_rewrite: www.google.com
                  cluster: service_google
                typed_per_filter_config:
                  envoy.fault:
                    '@type': type.googleapis.com/envoy.config.filter.http.fault.v2.HTTPFault
                    abort:
                      http_status: 409
                      percentage:
                        numerator: 100
                        denominator: HUNDRED
          http_filters:
          - name: envoy.router
  clusters:
  - name: service_google
    connect_timeout: 0.25s
    type: LOGICAL_DNS
    dns_lookup_family: V4_ONLY
    dns_refresh_rate: 10000s
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_google
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: www.google.com
                port_value: 443
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.api.v2.auth.UpstreamTlsContext
        sni: www.google.com

Ran with docker run -p 10000:10000 -v $HOME/kube/local:/config --entrypoint envoy envoyproxy/envoy-dev -c /config/envoy.yaml -l trace --log-format-prefix-with-location 0 then nc localhost 10000.

Trace logs, with listener filter timeout:

[debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:78] tls inspector: new connection accepted
[trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:172] tls inspector: recv: -1
[debug][conn_handler] [source/server/connection_handler_impl.cc:227] listener filter times out after 1000 ms

Trace logs, with timeout on client side (closes connection):

[debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:78] tls inspector: new connection accepted
[trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:172] tls inspector: recv: -1
[trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:193] tls inspector: done: false
[trace][main] [source/common/event/dispatcher_impl.cc:170] item added to deferred deletion list (size=1)
[trace][main] [source/common/event/dispatcher_impl.cc:78] clearing deferred deletion list (size=1)

In both cases, we can see there are no access logs emitted.\

cc @kyessenov

ggreenway pushed a commit that referenced this issue Sep 3, 2020
emit logs in case listener filter reject or timeout incoming connections. Moved the stream info ownership to the active TCP socket. There are three cases that have to be handled separately: ownership transfer to TCP connection (how it works today), deferred deletion of TCP socket (via unlinking), direct deletion of TCP socket

Fixes: #12809

Signed-off-by: Kuat Yessenov <kuat@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants