Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Service monitoring via istio_* metrics and Kiali? #33336

Closed
diranged opened this issue Jun 8, 2021 · 16 comments
Closed

External Service monitoring via istio_* metrics and Kiali? #33336

diranged opened this issue Jun 8, 2021 · 16 comments
Labels
area/extensions and telemetry kind/docs lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while

Comments

@diranged
Copy link
Contributor

diranged commented Jun 8, 2021

Bug description

This may be a question - or may be a bug report, I am not sure. We're hoping to use Istio not only to manage Envoy for the purpose of an internal service mesh within our clusters, but also to manage Envoy configurations that help our applications connect to externally hosted services (AWS ElastiCache, RDS, etc).

Our existing non-kubernetes applications use Envoy with the RedisProxy filter to hide from our applicatinos the fact that they are really connecting to large Redis clusters. We typically set enable_redirection: true and read_policy: ANY on our listeners. This allows a dumb-client (one that thinks its really just talking to a single Redis endpoint) to interact with a much larger cluster, but without any of the overhead of the connection management or failure semantics of downed nodes, etc in the EC clusters. The configuration of our existing application envoy listeners looks like this:

  listeners:
    - name: redis_listener
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 6302
      filter_chains:
        - filters:
          - name: envoy.redis_proxy
            typed_config:
              "@type": type.googleapis.com/envoy.config.filter.network.redis_proxy.v2.RedisProxy
              stat_prefix: default_cluster
              settings:
                op_timeout: 0.5s
                enable_redirection: true
                read_policy: ANY
              prefix_routes:
                catch_all_route:
                  cluster: default_cluster
...
  clusters:
    - name: default_cluster
      connect_timeout: 0.5s
      dns_lookup_family: V4_ONLY
      respect_dns_ttl: true
      hosts:
        - socket_address: { address: default_cluster,  port_value: 6379 }
      lb_policy: cluster_provided
      cluster_type:
        name: envoy.clusters.redis
        typed_config:
          "@type": type.googleapis.com/google.protobuf.Struct
          value:
            cluster_refresh_rate: 5s
            cluster_refresh_timeout: 0.5s
      health_checks:
        - custom_health_check:
            name: envoy.health_checkers.redis
            typed_config:
              "@type": type.googleapis.com/envoy.config.health_checker.redis.v2.Redis
          timeout: 0.5s
          interval: 2s
          unhealthy_threshold: 2
          healthy_threshold: 1

On our Kubernetes environment with Istio, we're trying to come up with a similar configuration. We have been able to make a nearly identical configuration with the following Kubernetes resources:

---
apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
  name: test
spec:
  workloadSelector:
    labels:
      run: debug-diranged
  outboundTrafficPolicy:
    mode: REGISTRY_ONLY
  egress:
    - bind: 127.0.0.1
      captureMode: NONE
      port:
        number: 6379
        protocol: REDIS
        name: redis 
      hosts:
        - ./*
---
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: redis-proxy
spec:
  resolution: DNS
  hosts:
    - benchmark.....clustercfg.usw2.cache.amazonaws.com
  endpoints:
    - address: benchmark.....clustercfg.usw2.cache.amazonaws.com
      ports:
        redis: 6379
  exportTo: [.]
  location: MESH_EXTERNAL
  ports:
    - number: 6379
      name: redis
      protocol: REDIS
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: redis-proxy
spec:
  host: benchmark.....clustercfg.usw2.cache.amazonaws.com
  exportTo: [.]
  trafficPolicy:
    connectionPool:
      tcp:
        connectTimeout: 500ms
        tcpKeepalive:
          time: 300s
          interval: 60s
          time: 300s
---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: redis-proxy
spec:
  workloadSelector:
    labels:
      run: debug-diranged
  configPatches:
    - applyTo: NETWORK_FILTER
      match:
        listener:
          name: 127.0.0.1_6379
          filterChain:
            filter:
              name: envoy.filters.network.redis_proxy
      patch:
        operation: REPLACE
        value:
          name: envoy.filters.network.redis_proxy
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.network.redis_proxy.v3.RedisProxy
            latencyInMicros: true
            stat_prefix: outbound|6379||benchmark.....clustercfg.usw2.cache.amazonaws.com
            prefix_routes:
              catch_all_route:
                cluster: outbound|6379||benchmark.....clustercfg.usw2.cache.amazonaws.com
            settings:
              op_timeout: 0.3s
              enable_redirection: true
              enable_command_stats: true
              read_policy: ANY

    - applyTo: CLUSTER
      match:
        context: SIDECAR_OUTBOUND
        cluster:
          port: 6379
          service: benchmark.....clustercfg.usw2.cache.amazonaws.com
      patch:
        operation: MERGE
        value:
          health_checks:
            - custom_health_check:
                name: envoy.health_checkers.redis
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.health_checkers.redis.v3.Redis
              timeout: 0.5s
              interval: 2s
              unhealthy_threshold: 2
              healthy_threshold: 1

With the above config, we're able to connect to 127.0.0.1:6379 within our Pod, and run redis-benchmark across the nodes without it being aware that there is a cluster under the hood:

root@debug-diranged:/data# redis-benchmark -t get,set   
ERROR: unsupported command 'CONFIG'
ERROR: failed to fetch CONFIG from 127.0.0.1:6379
WARN: could not fetch server CONFIG
====== SET ======                                                   
  100000 requests completed in 2.01 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1
  multi-thread: no

Latency by percentile distribution:
0.000% <= 0.447 milliseconds (cumulative count 2)
50.000% <= 0.919 milliseconds (cumulative count 50771)
75.000% <= 1.039 milliseconds (cumulative count 75379)
87.500% <= 1.127 milliseconds (cumulative count 87934)
93.750% <= 1.215 milliseconds (cumulative count 93900)
96.875% <= 1.319 milliseconds (cumulative count 96914)
98.438% <= 1.463 milliseconds (cumulative count 98438)
99.219% <= 1.695 milliseconds (cumulative count 99223)
99.609% <= 2.119 milliseconds (cumulative count 99610)
99.805% <= 6.111 milliseconds (cumulative count 99805)
99.902% <= 8.095 milliseconds (cumulative count 99903)
99.951% <= 9.119 milliseconds (cumulative count 99952)
99.976% <= 9.671 milliseconds (cumulative count 99976)
99.988% <= 10.071 milliseconds (cumulative count 99988)
99.994% <= 10.295 milliseconds (cumulative count 99994)
99.997% <= 10.367 milliseconds (cumulative count 99997)
99.998% <= 10.391 milliseconds (cumulative count 99999)
99.999% <= 10.399 milliseconds (cumulative count 100000)
100.000% <= 10.399 milliseconds (cumulative count 100000)

Cumulative distribution of latencies:
0.000% <= 0.103 milliseconds (cumulative count 0)
0.065% <= 0.503 milliseconds (cumulative count 65)
2.814% <= 0.607 milliseconds (cumulative count 2814)
10.565% <= 0.703 milliseconds (cumulative count 10565)
26.371% <= 0.807 milliseconds (cumulative count 26371)
47.283% <= 0.903 milliseconds (cumulative count 47283)
69.471% <= 1.007 milliseconds (cumulative count 69471)
85.286% <= 1.103 milliseconds (cumulative count 85286)
93.552% <= 1.207 milliseconds (cumulative count 93552)
96.625% <= 1.303 milliseconds (cumulative count 96625)
97.992% <= 1.407 milliseconds (cumulative count 97992)
98.633% <= 1.503 milliseconds (cumulative count 98633)
99.014% <= 1.607 milliseconds (cumulative count 99014)
99.237% <= 1.703 milliseconds (cumulative count 99237)
99.404% <= 1.807 milliseconds (cumulative count 99404)
99.520% <= 1.903 milliseconds (cumulative count 99520)
99.576% <= 2.007 milliseconds (cumulative count 99576)
99.608% <= 2.103 milliseconds (cumulative count 99608)
99.783% <= 3.103 milliseconds (cumulative count 99783)
99.800% <= 4.103 milliseconds (cumulative count 99800)
99.804% <= 6.103 milliseconds (cumulative count 99804)
99.852% <= 7.103 milliseconds (cumulative count 99852)
99.906% <= 8.103 milliseconds (cumulative count 99906)
99.951% <= 9.103 milliseconds (cumulative count 99951)
99.989% <= 10.103 milliseconds (cumulative count 99989)
100.000% <= 11.103 milliseconds (cumulative count 100000)

Summary:
  throughput summary: 49776.01 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.944     0.440     0.919     1.247     1.607    10.399
====== GET ======                                                   
  100000 requests completed in 2.01 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1
  multi-thread: no

Latency by percentile distribution:
0.000% <= 0.399 milliseconds (cumulative count 1)
50.000% <= 0.991 milliseconds (cumulative count 50533)
75.000% <= 1.127 milliseconds (cumulative count 75053)
87.500% <= 1.223 milliseconds (cumulative count 88256)
93.750% <= 1.311 milliseconds (cumulative count 93946)
96.875% <= 1.407 milliseconds (cumulative count 96902)
98.438% <= 1.511 milliseconds (cumulative count 98448)
99.219% <= 1.607 milliseconds (cumulative count 99230)
99.609% <= 1.703 milliseconds (cumulative count 99629)
99.805% <= 1.799 milliseconds (cumulative count 99815)
99.902% <= 1.879 milliseconds (cumulative count 99905)
99.951% <= 1.959 milliseconds (cumulative count 99953)
99.976% <= 2.047 milliseconds (cumulative count 99977)
99.988% <= 2.191 milliseconds (cumulative count 99988)
99.994% <= 2.247 milliseconds (cumulative count 99994)
99.997% <= 2.303 milliseconds (cumulative count 99997)
99.998% <= 2.327 milliseconds (cumulative count 99999)
99.999% <= 2.343 milliseconds (cumulative count 100000)
100.000% <= 2.343 milliseconds (cumulative count 100000)

Cumulative distribution of latencies:
0.000% <= 0.103 milliseconds (cumulative count 0)
0.005% <= 0.407 milliseconds (cumulative count 5)
1.614% <= 0.503 milliseconds (cumulative count 1614)
14.217% <= 0.607 milliseconds (cumulative count 14217)
24.580% <= 0.703 milliseconds (cumulative count 24580)
32.086% <= 0.807 milliseconds (cumulative count 32086)
39.619% <= 0.903 milliseconds (cumulative count 39619)
53.024% <= 1.007 milliseconds (cumulative count 53024)
70.785% <= 1.103 milliseconds (cumulative count 70785)
86.583% <= 1.207 milliseconds (cumulative count 86583)
93.613% <= 1.303 milliseconds (cumulative count 93613)
96.902% <= 1.407 milliseconds (cumulative count 96902)
98.360% <= 1.503 milliseconds (cumulative count 98360)
99.230% <= 1.607 milliseconds (cumulative count 99230)
99.629% <= 1.703 milliseconds (cumulative count 99629)
99.823% <= 1.807 milliseconds (cumulative count 99823)
99.919% <= 1.903 milliseconds (cumulative count 99919)
99.968% <= 2.007 milliseconds (cumulative count 99968)
99.982% <= 2.103 milliseconds (cumulative count 99982)
100.000% <= 3.103 milliseconds (cumulative count 100000)

Summary:
  throughput summary: 49776.01 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.946     0.392     0.991     1.343     1.575     2.343

root@debug-diranged:/data# 

The problem is that we have no visibility into the data flow at all. When we run the benchmark for a while, we don't see really any istio_tcp.* metrics coming from the Pod, we don't see anything in Kiali that shows any traffic going to our ServiceEndpoint.

We aren't looking for deep protocol level data from Istio here, and I know that we can expose the Envoy native Redis metrics if we need to. I just want to understand if we are missing something, or if we're really not going to get any visibility into TCP traffic to an external service like this?

[x] Docs
[ ] Installation
[ ] Networking
[ ] Performance and Scalability
[x] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
[ ] Upgrade

Expected behavior

I had expected that we would see a graph entry in Kiali for our ServiceEntry, and that we would see some level of data around bytes flowing into that service. I would expect to see some traffic reported through the stats metrics:

$ curl -s http://localhost:15000/stats/prometheus | egrep -i 'redis|istio|amazon' 
# TYPE istio_build gauge
istio_build{component="proxy",tag="1.10.0"} 1
$ 

Steps to reproduce the bug

Launch an ElastiCache cluster in AWS with multiple Node Groups and REplicas per Node. Then use the above configuration I have provided to try to create a route to it in Istio via the Sidecar.

Version (include the output of istioctl version --remote and kubectl version --short and helm version --short if you used Helm)

$ istioctl version --remote
client version: 1.10.0
control plane version: 1.10.0
data plane version: 1.10.0 (10 proxies), 1.9.0 (2 proxies)
$ kubectl version --short
Client Version: v1.21.1
Server Version: v1.20.4-eks-6b7464

How was Istio installed?

Via the Istio Operator

Environment where the bug was observed (cloud vendor, OS, etc)

AWS on EKS... also, I have captured a bug report that I can share privately with the developers, but do not want to publish publicly.

@bianpengyuan
Copy link
Contributor

You should be able to see istio_tcp metrics. Do you see istio.stats filter at 127.0.0.1_6379 listener?

@diranged
Copy link
Contributor Author

diranged commented Jun 9, 2021

I do not:

$ istioctl pc listeners debug-diranged --address 127.0.0.1 --port 6379 -o yaml
- accessLog:
  - filter:
      responseFlagFilter:
        flags:
        - NR
    name: envoy.access_loggers.file
    typedConfig:
      '@type': type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
      logFormat:
        jsonFormat:
          authority: '%REQ(:AUTHORITY)%'
          bytes_received: '%BYTES_RECEIVED%'
          bytes_sent: '%BYTES_SENT%'
          connection_termination_details: '%CONNECTION_TERMINATION_DETAILS%'
          downstream_local_address: '%DOWNSTREAM_LOCAL_ADDRESS%'
          downstream_remote_address: '%DOWNSTREAM_REMOTE_ADDRESS%'
          duration: '%DURATION%'
          method: '%REQ(:METHOD)%'
          path: '%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%'
          protocol: '%PROTOCOL%'
          request_id: '%REQ(X-REQUEST-ID)%'
          requested_server_name: '%REQUESTED_SERVER_NAME%'
          response_code: '%RESPONSE_CODE%'
          response_code_details: '%RESPONSE_CODE_DETAILS%'
          response_flags: '%RESPONSE_FLAGS%'
          route_name: '%ROUTE_NAME%'
          start_time: '%START_TIME%'
          upstream_cluster: '%UPSTREAM_CLUSTER%'
          upstream_host: '%UPSTREAM_HOST%'
          upstream_local_address: '%UPSTREAM_LOCAL_ADDRESS%'
          upstream_service_time: '%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%'
          upstream_transport_failure_reason: '%UPSTREAM_TRANSPORT_FAILURE_REASON%'
          user_agent: '%REQ(USER-AGENT)%'
          x_forwarded_for: '%REQ(X-FORWARDED-FOR)%'
      path: /dev/stdout
  address:
    socketAddress:
      address: 127.0.0.1
      portValue: 6379
  filterChains:
  - filters:
    - name: envoy.filters.network.redis_proxy
      typedConfig:
        '@type': type.googleapis.com/envoy.extensions.filters.network.redis_proxy.v3.RedisProxy
        latencyInMicros: true
        prefixRoutes:
          catchAllRoute:
            cluster: outbound|6379|| benchmark.....clustercfg.usw2.cache.amazonaws.com
        settings:
          enableCommandStats: true
          enableRedirection: true
          opTimeout: 0.300s
          readPolicy: ANY
        statPrefix: outbound|6379|| benchmark.....clustercfg.usw2.cache.amazonaws.com
  name: 127.0.0.1_6379
  trafficDirection: OUTBOUND

@diranged
Copy link
Contributor Author

diranged commented Jun 9, 2021

Just to make sure it wasn't my EnvoyFilter messing things up, I deleted the filter and checked again:

$ istioctl pc listeners debug-diranged --address 127.0.0.1 --port 6379 -o yaml
- accessLog:
  - filter:
      responseFlagFilter:
        flags:
        - NR
    name: envoy.access_loggers.file
    typedConfig:
      '@type': type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
      logFormat:
        jsonFormat:
          authority: '%REQ(:AUTHORITY)%'
          bytes_received: '%BYTES_RECEIVED%'
          bytes_sent: '%BYTES_SENT%'
          connection_termination_details: '%CONNECTION_TERMINATION_DETAILS%'
          downstream_local_address: '%DOWNSTREAM_LOCAL_ADDRESS%'
          downstream_remote_address: '%DOWNSTREAM_REMOTE_ADDRESS%'
          duration: '%DURATION%'
          method: '%REQ(:METHOD)%'
          path: '%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%'
          protocol: '%PROTOCOL%'
          request_id: '%REQ(X-REQUEST-ID)%'
          requested_server_name: '%REQUESTED_SERVER_NAME%'
          response_code: '%RESPONSE_CODE%'
          response_code_details: '%RESPONSE_CODE_DETAILS%'
          response_flags: '%RESPONSE_FLAGS%'
          route_name: '%ROUTE_NAME%'
          start_time: '%START_TIME%'
          upstream_cluster: '%UPSTREAM_CLUSTER%'
          upstream_host: '%UPSTREAM_HOST%'
          upstream_local_address: '%UPSTREAM_LOCAL_ADDRESS%'
          upstream_service_time: '%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%'
          upstream_transport_failure_reason: '%UPSTREAM_TRANSPORT_FAILURE_REASON%'
          user_agent: '%REQ(USER-AGENT)%'
          x_forwarded_for: '%REQ(X-FORWARDED-FOR)%'
      path: /dev/stdout
  address:
    socketAddress:
      address: 127.0.0.1
      portValue: 6379
  filterChains:
  - filters:
    - name: envoy.filters.network.redis_proxy
      typedConfig:
        '@type': type.googleapis.com/envoy.extensions.filters.network.redis_proxy.v3.RedisProxy
        latencyInMicros: true
        prefixRoutes:
          catchAllRoute:
            cluster: outbound|6379|| benchmark.....clustercfg.usw2.cache.amazonaws.com
        settings:
          opTimeout: 5s
        statPrefix: outbound|6379|| benchmark.....clustercfg.usw2.cache.amazonaws.com
  name: 127.0.0.1_6379
  trafficDirection: OUTBOUND

@bianpengyuan
Copy link
Contributor

Ok, the TCP stats filter is injected based on presence of tcp_proxy. You can get EnvoyFilter at istio-system namespace and check tcp-stats ones. Probably you will need to define another EnvoyFilter to accommodate redis_proxy case.

@diranged
Copy link
Contributor Author

diranged commented Jun 9, 2021

@bianpengyuan Thanks for that - I was able to make something work, and it's not terrible:

Screen Shot 2021-06-09 at 11 07 05 AM

Screen Shot 2021-06-09 at 11 06 50 AM

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: redis-proxy-stats-filter
spec:
  configPatches:
  - applyTo: NETWORK_FILTER
    match:
      context: SIDECAR_OUTBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.redis_proxy
      proxy:
        proxyVersion: ^1\.10.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.stats
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.network.wasm.v3.Wasm
          value:
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {
                    "debug": "false",
                    "stat_prefix": "istio"
                  }
              root_id: stats_outbound
              vm_config:
                code:
                  local:
                    inline_string: envoy.wasm.stats
                runtime: envoy.wasm.runtime.null
                vm_id: tcp_stats_outbound

Do you know if there is now anything we can do to improve the visibility of the target destination - so that instead of it being PassthroughFilter, it shows the redis endpoint?

@diranged
Copy link
Contributor Author

diranged commented Jun 9, 2021

@bianpengyuan,
I was able to modify the metrics list slightly so that it includes destination_port properly.. but is there some way for me to get the ServiceEntry FQDN, or Service resource name, or listener port/IP, or something else here to help add some context to the data?

                    "metrics": [
                      {
                        "dimensions": {
                          "destination_port": "string(destination.port)"
                        }
                      }

@bianpengyuan
Copy link
Contributor

You should be able to overwrite existing labels like destination_service with metric customization.

@diranged
Copy link
Contributor Author

@bianpengyuan,
I think the question is - what metadata is available for us during a pure TCP session when the TCP session is going through a local listener port (127.0.0.1_6379)? I've tried a ton of different settings, and I keep finding empty metadata. I tried using options from https://istio.io/latest/docs/tasks/observability/metrics/customize-metrics/#use-expressions-for-values, https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/advanced/attributes#connection-attributes, and https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/advanced/attributes#upstream-attributes.

I did find that destination.port works (returns 6379), as well as source.port (which is a bad idea :P). I found that destination.address returned 127.0.0.1. So I can cobble together a "destination_service" of 127.0.0.1:6379.. but the better address would be to use a value from the ServiceEntry (like its host name) or its endpoint, or even all of the individual downstream IP addresses of the redis cluster.

@bianpengyuan
Copy link
Contributor

bianpengyuan commented Jun 10, 2021

I think you can hard code something in the attributes, like destination_service: const("redis_benchmark") or use the cluster name attribute "cluster_name". I am not sure about the syntax though for a const string expr. @kyessenov would know better.

@kyessenov
Copy link
Contributor

kyessenov commented Jun 10, 2021

String syntax is "some_value" or 'some_value'. Note the double quotation.

@diranged
Copy link
Contributor Author

So - I figured out how to hard-code a specific string-value in there. However, that means we have to create a unique EnvoyFilter for every single instance where we're doing this, and use narrowly scoped workloadSelectors. I had hoped that something like cluster_name would work, but that just returns unknown. Here's the example listener:

- accessLog:
  - filter:
      responseFlagFilter:
        flags:
        - NR
    name: envoy.access_loggers.file
    typedConfig:
      '@type': type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
      logFormat:
        jsonFormat:
          authority: '%REQ(:AUTHORITY)%'
          bytes_received: '%BYTES_RECEIVED%'
          bytes_sent: '%BYTES_SENT%'
          connection_termination_details: '%CONNECTION_TERMINATION_DETAILS%'
          downstream_local_address: '%DOWNSTREAM_LOCAL_ADDRESS%'
          downstream_remote_address: '%DOWNSTREAM_REMOTE_ADDRESS%'
          duration: '%DURATION%'
          method: '%REQ(:METHOD)%'
          path: '%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%'
          protocol: '%PROTOCOL%'
          request_id: '%REQ(X-REQUEST-ID)%'
          requested_server_name: '%REQUESTED_SERVER_NAME%'
          response_code: '%RESPONSE_CODE%'
          response_code_details: '%RESPONSE_CODE_DETAILS%'
          response_flags: '%RESPONSE_FLAGS%'
          route_name: '%ROUTE_NAME%'
          start_time: '%START_TIME%'
          upstream_cluster: '%UPSTREAM_CLUSTER%'
          upstream_host: '%UPSTREAM_HOST%'
          upstream_local_address: '%UPSTREAM_LOCAL_ADDRESS%'
          upstream_service_time: '%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%'
          upstream_transport_failure_reason: '%UPSTREAM_TRANSPORT_FAILURE_REASON%'
          user_agent: '%REQ(USER-AGENT)%'
          x_forwarded_for: '%REQ(X-FORWARDED-FOR)%'
      path: /dev/stdout
  address:
    socketAddress:
      address: 127.0.0.1
      portValue: 6379
  filterChains:
  - filters:
    - name: istio.stats
      typedConfig:
        '@type': type.googleapis.com/udpa.type.v1.TypedStruct
        typeUrl: type.googleapis.com/envoy.extensions.filters.network.wasm.v3.Wasm
        value:
          config:
            configuration:
              '@type': type.googleapis.com/google.protobuf.StringValue
              value: |
                {
                  "debug": "false",
                  "stat_prefix": "istio",
                  "metrics": [
                    {
                      "dimensions": {
                        "destination_service_name": "cluster_name",
                        "destination_service": "destination.address",
                        "destination_port": "string(destination.port)"
                      }
                    }
                  ]
                }
            root_id: stats_outbound
            vm_config:
              code:
                local:
                  inline_string: envoy.wasm.stats
              runtime: envoy.wasm.runtime.null
              vm_id: tcp_stats_outbound
    - name: envoy.filters.network.redis_proxy
      typedConfig:
        '@type': type.googleapis.com/envoy.extensions.filters.network.redis_proxy.v3.RedisProxy
        latencyInMicros: true
        prefixRoutes:
          catchAllRoute:
            cluster: outbound|6379|| benchmark.....clustercfg.usw2.cache.amazonaws.com
        settings:
          enableCommandStats: true
          enableRedirection: true
          opTimeout: 0.300s
          readPolicy: ANY
        statPrefix: outbound|6379|| benchmark.....clustercfg.usw2.cache.amazonaws.com
  name: 127.0.0.1_6379
  trafficDirection: OUTBOUND

and here's the stats we see:

# TYPE istio_tcp_connections_closed_total counter
istio_tcp_connections_closed_total{reporter="source",source_workload="debug-diranged",source_workload_namespace="test",source_principal="unknown",source_app="unknown",source_version="unknown",source_cluster="Kubernetes",destination_workload="unknown",destination_workload_namespace="unknown",destination_principal="unknown",destination_app="unknown",destination_version="unknown",destination_service="127.0.0.1:6379",destination_service_name="unknown",destination_service_namespace="unknown",destination_port="6379",destination_cluster="unknown",request_protocol="tcp",response_flags="-",connection_security_policy="unknown",source_canonical_service="debug-diranged",destination_canonical_service="unknown",source_canonical_revision="latest",destination_canonical_revision="latest"} 301

# TYPE istio_tcp_connections_opened_total counter
istio_tcp_connections_opened_total{reporter="source",source_workload="debug-diranged",source_workload_namespace="test",source_principal="unknown",source_app="unknown",source_version="unknown",source_cluster="Kubernetes",destination_workload="unknown",destination_workload_namespace="unknown",destination_principal="unknown",destination_app="unknown",destination_version="unknown",destination_service="127.0.0.1:6379",destination_service_name="unknown",destination_service_namespace="unknown",destination_port="6379",destination_cluster="unknown",request_protocol="tcp",response_flags="-",connection_security_policy="unknown",source_canonical_service="debug-diranged",destination_canonical_service="unknown",source_canonical_revision="latest",destination_canonical_revision="latest"} 301

# TYPE istio_tcp_received_bytes_total counter
istio_tcp_received_bytes_total{reporter="source",source_workload="debug-diranged",source_workload_namespace="test",source_principal="unknown",source_app="unknown",source_version="unknown",source_cluster="Kubernetes",destination_workload="unknown",destination_workload_namespace="unknown",destination_principal="unknown",destination_app="unknown",destination_version="unknown",destination_service="127.0.0.1:6379",destination_service_name="unknown",destination_service_namespace="unknown",destination_port="6379",destination_cluster="unknown",request_protocol="tcp",response_flags="-",connection_security_policy="unknown",source_canonical_service="debug-diranged",destination_canonical_service="unknown",source_canonical_revision="latest",destination_canonical_revision="latest"} 24300077

# TYPE istio_tcp_sent_bytes_total counter
istio_tcp_sent_bytes_total{reporter="source",source_workload="debug-diranged",source_workload_namespace="test",source_principal="unknown",source_app="unknown",source_version="unknown",source_cluster="Kubernetes",destination_workload="unknown",destination_workload_namespace="unknown",destination_principal="unknown",destination_app="unknown",destination_version="unknown",destination_service="127.0.0.1:6379",destination_service_name="unknown",destination_service_namespace="unknown",destination_port="6379",destination_cluster="unknown",request_protocol="tcp",response_flags="-",connection_security_policy="unknown",source_canonical_service="debug-diranged",destination_canonical_service="unknown",source_canonical_revision="latest",destination_canonical_revision="latest"} 4200062

@kyessenov
Copy link
Contributor

I think cluster_name is not working because there is a bug with redis_proxy not populating the stream metadata. Need to confirm, but that's my guess why it doesn't work for you.

@diranged
Copy link
Contributor Author

Ok - thanks.. if you could confirm that, it would help me feel less crazy. :)

@diranged
Copy link
Contributor Author

@kyessenov Do you know of where the code would be that shows us precisely which "metadata" is populated by the filter?

@kyessenov
Copy link
Contributor

The accessor is here https://github.com/envoyproxy/envoy/blob/0fae6970ddaf93f024908ba304bbd2b34e997a51/source/extensions/common/wasm/context.cc#L527. I suspect upstream stuff in StreamInfo is not properly populated by redis filter.

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Sep 13, 2021
@istio-policy-bot
Copy link

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2021-06-14. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/extensions and telemetry kind/docs lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while
Projects
None yet
Development

No branches or pull requests

4 participants