New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External Service monitoring via istio_* metrics and Kiali? #33336
Comments
You should be able to see istio_tcp metrics. Do you see |
I do not: $ istioctl pc listeners debug-diranged --address 127.0.0.1 --port 6379 -o yaml
- accessLog:
- filter:
responseFlagFilter:
flags:
- NR
name: envoy.access_loggers.file
typedConfig:
'@type': type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
logFormat:
jsonFormat:
authority: '%REQ(:AUTHORITY)%'
bytes_received: '%BYTES_RECEIVED%'
bytes_sent: '%BYTES_SENT%'
connection_termination_details: '%CONNECTION_TERMINATION_DETAILS%'
downstream_local_address: '%DOWNSTREAM_LOCAL_ADDRESS%'
downstream_remote_address: '%DOWNSTREAM_REMOTE_ADDRESS%'
duration: '%DURATION%'
method: '%REQ(:METHOD)%'
path: '%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%'
protocol: '%PROTOCOL%'
request_id: '%REQ(X-REQUEST-ID)%'
requested_server_name: '%REQUESTED_SERVER_NAME%'
response_code: '%RESPONSE_CODE%'
response_code_details: '%RESPONSE_CODE_DETAILS%'
response_flags: '%RESPONSE_FLAGS%'
route_name: '%ROUTE_NAME%'
start_time: '%START_TIME%'
upstream_cluster: '%UPSTREAM_CLUSTER%'
upstream_host: '%UPSTREAM_HOST%'
upstream_local_address: '%UPSTREAM_LOCAL_ADDRESS%'
upstream_service_time: '%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%'
upstream_transport_failure_reason: '%UPSTREAM_TRANSPORT_FAILURE_REASON%'
user_agent: '%REQ(USER-AGENT)%'
x_forwarded_for: '%REQ(X-FORWARDED-FOR)%'
path: /dev/stdout
address:
socketAddress:
address: 127.0.0.1
portValue: 6379
filterChains:
- filters:
- name: envoy.filters.network.redis_proxy
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.network.redis_proxy.v3.RedisProxy
latencyInMicros: true
prefixRoutes:
catchAllRoute:
cluster: outbound|6379|| benchmark.....clustercfg.usw2.cache.amazonaws.com
settings:
enableCommandStats: true
enableRedirection: true
opTimeout: 0.300s
readPolicy: ANY
statPrefix: outbound|6379|| benchmark.....clustercfg.usw2.cache.amazonaws.com
name: 127.0.0.1_6379
trafficDirection: OUTBOUND |
Just to make sure it wasn't my EnvoyFilter messing things up, I deleted the filter and checked again: $ istioctl pc listeners debug-diranged --address 127.0.0.1 --port 6379 -o yaml
- accessLog:
- filter:
responseFlagFilter:
flags:
- NR
name: envoy.access_loggers.file
typedConfig:
'@type': type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
logFormat:
jsonFormat:
authority: '%REQ(:AUTHORITY)%'
bytes_received: '%BYTES_RECEIVED%'
bytes_sent: '%BYTES_SENT%'
connection_termination_details: '%CONNECTION_TERMINATION_DETAILS%'
downstream_local_address: '%DOWNSTREAM_LOCAL_ADDRESS%'
downstream_remote_address: '%DOWNSTREAM_REMOTE_ADDRESS%'
duration: '%DURATION%'
method: '%REQ(:METHOD)%'
path: '%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%'
protocol: '%PROTOCOL%'
request_id: '%REQ(X-REQUEST-ID)%'
requested_server_name: '%REQUESTED_SERVER_NAME%'
response_code: '%RESPONSE_CODE%'
response_code_details: '%RESPONSE_CODE_DETAILS%'
response_flags: '%RESPONSE_FLAGS%'
route_name: '%ROUTE_NAME%'
start_time: '%START_TIME%'
upstream_cluster: '%UPSTREAM_CLUSTER%'
upstream_host: '%UPSTREAM_HOST%'
upstream_local_address: '%UPSTREAM_LOCAL_ADDRESS%'
upstream_service_time: '%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%'
upstream_transport_failure_reason: '%UPSTREAM_TRANSPORT_FAILURE_REASON%'
user_agent: '%REQ(USER-AGENT)%'
x_forwarded_for: '%REQ(X-FORWARDED-FOR)%'
path: /dev/stdout
address:
socketAddress:
address: 127.0.0.1
portValue: 6379
filterChains:
- filters:
- name: envoy.filters.network.redis_proxy
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.network.redis_proxy.v3.RedisProxy
latencyInMicros: true
prefixRoutes:
catchAllRoute:
cluster: outbound|6379|| benchmark.....clustercfg.usw2.cache.amazonaws.com
settings:
opTimeout: 5s
statPrefix: outbound|6379|| benchmark.....clustercfg.usw2.cache.amazonaws.com
name: 127.0.0.1_6379
trafficDirection: OUTBOUND |
Ok, the TCP stats filter is injected based on presence of tcp_proxy. You can get EnvoyFilter at istio-system namespace and check |
@bianpengyuan Thanks for that - I was able to make something work, and it's not terrible: apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: redis-proxy-stats-filter
spec:
configPatches:
- applyTo: NETWORK_FILTER
match:
context: SIDECAR_OUTBOUND
listener:
filterChain:
filter:
name: envoy.filters.network.redis_proxy
proxy:
proxyVersion: ^1\.10.*
patch:
operation: INSERT_BEFORE
value:
name: istio.stats
typed_config:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.network.wasm.v3.Wasm
value:
config:
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: |
{
"debug": "false",
"stat_prefix": "istio"
}
root_id: stats_outbound
vm_config:
code:
local:
inline_string: envoy.wasm.stats
runtime: envoy.wasm.runtime.null
vm_id: tcp_stats_outbound Do you know if there is now anything we can do to improve the visibility of the target destination - so that instead of it being |
@bianpengyuan, "metrics": [
{
"dimensions": {
"destination_port": "string(destination.port)"
}
} |
You should be able to overwrite existing labels like destination_service with metric customization. |
@bianpengyuan, I did find that |
I think you can hard code something in the attributes, like |
String syntax is |
So - I figured out how to hard-code a specific string-value in there. However, that means we have to create a unique - accessLog:
- filter:
responseFlagFilter:
flags:
- NR
name: envoy.access_loggers.file
typedConfig:
'@type': type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
logFormat:
jsonFormat:
authority: '%REQ(:AUTHORITY)%'
bytes_received: '%BYTES_RECEIVED%'
bytes_sent: '%BYTES_SENT%'
connection_termination_details: '%CONNECTION_TERMINATION_DETAILS%'
downstream_local_address: '%DOWNSTREAM_LOCAL_ADDRESS%'
downstream_remote_address: '%DOWNSTREAM_REMOTE_ADDRESS%'
duration: '%DURATION%'
method: '%REQ(:METHOD)%'
path: '%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%'
protocol: '%PROTOCOL%'
request_id: '%REQ(X-REQUEST-ID)%'
requested_server_name: '%REQUESTED_SERVER_NAME%'
response_code: '%RESPONSE_CODE%'
response_code_details: '%RESPONSE_CODE_DETAILS%'
response_flags: '%RESPONSE_FLAGS%'
route_name: '%ROUTE_NAME%'
start_time: '%START_TIME%'
upstream_cluster: '%UPSTREAM_CLUSTER%'
upstream_host: '%UPSTREAM_HOST%'
upstream_local_address: '%UPSTREAM_LOCAL_ADDRESS%'
upstream_service_time: '%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%'
upstream_transport_failure_reason: '%UPSTREAM_TRANSPORT_FAILURE_REASON%'
user_agent: '%REQ(USER-AGENT)%'
x_forwarded_for: '%REQ(X-FORWARDED-FOR)%'
path: /dev/stdout
address:
socketAddress:
address: 127.0.0.1
portValue: 6379
filterChains:
- filters:
- name: istio.stats
typedConfig:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
typeUrl: type.googleapis.com/envoy.extensions.filters.network.wasm.v3.Wasm
value:
config:
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: |
{
"debug": "false",
"stat_prefix": "istio",
"metrics": [
{
"dimensions": {
"destination_service_name": "cluster_name",
"destination_service": "destination.address",
"destination_port": "string(destination.port)"
}
}
]
}
root_id: stats_outbound
vm_config:
code:
local:
inline_string: envoy.wasm.stats
runtime: envoy.wasm.runtime.null
vm_id: tcp_stats_outbound
- name: envoy.filters.network.redis_proxy
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.network.redis_proxy.v3.RedisProxy
latencyInMicros: true
prefixRoutes:
catchAllRoute:
cluster: outbound|6379|| benchmark.....clustercfg.usw2.cache.amazonaws.com
settings:
enableCommandStats: true
enableRedirection: true
opTimeout: 0.300s
readPolicy: ANY
statPrefix: outbound|6379|| benchmark.....clustercfg.usw2.cache.amazonaws.com
name: 127.0.0.1_6379
trafficDirection: OUTBOUND and here's the stats we see:
|
I think |
Ok - thanks.. if you could confirm that, it would help me feel less crazy. :) |
@kyessenov Do you know of where the code would be that shows us precisely which "metadata" is populated by the filter? |
The accessor is here https://github.com/envoyproxy/envoy/blob/0fae6970ddaf93f024908ba304bbd2b34e997a51/source/extensions/common/wasm/context.cc#L527. I suspect upstream stuff in StreamInfo is not properly populated by redis filter. |
🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2021-06-14. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions. Created by the issue and PR lifecycle manager. |
Bug description
This may be a question - or may be a bug report, I am not sure. We're hoping to use Istio not only to manage Envoy for the purpose of an internal service mesh within our clusters, but also to manage Envoy configurations that help our applications connect to externally hosted services (AWS ElastiCache, RDS, etc).
Our existing non-kubernetes applications use Envoy with the
RedisProxy
filter to hide from our applicatinos the fact that they are really connecting to large Redis clusters. We typically setenable_redirection: true
andread_policy: ANY
on our listeners. This allows a dumb-client (one that thinks its really just talking to a single Redis endpoint) to interact with a much larger cluster, but without any of the overhead of the connection management or failure semantics of downed nodes, etc in the EC clusters. The configuration of our existing application envoy listeners looks like this:On our Kubernetes environment with Istio, we're trying to come up with a similar configuration. We have been able to make a nearly identical configuration with the following Kubernetes resources:
With the above config, we're able to connect to
127.0.0.1:6379
within our Pod, and runredis-benchmark
across the nodes without it being aware that there is a cluster under the hood:The problem is that we have no visibility into the data flow at all. When we run the benchmark for a while, we don't see really any
istio_tcp.*
metrics coming from the Pod, we don't see anything in Kiali that shows any traffic going to our ServiceEndpoint.We aren't looking for deep protocol level data from Istio here, and I know that we can expose the Envoy native Redis metrics if we need to. I just want to understand if we are missing something, or if we're really not going to get any visibility into TCP traffic to an external service like this?
[x] Docs
[ ] Installation
[ ] Networking
[ ] Performance and Scalability
[x] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
[ ] Upgrade
Expected behavior
I had expected that we would see a graph entry in Kiali for our ServiceEntry, and that we would see some level of data around bytes flowing into that service. I would expect to see some traffic reported through the stats metrics:
Steps to reproduce the bug
Launch an ElastiCache cluster in AWS with multiple Node Groups and REplicas per Node. Then use the above configuration I have provided to try to create a route to it in Istio via the Sidecar.
Version (include the output of
istioctl version --remote
andkubectl version --short
andhelm version --short
if you used Helm)How was Istio installed?
Via the Istio Operator
Environment where the bug was observed (cloud vendor, OS, etc)
AWS on EKS... also, I have captured a bug report that I can share privately with the developers, but do not want to publish publicly.
The text was updated successfully, but these errors were encountered: