-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
telemetry: adding custom tags generates duplicated metrics #39772
Comments
@sschepens can you share me the listener configdump of the pod? |
iop yaml # export mesh_id cluster_id in metrics
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: dup-metrics
namespace: istio-system
spec:
values:
global:
meshID: mesh1
meshConfig:
defaultConfig:
extraStatTags:
- destination_x
- source_x telemetry apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
name: custom-tags
namespace: istio-system
spec:
metrics:
- overrides:
- match:
metric: ALL_METRICS
mode: CLIENT
tagOverrides:
destination_x:
value: upstream_peer.labels['app'].value
- match:
metric: ALL_METRICS
mode: SERVER
tagOverrides:
source_x:
value: downstream_peer.labels['app'].value
providers:
- name: prometheus promethe output: istio_requests_total{response_code="200",reporter="source",source_workload="sleep",source_workload_namespace="default",source_principal="spiffe://cluster.local/ns/default/sa/sleep",source_app="sleep",source_version="unknown",source_cluster="Kubernetes",destination_workload="httpbin",destination_workload_namespace="default",destination_principal="spiffe://cluster.local/ns/default/sa/httpbin",destination_app="httpbin",destination_version="v1",destination_service="httpbin.default.svc.cluster.local",destination_service_name="httpbin",destination_service_namespace="default",destination_cluster="Kubernetes",request_protocol="http",response_flags="-",grpc_response_status="",connection_security_policy="unknown",source_canonical_service="sleep",destination_canonical_service="httpbin",source_canonical_revision="latest",destination_canonical_revision="v1",destination_x="httpbin"} 1
istio_requests_total{response_code="200",reporter="source",source_workload="sleep",source_workload_namespace="default",source_principal="spiffe://cluster.local/ns/default/sa/sleep",source_app="sleep",source_version="unknown",source_cluster="Kubernetes",destination_workload="httpbin",destination_workload_namespace="default",destination_principal="spiffe://cluster.local/ns/default/sa/httpbin",destination_app="httpbin",destination_version="v1",destination_service="httpbin.default.svc.cluster.local",destination_service_name="httpbin",destination_service_namespace="default",destination_cluster="Kubernetes",request_protocol="http",response_flags="-",grpc_response_status="",connection_security_policy="unknown",source_canonical_service="sleep",destination_canonical_service="httpbin",source_canonical_revision="latest",destination_canonical_revision="v1"} 1 listener dump: - address:
socketAddress:
address: 0.0.0.0
portValue: 8000
bindToPort: false
continueOnListenerFiltersTimeout: true
defaultFilterChain:
filterChainMatch: {}
filters:
- name: istio.stats
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.network.wasm.v3.Wasm
config:
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: '{"metrics":[{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"request_messages_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"response_messages_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"requests_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"request_duration_milliseconds"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"request_bytes"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"response_bytes"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_connections_closed_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_connections_opened_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_received_bytes_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_sent_bytes_total"}]}'
rootId: stats_outbound
vmConfig:
allowPrecompiled: true
code:
local:
filename: /etc/istio/extensions/stats-filter.compiled.wasm
runtime: envoy.wasm.runtime.v8
vmId: tcp_stats_outbound
- name: istio.stats
typedConfig:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
typeUrl: type.googleapis.com/envoy.extensions.filters.network.wasm.v3.Wasm
value:
config:
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: |
{
"debug": "false",
"stat_prefix": "istio"
}
root_id: stats_outbound
vm_config:
allow_precompiled: true
code:
local:
filename: /etc/istio/extensions/stats-filter.compiled.wasm
runtime: envoy.wasm.runtime.v8
vm_id: tcp_stats_outbound
- name: envoy.filters.network.tcp_proxy
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
cluster: PassthroughCluster
idleTimeout: 0s
statPrefix: PassthroughCluster
name: PassthroughFilterChain
filterChains:
- filterChainMatch:
applicationProtocols:
- http/1.1
- h2c
transportProtocol: raw_buffer
filters:
- name: envoy.filters.network.http_connection_manager
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
httpFilters:
- name: istio.metadata_exchange
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
config:
configuration:
'@type': type.googleapis.com/envoy.tcp.metadataexchange.config.MetadataExchange
vmConfig:
allowPrecompiled: true
code:
local:
filename: /etc/istio/extensions/metadata-exchange-filter.compiled.wasm
runtime: envoy.wasm.runtime.v8
- name: istio.alpn
typedConfig:
'@type': type.googleapis.com/istio.envoy.config.filter.http.alpn.v2alpha1.FilterConfig
alpnOverride:
- alpnOverride:
- istio-http/1.0
- istio
- http/1.0
- alpnOverride:
- istio-http/1.1
- istio
- http/1.1
upstreamProtocol: HTTP11
- alpnOverride:
- istio-h2
- istio
- h2
upstreamProtocol: HTTP2
- name: envoy.filters.http.fault
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
- name: envoy.filters.http.cors
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
- name: istio.stats
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
config:
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: '{"metrics":[{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"request_messages_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"response_messages_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"requests_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"request_duration_milliseconds"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"request_bytes"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"response_bytes"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_connections_closed_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_connections_opened_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_received_bytes_total"},{"dimensions":{"destination_x":"upstream_peer.labels[''app''].value"},"name":"tcp_sent_bytes_total"}]}'
rootId: stats_outbound
vmConfig:
allowPrecompiled: true
code:
local:
filename: /etc/istio/extensions/stats-filter.compiled.wasm
runtime: envoy.wasm.runtime.v8
vmId: stats_outbound
- name: istio.stats
typedConfig:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
typeUrl: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
value:
config:
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: |
{
"debug": "false",
"stat_prefix": "istio"
}
root_id: stats_outbound
vm_config:
allow_precompiled: true
code:
local:
filename: /etc/istio/extensions/stats-filter.compiled.wasm
runtime: envoy.wasm.runtime.v8
vm_id: stats_outbound
- name: envoy.filters.http.router
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
normalizePath: true
pathWithEscapedSlashesAction: KEEP_UNCHANGED
rds:
configSource:
ads: {}
initialFetchTimeout: 0s
resourceApiVersion: V3
routeConfigName: "8000"
requestIdExtension:
typedConfig:
'@type': type.googleapis.com/envoy.extensions.request_id.uuid.v3.UuidRequestIdConfig
useRequestIdForTraceSampling: true
statPrefix: outbound_0.0.0.0_8000
streamIdleTimeout: 0s
tracing:
clientSampling:
value: 100
customTags:
- metadata:
kind:
request: {}
metadataKey:
key: envoy.filters.http.rbac
path:
- key: istio_dry_run_allow_shadow_effective_policy_id
tag: istio.authorization.dry_run.allow_policy.name
- metadata:
kind:
request: {}
metadataKey:
key: envoy.filters.http.rbac
path:
- key: istio_dry_run_allow_shadow_engine_result
tag: istio.authorization.dry_run.allow_policy.result
- metadata:
kind:
request: {}
metadataKey:
key: envoy.filters.http.rbac
path:
- key: istio_dry_run_deny_shadow_effective_policy_id
tag: istio.authorization.dry_run.deny_policy.name
- metadata:
kind:
request: {}
metadataKey:
key: envoy.filters.http.rbac
path:
- key: istio_dry_run_deny_shadow_engine_result
tag: istio.authorization.dry_run.deny_policy.result
- literal:
value: latest
tag: istio.canonical_revision
- literal:
value: sleep
tag: istio.canonical_service
- literal:
value: mesh1
tag: istio.mesh_id
- literal:
value: default
tag: istio.namespace
overallSampling:
value: 100
randomSampling:
value: 1
upgradeConfigs:
- upgradeType: websocket
useRemoteAddress: false
listenerFilters:
- name: envoy.filters.listener.tls_inspector
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
- name: envoy.filters.listener.http_inspector
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.listener.http_inspector.v3.HttpInspector
listenerFiltersTimeout: 0s
name: 0.0.0.0_8000
trafficDirection: OUTBOUND
cc @douglas-reid looks like we need remove duplicate |
@zirain Telemetry API is fundamentally incompatible with the telemetry-focused EnvoyFilters that get applied. If one wants to use Telemetry API, the first step is really to delete those Filters. IIRC, this used to be documented somewhere, but I can't find it now. I think for the transition period, we need to handle mapping the default providers appropriately based on flags -- and properly handle upgrades with existing filters. |
I'm thinking do something to avoid this, most of users will not read docs before getting errors. |
@zirain can we unstale this? Is there any plans for deprecating the old EnvotFilter based telemetry? Or is there a way to disabled them when installing Istio? |
Yeah, disable EnvoyFilters by setting The question should be when do we make it that default, that's probably 1.17. |
I have the same problem (#39932 (comment)) |
and one more side effect: |
I deleted all Istio version: |
there's no magic behind it, you can generate manifests by running |
can this issue be closed @zirain ? |
will review again after Telemetry API promoted to beta. |
I am running into an issue / some issues here, and I get the sense they are the same as what's going in this bug. Want to confirm, and want to ask a a few more questions to make sure I'm understanding the issue correctly / what I can do / expect in our current / future versions. I want to use the "Telemetry API", which as I understand it means to do things like create a CRD like this to expose some more envoy properties in my prometheus stats for tcp traffic (in my specific case, istio_tcp_sent_bytes_total):
My specific goal is to understand the remote address of a tcp connection from the source (info I can see when access logs are dumped for these same connections, so I assume that they are "knowable" by the proxy) The problem I am currently seeing is that certain fields are not consistently populated for certain metrics, and I don't really understand why. Some info here maybe provides some insight into types of traffic we have that may not work: https://techblog.cisco.com/istio-mixerless-telemetry#istio-telemetry-v2
Some of our traffic is out of mesh, but some is not (in mesh mtls), and in either case, I intermittently see different labels successfully being populated with values vs with "unknown" (even in cases of mtls traffic between the same two services, or same processes!). E.g. sometimes these 4 are populated
Sometimes everything is "unknown" besides destination_canonical_service. One other observation - seems any metric series that get created with the values that aren't "unknown" also has no non-zero values for the metric I'm looking at (tcp sent bytes). Maybe a coincidence. Here's the relevant part of my operator config:
I understand that this is experimental, but what I'm trying to find out is
control plane version: 1.14.3 |
I'm seeing a similar issue but in my case the metric is completely duplicated with the custom tag but different values. We have a
and of course the Istiod config has the old EnvoyFilters disabled:
We end up with these duplicate metrics which causes some issues for the metrics collector sometimes picking one and sometimes the other, maybe we could find a fix on that side but the ideal solution would be to not have duplicate metrics in the first place. We don't see the issue if the
This is on Istio |
@SamuelRosenqvist Sam IMO, 1.19.7 can't auto remove the stats EnvoyFilter, you can manually verify that the EnvoyFilter is deleted |
@nicole-lihui Thank you for the reply, I've removed the edit:
I noticed that the CEL fails to evaluate if the header is missing a value, e.g |
Bug Description
When adding custom tags to Istio standard metrics I get duplicated metrics.
Telemetry config:
The output of
/stats/prometheus
:As you can see, there is two separate metrics, with the exact same value, one has the custom tags and the other does not.
Is this intended behaviour? I was expecting to have only one metric with the custom tags.
Version
Additional Information
No response
The text was updated successfully, but these errors were encountered: