Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper aggregation of istio metrics using prometheus federation #9700

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 3 additions & 2 deletions docs/monitoring/05-03-prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,6 @@ This table lists the configurable parameters, their descriptions, and default va

| Parameter | Description | Default value |
|-----------|-------------|---------------|
| **retention** | Specifies a period for which Prometheus stores the metrics in-memory. This retention time applies to in-memory storage only. Prometheus stores the recent data in-memory for the specified amount of time to avoid reading the entire data from disk.| `2h` |
| **storageSpec.volumeClaimTemplate.spec.resources.requests.storage** | Specifies the size of a Persistent Volume Claim (PVC). | `4Gi` |
| **prometheusSpec.retention** | Specifies a period for which Prometheus stores the metrics.| `1d` |
| **prometheusSpec.retentionSize** | Specifies the maximum number of bytes that storage blocks can use. The oldest data will be removed first.| `2GB` |
| **prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage** | Specifies the size of a PersistentVolumeClaim (PVC). | `10Gi` |
17 changes: 8 additions & 9 deletions docs/monitoring/05-04-production-profile.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,15 @@ The table shows the parameters of each profile and their values:

Parameter | Description | Default profile| Production profile | Local profile|
|-----------|-------------|----------------|--------------------|--------------|
| **retentionSize** | Maximum number of bytes that storage blocks can use. The oldest data will be removed first. | `2GB` | `15GB` | `256MB` |
| **retention** | Time period for which Prometheus stores metrics in an in-memory database. Prometheus stores the recent data for the specified amount of time to avoid reading all data from the disk. This parameter only applies to in-memory storage.|`1d`| `30d` | `2h`|
| **prometheusSpec.volumeClaimTemplate.spec.resources.requests.storage** | Amount of storage requested by the Prometheus Pod. |`10Gi`| `20Gi` | `1Gi` |
| **prometheusSpec.resources.limits.cpu** | Maximum number of CPUs available for the Prometheus Pod to use. | `600m`| `1` | `150m`|
| **prometheusSpec.resources.limits.memory** | Maximum amount of memory available for the Prometheus Pod to use. |`2000Mi` | `3Gi` |`800Mi`|
| **prometheusSpec.resources.requests.cpu** | Number of CPUs requested by the Prometheus Pod to operate.| `300m`| `300m` | `100m` |
| **prometheusSpec.resources.requests.memory** | Amount of memory requested by the Prometheus Pod to operate. | `1000Mi`| `1Gi` | `200Mi` |
| **prometheus.prometheusSpec.retentionSize** | Maximum number of bytes that storage blocks can use. The oldest data will be removed first. | `2GB` | `15GB` | `256MB` |
| **prometheus.prometheusSpec.retention** | Time period for which Prometheus stores the metrics. |`1d`| `30d` | `2h`|
| **prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage** | Amount of storage requested by the Prometheus Pod. |`10Gi`| `20Gi` | `1Gi` |
| **prometheus.prometheusSpec.resources.limits.cpu** | Maximum number of CPUs available for the Prometheus Pod to use. | `600m`| `1` | `150m`|
| **prometheus.prometheusSpec.resources.limits.memory** | Maximum amount of memory available for the Prometheus Pod to use. |`2Gi` | `3Gi` |`800Mi`|
| **prometheus.prometheusSpec.resources.requests.cpu** | Number of CPUs requested by the Prometheus Pod to operate.| `200m`| `300m` | `100m` |
| **prometheus.prometheusSpec.resources.requests.memory** | Amount of memory requested by the Prometheus Pod to operate. | `600Mi`| `1Gi` | `200Mi` |
| **alertmanager.alertmanagerSpec.retention** | Time period for which Alertmanager retains data.| `120h` | `240h` | `1h` |
| **grafana.persistence.enabled**| Storing grafana database on a PersistentVolume?|`true`|`true`|`false`|
| **grafana.persistence.enabled**| Parameter that enables storing Grafana database on a PersistentVolume |`true`|`true`|`false`|

## Use profiles

Expand Down Expand Up @@ -113,4 +113,3 @@ You can deploy a Kyma cluster with Monitoring configured to use the production p
</details>
</div>


Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ data:
prometheus.prometheusSpec.resources.limits.memory: "3Gi"
prometheus.prometheusSpec.resources.requests.cpu: "300m"
prometheus.prometheusSpec.resources.requests.memory: "1Gi"
prometheusIstio.server.resources.limits.memory: "3Gi"
alertmanager.alertmanagerSpec.retention: "240h"

---
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if (and .Values.monitoring.enabled .Values.monitoring.dashboards.enabled) }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down Expand Up @@ -1236,3 +1237,4 @@ data:
"uid": "G8wLrJIZk",
"version": 5
}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if (and .Values.monitoring.enabled .Values.monitoring.dashboards.enabled) }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down Expand Up @@ -1833,3 +1834,4 @@ data:
"uid": "vu8e0VWZk",
"version": 22
}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if (and .Values.monitoring.enabled .Values.monitoring.dashboards.enabled) }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down Expand Up @@ -1601,4 +1602,5 @@ data:
"title": "Istio / Pilot",
"uid": "3--MLVZZk",
"version": 11
}
}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if (and .Values.monitoring.enabled .Values.monitoring.dashboards.enabled) }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down Expand Up @@ -2612,3 +2613,4 @@ data:
"uid": "LJ_uJAvmk",
"version": 1
}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if (and .Values.monitoring.enabled .Values.monitoring.dashboards.enabled) }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down Expand Up @@ -2314,3 +2315,4 @@ data:
"uid": "UbsSZTDik",
"version": 1
}
{{- end }}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if (and .Values.monitoring.enabled .Values.monitoring.istioServiceMonitor.enabled) }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
Expand All @@ -20,7 +21,9 @@ spec:
- istio-system
endpoints:
- path: /metrics
interval: 30s
{{- if .Values.monitoring.istioServiceMonitor.scrapeInterval }}
interval: {{ .Values.monitoring.istioServiceMonitor.scrapeInterval }}
{{- end }}
relabelings:
- sourceLabels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
separator: ;
Expand All @@ -31,4 +34,4 @@ spec:
- sourceLabels: [ __name__ ]
regex: ^(envoy_cluster_upstream_cx_active|envoy_cluster_upstream_cx_connect_fail|envoy_cluster_upstream_cx_rx_bytes_total|envoy_cluster_upstream_cx_total|envoy_cluster_upstream_cx_tx_bytes_total|envoy_server_hot_restart_epoch|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|grpc_io_server_completed_rpcs|grpc_io_server_server_latency_bucket|istio_build|istio_mcp_request_acks_total|istio_mcp_request_nacks_total|mixer_runtime_dispatch_duration_seconds_bucket|mixer_runtime_dispatches_total|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|istio_build|istio_mcp_request_acks_total|pilot_conflict_inbound_listener|pilot_conflict_outbound_listener_http_over_current_tcp|pilot_conflict_outbound_listener_tcp_over_current_http|pilot_conflict_outbound_listener_tcp_over_current_tcp|pilot_proxy_convergence_time_bucket|pilot_services|pilot_virt_services|pilot_xds_push_context_errors|pilot_total_xds_rejects|pilot_total_xds_internal_errors|pilot_xds_write_timeout|pilot_xds_lds_reject|pilot_xds_rds_reject|pilot_xds_push_timeout_failures|pilot_xds_eds_instances|pilot_xds_eds_reject|pilot_xds|pilot_xds_push_timeout|pilot_xds_push_errors|pilot_xds_cds_reject|pilot_xds_pushes|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|galley_istio_authentication_meshpolicies|galley_istio_networking_destinationrules|galley_istio_networking_gateways|galley_istio_networking_virtualservices|galley_runtime_processor_events_processed_total|galley_runtime_processor_snapshot_events_total_bucket|galley_runtime_processor_snapshots_published_total|galley_runtime_state_type_instances_total|galley_runtime_strategy_on_change_total|galley_runtime_strategy_timer_max_time_reached_total|galley_runtime_strategy_timer_quiesce_reached_total|galley_runtime_strategy_timer_resets_total|galley_source_kube_dynamic_converter_failure_total|galley_source_kube_dynamic_converter_success_total|galley_source_kube_event_error_total|galley_source_kube_event_success_total|galley_validation_http_error|galley_validation_cert_key_update_errors|galley_validation_cert_key_updates|galley_validation_passed|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|istio_build|istio_mcp_clients_total|istio_mcp_request_acks_total|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|citadel_secret_controller_secret_deleted_cert_count|citadel_secret_controller_svc_acc_deleted_cert_count|citadel_secret_controller_svc_acc_created_cert_count|citadel_server_authentication_failure_count|citadel_server_csr_count|citadel_secret_controller_csr_err_count|citadel_server_csr_parsing_err_count|citadel_server_success_cert_issuance_count|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|grpc_server_handled_total|grpc_server_handling_seconds_bucket|grpc_server_started_total|istio_build|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|secret_deleted_cert_count|svc_acc_created_cert_count|svc_acc_deleted_cert_count|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|grpc_io_server_completed_rpcs|grpc_io_server_server_latency_bucket|istio_build|istio_mcp_request_acks_total|mixer_runtime_dispatch_duration_seconds_bucket|mixer_runtime_dispatches_total|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|istio_request_bytes_bucket|istio_request_bytes_sum|istio_request_duration_milliseconds_bucket|istio_requests_total|istio_response_bytes_bucket|istio_response_bytes_sum|istio_tcp_received_bytes_total|istio_tcp_sent_bytes_total)$
action: keep
{{- end }}
8 changes: 8 additions & 0 deletions resources/istio/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,11 @@ istio:
installer:
image: eu.gcr.io/kyma-project/istio-installer
tag: 19e240cd

monitoring:
enabled: true
dashboards:
enabled: true
istioServiceMonitor:
enabled: true
scrapeInterval: ""
5 changes: 5 additions & 0 deletions resources/monitoring/charts/prometheus-istio/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: v1
name: prometheus-istio
version: 11.16.2
appVersion: 2.21.0
description: Prometheus is a monitoring system and time series database.