-
Notifications
You must be signed in to change notification settings - Fork 204
Description
Currently any change in the running OpenTelemetry collector configuration causes the collector to start and stop. In dynamic environments like Kubernetes where inputs can rapidly be created and stop this will lead to significant disruption of data collection. Eliminate or reduce the need to start and stop the collector as a result of dynamic provider events, especially on Kubernetes.
There are two places where this problem needs to be addressed:
Dynamic Providers
- Relates https://www.elastic.co/docs/reference/fleet/kubernetes-provider
- Relates https://www.elastic.co/docs/reference/fleet/kubernetes_secrets-provider
- Relates https://www.elastic.co/docs/reference/fleet/kubernetes_leaderelection-provider
The upstream solution to the problem of being able to dynamically start and stop receivers is the receivercreator. One option for Beats receivers is to translate inputs templated with dynamic providers into receiver creator expressions.
For an example an nginx/metrics input that uses the Kubernetes provider would typically look like:
inputs:
- id: kubernetes-nginx-${kubernetes.pod.name}-${kubernetes.container.id}
type: nginx/metrics
use_output: default
data_stream:
type: metrics
dataset: nginx.stubstatus
namespace: default
metricsets:
- stubstatus
hosts: ["${kubernetes.pod.ip}:6379"]
server_status_path: "nginx_status"
period: 15s
condition: ${kubernetes.labels.app} == 'nginx'A version of this written using receivercreator to create receivers to watch nginx pods would be:
receivers:
receiver_creator:
watch_observers: [k8s_observer]
receivers:
nginx:
rule: type == "port" && port == 80 && pod.name matches "(?i)nginx"
config:
endpoint: 'http://`endpoint`/nginx_status'
collection_interval: '15s'Self-Monitoring
Parallel to but separate from the problem of dynamic providers adding inputs to the configuration, each time we start or stop a beat receiver the self-monitoring configuration needs to be updated to scrape it's metrics from it's newly started metrics socket.
This could perhaps be addressed with receivercreator as well, but could also be addressed by completely changing how self-monitoring is done to eliminate this problem. The beat receivers will be inside the collector process so allocating a new unix socket/named pipe for each one is fundamentally unnecessary.