Skip to content

Managed PrometheusRule use wrong or non-existing metrics #2130

@sebhoss

Description

@sebhoss

Describe the bug:

The operator managed PrometheusRule resource for syslog-ng uses metrics that do not exist (anymore).

Expected behaviour:

The managed PrometheusRule resource for syslog-ng should use metrics that are actually exported by the axosyslog-metrics-exporter

Steps to reproduce the bug:

  • Deploy a SyslogNGConfig and set .spec.metrics.enabled to true as well as .spec.metrics.serviceMonitor to true.
  • Call kubectl --namespace YOUR_LOGGING_NAMESPACE_HERE port-forward svc/default-syslog-ng-metrics :http-metrics
  • Call curl localhost:PORT to see all available metrics

This is the result I'm getting for the curl call at the end:

# TYPE syslogng_parsed_events_total counter
syslogng_parsed_events_total{id="#anon-parser0",result="processed"} 4.054305e+06
syslogng_parsed_events_total{id="#anon-parser0",result="discarded"} 0
# TYPE syslogng_scratch_buffers_bytes gauge
syslogng_scratch_buffers_bytes 57344
# TYPE syslogng_stats_level gauge
syslogng_stats_level 2
# TYPE syslogng_output_event_delay_sample_seconds gauge
syslogng_output_event_delay_sample_seconds{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="0"} 0.001 1761812249809
syslogng_output_event_delay_sample_seconds{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="1"} 0 1761812249809
syslogng_output_event_delay_sample_seconds{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="2"} 0 1761812249809
syslogng_output_event_delay_sample_seconds{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="3"} 0.001 1761812249809
syslogng_output_event_delay_sample_seconds{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="4"} 0.001 1761812249809
# TYPE syslogng_memory_queue_events gauge
syslogng_memory_queue_events{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="0"} 0
syslogng_memory_queue_events{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="1"} 0
syslogng_memory_queue_events{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="2"} 0
syslogng_memory_queue_events{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="3"} 0
syslogng_memory_queue_events{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="4"} 0
# TYPE syslogng_memory_queue_memory_usage_bytes gauge
syslogng_memory_queue_memory_usage_bytes{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="0"} 0
syslogng_memory_queue_memory_usage_bytes{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="1"} 0
syslogng_memory_queue_memory_usage_bytes{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="2"} 0
syslogng_memory_queue_memory_usage_bytes{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="3"} 0
syslogng_memory_queue_memory_usage_bytes{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="4"} 0
# TYPE syslogng_filtered_events_total counter
syslogng_filtered_events_total{id="clusterflow_logging_default_match",result="matched"} 3.054606e+06
syslogng_filtered_events_total{id="clusterflow_logging_default_match",result="not_matched"} 999699
# TYPE syslogng_input_event_bytes_total gauge
syslogng_input_event_bytes_total{transport="tcp",address="10.25.253.173",driver="afsocket",id="main_input#0"} 1.27631084e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.203.214",driver="afsocket",id="main_input#0"} 1.86197676e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.150.153",driver="afsocket",id="main_input#0"} 7.8527713e+07
syslogng_input_event_bytes_total{transport="tcp",address="10.25.152.158",driver="afsocket",id="main_input#0"} 5.5628234e+07
syslogng_input_event_bytes_total{transport="tcp",address="10.25.253.57",driver="afsocket",id="main_input#0"} 2.14349791e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.142.56",driver="afsocket",id="main_input#0"} 2.22488125e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.196.236",driver="afsocket",id="main_input#0"} 2.21020977e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.140.158",driver="afsocket",id="main_input#0"} 1.77989365e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.199.27",driver="afsocket",id="main_input#0"} 2.53170068e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.191.125",driver="afsocket",id="main_input#0"} 1.91468636e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.250.174",driver="afsocket",id="main_input#0"} 1.47182883e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.234.31",driver="afsocket",id="main_input#0"} 1.56667392e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.199.228",driver="afsocket",id="main_input#0"} 7.3285488e+07
syslogng_input_event_bytes_total{transport="tcp",address="10.25.213.3",driver="afsocket",id="main_input#0"} 3.24753371e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.137.217",driver="afsocket",id="main_input#0"} 2.30593026e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.244.122",driver="afsocket",id="main_input#0"} 3.8462941e+07
syslogng_input_event_bytes_total{transport="tcp",address="10.25.198.20",driver="afsocket",id="main_input#0"} 1.37774763e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.221.166",driver="afsocket",id="main_input#0"} 2.3355932e+07
syslogng_input_event_bytes_total{transport="tcp",address="10.25.211.244",driver="afsocket",id="main_input#0"} 8.7825187e+07
syslogng_input_event_bytes_total{transport="tcp",address="10.25.132.253",driver="afsocket",id="main_input#0"} 9.0062224e+07
syslogng_input_event_bytes_total{transport="tcp",address="10.25.179.6",driver="afsocket",id="main_input#0"} 259756
syslogng_input_event_bytes_total{transport="tcp",address="10.25.248.183",driver="afsocket",id="main_input#0"} 2.86153457e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.175.229",driver="afsocket",id="main_input#0"} 2.18298175e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.209.71",driver="afsocket",id="main_input#0"} 8.290648e+06
syslogng_input_event_bytes_total{transport="tcp",address="10.25.147.41",driver="afsocket",id="main_input#0"} 2.59650597e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.193.101",driver="afsocket",id="main_input#0"} 7.533695e+06
syslogng_input_event_bytes_total{transport="tcp",address="10.25.154.165",driver="afsocket",id="main_input#0"} 2.4203097e+07
syslogng_input_event_bytes_total{transport="tcp",address="10.25.227.148",driver="afsocket",id="main_input#0"} 2.10312243e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.231.22",driver="afsocket",id="main_input#0"} 3.82997e+06
syslogng_input_event_bytes_total{transport="tcp",address="10.25.223.239",driver="afsocket",id="main_input#0"} 2.53065505e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.211.183",driver="afsocket",id="main_input#0"} 1.06518665e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.132.123",driver="afsocket",id="main_input#0"} 2.61430632e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.215.228",driver="afsocket",id="main_input#0"} 1.58948831e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.240.129",driver="afsocket",id="main_input#0"} 4.39607e+06
syslogng_input_event_bytes_total{transport="tcp",address="10.25.244.189",driver="afsocket",id="main_input#0"} 2.08212488e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.218.139",driver="afsocket",id="main_input#0"} 1.07616052e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.225.251",driver="afsocket",id="main_input#0"} 3.8988e+06
syslogng_input_event_bytes_total{transport="tcp",address="10.25.156.158",driver="afsocket",id="main_input#0"} 1.63028137e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.142.244",driver="afsocket",id="main_input#0"} 406387
syslogng_input_event_bytes_total{transport="tcp",address="10.25.181.212",driver="afsocket",id="main_input#0"} 3.873191e+06
syslogng_input_event_bytes_total{transport="tcp",address="10.25.206.137",driver="afsocket",id="main_input#0"} 2.10358592e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.140.192",driver="afsocket",id="main_input#0"} 2.02512537e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.235.178",driver="afsocket",id="main_input#0"} 3.05600015e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.177.237",driver="afsocket",id="main_input#0"} 8.9469598e+07
syslogng_input_event_bytes_total{transport="tcp",address="10.25.195.46",driver="afsocket",id="main_input#0"} 2.86378511e+08
syslogng_input_event_bytes_total{transport="tcp",address="10.25.194.132",driver="afsocket",id="main_input#0"} 6.8562327e+07
# TYPE syslogng_memory_queue_capacity gauge
syslogng_memory_queue_capacity{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="0"} 10000
syslogng_memory_queue_capacity{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="1"} 10000
syslogng_memory_queue_capacity{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="2"} 10000
syslogng_memory_queue_capacity{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="3"} 10000
syslogng_memory_queue_capacity{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="4"} 10000
# TYPE syslogng_scratch_buffers_count gauge
syslogng_scratch_buffers_count 221
# TYPE syslogng_output_grpc_requests_total gauge
syslogng_output_grpc_requests_total{driver="loki",url="loki-write.loki.svc.cluster.local:9095",response_code="ok"} 483179
# TYPE syslogng_input_events_total counter
syslogng_input_events_total{transport="tcp",address="10.25.199.27",driver="afsocket",id="main_input#0"} 160531
syslogng_input_events_total{transport="tcp",address="10.25.140.158",driver="afsocket",id="main_input#0"} 113072
syslogng_input_events_total{transport="tcp",address="10.25.181.212",driver="afsocket",id="main_input#0"} 2052
syslogng_input_events_total{transport="tcp",address="10.25.209.71",driver="afsocket",id="main_input#0"} 4299
syslogng_input_events_total{transport="tcp",address="10.25.177.237",driver="afsocket",id="main_input#0"} 57026
syslogng_input_events_total{transport="tcp",address="10.25.198.20",driver="afsocket",id="main_input#0"} 87226
syslogng_input_events_total{transport="tcp",address="10.25.231.22",driver="afsocket",id="main_input#0"} 2062
syslogng_input_events_total{transport="tcp",address="10.25.137.217",driver="afsocket",id="main_input#0"} 145935
syslogng_input_events_total{transport="tcp",address="10.25.211.244",driver="afsocket",id="main_input#0"} 55324
syslogng_input_events_total{transport="tcp",address="10.25.253.173",driver="afsocket",id="main_input#0"} 80527
syslogng_input_events_total{transport="tcp",address="10.25.132.253",driver="afsocket",id="main_input#0"} 57523
syslogng_input_events_total{transport="tcp",address="10.25.221.166",driver="afsocket",id="main_input#0"} 15593
syslogng_input_events_total{transport="tcp",address="10.25.196.236",driver="afsocket",id="main_input#0"} 140442
syslogng_input_events_total{transport="tcp",address="10.25.191.125",driver="afsocket",id="main_input#0"} 111494
syslogng_input_events_total{transport="tcp",address="10.25.147.41",driver="afsocket",id="main_input#0"} 165099
syslogng_input_events_total{transport="tcp",address="10.25.250.174",driver="afsocket",id="main_input#0"} 92782
syslogng_input_events_total{transport="tcp",address="10.25.199.228",driver="afsocket",id="main_input#0"} 45651
syslogng_input_events_total{transport="tcp",address="10.25.154.165",driver="afsocket",id="main_input#0"} 16449
syslogng_input_events_total{transport="tcp",address="10.25.227.148",driver="afsocket",id="main_input#0"} 132495
syslogng_input_events_total{transport="tcp",address="10.25.211.183",driver="afsocket",id="main_input#0"} 64253
syslogng_input_events_total{transport="tcp",address="10.25.213.3",driver="afsocket",id="main_input#0"} 192572
syslogng_input_events_total{transport="tcp",address="10.25.244.122",driver="afsocket",id="main_input#0"} 24031
syslogng_input_events_total{transport="tcp",address="10.25.223.239",driver="afsocket",id="main_input#0"} 159577
syslogng_input_events_total{transport="tcp",address="10.25.132.123",driver="afsocket",id="main_input#0"} 162904
syslogng_input_events_total{transport="tcp",address="10.25.248.183",driver="afsocket",id="main_input#0"} 178681
syslogng_input_events_total{transport="tcp",address="10.25.175.229",driver="afsocket",id="main_input#0"} 137041
syslogng_input_events_total{transport="tcp",address="10.25.240.129",driver="afsocket",id="main_input#0"} 2394
syslogng_input_events_total{transport="tcp",address="10.25.142.244",driver="afsocket",id="main_input#0"} 311
syslogng_input_events_total{transport="tcp",address="10.25.253.57",driver="afsocket",id="main_input#0"} 135969
syslogng_input_events_total{transport="tcp",address="10.25.215.228",driver="afsocket",id="main_input#0"} 100471
syslogng_input_events_total{transport="tcp",address="10.25.193.101",driver="afsocket",id="main_input#0"} 4608
syslogng_input_events_total{transport="tcp",address="10.25.179.6",driver="afsocket",id="main_input#0"} 214
syslogng_input_events_total{transport="tcp",address="10.25.225.251",driver="afsocket",id="main_input#0"} 2110
syslogng_input_events_total{transport="tcp",address="10.25.218.139",driver="afsocket",id="main_input#0"} 64261
syslogng_input_events_total{transport="tcp",address="10.25.140.192",driver="afsocket",id="main_input#0"} 127284
syslogng_input_events_total{transport="tcp",address="10.25.194.132",driver="afsocket",id="main_input#0"} 42039
syslogng_input_events_total{transport="tcp",address="10.25.234.31",driver="afsocket",id="main_input#0"} 96753
syslogng_input_events_total{transport="tcp",address="10.25.235.178",driver="afsocket",id="main_input#0"} 191039
syslogng_input_events_total{transport="tcp",address="10.25.156.158",driver="afsocket",id="main_input#0"} 103201
syslogng_input_events_total{transport="tcp",address="10.25.195.46",driver="afsocket",id="main_input#0"} 179306
syslogng_input_events_total{transport="tcp",address="10.25.203.214",driver="afsocket",id="main_input#0"} 118315
syslogng_input_events_total{transport="tcp",address="10.25.152.158",driver="afsocket",id="main_input#0"} 32909
syslogng_input_events_total{transport="tcp",address="10.25.142.56",driver="afsocket",id="main_input#0"} 140222
syslogng_input_events_total{transport="tcp",address="10.25.244.189",driver="afsocket",id="main_input#0"} 129311
syslogng_input_events_total{transport="tcp",address="10.25.150.153",driver="afsocket",id="main_input#0"} 44663
syslogng_input_events_total{transport="tcp",address="10.25.206.137",driver="afsocket",id="main_input#0"} 134284
# TYPE syslogng_socket_max_connections gauge
syslogng_socket_max_connections{id="main_input#0",driver="afsocket",transport="stream",address="AF_INET(0.0.0.0:601)",direction="input"} 490
# TYPE syslogng_socket_rejected_connections_total gauge
syslogng_socket_rejected_connections_total{id="main_input#0",driver="afsocket",transport="stream",address="AF_INET(0.0.0.0:601)",direction="input"} 0
# TYPE syslogng_output_event_bytes_total gauge
syslogng_output_event_bytes_total{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0"} 4.85795123e+09
# TYPE syslogng_output_events_total counter
syslogng_output_events_total{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",result="dropped"} 24836
syslogng_output_events_total{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",result="queued"} 0
syslogng_output_events_total{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",result="delivered"} 3.029769e+06
# TYPE syslogng_last_config_file_modification_timestamp_seconds gauge
syslogng_last_config_file_modification_timestamp_seconds 1.761809037e+09
# TYPE syslogng_output_unreachable gauge
syslogng_output_unreachable{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="0"} 0
syslogng_output_unreachable{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="1"} 0
syslogng_output_unreachable{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="2"} 0
syslogng_output_unreachable{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="3"} 0
syslogng_output_unreachable{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="4"} 0
# TYPE syslogng_output_event_delay_sample_age_seconds gauge
syslogng_output_event_delay_sample_age_seconds{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="0"} 0
syslogng_output_event_delay_sample_age_seconds{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="1"} 0
syslogng_output_event_delay_sample_age_seconds{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="2"} 0
syslogng_output_event_delay_sample_age_seconds{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="3"} 0
syslogng_output_event_delay_sample_age_seconds{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0",worker="4"} 0
# TYPE syslogng_mainloop_io_worker_roundtrip_latency_seconds gauge
syslogng_mainloop_io_worker_roundtrip_latency_seconds 0.000119815
# TYPE syslogng_last_config_reload_timestamp_seconds gauge
syslogng_last_config_reload_timestamp_seconds 1.761809039e+09
# TYPE syslogng_output_event_retries_total gauge
syslogng_output_event_retries_total{driver="loki",url="loki-write.loki.svc.cluster.local:9095",id="clusteroutput_logging_loki#0"} 0
# TYPE syslogng_io_worker_latency_seconds gauge
syslogng_io_worker_latency_seconds 9.5954e-05
# TYPE syslogng_events_allocated_bytes gauge
syslogng_events_allocated_bytes 0
# TYPE syslogng_last_successful_config_reload_timestamp_seconds gauge
syslogng_last_successful_config_reload_timestamp_seconds 1.761809039e+09
# TYPE syslogng_socket_connections gauge
syslogng_socket_connections{id="main_input#0",driver="afsocket",transport="stream",address="AF_INET(0.0.0.0:601)",direction="input"} 259

The metrics used in the managed PrometheusRule resource all start with syslog_ng whereas the actual metrics seem to start with syslogng. Likewise, even if we ignore the prefix, all referenced metrics in the managed PrometheusRule resource do not exist (anymore). Maybe this is due to a newer metrics exporter changing the metrics?

Additional context:
Add any other context about the problem here.

Environment details:

  • Kubernetes version (e.g. v1.15.2): 1.33
  • Cloud-provider/provisioner (e.g. AKS, GKE, EKS, PKE etc): bare metal
  • logging-operator version (e.g. 2.1.1): 6.1.0
  • Install method (e.g. helm or static manifests): helm
  • Logs from the misbehaving component (and any other relevant logs):
  • Resource definition (possibly in YAML format) that caused the issue, without sensitive data:

/kind bug

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions