Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Consumer offset lag metrics #609

Closed
pari205 opened this issue Jun 8, 2021 · 9 comments
Closed

Kafka Consumer offset lag metrics #609

pari205 opened this issue Jun 8, 2021 · 9 comments

Comments

@pari205
Copy link

pari205 commented Jun 8, 2021

Hi All,

consumer offset lag metrics through jmx exporter is not working, I am having the following in the config files. however it doesn't fetch the details required.

  • pattern: kafka.consumer<type=(.+), client-id=(.+)><>(records-lag-max)
    name: kafka_$1_$3
    labels:
    client-id: $2

Version details:
Kafka version: kafka_2.13-2.7.1
jmx exporter: https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.15.0/jmx_prometheus_javaagent-0.15.0.jar

Please let me know if any other details required.

Thanks

@fstab
Copy link
Member

fstab commented Jun 9, 2021

I found a similar example in kafka-connect.yml:

#kafka.producer:type=producer-topic-metrics,client-id="{clientid}",topic="{topic}"", partition="{partition}"
#kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}",topic="{topic}"", partition="{partition}"
- pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), topic=(.+), partition=(.+)><>(.+-total|compression-rate|.+-avg|.+-replica|.+-lag|.+-lead)
name: kafka_$2_$6
labels:
clientId: "$3"
topic: "$4"
partition: "$5"
help: "Kafka $1 JMX metric type $2"
type: GAUGE

Maybe you can start with this and adapt it? If that doesn't work, please let us know how exactly the JMX bean and attributes are named, for example by attaching jconsole to the process and taking a screenshot of the MBean.

@conradkleinespel
Copy link

@fstab This seems correct, thanks ! 🙏

However, for some reason, I'm seeing this kind of output from jmx-exporter. We can see duplicate metrics , one with a value, the other with NaN.

kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} NaN

I can probably ignore the NaN in promql, however I wonder if this is expected behavior. What do you think?

@conradkleinespel
Copy link

If I set my configuration like this to fetch only a single attribute, records-lag for a specific topic and partition:

rules:
  - pattern: kafka.consumer<type=consumer-fetch-manager-metrics, client-id=(.+), topic=(.+), partition=(.+)><>records-lag
    name: kafka_connect_consumer_fetch_records_lag
    labels:
      clientId: "$1"
      topic: "$2"
      partition: "$3"
    help: "Kafka Connect JMX metric type consumer-fetch-manager"
    type: GAUGE

Then, initially, I see the metric 3 times in jmx-exporter:

kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0

After a few minutes of waiting, I see the same metrics but 2 values are NaN:

kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} NaN
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} NaN

@tadam313
Copy link

I have exactly the same issue using this configuration. @conradkleinespel could you resolve it?

@conradkleinespel
Copy link

@tadam313 Unfortunately no

@superfav
Copy link

superfav commented Feb 9, 2022

This configuration works as expected:

  - pattern: kafka.consumer<type=consumer-fetch-manager-metrics, client-id=(.+), topic=(.+), partition=(.+)><>(records-lag[a-zA-Z-]+|records-lag)
    name: kafka_connect_consumer_fetch_$4
    labels:
      clientId: "$1"
      topic: "$2"
      partition: "$3"
    help: "Kafka Connect JMX metric type consumer-fetch-manager"
    type: GAUGE
# HELP kafka_connect_consumer_fetch_records_lag Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag gauge
kafka_connect_consumer_fetch_records_lag{clientId="foo",partition="0",topic="bar",} 0.0
# HELP kafka_connect_consumer_fetch_records_lag_avg Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_avg gauge
kafka_connect_consumer_fetch_records_lag_avg{clientId="foo",partition="0",topic="bar",} NaN
# HELP kafka_connect_consumer_fetch_records_lag_max Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_max gauge
kafka_connect_consumer_fetch_records_lag_max{clientId="foo",partition="0",topic="bar",} NaN

The order is important, it looks like records-lag needs to be at the end. I couldn't make records-lag$ work.
This one creates a mess:

  - pattern: kafka.consumer<type=consumer-fetch-manager-metrics, client-id=(.+), topic=(.+), partition=(.+)><>(records-lag.*)
# HELP kafka_connect_consumer_fetch_records_lag_avg:_205_4 Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_avg:_205_4 gauge
kafka_connect_consumer_fetch_records_lag_avg:_205_4{clientId="foo",partition="0",topic="bar",} 205.4
# HELP kafka_connect_consumer_fetch_records_lag_max:_479_0 Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_max:_479_0 gauge
kafka_connect_consumer_fetch_records_lag_max:_479_0{clientId="foo",partition="0",topic="bar",} 479.0

I hope this helps.

@conradkleinespel
Copy link

@superfav Thanks for your help, this does fix the issue on my side too! I had a quick look at the JMX exporter docs, it says the pattern is not anchored: from what I understand, it means ^ and $ are not supported in patterns.

superfav added a commit to superfav/jmx_exporter that referenced this issue Feb 14, 2022
Including `records-lag-avg` and `records-lag-max` explicitly avoids `kafka_connect_consumer_fetch_records_lag` mixing its values as seen here:
```
kafka_connect_consumer_fetch_records_lag{clientId="foo",partition="0",topic="bar",} 0.0
kafka_connect_consumer_fetch_records_lag{clientId="foo",partition="0",topic="bar",} NaN
kafka_connect_consumer_fetch_records_lag{clientId="foo",partition="0",topic="bar",} NaN
```
After applying this change:
```
# HELP kafka_connect_consumer_fetch_records_lag Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag gauge
kafka_connect_consumer_fetch_records_lag{clientId="foo",partition="0",topic="bar",} 0.0
# HELP kafka_connect_consumer_fetch_records_lag_avg Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_avg gauge
kafka_connect_consumer_fetch_records_lag_avg{clientId="foo",partition="0",topic="bar",} NaN
# HELP kafka_connect_consumer_fetch_records_lag_max Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_max gauge
kafka_connect_consumer_fetch_records_lag_max{clientId="foo",partition="0",topic="bar",} NaN
```
@dhoard
Copy link
Collaborator

dhoard commented Jun 24, 2023

Closing as resolved.

@JoelRodrigues58
Copy link

I have instantiated jmx-exporter-prometheus containers in many Kafka services. Some metrics are being exported in duplicate with one of the values being NaN.

For example in ksql service:

curl http://[TARGET_IP]:5556/metrics | grep kafka_consumer_consumer_fetch_manager_metrics_records_lag

kafka_consumer_consumer_fetch_manager_metrics_records_lag{clientId="client-12",partition="0",topic="topic-123",} NaN
kafka_consumer_consumer_fetch_manager_metrics_records_lag{clientId="client-12",partition="0",topic="topic-123",} 15.2


My jmx-exporter has the following configuration:

jmxUrl: service:jmx:rmi:///jndi/rmi://localhost:5555/jmxrmi
lowercaseOutputName: true
rules:
  # kafka.streams:type=stream-thread-metrics,thread-id="{threadId}"
  - pattern: 'kafka.streams<type=stream-thread-metrics, thread-id=(.+)><>(.+-total|.+-rate|.+-avg)'
    name: kafka_streams_stream_thread_metrics_$2
    labels:
      threadId: "$1"
    help: "Kafka Streams JMX metric $2"
    type: GAUGE

  # kafka.streams:type=stream-task-metrics,thread-id="{threadId}",task-id="{taskId}"
  - pattern: 'kafka.streams<type=stream-task-metrics, thread-id=(.+), task-id=(.+)><>(.+-total|.+-rate|.+-ratio|.+-avg)'
    name: kafka_streams_stream_task_metrics_$3
    labels:
      threadId: "$1"
      taskId: "$2"
    help: "Kafka Streams JMX metric $3"
    type: GAUGE

  #kafka.producer:type=producer-topic-metrics,client-id="{clientid}",topic="{topic}",partition="{partition}"
  #kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}",topic="{topic}",partition="{partition}"
  - pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), topic=(.+), partition=(.+)><>(.+-total|.+-rate|.+-avg|.+-lag)
    name: kafka_$1_$2_metrics_$6
    labels:
      clientId: "$3"
      topic: "$4"
      partition: "$5"
    help: "Kafka $1 JMX metric type $2"
    type: GAUGE

  #kafka.producer:type=producer-topic-metrics,client-id="{clientid}",topic="{topic}"
  #kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}",topic="{topic}""
  - pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), topic=(.+)><>(.+-total|.+-rate|.+-avg)
    name: kafka_$1_$2_metrics_$5
    labels:
      clientId: "$3"
      topic: "$4"
    help: "Kafka $1 JMX metric type $2"
    type: GAUGE

  #kafka.streams:type=streams-node-metrics,client-id="{clientid}",node-id="{nodeid}"
  #kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id="{nodeid}"
  - pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), node-id=(.+)><>(.+-total|.+-avg)
    name: kafka_$1_$2_metrics_$5
    labels:
      clientId: "$3"
      nodeId: "$4"
    help: "Kafka $1 JMX metric type $2"
    type: UNTYPED

  #kafka.streams:type=kafka-metrics-theads,client-id="{clientid}"
  #kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}"
  #kafka.consumer:type=consumer-coordinator-metrics,client-id="{clientid}"
  #kafka.consumer:type=consumer-metrics,client-id="{clientid}"
  #kafka.producer:type=producer-metrics,client-id="{clientid}"
  - pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.*)><>(.+-total|.+-avg|.+-bytes|.+-count|.+-ratio|.+-rate|.+-age|.+-flight|.+-threads|.+-connectors|.+-tasks|.+-ago)
    name: kafka_$1_$2_metrics_$4
    labels:
      clientId: "$3"
    help: "Kafka $1 JMX metric type $2"
    type: GAUGE

  #io.confluent.ksql.metrics:type=ksql-engine-query-stats:{serviceId}
  - pattern: 'io.confluent.ksql.metrics<type=ksql-engine-query-stats><>(_confluent-ksql-default_)(.+-total|.+-rate|.+-avg|.+-per-sec|num.+queries)'
    name: ksql_metrics_ksql_engine_query_stats_$2
    labels:
      serviceId: "$1"
    help: "ksql JMX metric $2"
    type: GAUGE
  - pattern: 'io.confluent.ksql.metrics<type=ksql-engine-query-stats><>(_confluent-ksql-default_ksql-engine-query-stats-)(.+-queries)'
    name: ksql_metrics_ksql_engine_query_stats_$2
    labels:
      serviceId: "$1"
    help: "ksql JMX metric $2"
    type: GAUGE

  #io.confluent.ksql.metrics:id={id},key={key},type=producer-metrics
  #io.confluent.ksql.metrics:id={id},key={key},type=consumer-metrics
  - pattern: 'io.confluent.ksql.metrics<id=(.*),key=(.*),type=(.+)-metrics><>(.+-per-sec|.+-messages|.+-bytes)'
    name: ksql_metrics_$3_metrics_$4
    labels:
      id: "$1"
      key: "$2"
    help: "ksql JMX m

Can help, pls?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants