Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL configuration does not enable metrics being exposed through HTTPS #688

Closed
csrrmrvll opened this issue Feb 25, 2022 · 34 comments
Closed

Comments

@csrrmrvll
Copy link

csrrmrvll commented Feb 25, 2022

I am trying to expose metrics via HTTPS or SSL on TCP and being scraped from a central Prometheus as there are firewall rules in place to block non-cyphered connections in our network. I have configured jmx_exporter as documented, but I can only access metrics over HTTP. This is the config:

version

jmx_prometheus_javaagent-0.16.1


config.yaml:

ssl: true
lowercaseOutputLabelNames: false
lowercaseOutputName: false
startDelaySeconds: 0


JAVA_OPTS

-javaagent:/opt/jmx-exporter/jmx_prometheus_javaagent-0.16.1=10001:/etc/jmx-exporter/jmx_prometheus_javaagent.yaml -Djavax.net.ssl.keyStore=/etc/jmx-exporter/keystore -Djavax.net.ssl.keyStorePassword=changeit -Djavax.net.ssl.trustStore=/etc/jmx-exporter/truststore -Djavax.net.ssl.trustStorePassword=changeit


What am I missing?

@dhoard
Copy link
Collaborator

dhoard commented Feb 25, 2022

jmx_exporter currently does not support HTTPS for exposing metrics.

@csrrmrvll
Copy link
Author

Thanks @dhoard.
Do you know if there are plans to provide this support in the short term?

@dhoard
Copy link
Collaborator

dhoard commented Mar 1, 2022

@csrrmrvll All the underlying pieces are in place. @fstab is working on configuration changes that would allow jmx_exporter to support both SSL/TLS termination and basic authentication.

@csrrmrvll
Copy link
Author

Great news, I will be waiting for this feature to be released. Thanks a lot

@srinirama
Copy link

@fstab - is the SSL configuration to allow jmx_exporter to support both SSL/TLS termination and basic authentication being scoped ? We are keeping fingers crossed for this feature

@palashbhowmick
Copy link

We are also facing the same problem. We can't use JMX Exporter because it doesn't support SSL yet.
Any ETA? @dhoard @fstab
Any work around?

@fstab
Copy link
Member

fstab commented Jun 19, 2022

Here's a status update:
✔️ first implementation of a new config file format is done, it's in the new-config branch. This format will allow to configure SSL for scraping via JMX as well as for exporting via HTTPS.
✔️ first integration test for scraping with SSL via JMX is implemented and pushed to the new-config branch.
🔲 The new config parser should be backwards compatible, we need tests to make sure current config files still work.
🔲 We need more integration tests for different SSL scenarios.

Here's a bit of brainstorming what tests are missing:

// General
// * configuration via config.yaml vs. configuration via command line parameters
// * all tests with and without client auth
// * test passwordfile and accessfile
//
// JMX exporter HTTP Server:
// * scrape JMX using SSL but export using plain HTTP
// * scrape JMX in plain text but export metrics via HTTPS
// * use SSL for both, but different certificates for scraping and exporting
//
// Java agent
// * export metrics with SSL using the default keystore from the application
// * export metrics with SSL but use a dedicated keystore for the agent
// * attach the agent to an application that uses SSL and make sure the agent's SSL does not interfere with the application's SSL

At this point coding is not the bottleneck, but researching how to configure different SSL scenarios can be surprisingly time consuming. It took me several hours to get the initial test running because I was fighting a "Remote host terminated the handshake" Exception.

If anyone wants to help out: Please set up some of the scenarios above using the JAR files from the new-config branch and let me know:

  • The commands to create the keystore and truststore files for your scenario
  • The command to start the example application
  • If the test is about jmx_prometheus_httpserver: The command to start the HTTP server
  • The command to run the client that scrapes metrics (when testing client authentication)
  • The config file for the jmx_exporter.

@dassan1505
Copy link

dassan1505 commented Jul 8, 2022

Ran JMX exporter as Java agent, as mentioned in readme with 'new-config' branch code
java -javaagent:./jmx_prometheus_javaagent-0.17.0.jar=12345:config.yaml -jar yourJar.jar

with config:

ssl:true

Getting error

Caused by: io.prometheus.jmx.Config$ConfigException: Configuration error in collector: Cannot set ssl=true without specifying hostPort or jmxUrl
at io.prometheus.jmx.Config.loadCollectorConfig(Config.java:388)
at io.prometheus.jmx.Config.load(Config.java:357)
at io.prometheus.jmx.DynamicConfigReloader.(DynamicConfigReloader.java:34)
at io.prometheus.jmx.JmxCollector.(JmxCollector.java:49)
at io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:29)
... 6 more
FATAL ERROR in native method: processing of -javaagent failed
Aborted

Tried with below config as well, getting err

with config:

ssl:true
hostPort: 10.0.5.218:9991

Error:

Jul 08, 2022 8:58:15 AM io.prometheus.jmx.JmxCollector exitOnConfigError
SEVERE: Configuration error: When running jmx_exporter as a Java agent, you must not configure 'jmxUrl' or 'hostPort' because you don't want to monitor a remote JVM.

what i am missing? i am trying to export metrics as ssl enabled.
config file structure is changed for the new branch?

@dhoard
Copy link
Collaborator

dhoard commented Sep 27, 2022

@fstab / anyone working on the tests for jmx_prometheus_httpserver here is some initial configuration information.

Keytool / keystore instructions

Java < 11 requires a JKS keystore. For Java 11+, a JKS or PKCS12 keystore can be used (PKCS12 is the recommendation for Oracle Java 11+)

Create a self-signed JKS keystore, validity = 3652 days (10 years), password = changeit, certificate alias = jmx

keytool -genkey -keyalg RSA -alias jmx -keystore keystore.jks -storepass changeit -validity 360 -keysize 2048 -dname "CN=jmx, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown"

Convert a JKS keystore to PKCS12 keystore (recommended for Java 11+)

keytool -importkeystore -srckeystore keystore.jks -destkeystore keystore.pkcs12 -srcstoretype JKS -deststoretype PKCS12 -srcstorepass changeit -deststorepass changeit

Test application command line configuration for full JMX security (SSL, authentication, authorization)

Create a JMX access file jmxremote.access

<username> <permissions>

Example content:

jmx readonly

Create a JMX password file jmxremote.password

<username> <clear-text password>

Example content:

jmx secret

Note:

Both jmxremote.password and jmxremote.access require permissions 600


Start the test application with full JMX security (SSL, authentication, authorization)

java \
 -Dcom.sun.management.jmxremote=true \
 -Dcom.sun.management.jmxremote.access.file=jmxremote.access \
 -Dcom.sun.management.jmxremote.authenticate=true \
 -Dcom.sun.management.jmxremote.local.only=false \
 -Dcom.sun.management.jmxremote.password.file=jmxremote.password \
 -Dcom.sun.management.jmxremote.port=1234 \
 -Dcom.sun.management.jmxremote.registry.ssl=true \
 -Dcom.sun.management.jmxremote.rmi.port=1234 \
 -Dcom.sun.management.jmxremote.ssl.need.client.auth=true \
 -Dcom.sun.management.jmxremote.ssl=true \
 -Djavax.net.ssl.keyStore=keystore.jks \
 -Djavax.net.ssl.keyStorePassword=changeit \
 -Djavax.net.ssl.trustStore=keystore.jks \
 -Djavax.net.ssl.trustStorePassword=changeit \
 -jar test-application.jar

Start the jmx_prometheus_httpserver on port 12345

java \
 -Djavax.net.ssl.keyStore=localhost.jks \
 -Djavax.net.ssl.keyStorePassword=changeit \
 -Djavax.net.ssl.trustStore=localhost.jks \
 -Djavax.net.ssl.trustStorePassword=changeit \
 -jar jmx_prometheus_httpserver-0.17.2.jar \
 12345 exporter.yml

Note:

The values above and exporter.yml values need to match.

In the examples above, I have attempted to match these with collector/src/test/resources/test-config-new.yaml, but there may be errors.

Changing the test application arguments allows you to disable JMX SSL, JMX authentication.

@dhoard
Copy link
Collaborator

dhoard commented Oct 1, 2022

@dassan1505 ...

ssl:true
hostPort: 10.0.5.218:9991

... appears to be the old configuration format used to configure JMX SSL. When running as a Java agent, you don't need them.

@dhoard
Copy link
Collaborator

dhoard commented Jan 8, 2023

@csrrmrvll @srinirama @palashbhowmick I have been working on the initial code to support HTTPS and HTTP Basic authentication.

A potential pre-release version of the code can be found at...

https://github.com/dhoard/jmx_exporter/releases/tag/jmx_exporter_enhanced-snapshot.

Due to the vast number of changes, it will take time to be merged and released.

@dhoard
Copy link
Collaborator

dhoard commented Jan 26, 2023

Attached is a zip of potential pre-release version based on jmx_exporter 0.17.3-SNAPSHOT

(REMOVED - proper basic authentication and SSL support is in progress)

Additional configuration (optional)

httpServer:
  authentication:
    enabled: true
    algorithm: Basic
    username: prometheus
    password: secret
  ssl:
    enabled: true
    certificateAlias: localhost
  threads:
    minimum: 2
    maximum: 6
    keepAlive: 600000 // milliseconds

@euthuppan
Copy link

@dhoard appreciate your work! any ideas when it'll be merged? Just eagerly waiting to make use of tls and basic auth :)

@dhoard
Copy link
Collaborator

dhoard commented Feb 15, 2023

@euthuppan

The feature branch has SIGNIFICANT project/code changes to review.

I'm not a maintainer so can't speak to when these features would be released.

I would suggest forking my repository/branch, build it, and test it out in your environment.

Early feedback for such significant changes is much appreciated!

I implemented the features along with an extensive integration test suite (~19k total test method executions over 43 Docker containers/Java versions)... because I required the functionality.

@euthuppan
Copy link

euthuppan commented Feb 20, 2023

@dhoard I just got around to messing with the jar you linked from a month ago. (the 17.3-SNAPSHOT.jar).

For context, I'm using the jmx exporter as a java agent for exporting JMX metrics from Kafka. This is how it's normally setup for us. We export this environment variable for when Kafka starts to run this as a java agent.

KAFKA_OPTS=-javaagent:/kfk/libs/jmx_prometheus_javaagent-0.17.2.jar=9002:/kfk/config/config.yaml

Our config.yaml file is usually just empty and works fine, exposing metrics on that port.

However, I just replaced it with yours like so

KAFKA_OPTS=-javaagent:/kfk/libs/jmx_prometheus_javaagent-0.17.3-SNAPSHOT.jar=9002:/kfk/config/config.yaml

And Kafka refuses to start. This is the error I was able to capture regarding this.

2023-02-20T17:58:58.949946709Z 2023-02-20 17:58:58.941 [main] INFO  io.prometheus.jmx.JavaAgent - Starting...
2023-02-20T17:58:58.969009229Z java.lang.NoSuchMethodError: 'void io.prometheus.jmx.exporter.Exporter.<init>(io.prometheus.jmx.common.logger.Logger, io.prometheus.jmx.exporter.Exporter$Mode, io.prometheus.jmx.shaded.io.prometheus.client.CollectorRegistry, java.lang.String, int, java.io.File)'
2023-02-20T17:58:58.969593208Z  at io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:44)
2023-02-20T17:58:58.969638582Z  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2023-02-20T17:58:58.969726253Z  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
2023-02-20T17:58:58.969822380Z  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2023-02-20T17:58:58.969892160Z  at java.base/java.lang.reflect.Method.invoke(Method.java:568)
2023-02-20T17:58:58.970013075Z  at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:491)
2023-02-20T17:58:58.970091093Z  at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:503)

This happens whether i have something in my config.yaml, or if I try to enable basicAuth.

The strange thing is, when i run it standalone like so

java -javaagent:./jmx_prometheus_javaagent-0.17.3-SNAPSHOT.jar=9002:/kfk/config/config.yaml

It runs just fine. And even basic auth seems to work great! Just doesn't capture our Kafka JMX metrics unfortunately. so i guess something weird happens when trying to run the javaagent with that jar through a KAFKA_OPTS env variable. Does that error give you any idea what it could be?

Java Version we're running:

$ java --version
openjdk 17.0.6 2023-01-17 LTS
OpenJDK Runtime Environment Zulu17.40+19-CA (build 17.0.6+10-LTS)
OpenJDK 64-Bit Server VM Zulu17.40+19-CA (build 17.0.6+10-LTS, mixed mode, sharing)

@euthuppan
Copy link

euthuppan commented Feb 20, 2023

Actually nvm @dhoard , i am able to get past that error after building with your latest code and Kafka now runs uninterrupted without that error.

Unfortunately now grabbing those metrics on that port takes a much longer time, with or without basic auth mentioned in the config.yaml. When I do have basic auth in the config, it does eventually load, but never actually prompts me for a username nor password.

Config.yaml

httpServer:
  authentication:
    enabled: true
    algorithm: Basic
    username: 
    password: 
  ssl:
    enabled: false

@dhoard
Copy link
Collaborator

dhoard commented Feb 20, 2023

@euthuppan thanks for testing!

Can you share your Kafka version and exact exporter configuration YAML?

Typically, in Kafka monitoring scenarios, exporter configuration key startDelaySeconds is required. I suspect this is related to slow initial metrics collection.

Regarding BASIC auth, I'll review the code. All integration tests pass as expected. Does the agent output show that authentication is enabled?

@euthuppan
Copy link

euthuppan commented Feb 20, 2023

No problem @dhoard , thank you for having the initiative to make these features supported! And sure, I'm running Kafka version 3.2.0.

config.yaml i'm using:

httpServer:
  authentication:
    enabled: true
    algorithm: Basic
    username: promtheus
    password: random_pass
  ssl:
    enabled: false

I tried checking for agent output in our logs but I guess either they either aren't enabled or I'm just not sure where to check for them. I looked in our existing logs and don't see any related to the agent unfortunately. 😕

@dhoard
Copy link
Collaborator

dhoard commented Feb 23, 2023

@euthuppan if you are running the application via systemd, the systemd journal should contain some standard information...

2023-02-23 14:06:26.177 [main] INFO  io.prometheus.jmx.JavaAgent - Creating HTTP server...
2023-02-23 14:06:26.177 [main] INFO  io.prometheus.jmx.JavaAgent - Address [0.0.0.0]
2023-02-23 14:06:26.177 [main] INFO  io.prometheus.jmx.JavaAgent - Port [8888]
2023-02-23 14:06:26.179 [main] INFO  io.prometheus.jmx.JavaAgent - Minimum threads [1]
2023-02-23 14:06:26.179 [main] INFO  io.prometheus.jmx.JavaAgent - Maximum threads [5]
2023-02-23 14:06:26.179 [main] INFO  io.prometheus.jmx.JavaAgent - Thread keep alive [0] ms
2023-02-23 14:06:26.180 [main] INFO  io.prometheus.jmx.JavaAgent - Configuring HTTP server Basic authentication (Basic)...
2023-02-23 14:06:26.180 [main] INFO  io.prometheus.jmx.JavaAgent - Basic authentication username [prometheus]
2023-02-23 14:06:26.180 [main] INFO  io.prometheus.jmx.JavaAgent - Basic authentication password [*] (masked)
2023-02-23 14:06:26.180 [main] INFO  io.prometheus.jmx.JavaAgent - Configuring HTTP server SSL...
2023-02-23 14:06:26.180 [main] INFO  io.prometheus.jmx.JavaAgent - SSL certificate alias [localhost]
2023-02-23 14:06:26.181 [main] INFO  io.prometheus.jmx.JavaAgent - HTTP server starting...
2023-02-23 14:06:26.192 [main] INFO  io.prometheus.jmx.JavaAgent - HTTP server running
2023-02-23 14:06:26.193 [main] INFO  io.prometheus.jmx.JavaAgent - Running

I wrote some more integration tests, which appear to work correctly.

Given

startDelaySeconds: 5
httpServer:
  authentication:
    enabled: true
    algorithm: Basic
    username: prometheus
    password: secret
rules:
  - pattern: ".*"

If the endpoint is called without authentication/invalid authentication credentials, then an HTTP 401 is returned (regardless of startDelaySeconds)

If the endpoint is called with valid authentication credentials and you are within the startDelaySeconds period, an empty response is returned (no HTTP response headers, output, etc...)

If the endpoint is called with valid authentication credentials and you are past the startDelaySeconds period, metrics are returned.

My time has been constrained but will try to test Kafka specifically as soon as possible.

@dhoard
Copy link
Collaborator

dhoard commented Feb 24, 2023

@euthuppan I performed actual testing against a Kafka 3.2.x broker and it's working as expected (described above.)

Configuration YAML:

https://github.com/confluentinc/jmx-monitoring-stacks/blob/7.2-post/shared-assets/jmx-exporter/kafka_broker.yml

Testing time via curl...

Official 0.17.2 release:

     time_namelookup:  0.004298s
        time_connect:  0.004690s
     time_appconnect:  0.000000s
    time_pretransfer:  0.004768s
       time_redirect:  0.000000s
  time_starttransfer:  26.947832s
                     ----------
          time_total:  26.968411s

Build from my branch (SSL and BASIC authentication disabled):

     time_namelookup:  0.001845s
        time_connect:  0.002324s
     time_appconnect:  0.000000s
    time_pretransfer:  0.002401s
       time_redirect:  0.000000s
  time_starttransfer:  26.271366s
                     ----------
          time_total:  26.294840s

Build from my branch (with SSL and BASIC authentication enabled):

     time_namelookup:  0.004212s
        time_connect:  0.004593s
     time_appconnect:  0.058974s
    time_pretransfer:  0.059085s
       time_redirect:  0.000000s
  time_starttransfer:  27.566052s
                     ----------
          time_total:  27.618721s

@euthuppan
Copy link

euthuppan commented Feb 24, 2023

@dhoard thanks a lot for setting up the test environment to test Kafka!

I made my config.yaml now look just like this:

startDelaySeconds: 5
httpServer:
  authentication:
    enabled: true
    algorithm: Basic
    username: prometheus
    password: secret
rules:
  - pattern: ".*"

And i still get similar behavior on my side. Are you setting up the JMX exporter by adding the javaagent and jar as an environment variable for the Kafka application to make use of like this?

KAFKA_OPTS=-javaagent:/kfk/libs/jmx_prometheus_javaagent-0.17.3-SNAPSHOT.jar=9002:/kfk/config/config.yaml

If not, how are you setting it up to scrape Kafka metrics? If there's an alternative that seems to be working for you, I'd be down to try. Still when I use this jar and specify basic auth in the config.yaml as shown above, after I hit the endpoint it does not prompt be to provide basic auth. Instead after about a minute it gives me a 200 OK and spits out the JMX metrics without providing credentials.

$ curl -k -i -v kafkahostname:9002/ > output.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying hostip:9002...
* Connected to kafkahostname (hostip) port 9002 (#0)
> GET / HTTP/1.1
> Host: kafkahostname:9002
> User-Agent: curl/7.86.0
> Accept: */*
>
  0     0    0     0    0     0      0      0 --:--:--  0:01:12 --:--:--     0* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Fri, 24 Feb 2023 15:33:31 GMT
< Content-type: text/plain; version=0.0.4; charset=utf-8
< Content-length: 2239858
<
{ [1268 bytes data]
100 2187k  100 2187k    0     0  30452      0  0:01:13  0:01:13 --:--:--  536k

I'm guessing if you pass in the -javaagent flag when starting kafka, you might see similar behavior to me, unless there's something dumb i could be overlooking?

@euthuppan
Copy link

euthuppan commented Feb 24, 2023

Here's some extra info, when i check the process running the kafka application, these are all the args i pass in. Anything here i might be missing?

$ ps aux | grep java
mc-svc       1  233  6.2 30681388 1018568 ?    Ssl  16:07   1:00 java -Xmx6g -Xms6g -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xlog:gc*:file=/data/logs/gc.log:time,tags:filecount=5,filesize=5M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=9998 -Dkafka.logs.dir=/data/logs -Dlog4j.configuration=file:/kfk/config/log4j.properties -cp /kfk/bin/../libs/activation-1.1.1.jar:/kfk/bin/../libs/aopalliance-repackaged-2.6.1.jar:/kfk/bin/../libs/argparse4j-0.7.0.jar:/kfk/bin/../libs/audience-annotations-0.5.0.jar:/kfk/bin/../libs/commons-cli-1.4.jar:/kfk/bin/../libs/commons-lang3-3.8.1.jar:/kfk/bin/../libs/connect-api-3.2.0.jar:/kfk/bin/../libs/connect-basic-auth-extension-3.2.0.jar:/kfk/bin/../libs/connect-json-3.2.0.jar:/kfk/bin/../libs/connect-mirror-3.2.0.jar:/kfk/bin/../libs/connect-mirror-client-3.2.0.jar:/kfk/bin/../libs/connect-runtime-3.2.0.jar:/kfk/bin/../libs/connect-transforms-3.2.0.jar:/kfk/bin/../libs/cruise-control-metrics-reporter-2.5.92.jar:/kfk/bin/../libs/hk2-api-2.6.1.jar:/kfk/bin/../libs/hk2-locator-2.6.1.jar:/kfk/bin/../libs/hk2-utils-2.6.1.jar:/kfk/bin/../libs/jackson-annotations-2.12.6.jar:/kfk/bin/../libs/jackson-core-2.12.6.jar:/kfk/bin/../libs/jackson-databind-2.12.6.1.jar:/kfk/bin/../libs/jackson-dataformat-csv-2.12.6.jar:/kfk/bin/../libs/jackson-datatype-jdk8-2.12.6.jar:/kfk/bin/../libs/jackson-jaxrs-base-2.12.6.jar:/kfk/bin/../libs/jackson-jaxrs-json-provider-2.12.6.jar:/kfk/bin/../libs/jackson-module-jaxb-annotations-2.12.6.jar:/kfk/bin/../libs/jackson-module-scala_2.13-2.12.6.jar:/kfk/bin/../libs/jakarta.activation-api-1.2.1.jar:/kfk/bin/../libs/jakarta.annotation-api-1.3.5.jar:/kfk/bin/../libs/jakarta.inject-2.6.1.jar:/kfk/bin/../libs/jakarta.validation-api-2.0.2.jar:/kfk/bin/../libs/jakarta.ws.rs-api-2.1.6.jar:/kfk/bin/../libs/jakarta.xml.bind-api-2.3.2.jar:/kfk/bin/../libs/javassist-3.27.0-GA.jar:/kfk/bin/../libs/javax.servlet-api-3.1.0.jar:/kfk/bin/../libs/javax.ws.rs-api-2.1.1.jar:/kfk/bin/../libs/jaxb-api-2.3.0.jar:/kfk/bin/../libs/jersey-client-2.34.jar:/kfk/bin/../libs/jersey-common-2.34.jar:/kfk/bin/../libs/jersey-container-servlet-2.34.jar:/kfk/bin/../libs/jersey-container-servlet-core-2.34.jar:/kfk/bin/../libs/jersey-hk2-2.34.jar:/kfk/bin/../libs/jersey-server-2.34.jar:/kfk/bin/../libs/jetty-client-9.4.44.v20210927.jar:/kfk/bin/../libs/jetty-continuation-9.4.44.v20210927.jar:/kfk/bin/../libs/jetty-http-9.4.44.v20210927.jar:/kfk/bin/../libs/jetty-io-9.4.44.v20210927.jar:/kfk/bin/../libs/jetty-security-9.4.44.v20210927.jar:/kfk/bin/../libs/jetty-server-9.4.44.v20210927.jar:/kfk/bin/../libs/jetty-servlet-9.4.44.v20210927.jar:/kfk/bin/../libs/jetty-servlets-9.4.44.v20210927.jar:/kfk/bin/../libs/jetty-util-9.4.44.v20210927.jar:/kfk/bin/../libs/jetty-util-ajax-9.4.44.v20210927.jar:/kfk/bin/../libs/jline-3.21.0.jar:/kfk/bin/../libs/jmx_prometheus_httpserver-0.17.3-SNAPSHOT.jar:/kfk/bin/../libs/jmx_prometheus_javaagent-0.16.1.jar:/kfk/bin/../libs/jmx_prometheus_javaagent-0.17.2.jar:/kfk/bin/../libs/jmx_prometheus_javaagent-0.17.3-SNAPSHOT.jar:/kfk/bin/../libs/jopt-simple-5.0.4.jar:/kfk/bin/../libs/jose4j-0.7.9.jar:/kfk/bin/../libs/kafka-clients-3.2.0.jar:/kfk/bin/../libs/kafka-log4j-appender-3.2.0.jar:/kfk/bin/../libs/kafka-metadata-3.2.0.jar:/kfk/bin/../libs/kafka-raft-3.2.0.jar:/kfk/bin/../libs/kafka-server-common-3.2.0.jar:/kfk/bin/../libs/kafka-shell-3.2.0.jar:/kfk/bin/../libs/kafka-storage-3.2.0.jar:/kfk/bin/../libs/kafka-storage-api-3.2.0.jar:/kfk/bin/../libs/kafka-streams-3.2.0.jar:/kfk/bin/../libs/kafka-streams-examples-3.2.0.jar:/kfk/bin/../libs/kafka-streams-scala_2.13-3.2.0.jar:/kfk/bin/../libs/kafka-streams-test-utils-3.2.0.jar:/kfk/bin/../libs/kafka-tools-3.2.0.jar:/kfk/bin/../libs/kafka_2.13-3.2.0.jar:/kfk/bin/../libs/lz4-java-1.8.0.jar:/kfk/bin/../libs/maven-artifact-3.8.4.jar:/kfk/bin/../libs/metrics-core-2.2.0.jar:/kfk/bin/../libs/metrics-core-4.1.12.1.jar:/kfk/bin/../libs/netty-buffer-4.1.73.Final.jar:/kfk/bin/../libs/netty-codec-4.1.73.Final.jar:/kfk/bin/../libs/netty-common-4.1.73.Final.jar:/kfk/bin/../libs/netty-handler-4.1.73.Final.jar:/kfk/bin/../libs/netty-resolver-4.1.73.Final.jar:/kfk/bin/../libs/netty-tcnative-classes-2.0.46.Final.jar:/kfk/bin/../libs/netty-transport-4.1.73.Final.jar:/kfk/bin/../libs/netty-transport-classes-epoll-4.1.73.Final.jar:/kfk/bin/../libs/netty-transport-native-epoll-4.1.73.Final.jar:/kfk/bin/../libs/netty-transport-native-unix-common-4.1.73.Final.jar:/kfk/bin/../libs/osgi-resource-locator-1.0.3.jar:/kfk/bin/../libs/paranamer-2.8.jar:/kfk/bin/../libs/plexus-utils-3.3.0.jar:/kfk/bin/../libs/reflections-0.9.12.jar:/kfk/bin/../libs/reload4j-1.2.19.jar:/kfk/bin/../libs/rocksdbjni-6.29.4.1.jar:/kfk/bin/../libs/scala-collection-compat_2.13-2.6.0.jar:/kfk/bin/../libs/scala-java8-compat_2.13-1.0.2.jar:/kfk/bin/../libs/scala-library-2.13.8.jar:/kfk/bin/../libs/scala-logging_2.13-3.9.4.jar:/kfk/bin/../libs/scala-reflect-2.13.8.jar:/kfk/bin/../libs/slf4j-api-1.7.36.jar:/kfk/bin/../libs/slf4j-reload4j-1.7.36.jar:/kfk/bin/../libs/snappy-java-1.1.8.4.jar:/kfk/bin/../libs/trogdor-3.2.0.jar:/kfk/bin/../libs/zookeeper-3.6.3.jar:/kfk/bin/../libs/zookeeper-jute-3.6.3.jar:/kfk/bin/../libs/zstd-jni-1.5.2-1.jar -Djava.io.tmpdir=/data/tmp -Dkafka.logs.dir=/data/logs -javaagent:/kfk/libs/jmx_prometheus_javaagent-0.17.3-SNAPSHOT.jar=9002:/kfk/config/config.yaml -Djava.security.auth.login.config=/kfk/config/jaas.conf -Djava.security.krb5.conf=/kfk/config/krb5.conf -Dzookeeper.sasl.client.username=zookeeper -Djdk.tls.disabledAlgorithms=SSLv3,TLSv1,TLSv1.1,DES,MD5withRSA kafka.Kafka /kfk/config/server.properties
mc-svc     461  0.0  0.0  16364  1012 pts/0    R+   16:08   0:00 grep --color=auto java

@dhoard
Copy link
Collaborator

dhoard commented Feb 24, 2023

Looking at your classpath, you have multiple instances of the exporter in your classpath, which seems incorrect to me.

kfk/bin/../libs/jmx_prometheus_httpserver-0.17.3-SNAPSHOT.jar:/kfk/bin/../libs/jmx_prometheus_javaagent-0.16.1.jar:/kfk/bin/../libs/jmx_prometheus_javaagent-0.17.2.jar:/kfk/bin/../libs/jmx_prometheus_javaagent-0.17.3-SNAPSHOT.jar

... along with the -javaagent argument with parameters.

@euthuppan
Copy link

euthuppan commented Feb 24, 2023

Looking at your classpath, you have multiple instances of the exporter in your classpath, which seems incorrect to me.


kfk/bin/../libs/jmx_prometheus_httpserver-0.17.3-SNAPSHOT.jar:/kfk/bin/../libs/jmx_prometheus_javaagent-0.16.1.jar:/kfk/bin/../libs/jmx_prometheus_javaagent-0.17.2.jar:/kfk/bin/../libs/jmx_prometheus_javaagent-0.17.3-SNAPSHOT.jar

... along with the -javaagent argument with parameters.

True, it's a little confusing because that section is just listing all the jar files in that directory, but you'll notice that only the 17.3 one is passed into the -javaagent field. Ive tested swapping that section with 17.2 and 16.1 exporter jars and it changes behavior as expected.

@dhoard
Copy link
Collaborator

dhoard commented Feb 24, 2023

I'm running my server via systemd

Environment="KAFKA_HEAP_OPTS=-Xms6g -Xmx6g -XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 -javaagent:/opt/jmx-exporter/jmx_prometheus_javaagent.jar=9002:/opt/jmx-exporter/kafka_broker.yaml -Djavax.net.ssl.keyStore=/opt/jmx-exporter/keystore.pkcs12 -Djavax.net.ssl.keyStorePassword=changeit"

Are you running via systemd? If so can you run...

journalctl -f -u <service name>

before starting the service.

If you're not running via systemd, then the output (System.out) would be in whatever file you capture the output.

Example output:

Feb 24 21:20:19 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:19.022 [main] INFO  io.prometheus.jmx.JavaAgent - Starting...
Feb 24 21:20:19 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:19.087 [main] INFO  io.prometheus.jmx.JavaAgent - Loading configuration...
Feb 24 21:20:19 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:19.514 [main] INFO  io.prometheus.jmx.JavaAgent - Registering build info collector...
Feb 24 21:20:19 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:19.527 [main] INFO  io.prometheus.jmx.JavaAgent - Registering default exports...
Feb 24 21:20:19 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:19.646 [main] INFO  io.prometheus.jmx.JavaAgent - Registering JMX collector...
Feb 24 21:20:20 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:20.133 [main] INFO  io.prometheus.jmx.JavaAgent - Creating HTTP server...
Feb 24 21:20:20 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:20.136 [main] INFO  io.prometheus.jmx.JavaAgent - Address [0.0.0.0]
Feb 24 21:20:20 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:20.137 [main] INFO  io.prometheus.jmx.JavaAgent - Port [9002]
Feb 24 21:20:20 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:20.144 [main] INFO  io.prometheus.jmx.JavaAgent - Minimum threads [1]
Feb 24 21:20:20 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:20.145 [main] INFO  io.prometheus.jmx.JavaAgent - Maximum threads [5]
Feb 24 21:20:20 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:20.145 [main] INFO  io.prometheus.jmx.JavaAgent - Thread keep alive [0] ms
Feb 24 21:20:20 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:20.154 [main] INFO  io.prometheus.jmx.JavaAgent - Configuring HTTP server Basic authentication (Basic)...
Feb 24 21:20:20 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:20.154 [main] INFO  io.prometheus.jmx.JavaAgent - Basic authentication username [prometheus]
Feb 24 21:20:20 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:20.155 [main] INFO  io.prometheus.jmx.JavaAgent - Basic authentication password [*] (masked)
Feb 24 21:20:20 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:20.156 [main] INFO  io.prometheus.jmx.JavaAgent - HTTP server starting...
Feb 24 21:20:20 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:20.216 [main] INFO  io.prometheus.jmx.JavaAgent - HTTP server running
Feb 24 21:20:20 <machinename.domain> kafka-server-start[18036]: 2023-02-24 15:20:20.220 [main] INFO  io.prometheus.jmx.JavaAgent - Running

@euthuppan
Copy link

euthuppan commented Feb 24, 2023

@dhoard Great news! I finally figured it out. Your original hunch was spot on, it was the extra jmx exporter lib versions laying around in the classpath that seemed to make the difference. After removing those older jars, everything plays nice! Maybe it caused the jmx exporter logic to get conflated with the other libs.. not sure. I guess I assumed just placing the new jar in the -javaagent field would be sufficient, but seems that was the wrong assumption.

Now I finally get it prompting me with Basic Auth when i hit the endpoint in my browser! 🙌
Next I'll try to get TLS working and I should be set. Thanks a ton for taking some of your time and energy to look at this with me to deduce the problem. :)

Btw I'm running it in a container, and i am controlling starting and stopping that container through systemd so journalctl only shows the container state. But i was able to start seeing the JavaAgent output using podman logs --follow --timestamps --tail -1 kfk. Weird container situation i guess 🤷‍♂️

@dhoard
Copy link
Collaborator

dhoard commented Feb 25, 2023

@euthuppan great news!

@fstab
Copy link
Member

fstab commented Mar 6, 2023

Thanks a lot for working on SSL support. Just a quick status update from a maintainer: My current priority is on a new data model for client_java which supports Prometheus native histograms. Once this is done, I am available to spend more time on jmx_exporter again. However, this might take a couple of months. So either we find volunteers who will maintain SSL support on jmx_exporter, or it will take a while until this makes it into the official release.

@euthuppan
Copy link

euthuppan commented May 1, 2023

@dhoard I've been using your version of the jmx_exporter from back in February (only in a couple QA kafka clusters) and it works pretty great for TLS and basic auth support. It's fantastic!

I just had a couple things i noticed while using this on a real QA kafka cluster the last couple months.

  • Since using it, I noticed prometheus takes about 10-20 seconds to scrape all the kafka metrics whereas before it was in the ms. I had to add a whitelist and that reduced it to about ~1 second which works fine for us. Gist Link to mine. Not asking for your help on this necessarily unless you know of a way to make it better, just linking here in case anyone else runs into something similar with kafka.
  • Only about 3 times now, I saw that the jmx exporter will stop responding to prometheus. Even times out in the browser as well. It usually happens to about 1 node in the cluster at a time, and usually the only fix i can think of is to restart the service and then it works fine right after. Not sure but I imagine there's a cleaner way than restarting the whole service. Nothing at all was wrong with the kafka service itself as it was functioning fine. And using the jmx_exporter prior to enabling auth and TLS, this never was an issue. Anyways, this is the only real doubt I have before considering expanding this to more clusters as we wouldn't want alarms to go off because the exporter alone stopped responding. (I don't really plan to take this to production until the official release anyways). Not sure how many changes you introduced since last late Feb which may or may not have already resolved this, but just curious if you noticed this on long running jmx exporters at all?

@dhoard
Copy link
Collaborator

dhoard commented May 3, 2023

@euthuppan I'm not aware of any changes that would cause the slower scrape or the unresponsiveness (It might have been something in 0.17.3 SNAPSHOT codebase I used - I no longer have the branch.)

I'm working on proper basic authentication functionality. Once completed, I'll start working on proper SSL configuration.

Again, I would advise you to NOT use the code in production.

@euthuppan
Copy link

gotcha @dhoard so you've essentially started rewriting it? and yep note taken.

@dhoard
Copy link
Collaborator

dhoard commented May 4, 2023

@euthuppan correct. It will a proper feature.

@dhoard
Copy link
Collaborator

dhoard commented May 7, 2023

@euthuppan This is targeted for the next (post 0.18.0) release

@dhoard
Copy link
Collaborator

dhoard commented Jun 30, 2023

Resolved in release 0.19.0.

@dhoard dhoard closed this as completed Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants