Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add http active requests gauge metric #29242

Merged

Conversation

pcasaes
Copy link
Contributor

@pcasaes pcasaes commented Nov 13, 2022

The original Vertx micrometer metric include the gauge vertx_http_server_active_requests which is pretty handy. I was hoping we could add a similar gauge here.

@quarkus-bot
Copy link

quarkus-bot bot commented Nov 13, 2022

/cc @ebullient

@pcasaes pcasaes force-pushed the pc/add-http-active-requests-metric branch from bd33397 to a6f42c6 Compare November 13, 2022 22:58
@quarkus-bot

This comment has been minimized.

@pcasaes
Copy link
Contributor Author

pcasaes commented Nov 14, 2022

Thanks @cescoffier . Not sure why these tests failed.

2022-11-14T08:46:25.3565860Z 08:46:25.345 Keycloak:08:46:25,341 WARN  [org.keycloak.keys.loader.ClientPublicKeyLoader] (default task-7) Unable to retrieve publicKey for verify signature of client 'quarkus-app-jwt' . Error details: Client doesn't have certificate or publicKey configured: org.keycloak.models.ModelException: Client doesn't have certificate or publicKey configured
2022-11-14T08:46:25.3598252Z 08:46:25.345 Keycloak:	at org.keycloak.keycloak-services@19.0.3//org.keycloak.keys.loader.ClientPublicKeyLoader.getSignatureValidationKey(ClientPublicKeyLoader.java:100)

@ebullient
Copy link
Contributor

On the fence about this as a gauge or as a long task timer (which includes a count of active requests): https://github.com/micrometer-metrics/micrometer-docs/blob/main/src/docs/concepts/long-task-timers.adoc

@pcasaes
Copy link
Contributor Author

pcasaes commented Nov 14, 2022

A long task time wouldn't work for my use case.

Long Task Timer aren't really Open Metrics compatible. It's why the vertx binder works well, it's a simple gauge.

@pcasaes
Copy link
Contributor Author

pcasaes commented Nov 14, 2022

I was originally replacing this binder with the original one but this PR broke that on 4.3.4
vert-x3/vertx-micrometer-metrics#160

I'd rather not keep hacking Quarkus to get those metrics.

Also, it's my understanding that the current metrics are timers already. I'm not sure what making active requests as timers would add to what we have now.

The question I'd like to answer is: How many active requests are open right now?

@ebullient
Copy link
Contributor

ebullient commented Nov 14, 2022

I was originally replacing this binder with the original one but this PR broke that on 4.3.4 vert-x3/vertx-micrometer-metrics#160

I'd rather not keep hacking Quarkus to get those metrics.

Quarkus uses Vert.x, but is not Vert.x. For a lot of reasons, Quarkus does not use the vertx micrometer instrumentation (from Vert.x 3.0). So "hacking Quarkus to get those metrics" is what you need to do if you want those metrics from Quarkus. They are not the same thing.

Also, it's my understanding that the current metrics are timers already. I'm not sure what making active requests as timers would add to what we have now.

The question I'd like to answer is: How many active requests are open right now?

The current metrics (http.server.requests.* and http.client.requests.*) are timers. A long task timer is something else, and it does answer your question, but provides some additional information as well. I'm not sure if that extra information is also interesting.

@pcasaes
Copy link
Contributor Author

pcasaes commented Nov 14, 2022

To expand on the sticking point for me. The long task timer's aren't openmetrics compliant. One of my providers is strict on openmetrics. It ignores *_active_count.

@pcasaes
Copy link
Contributor Author

pcasaes commented Nov 14, 2022

What could be interesting is adding tags for url and method.

What do you think?

@ebullient
Copy link
Contributor

ebullient commented Nov 14, 2022

Is it because it ends with _active_count? A simple meter filter could rename the metric to follow guidelines if that is the constraint. That metric is not computed as a LongAdder, but it is, functionally, a gauge.

If we use a long task timer, we would use a second sample, and thread the needle identically to the Timer Sample. Note that the exact URI of the request (with templates replaced) is not known until the request is complete outside of use of a regex (deferred URI template resolution by REST containers). Existing HTTP configuration contains URI regex for constraining cardinality.. so this isn't a problem, just explaining a complication to using that information for requests in flight, rather than requests that have been completed.

@pcasaes
Copy link
Contributor Author

pcasaes commented Nov 14, 2022

I don't believe MeterFilter allows us to change the names of samples, only tags and ids. Not sure how I would use it to change the name of _active_count. Internally the prometheus registry creates a sample for that.

I'm still unsure what the benefit for long task timer would be here given that the current timer metric already publish timing information. I do understand the a long timer is something else, but I'm not sure what the benefit would be here.

If we were to thread the needle as you say wouldn't it end up timing the same thing that the current timer times?

@ebullient
Copy link
Contributor

ebullient commented Nov 14, 2022

The hooks are roughly in the same place.. but what happens at start/stop is different. Long task timer is active requests, duration of longest running active request.

You're right about _active_count naming convention sorry about that.

Splitting the gauge (active requests) by URI and Method requires storing a separate LongAdder for each (I know this is what 3.x Vert.x does.. but it does it for known routes with known methods). The metric hook (request begin) is called before the request route has been completely resolved from a Resteasy perspective. While we may know the method, we won't know the templated URI w/o involvement of regex (as mentioned above)

@pcasaes
Copy link
Contributor Author

pcasaes commented Nov 14, 2022

I see what your saying. We wouldn't be able to get path information during the requestBegin.

Looking at the API for LongTimerTask we would have the same problem. There's no static start() method on LongTimerTask as there is in Timer. The LongTimerTask would have to be created in the requestBegin.

@ebullient
Copy link
Contributor

ebullient commented Nov 14, 2022

Looking at the API for LongTimerTask we would have the same problem. There's no static start() method on LongTimerTask as there is in Timer. The LongTimerTask would have to be created in the requestBegin.

Correct. The LongAddr and the LongTaskTimer have the same constraint with regard to tags. As a funny aside.. you can get a LongTaskTimer with what you want by adding a @Timer annotation to requests (with longTask=true and tags with the templated URI and method) .. kind of funny, but you have the advantage of knowing what the values should be because you know what method will be called. The scope for the gauge will be a little different, however, as it will be around the method invocation, rather than around vert.x receive/response.

@pcasaes
Copy link
Contributor Author

pcasaes commented Nov 14, 2022

How do we proceed from here?

@@ -96,6 +103,7 @@ public void requestRouted(HttpRequestMetric requestMetric, String route) {
public HttpRequestMetric requestBegin(Map<String, Object> socketMetric, HttpRequest request) {
HttpRequestMetric requestMetric = new HttpRequestMetric(request);
requestMetric.setSample(Timer.start(registry));
requestMetric.requestStarted(activeRequests);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activeRequests is a member variable. Can be passed to HttpRequestMetric ctor (and never be null).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind making it null is to make sure we never decrement more than once. But if we're willing to trust the framework, that's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebullient added a commit adding this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're worried about this case.. a simple boolean in the HttpRequest can make sure requestEnd isn't called twice.

@pcasaes pcasaes force-pushed the pc/add-http-active-requests-metric branch 2 times, most recently from 4723ce1 to c779ca7 Compare November 14, 2022 23:23
@ebullient
Copy link
Contributor

Will let CI run .. can you squash down to one commit before merge?

@pcasaes pcasaes force-pushed the pc/add-http-active-requests-metric branch from c779ca7 to 030702b Compare November 15, 2022 02:30
@pcasaes
Copy link
Contributor Author

pcasaes commented Nov 15, 2022

Will let CI run .. can you squash down to one commit before merge?

done

@quarkus-bot

This comment has been minimized.

@quarkus-bot
Copy link

quarkus-bot bot commented Nov 15, 2022

Failing Jobs - Building 030702b

Status Name Step Failures Logs Raw logs
JVM Tests - JDK 11 Build Failures Logs Raw logs
JVM Tests - JDK 17 Build Failures Logs Raw logs
✔️ JVM Tests - JDK 17 MacOS M1
JVM Tests - JDK 18 Build Failures Logs Raw logs
Native Tests - Security2 Build Failures Logs Raw logs
Native Tests - gRPC Build ⚠️ Check → Logs Raw logs

Full information is available in the Build summary check run.

Failures

⚙️ JVM Tests - JDK 11 #

- Failing: integration-tests/oidc-code-flow 

📦 integration-tests/oidc-code-flow

io.quarkus.it.keycloak.CodeFlowTest.testRPInitiatedLogout line 416 - More details - Source on GitHub

com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 401 Unauthorized for http://localhost:8081/tenant-logout?state=e75708ab-4ff2-444a-a239-55847fb5a84d&session_state=049ec941-84b5-44a7-bd07-76e6982f243a&code=02cd9de8-3d7d-45c6-9c11-76d71405ae16.049ec941-84b5-44a7-bd07-76e6982f243a.fe0f9c9f-c781-421e-b1eb-ec728abc73bf
	at com.gargoylesoftware.htmlunit.WebClient.throwFailingHttpStatusCodeExceptionIfNecessary(WebClient.java:701)
	at com.gargoylesoftware.htmlunit.WebClient.loadDownloadedResponses(WebClient.java:2452)

⚙️ JVM Tests - JDK 17 #

- Failing: integration-tests/oidc-code-flow 

📦 integration-tests/oidc-code-flow

io.quarkus.it.keycloak.CodeFlowTest.testRPInitiatedLogout line 416 - More details - Source on GitHub

com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 401 Unauthorized for http://localhost:8081/tenant-logout?state=22418085-51c4-4b24-84ac-114ecc1b0625&session_state=95b8c375-8e46-41e0-a464-c75267f428fa&code=2e738fbf-6289-4db1-8eae-6dc2b47dbf6f.95b8c375-8e46-41e0-a464-c75267f428fa.563a37f8-7a55-4753-bae4-ee73ef649484
	at com.gargoylesoftware.htmlunit.WebClient.throwFailingHttpStatusCodeExceptionIfNecessary(WebClient.java:701)
	at com.gargoylesoftware.htmlunit.WebClient.loadDownloadedResponses(WebClient.java:2452)

⚙️ JVM Tests - JDK 18 #

- Failing: integration-tests/oidc-code-flow 

📦 integration-tests/oidc-code-flow

io.quarkus.it.keycloak.CodeFlowTest.testRPInitiatedLogout line 416 - More details - Source on GitHub

com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 401 Unauthorized for http://localhost:8081/tenant-logout?state=67196960-11c7-41bd-9a0b-8c70bde47d62&session_state=b0faf167-ae6f-402e-98e7-9a65af6c254e&code=c8912021-17a2-41d9-9d33-6dc2e2507f86.b0faf167-ae6f-402e-98e7-9a65af6c254e.46716126-216f-4c6f-81bc-5ae1966c0462
	at com.gargoylesoftware.htmlunit.WebClient.throwFailingHttpStatusCodeExceptionIfNecessary(WebClient.java:701)
	at com.gargoylesoftware.htmlunit.WebClient.loadDownloadedResponses(WebClient.java:2452)

⚙️ Native Tests - Security2 #

- Failing: integration-tests/oidc-code-flow 

📦 integration-tests/oidc-code-flow

io.quarkus.it.keycloak.CodeFlowInGraalITCase.testRPInitiatedLogout - More details - Source on GitHub

com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 401 Unauthorized for http://localhost:8081/tenant-logout?state=d203d20b-6eab-42ca-b29f-1a2157562f26&session_state=e828683f-482a-43cf-a821-e089434c685d&code=c5b2435c-5d59-4bd8-be42-bd8b446de3f8.e828683f-482a-43cf-a821-e089434c685d.041638bb-5aa5-4026-b41f-506aee57fe7b
	at com.gargoylesoftware.htmlunit.WebClient.throwFailingHttpStatusCodeExceptionIfNecessary(WebClient.java:701)
	at com.gargoylesoftware.htmlunit.WebClient.loadDownloadedResponses(WebClient.java:2452)

@cescoffier cescoffier merged commit 921193c into quarkusio:main Nov 16, 2022
@quarkus-bot quarkus-bot bot added this to the 2.15 - main milestone Nov 16, 2022
@cescoffier
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants