MP Metrics 5.0 support for 4.x #7139

tjquinno · 2023-06-30T15:14:13Z

Resolves #5816

👀	It's a long list of reviewers! Changes in the metrics API affected many parts of Helidon that use metrics, including several integrations. I've added people who IIRC have worked on such integrations so you can take a look.

Overview

Note: Look for 🚧 User action flags below for what developers need to do to adapt to the changes. The MP spec section on migration hints has information also.

The latest MP Metrics release is 5.0.1.

This PR contains all the code changes that I know of to implement MP Metrics 5.0. As noted in separate meetings, the TCK is not yet running because the way we set up the Arquillian environment leads to many test failures. That remains a work in progress.

To detect other problems, I have commented out the invocation of the metrics TCK in this PR.

What this PR does not include:

doc changes
use of the new Helidon injection/builder approaches
reactive removal
removal of nima directories

MP Metrics 5.0

There were breaking changes in the MP metrics libraries--to help simplify the API and bring MP metrics in closer alignment with Micrometer and OpenTelemetry metrics. Notable changes:

metric types removed

🚧 User action - use an alternative metric type and annotation

Metric type/annotation removed	Alternative
`ConcurrentGauge` `@ConcurrentGauge`	`Gauge` `@Gauge`
`Meter` `@Metered`	`Counter` `@Counted` - back-ends can track the changes in the counter value over time and create time series analysis from that (meter used to maintain historical data itself in the server)
`SimpleTimer` `@SimplyTimed`	`Timer` `@Timed`

metadata removals
- type
- display name
🚧 User action - Remove code that sets the type or display name in the metadata programmatically and remove settings in the @Metric annotation.
Other
- The notion of a registry "type" with fixed values of base, vendor, and application has given way to the idea of a "scope" instead. There are three built-in scopes--base, vendor, and application--but users can create their own scopes.
  - MetricRegistry.Type was an enum covering base, vendor, and application. It's gone because users can create their own so an enum doesn't really work.
  - @RegistryType is deprecated, intended to be replaced by @RegistryScope. Unfortunately, @RegistryScope is not (yet) marked as a CDI qualifier (that'll change in an upcoming release of MP Metrics), so for now our CDI code still has producers for the three registry types using @RegistryType as well as a producer for injection sites that use @RegistryScope rather than @RegistryType.
  🚧 User action
  - Change uses of MetricRegistry.Type to a String with the corresponding name. The MetricRegistry interface declares String constants for the predefined scopes.
  - Over time, change uses of @RegistryType (deprecated but not yet removed) to @RegistryScope.
- All metrics that share the same name must:
  - Have the same metadata (this is not new)
  - Have the same tag name set (this is new). Apparently the Prometheus exporter (not ours) needs this restriction for some reason in order to produce correct output; I am not at all familiar with that code but the restriction is written into the spec and checked by the TCK so our code enforces this.
  🚧 User action - Make sure all metrics that share the same name use the same tag names.
- The metric registry API no longer permits you to create your own instance of one of the supported metric types (such as your own Counter) and register it. This is because Micrometer and OpenTel are totally in control of manufacturing the metric instances.
  
  Some of our code (integrations, for example) used to do this in order to wrap other technology's measurements with custom implementations of the MP metrics (typically a counter). For the most part where I had to change our code to adjust, I changed such places to use gauges instead. In line with what Micrometer supports, The Registry interface usable from SE apps includes a way to register a read-only Counter by passing a ToDoubleFunction and an object to which to apply that function. But this is not visible through the MP MetricRegistry interface. In some places I used that approach.
  
  🚧 User action - Use a Gauge metric type to expose a value that is updated outside of the metrics API.
- Change in Gauge to Gauge<? extends Number> - The Gauge metric was always intended to expose numeric data. The updated API now enforces this.
  
  🚧 User action - Declare each of your Gauge metrics so it reflects the specific numeric type the gauge exposes.

Metric types remaining from earlier releases are Counter, Gauge<? extends Number>, Histogram, and Timer.

Why do so many files have to change?

Our SE metrics implementation has always dependend--and, for the moment, still does depend--on the MP metrics types. We plan for that to change by introducing a neutral Helidon metrics API which will eventually have implementations which delegate to Micrometer and OpenTelemetry metrics.
Then our MP metrics implementation will invoke the neutral Helidon API, as will customers who develop SE apps.
People will be able to choose whether to use Micrometer or OTel underneat for either SE or MP apps.
But we have not yet defined or implemented the neutral API yet. (I have thought a lot about what that API should look like but the code is not written yet.)

So, the MP metrics changes rippled through our SE metrics API and implementation, and into other code that uses metrics, and into examples, and into tests.

Other notes

Although this PR does not introduce the neutral Helidon metrics API, there are some beginning steps in that direction.
For example, the new MetricFactory SPI interface encapsulates the details of creating new metrics. There are two implementations so far: one that delegates to Micrometer and one that provides no-op implementations of the various metrics.

Other changes in behavior

Prometheus output

In addition to the API changes, the Prometheus output now is slightly different. Each line showing a metric value (that is, not the # TYPE or # HELP lines) has an additional tag to identify the scope where that metric was registered. In MP, the tag name is mp_scope per the spec; in SE it (for the moment) is h_scope.

This is because our code now stores metrics in the Micrometer meter registry...a single meter registry for all scopes we and our developers might create. That's the way Micrometer works.

The Micrometer world does not use different meter registries for scoping but rather for dealing with different types of backends/consumers of metrics output. Typical applications use the Micrometer global registry for adding, removing, and querying meters.
That global registry is a composite registry, meaning you can add technology-specific meter registries to it.
There is a Prometheus Micrometer meter registry Helidon creates and adds (if absent) to the global registry.
Applications create and add other types of meter registries to the global one to support other back-ends.

Further, the Prometheus Micrometer meter registry knows how to format its output so we will be able to remove our own code that does that.

JSON output

It's still there in Helidon, but the MP metrics spec no longer requires it. At this moment, I have not added back the JSON OPTIONS support for metadata.
I might do that if I have time.

We might want to remove JSON support at some point--perhaps deprecate it in 4.0--but there was some Helidon code that used it for testing and there might be user code out there that relies on it.
Users will have plenty to adapt to in 4.0 already, so it might be worth keeping JSON format output for now to ease that migration.

I already changed some of the JSON-dependent testing code to use Prometheus output instead; it is possible to do that everywhere if we need to.
Because there will be a cost to us in keeping it around--even if deprecated--until 5.0, we might want to remove it now.
To my knowledge, MP metrics never marked the JSON output as deprecated during the MP metrics 4.x lifespan, for what that's worth.

Current state

Work done

AFAIK, all our MP and SE code and other files (excluding doc) are now consistent with the new MP Metrics spec and changes in the API.
I've updated the MP fault tolerance dependency from 4.0 to 4.0.2 so it depends on the updated MP metrics library instead of the old one as 4.0 did.

Work left to do for full MP metrics support in M1

Adapt to upcoming big merges as needed.
The MP spec contains a section with suggestions (not requirements) for vendors who choose to use Micrometer for the implementation, mostly how to plug in and configure Micrometer support for other back-ends. This is nice to have but is certainly not required for M1.

tomas-langer · 2023-07-03T14:53:17Z

There seems to be a problem with the calls to registry. Calls to /observe/metrics fail with 404. But if I call /observer/metrics/base, it succeeds, and then all calls to /observer/metrics succeed as well.
Investigating

Problem is in RegistryFactory.getInstance().scopes() that returns an empty set, as the registries field is empty.
Investigating

So the registries field is lazyly initialized as the various scopes are requested.
Possible fix:

    @Override
    public Iterable<String> scopes() {
        Set<String> result = new HashSet<>(registries.keySet());
        // always add base and vendor, as these are known to exist (even if not yet initialized)
        result.add(Registry.BASE_SCOPE);
        result.add(Registry.VENDOR_SCOPE);
        return result;
    }

tjquinno · 2023-07-03T19:05:15Z

Exactly. I also found this independently. Pushed a fix to the PR branch.

tjquinno · 2023-07-03T19:07:48Z

We definitely need to do this for base (but only if it is enabled; remember metrics can be disabled by scope).

But you think also for vendor? Base is the only registry lazily instantiated upon first read.

tjquinno · 2023-07-03T20:34:38Z

Still investigating, but the actual underlying cause might be different.

With the change to the scopes method to always return base I am seeing failures in the bookstore functional test because the request.count metric from the key performance indicator metrics is not present in the output.

It seems that maybe the metrics feature post-setup code is not invoked when expected. That code should be fetching the base metrics registry to trigger its creation and also set up the KPI metrics...and neither seems to be happening and in the debugger I don't see the metrics feature postSetup method invoked.

tjquinno · 2023-07-12T13:11:52Z

BTW, the latest pushes resolve the problem Tomas and I discovered with the base metrics.

klustria

As it relates to changes in integrations/oci/metrics, it LGTM, hence giving approval from that perspective.

…n extension

…r test

…rs; remove scope tag from JSON output

…ugh)

…signature

…less sensitive to ordering of TEST and HELP lines in Prometheus output

…es (instead of empty output)

…ially no-ops during normal usage)

…d created yet, when Registry is asked for the list of scopes; also make sure metrics is properly declared as a feature

…r clean-up

…shutdown, now that the clean-up is in the RegistryFactory stop rather than start method

trentjeff · 2023-07-13T16:18:33Z

examples/metrics/exemplar/src/test/java/io/helidon/examples/metrics/exemplar/MainTest.java

    private static String valueMatcher(String statName) {
        // application_timerForGets_mean_seconds 0.010275403147594316 # {trace_id="cfd13196e6a9fb0c"} 0.002189822 1617799841.963000
        return "application_" + GreetService.TIMER_FOR_GETS
-                + "_" + statName + "_seconds [\\d\\.]+ # \\{trace_id=\"[^\"]+\"\\} [\\d\\.]+ [\\d\\.]+";
+                // TODO Following is the original, exemplar-matching pattern. Suppressed temporarily while we migrate


can you place the issue number here as part of the follow-up

Will update in this PR if there are other, functional, changes as well.

trentjeff · 2023-07-13T16:23:00Z

.../java/io/helidon/integrations/datasource/hikaricp/cdi/MicroProfileMetricsTrackerFactory.java

@@ -34,6 +34,7 @@ class MicroProfileMetricsTrackerFactory implements MetricsTrackerFactory {
        this.registry = null;
    }

+    // TODO change to RegistryScope once MP makes it a qualifier


tracking issue inlined

Will update in this PR if there are other, functional, changes as well.

trentjeff · 2023-07-13T16:53:35Z

metrics/api/src/main/java/io/helidon/metrics/api/MetricsProgrammaticSettings.java

+        private Instance() {
+        }
+
+        private static final LazyValue<MetricsProgrammaticSettings> INSTANCE = LazyValue.create(() ->


Why aren't we using the @Contract and pico style approach here? Holistically, I think we need to stop using this pattern - @tomas-langer should comment.

Definitely needs to happen. Much was changing in a short time as this was coming together and I chose to defer those changes until we had MP metrics operating.

This is the correct approach for now. We may want to use contract based lookup, but only when running within injection service registry. It still must work outside of it

trentjeff · 2023-07-13T16:55:38Z

metrics/api/src/main/java/io/helidon/metrics/api/RegistryFactory.java

+     * @throws java.lang.IllegalArgumentException if the implementation cannot handle the requested media type
+     * @throws java.lang.UnsupportedOperationException if the implementation cannot expose its metrics
+     */
+    Optional<?> scrape(MediaType mediaType, Iterable<String> scopeSelection, Iterable<String> meterNameSelection);


Returning Optional<?> seems wrong.

We could return null instead if the provided selections result in no output, but it's against our guidelines to use null as part of our APIs.

its not the optional i am commenting on - it is the use of the wildcard generic

Thanks for clarifying.

If you look at the two formatters that can be called, the Prometheus one returns an Optional<String> and the JSON one returns an Optional<JsonObject>.

in that case why not return a Builder with two Optional<> attributes? I think that would be better than using generics.

As is, the value (if present inside the Optional) is sent directly as the entity in the response. The code in MetricsFeature has no need to know the specific type of the returned value. The media support handles whatever the actual type is in sending the response. It's very flexible if we ever need to add another formatter that produces another type.

Adding another builder would (IMO) unnecessarily complicate things now, and if we were to add other format types in the future then we'd have to update the builder as well. As is, a hypothetical new formatter would just return its type and we'd be done.

btw Optional<Object> would work the same and it is less unusual on the API

Added to follow-up tasks issue #7193

trentjeff · 2023-07-13T16:59:13Z

metrics/api/src/test/java/io/helidon/metrics/api/TestDeletions.java

+    void addCounterWithTag() {
+        Counter counter = reg.counter(COMMON_COUNTER_NAME, new Tag("myTag", "a"));
+        assertThat("New counter value", counter.getCount(), is(0L));
+        counter.inc();


what is the point of this increment call if its not getting checked later on?

The @BeforeEach exercises the metric deletion feature. The increment of the counters in each test verifies that the counter was, in fact, deleted correctly by that code. If the deletion does not work correctly, then the counter value in the other test could be non-zero as a result of the increment at the end of the first test.

trentjeff · 2023-07-13T17:00:43Z

metrics/metrics/src/main/java/io/helidon/metrics/HelidonFunctionalCounter.java

+     * @param <T> type of the origin
+     */
+    static <T> HelidonFunctionalCounter<T> create(MeterRegistry meterRegistry,
+                                               String scope,


nit: alignment

Will change in this PR if there are also other, functional, changes needed.

tomas-langer

The use of MIcrometer as part of Metrics API was unexpected for me, and I would like to understand this.
I thought the MP was kept independent yet aligned with Micrometer, but now we have a hard dependency on it.

tomas-langer · 2023-07-14T10:36:00Z

metrics/api/pom.xml

@@ -50,6 +50,14 @@
            <groupId>org.eclipse.microprofile.metrics</groupId>
            <artifactId>microprofile-metrics-api</artifactId>
        </dependency>
+        <dependency>


Why is Micrometer an explicit dependency here? I thought the APIs should be MP and not Micrometere?

Yes, not ideal, but we plan for Helidon to introduce its own neutral metrics API and, once that is in place, we will change our MP metrics implementation to depend on that rather than directly on Micrometer.

But due to scheduling complications earlier in the spring and given our goal of having MP metrics support in M1 we decided to go ahead with a first MP implementation without waiting for the neutral API, knowing we would change the MP implementation once the neutral API is in place.

The neutral API will likely be very close to the Micrometer API, partly for the convenience of developers and partly because of Micrometer's maturity in this space. In that case, the changes we will need to make in our MP implementation, although not zero, will be relatively small. Much of the code already uses a MetricFactory SPI abstraction to help minimize the changes required later.

We would definitely have preferred to have the neutral API in place first, but I just couldn't make that happen in time for MP metrics in M1.

We should try to fix this for M2. It doesn't look very nice. I should be able to help Tim.

Yes, we should implement our neutral API and adapt SE and MP to it soon.

tomas-langer · 2023-07-14T10:40:13Z

metrics/api/src/main/java/io/helidon/metrics/api/MetricsProgrammaticSettings.java

+        private Instance() {
+        }
+
+        private static final LazyValue<MetricsProgrammaticSettings> INSTANCE = LazyValue.create(() ->


This is the correct approach for now. We may want to use contract based lookup, but only when running within injection service registry. It still must work outside of it

tomas-langer · 2023-07-14T10:41:09Z

metrics/api/src/main/java/io/helidon/metrics/api/MetricsProgrammaticSettings.java

+ *     customization.
+ * </p>
+ */
+public interface MetricsProgrammaticSettings {


Please add a follow up for naming here - we use Config as the suffix for configuring components in other modules, I think it should be used for this class as well

tomas-langer · 2023-07-14T10:44:54Z

metrics/api/src/main/java/io/helidon/metrics/api/RegistryFactory.java

+     * @throws java.lang.IllegalArgumentException if the implementation cannot handle the requested media type
+     * @throws java.lang.UnsupportedOperationException if the implementation cannot expose its metrics
+     */
+    Optional<?> scrape(MediaType mediaType, Iterable<String> scopeSelection, Iterable<String> meterNameSelection);


btw Optional<Object> would work the same and it is less unusual on the API

tomas-langer · 2023-07-14T10:50:49Z

metrics/metrics/src/main/java/io/helidon/metrics/HelidonFunctionalCounter.java

+ *
+ * @param <T> type of the object providing the value
+ */
+class HelidonFunctionalCounter<T> extends MetricImpl implements Counter {


Do I understand it correctly that this is basically a Gauge? Why do we still have it? Or is it represented differently by the format?

Basically, this allows a Counter-style "wrapper" around an externally-updated value instead of a Gauge-style wrapper.

Earlier releases of MP metrics allowed developers to create their own implementations of metrics and register them. Helidon used this to register custom counters, for example.

MP Metrics 5.0 no longer permits this, but the ability to have a Counter-like wrapper is useful (and present in Micrometer and OTel metrics) so will likely be in the Helidon neutral API as well.

tomas-langer · 2023-07-14T10:53:24Z

metrics/metrics/src/main/java/io/helidon/metrics/HelidonGauge.java


-    private HelidonGauge(String registryType, Metadata metadata, Gauge<T> metric) {
-        super(registryType, metadata);
+    private final io.micrometer.core.instrument.Gauge delegate;


I thought we should have support for Micrometer and possibly other metrics implementation, but not hardcoded into our impl, but through provider modules.
Why is there a hard dependency on Micrometer?

See earlier note about the neutral API and timing.

I did not realize from the description that we must use Micrometer to be able to generate prometheus output (as we used to generate it ourself).
I have approved the PR, we need to fix this eventually when we switch to Helidon Metrics API

Given that our neutral API will layer on one or more actual implementations (Micrometer, OTel metrics), we should delegate to the underlying implementation to publish, whether Prometheus or other (push or pull) models, rather than duplicate that functionality in our own code.

tomas-langer

There are things to follow up on, let's merge so we have MP 6 complete

tjquinno added this to the 4.0.0-M1 milestone Jun 30, 2023

tjquinno requested review from ljnelson, danielkec, dalexandrov, spericas, barchetta, tomas-langer, arjav-desai and klustria June 30, 2023 15:14

tjquinno self-assigned this Jun 30, 2023

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jun 30, 2023

tjquinno linked an issue Jun 30, 2023 that may be closed by this pull request

4.x: Upgrade MP Metrics to 5.0 #5816

Closed

tjquinno force-pushed the mp-metrics-4.x-mm branch from bc72cfb to 05447e9 Compare July 1, 2023 23:36

barchetta added metrics MP dependencies Pull requests that update a dependency file labels Jul 5, 2023

klustria approved these changes Jul 12, 2023

View reviewed changes

tjquinno added 8 commits July 12, 2023 23:30

Initial commit of MP based on Micrometer

b3d1193

More tests; start feature work; start CDI work

88e75c1

More tests; start feature work; start CDI work

a0a8324

Added basics for timers, gauges

9d1ce27

Improve formatter API; fix and expand tests; plug feature in; start o…

b19be5a

…n extension

Finish missing method impls in metric registry impl; improve formatte…

97dc30d

…r test

Fix some checkstyle, JavaDoc comments, etc.

e4a9a67

Further style, etc. fixes

4693d83

tjquinno added 24 commits July 12, 2023 23:30

More fixes; change native test to check timers instead of simple time…

b326082

…rs; remove scope tag from JSON output

Need to use MP FT 4.0.2 to depend on correct MP metrics release

b39de03

Adjust to removal of scope tag from JSON output

b6f0eea

Fix bug in deletion; add test (not the cause of the TCK problems, tho…

f368262

…ugh)

Fix up JavaDoc link to match changes I made to the referenced method …

08f08a4

…signature

Minor fix to JSON formatting and to bookstore functional test (to be …

4a0f8fc

…less sensitive to ordering of TEST and HELP lines in Prometheus output

Handle per-metric enable/disable in formatting

21da420

Make sure to return 404 if selective metrics retrieval finds no match…

2f2a429

…es (instead of empty output)

Add back EOF filtering from Prom. output; fix Optional issue

22a26a6

Remove obsolete metrics usage

0151c96

Remove refc to now-removed min value for timer

d67a736

Fix bad change for registry type qualification on injection

f5de4d3

Various clean-up changes; pruning unneeded files and directories

fc0e002

Style

d187627

Handle injections modified with @RegistryScope as well as @RegistryType

8bf9659

Bug fixes

add80e3

Further fine-tune regex to exclude TCK jars from Weld scanning

7916312

Refine the Arq config

d706ce9

Add logic to clear data structures between same-JVM TCK tests (essent…

60785d4

…ially no-ops during normal usage)

Restore metrics TCK runs

26c4cc1

Report 'base' as a scope, even if that registry has not been on-deman…

21ae6ab

…d created yet, when Registry is asked for the list of scopes; also make sure metrics is properly declared as a feature

Move clean-out of registries from afterStart to beforeStop; some othe…

9438aee

…r clean-up

Move per-Arquillian cleanup from before bean discovery to before CDI …

ec5fd30

…shutdown, now that the clean-up is in the RegistryFactory stop rather than start method

Fix copyright date

e138b14

tjquinno force-pushed the mp-metrics-4.x-mm branch from 207ccfc to e138b14 Compare July 13, 2023 04:32

tjquinno mentioned this pull request Jul 13, 2023

Restore JSON support for metadata to metrics in 4.x #7182

Closed

trentjeff reviewed Jul 13, 2023

View reviewed changes

tomas-langer requested changes Jul 14, 2023

View reviewed changes

tomas-langer approved these changes Jul 14, 2023

View reviewed changes

tjquinno merged commit 9327495 into helidon-io:main Jul 14, 2023
12 checks passed

MP Metrics 5.0 support for 4.x #7139

MP Metrics 5.0 support for 4.x #7139

Conversation

tjquinno commented Jun 30, 2023 • edited

Overview

MP Metrics 5.0

Why do so many files have to change?

Other notes

Other changes in behavior

Prometheus output

JSON output

Current state

Work done

Work left to do for full MP metrics support in M1

tomas-langer commented Jul 3, 2023 • edited

tjquinno commented Jul 3, 2023

tjquinno commented Jul 3, 2023

tjquinno commented Jul 3, 2023

tjquinno commented Jul 12, 2023

klustria left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomas-langer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomas-langer left a comment

Choose a reason for hiding this comment

tjquinno commented Jun 30, 2023 •

edited

tomas-langer commented Jul 3, 2023 •

edited