Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a config property to disable exemplars for Metrics of counter type #39759

Closed
xak2000 opened this issue Feb 26, 2024 · 7 comments
Closed
Labels
for: external-project For an external project and not something we can fix status: invalid An issue that we don't feel is valid

Comments

@xak2000
Copy link
Contributor

xak2000 commented Feb 26, 2024

Spring Boot 3.2 introduced broaden the exemplar support that requires Prometheus 2.43 or later.

This change raised problems for people who use an older version of Prometheus for one reason or another.

The problem is not just some features stoped to work. The metrics stopped to be scrapped completely as Prometheus older than 2.43 just ignores any counters that contain exemplars.

While older Prometheus versions are not officically supported by the Prometheus team, sometimes users are out of control of the Prometheus version they use. In such cases, you can't just ask users to update their Prometheus.

E.g. Managed Prometheus instance on GKE still have version 2.41. While they already started to work on the version upgrade, due to very inert nature of GKE updates, the "fix" will not be available in production earlier than 6 months from now. So, any company that wants to use Spring Boot 3.2/Micrometer 1.12 on GKE with the Managed Prometheus instance is forced to implement some workaround or metrics will be completely absent.

I think it's not very tolerant to force users to use older Spring Boot or Micrometer version if it is simple enough to give users a choice.

So, I propose to introduce a config property that will disable adding exemplars to counter metrics, so the older versions of Prometheus will be able to scrape them.

Known worarounds are described here. They imply either introducing a new actuator endpoint with text/plain content type (that doesn't support exemplars at all, so it "fixes" the problem) or overriding some beans with an implementation, that disables exemplars for all or only counter metrics.

An example of implementation of ExemplarSampler bean that disables exemplars only for counter metrics (they will still be enabled for histograms):

/**
 * A {@link BeanPostProcessor} that wraps any {@link ExemplarSampler} bean with an implementation
 * that delegates all method calls to the wrapped bean, except
 * {@link io.prometheus.client.exemplars.CounterExemplarSampler#sample(double, Exemplar)}.
 * This have an effect of disabled Exemplars for counter metrics, introduced in Prometheus 2.43.
 * Any counter metrics with exemplars are ignored by any older Prometheus version on scrapping,
 * so it is important for counter metrics to <b>not</b> have exemplars, or they will be ignored,
 * if scrapped by Prometheus version less than 2.43.
 *
 * @see <a href="https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-3.2-Release-Notes#broader-exemplar-support-in-micrometer-112">Spring Boot 3.2 Release Notes</a>
 * @see <a href="https://github.com/GoogleCloudPlatform/prometheus-engine/issues/812">Managed Prometheus on GCP Issue</a>
 * @see <a href="https://stackoverflow.com/questions/77586575/disable-exemplars-support-in-spring-boot-3-2-to-avoid-prometheus-problem-with-sc">Stackoverflow discussion</a>
 * @see <a href="https://github.com/micrometer-metrics/micrometer/pull/3996">Micrometer Metrics PR</a>
 */
@Component
public class ExemplarSamplerBeanPostProcessor implements BeanPostProcessor {

	@Override
	public Object postProcessAfterInitialization(Object bean, String beanName) throws BeansException {
		if (bean instanceof ExemplarSampler delegate) {
			return new IgnoreCountersExemplarSampler(delegate);
		}
		return bean;
	}

	@RequiredArgsConstructor
	static class IgnoreCountersExemplarSampler implements ExemplarSampler {

		private final ExemplarSampler delegate;

		@Override
		public Exemplar sample(double increment, Exemplar previous) {
			// Do not return exemplar for counter metrics to allow them to be scrapped by Prometheus 2.42 and below
			return null;
		}

		@Override
		public Exemplar sample(double value, double bucketFrom, double bucketTo, Exemplar previous) {
			return delegate.sample(value, bucketFrom, bucketTo, previous);
		}

	}

}

I implemented it as a BeanPostProcessor as it is a workaround for now, but it could be implemented as part of

@Bean
@ConditionalOnMissingBean(ExemplarSampler.class)
@ConditionalOnBean(SpanContextSupplier.class)
public DefaultExemplarSampler exemplarSampler(SpanContextSupplier spanContextSupplier) {
return new DefaultExemplarSampler(spanContextSupplier);
}

Maybe there is a better way to conditionally disable adding of exemplars to the /actuator/prometheus response. Pinging @jonatan-ivanov as it is he, who suggested to create this feature request in the first place. I think he could have a very useful input on the topic.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Feb 26, 2024
@jonatan-ivanov
Copy link
Member

jonatan-ivanov commented Feb 26, 2024

I'm a bit puzzled here since exemplars for Counter was supported by Prometheus since the introduction of that feature. On the other hand exemplars on _count for Summary and Histogram was not, please see the description of the issue you linked:

Prometheus did not support exemplars for Summaries and it did not support exemplars for anything else than buckets on Histograms. Since Prometheus now supports exemplars on every time series, we can add exemplars to Summary/Histogram _count, see: prometheus/prometheus#11982

So you should not have any problems with Counters whatsoever, but you can run into issues with _count on Summary and Histogram.

Because of @ConditionalOnMissingBean(ExemplarSampler.class) if you create any ExemplarSampler @Bean Boot will back off and not create one so you don't need the BeanPostProcessor:

@Bean
public IgnoreCountersExemplarSampler exemplarSampler(SpanContextSupplier spanContextSupplier) { 
 	return new IgnoreCountersExemplarSampler(new DefaultExemplarSampler(spanContextSupplier)); 
 }

or if IgnoreCountersExemplarSampler would extend DefaultExemplarSampler, you would only need to override one method and create the instance like this:

new IgnoreCountersExemplarSampler(spanContextSupplier);

If you want to disable all exemplars, you can create a SpanContextSupplier that always says isSampled false. Also, your solution not just disables _count but it also disables exemplars on Prometheus Counters. Could you please help us with your request a bit?

  • Is it ok for you to disable exemplars for everything?
  • Or you want to disable exemplars only for _count (the one that prevents you to upgrade) and use the other Exemplars?
  • Disabling exemplars on _count and Counter does not make a lot of sense to me but let us know if this is what you want.

Has anyone asked GKE, why they are using a Prometheus version this old? 2.50.1 is the current Prometheus Server version.

@mhalbritter mhalbritter added the status: waiting-for-feedback We need additional information before we can continue label Feb 27, 2024
@mhalbritter
Copy link
Contributor

Thanks for your input, @jonatan-ivanov .

@xak2000
Copy link
Contributor Author

xak2000 commented Feb 27, 2024

I'm a bit puzzled here since exemplars for Counter was supported by Prometheus since the introduction of that feature. On the other hand exemplars on _count for Summary and Histogram was not

Please forgive me my ignorance, I didn't know that Counter and _count for Summary and Histogram are different things. To be honest, the terminology of metric types and other stuff in Prometheus is very confusing to me and despite the fact I use Prometheus several years, I still newbie in it.

Because of @ConditionalOnMissingBean(ExemplarSampler.class) if you create any ExemplarSampler @Bean Boot will back off and not create one so you don't need the BeanPostProcessor:

Yes, I know, please see my answer in the SO question. I just think that BeanPostProcessor is a better solution for a workaround, as it is not relying on the DefaultExemplarSampler class. If Spring Boot will change the default implementation class in the AutoConfiguration, the BeanPostProcessor will still wrap this new implementation, while custom bean will use the hardcoded DefaultExemplarSampler. But, anyway, it is just a matter of preference. Both solutions work identically.

If you want to disable all exemplars, you can create a SpanContextSupplier that always says isSampled false.

Yes, I know. You mentioned it in the SO answer. But this will disable exemplars for all metrics, right? I prefer to keep them for metrics where they were supported in Prometheus 2.41. A quote from GCP docs:

Google Cloud Managed Service for Prometheus can only ingest exemplars attached to histogram metrics. Exemplars attached to counter metrics can't be ingested. Prometheus histogram metrics are converted to the analogous Cloud Monitoring Distribution type, which supports exemplars. Non-distribution metrics in Cloud Monitoring do not support exemplars.

So, I would like to keep supported stuff in-place. Also, returning false from isSampled span that is really sampled looks like a hack to me. The method will lie just to fool the caller. What if isSampled will be used (or maybe it already is) by other callers that will use this information for some other means (not just to decide if exemplar should be added or not)...

Also, your solution not just disables _count but it also disables exemplars on Prometheus Counters.

I didn't know that. :(

Could you please help us with your request a bit?

To be honest, what I really want is to make sure that apps on Spring Boot 3.2+ will continue to be monitored on GCP. I already solved this problem for myself (see the workaround above). Probably not ideally, as you mentioned that my solution will disable exemplars on Counters that is unnecessary as it is supported by Prometheus 2.41, so by GCP. But at least my workaround allows our apps to be monitored again.

So, I wanted to create a solution that allows to solve this problem for anyone (not just me, and not only for GCP) without such workarounds. I think the problem of non-functional monitoring is important enough to justify the config property.

But, as you mentioned that this solution will not just disable exemplars for _count rows in the output, but for Counters too, then I have no idea how to disable exemplars just for _count rows...

Maybe it's not worth to do if it will require too many internal changes in Micrometer or Spring Boot. When these changes will finally be available in the next Spring Boot version, GCP (and I hope most other platforms) will already be upgraded to Prometheus 2.43+.

Has anyone asked GKE, why they are using a Prometheus version this old? 2.50.1 is the current Prometheus Server version.

This question is flying in the air, but I do not know anyone from GKE team to ask such technical question. Maybe it's just a principle: "If it ain’t broke, don’t fix it." :)

And thank you for you thoughtful input, @jonatan-ivanov!

P.S. As you mentioned that Counter and _count are different things, can you rename please the title of this issue? I don't know how to propertly call this _count thing. It's still a counter, but not Counter metric type.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Feb 27, 2024
@jonatan-ivanov
Copy link
Member

jonatan-ivanov commented Feb 27, 2024

Please forgive me my ignorance [...]

I wanted to make sure that we are on the same page and I'm not missing something.

Google Cloud Managed Service for Prometheus can only ingest exemplars attached to histogram metrics. Exemplars attached to counter metrics can't be ingested.

GCP ignoring exemplars on Counters surprises me a bit but I'm not sure I understand why is it a problem for GCP that other time series can also contain exemplars in this case. GCP could ignore those too. (Maybe they remove the exemplar after Prometheus parsed the data and the older version of Prometheus they use cannot parse _count?)

So, I would like to keep supported stuff in-place.

This complicates things since I feel in order to solve this in Micrometer/Boot, Micrometer also needs to be changed since it should make _count configurable. This can be a rabbit hole if we want to add extra exemplars later (we have/had requests for more, like _max).

Also, your solution not just disables _count but it also disables exemplars on Prometheus Counters.

I didn't know that. :(

If you only use it for GCP which seems to be ignoring exemplars on Counter, disabling it does not make a difference for you, so your workaround is ok for GCP.

Maybe it's not worth to do if it will require too many internal changes in Micrometer or Spring Boot. When these changes will finally be available in the next Spring Boot version, GCP (and I hope most other platforms) will already be upgraded to Prometheus 2.43+.

This is definitely something we should consider. I think a proper solution for this would be through content negotiation: releasing OpenMetrics 1.1 with the extra exemplars feature that Prometheus already supports, plus making the Prometheus Java Client return the extra exemplars if OpenMetrics 1.1 is requested and do not return them if OpenMetrics 1.0 is asked for.

@xak2000
Copy link
Contributor Author

xak2000 commented Feb 28, 2024

I think you are right and content negotiation is the right way to do it.

Looking back, I think that when the ability to parse _count metrics with exemplars was implemented in Prometheus 2.43, a new content type version should have been created, so if a scrapper from Prometheus 2.41 uses an older version in the Accept header, the output should just not include exemplars in _count. This way such change would not be a breaking change.

But such solution will probably require many ifs in the implementation of metrics data serializer..

Or, if older Prometheus would just ignore unknown parts of line (e.g. anything after # that could not be parsed is considered as a comment), then adding exemplars to _count would also not be a breaking change. But I don't know the format well, so it's a speculation from my side. Maybe there are good reasons why Prometheus doesn't do this.

@mhalbritter
Copy link
Contributor

There's nothing Spring Boot can here, right? I'm closing this issue.

@mhalbritter mhalbritter closed this as not planned Won't fix, can't repro, duplicate, stale Feb 28, 2024
@mhalbritter mhalbritter added status: invalid An issue that we don't feel is valid for: external-project For an external project and not something we can fix and removed status: waiting-for-triage An issue we've not yet triaged status: feedback-provided Feedback has been provided labels Feb 28, 2024
@philwebb philwebb changed the title Introduce a config property to disable exemplars for Metrics of counter type Introduce a config property to disable exemplars for Metrics of counter type Feb 28, 2024
@jonatan-ivanov
Copy link
Member

@xak2000

I think that when the ability to parse _count metrics with exemplars was implemented in Prometheus 2.43, a new content type version should have been created [...]

There is no such content type, Open Metrics 1.1 does not exist (this should change in the future though).

if older Prometheus would just ignore unknown parts [...]

Yeah, I don't know why Prometheus drops the whole scrape response instead of ignoring what is not supported.

@mhalbritter I think in this particular case nothing Spring Boot can do here since @xak2000 needs existing exemplars and only wants to ignore the new ones (this also needs a change in Micrometer). Though there might be users in the future who will ask an enable flag for the whole exemplars functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: external-project For an external project and not something we can fix status: invalid An issue that we don't feel is valid
Projects
None yet
Development

No branches or pull requests

4 participants