-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/prometheus] Receiver incorrectly sets StartTimestamp #22810
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
The issue may be with the second scrape. Here, the
Maybe this is correct according to spec (??). But the interaction of this with |
Can you provide more detail regarding why this is not an option? This is the mechanism that the prometheus receiver exposes to do what you are trying to do. Does the receiver creator need to provide more flexibility? |
We are using receivers:
receiver_creator/prometheus:
watch_observers:
- k8s_observer
receivers:
prometheus:
rule: type == "pod" && annotations["prometheus.io/scrape"] == "true"
config:
config:
scrape_configs:
- job_name: 'prometheus:`name`'
scrape_interval: 60s
metrics_path: '`"prometheus.io/path" in annotations ? annotations["prometheus.io/path"] : "/metrics"`'
static_configs:
- targets: [ '`endpoint`:`"prometheus.io/port" in annotations ? annotations["prometheus.io/port"] : 9090`' ] I suppose we could devise some custom Kubernetes annotation so that apps can configure their start_time regex, and then propagate that down to the config. However, that would be a non-standard annotation and also a bit impractical to expect all the apps we support to modify their k8s manifests to include it. It's a worthy goal, but I'd also expect reasonable fallback behavior out of the box by |
I'm actually not able to reproduce this locally receivers:
prometheus:
config:
scrape_configs:
- job_name: 'istio'
scrape_interval: 30s
static_configs:
- targets: ['0.0.0.0:15020']
processors:
cumulativetodelta:
filter:
metrics:
metric:
- 'name != "istio_requests"'
datapoint:
- 'attributes["response_code"] != "200"'
exporters:
logging:
verbosity: detailed
connectors:
forward:
service:
telemetry:
logs:
level: DEBUG
development: true
encoding: console
pipelines:
metrics/1:
receivers:
- prometheus
processors:
- filter
exporters:
- forward
- logging
metrics/2:
receivers:
- forward
processors:
- cumulativetodelta
exporters:
- logging I see the following datapoints output, which looks correct
But we are definitely seeing this in our production environment after upgrading to |
Helpful gist from @seankhliao |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I think this is expected behavior based on the specification. See https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/blob/main/exporter/collector/internal/normalization/standard_normalizer.go#L264 for what my (GCP) exporter does to handle these points. We drop points where the start time isn't after the end time, and use that to "normalize" subsequent points by subtracting the initial value from them. I'm personally not a fan of the behavior, as the point implies that a non-zero number of events occurred in zero time. I would expect most backends to fail to handle these points if they aren't filtered out. A nil start time seems like a better representation of what we actually know, and I would rather have a processor that can deal with the lack of start time than have the points. We might need to change the spec first, though. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Component(s)
receiver/prometheus
Description
The
prometheus
receiver is settingStartTimestamp
on cumulative metrics to the observedTimestamp
instead ofnil
(or a "zero value") whenprocess_time_seconds
is not matched.This is undesired behavior in general, but causes severe issues when used in combination with
cumulativetodelta
configured withinitial_value: auto
(the default).A similar bug report was made in #17190, though that seems to point to the
cumulativetodelta
processor as the culprit. I believe this bug is inprometheusreceiver
.Steps to Reproduce
Start an OTel Collector with a
prometheus
receiverExpected Result
StartTimestamp
on cumulative metric should either:process_start_time
metricnull
or a "zero value"Actual Result
StartTimestamp
on cumulative metric is equal to the ObservedTimestamp
on that metricCollector version
0.78.0
Environment information
No response
OpenTelemetry Collector configuration
Log output
Additional context
The scraped endpoint does return a
process_start_time_seconds
metric but it has a prefix.I'm aware of the
start_time_metric_regex
configuration option in the receiver, but that's not usable for us since we are usingreceivercreator
here to do autodiscovery and the regex is application-specific.The text was updated successfully, but these errors were encountered: