[[outputs.http]] prometheus remote write metrics are not receiving on Thanos ""unsupported value type"" #12830

Ishmeet · 2023-03-10T03:41:37Z

Relevant telegraf.conf

[[outputs.http]]
      url = "http://thanos-receive.thanos2:19291/api/v1/receive"
      timeout = "60s"
      method = "POST"
      data_format = "prometheusremotewrite"
      #data_format = "prometheus"
      insecure_skip_verify = true
      use_batch_format = true
      content_encoding = "snappy"
      non_retryable_statuscodes = [409, 413] --> This does not have any effect on removing
      [outputs.http.headers]
        cluster_name = "telegraf"
        Content-Type = "application/x-protobuf"
        Content-Encoding = "snappy"
        X-Prometheus-Remote-Write-Version = "0.1.2"

[[inputs.prometheus]]
      urls = [
        "http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics"
      ]
      data_format = "prometheusremotewrite"
      [[inputs.prometheus.tags]]
        cluster = "telegraf"

Logs from Telegraf

[centos@k8node01 ~]$ kubectl logs my-release-telegraf-595c4bc7cf-5rnjm -n default
2023-03-10T03:34:23Z I! Using config file: /etc/telegraf/telegraf.conf
2023-03-10T03:34:23Z I! Starting Telegraf 1.25.3
2023-03-10T03:34:23Z I! Available plugins: 228 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
2023-03-10T03:34:23Z I! Loaded inputs: prometheus
2023-03-10T03:34:23Z I! Loaded aggregators: 
2023-03-10T03:34:23Z I! Loaded processors: enum
2023-03-10T03:34:23Z I! Loaded secretstores: 
2023-03-10T03:34:23Z I! Loaded outputs: http
2023-03-10T03:34:23Z I! Tags enabled: host=telegraf-polling-service
2023-03-10T03:34:23Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"telegraf-polling-service", Flush Interval:10s
2023-03-10T03:34:23Z D! [agent] Initializing plugins
2023-03-10T03:34:23Z D! [agent] Connecting outputs
2023-03-10T03:34:23Z D! [agent] Attempting connection to [outputs.http]
2023-03-10T03:34:23Z D! [agent] Successfully connected to outputs.http
2023-03-10T03:34:23Z D! [agent] Starting service inputs
2023-03-10T03:34:30Z D! [outputs.http] Wrote batch of 1000 metrics in 35.28967ms
2023-03-10T03:34:30Z D! [outputs.http] Buffer fullness: 6719 / 10000 metrics
2023-03-10T03:34:30Z D! [outputs.http] Wrote batch of 1000 metrics in 12.434578ms
2023-03-10T03:34:30Z D! [outputs.http] Buffer fullness: 6188 / 10000 metrics
2023-03-10T03:34:30Z D! [outputs.http] Wrote batch of 1000 metrics in 7.152857ms
2023-03-10T03:34:30Z D! [outputs.http] Buffer fullness: 5188 / 10000 metrics
2023-03-10T03:34:33Z D! [outputs.http] Wrote batch of 1000 metrics in 12.945771ms
2023-03-10T03:34:33Z D! [outputs.http] Wrote batch of 1000 metrics in 7.292723ms
2023-03-10T03:34:33Z D! [outputs.http] Wrote batch of 1000 metrics in 7.379668ms
2023-03-10T03:34:33Z D! [outputs.http] Wrote batch of 1000 metrics in 7.396781ms
2023-03-10T03:34:33Z D! [outputs.http] Wrote batch of 1000 metrics in 7.302257ms
2023-03-10T03:34:33Z D! [outputs.http] Wrote batch of 188 metrics in 2.570767ms
2023-03-10T03:34:33Z D! [outputs.http] Buffer fullness: 0 / 10000 metrics
2023-03-10T03:34:40Z D! [outputs.http] Wrote batch of 1000 metrics in 12.181248ms
2023-03-10T03:34:40Z D! [outputs.http] Buffer fullness: 3408 / 10000 metrics
2023-03-10T03:34:40Z D! [outputs.http] Wrote batch of 1000 metrics in 9.080564ms
2023-03-10T03:34:40Z D! [outputs.http] Buffer fullness: 3476 / 10000 metrics
2023-03-10T03:34:40Z D! [outputs.http] Wrote batch of 1000 metrics in 21.454808ms
2023-03-10T03:34:40Z D! [outputs.http] Buffer fullness: 5189 / 10000 metrics
2023-03-10T03:34:40Z D! [outputs.http] Wrote batch of 1000 metrics in 9.879934ms
2023-03-10T03:34:40Z D! [outputs.http] Buffer fullness: 4189 / 10000 metrics
2023-03-10T03:34:43Z D! [outputs.http] Wrote batch of 1000 metrics in 9.991816ms
2023-03-10T03:34:43Z D! [outputs.http] Wrote batch of 1000 metrics in 9.819117ms
2023-03-10T03:34:43Z D! [outputs.http] Wrote batch of 1000 metrics in 10.283911ms
2023-03-10T03:34:43Z D! [outputs.http] Wrote batch of 1000 metrics in 15.871428ms
2023-03-10T03:34:43Z D! [outputs.http] Wrote batch of 189 metrics in 2.509724ms
2023-03-10T03:34:43Z D! [outputs.http] Buffer fullness: 0 / 10000 metrics

System info

telegraf:1.25-alpine

Docker

No response

Steps to reproduce

Configure Inputs.Prometheus to read from kube-state-metrics.
Configure outputs.http with data_format = "prometheusremotewrite" and thanos url.
Check on Thanos receiver.
...

Expected behavior

Metrics should have been received on Thanos. Or some error/warning should have come on telegraf.

Actual behavior

No error on Thanos and Telegraf but still metrics are not receiving on Thanos

Additional info

Error logs on Thanos

level=warn ts=2023-03-10T07:27:14.137749483Z caller=writer.go:131 component=receive component=receive-writer msg="Error on ingesting out-of-order samples" numDropped=206
level=debug ts=2023-03-10T07:33:34.170158185Z caller=writer.go:88 component=receive component=receive-writer msg="Out of order sample" lset="{__name__=\"kube_deployment_status_condition_gauge\", condition=\"Progressing\", deployment=\"my-release-telegraf\", host=\"telegraf-polling-service\", namespace=\"default\", status=\"false\", url=\"[http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics\"}"](http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics/%22%7D%22) sample="unsupported value type"

The text was updated successfully, but these errors were encountered:

powersj · 2023-03-10T04:05:54Z

No error on Thanos and Telegraf but still metrics are not receiving on Thanos

Why do you think this is an issue in telegraf?

Telegraf clearly is getting a valid 2xx return code back from your HTTP endpoint, otherwise, it would be erroring and not claiming successful writes.

If you can provide additional logs or information that point to an issue in Telegraf, we would be very happy to help resolve any issues. However, without additional information (e.g. debug logs from Thanos showing issues with the request), it is not clear where to take this report.

Ishmeet · 2023-03-10T07:27:50Z

No error on Thanos and Telegraf but still metrics are not receiving on Thanos

Why do you think this is an issue in telegraf?

Telegraf clearly is getting a valid 2xx return code back from your HTTP endpoint, otherwise, it would be erroring and not claiming successful writes.

If you can provide additional logs or information that point to an issue in Telegraf, we would be very happy to help resolve any issues. However, without additional information (e.g. debug logs from Thanos showing issues with the request), it is not clear where to take this report.

level=warn ts=2023-03-10T07:27:14.137749483Z caller=writer.go:131 component=receive component=receive-writer msg="Error on ingesting out-of-order samples" numDropped=206

More logs

level=debug ts=2023-03-10T07:33:34.170158185Z caller=writer.go:88 component=receive component=receive-writer msg="Out of order sample" lset="{__name__=\"kube_deployment_status_condition_gauge\", condition=\"Progressing\", deployment=\"my-release-telegraf\", host=\"telegraf-polling-service\", namespace=\"default\", status=\"false\", url=\"[http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics\"}"](http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics/%22%7D%22) sample="unsupported value type"

Input plugin:

    [[inputs.prometheus]]
      urls = [
        "http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics"
      ]
      data_format = "prometheusremotewrite"
      [[inputs.prometheus.tags]]
        cluster = "telegraf"

powersj · 2023-03-10T17:29:16Z

Error on ingesting out-of-order samples" numDropped=206

This has come up before with this serializer. In general metrics are not ordered in Telegraf. Let me chat with @srebhan and get back to you.

powersj · 2023-03-13T22:35:35Z

Hi,

We chatted about this issue a bit more today. While we could possibly order individual batches, we ultimately cannot order all your metrics that you might send. Depending on the situation, 1) that might mean that you have metrics in the buffer, that will get split up and be across different times; 2) you could push data from different inputs with newer timestamps where you could come across this as well; 3) your inputs could just be impacted due to wrong times set see thanos-io/thanos#4831 for a longer discussion as well.

Reading through https://thanos.io/tip/operating/troubleshooting.md/#out-of-order-samples-error it seems that the key thing is the set of labels used on the metrics are unique from each deployment, and ensuring you avoid duplicate metrics.

It is not clear from this what Telegraf could further do to aid in resolving this issue, nor does it seem that there is a one singular change or fix for this issue.

telegraf-tiger · 2023-03-28T18:09:43Z

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!

Ishmeet added the bug unexpected problem or unintended behavior label Mar 10, 2023

powersj added the waiting for response waiting for response from contributor label Mar 10, 2023

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 10, 2023

Ishmeet changed the title ~~[[outputs.http]] prometheus remote write metrics are not receiving on Thanos~~ [[outputs.http]] prometheus remote write metrics are not receiving on Thanos ""unsupported value type"" Mar 10, 2023

powersj added the waiting for response waiting for response from contributor label Mar 13, 2023

telegraf-tiger bot closed this as completed Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[[outputs.http]] prometheus remote write metrics are not receiving on Thanos ""unsupported value type"" #12830

[[outputs.http]] prometheus remote write metrics are not receiving on Thanos ""unsupported value type"" #12830

Ishmeet commented Mar 10, 2023 •

edited

powersj commented Mar 10, 2023

Ishmeet commented Mar 10, 2023 •

edited

powersj commented Mar 10, 2023

powersj commented Mar 13, 2023

telegraf-tiger bot commented Mar 28, 2023

[[outputs.http]] prometheus remote write metrics are not receiving on Thanos ""unsupported value type"" #12830

[[outputs.http]] prometheus remote write metrics are not receiving on Thanos ""unsupported value type"" #12830

Comments

Ishmeet commented Mar 10, 2023 • edited

Relevant telegraf.conf

Logs from Telegraf

System info

Docker

Steps to reproduce

Expected behavior

Actual behavior

Additional info

powersj commented Mar 10, 2023

Ishmeet commented Mar 10, 2023 • edited

powersj commented Mar 10, 2023

powersj commented Mar 13, 2023

telegraf-tiger bot commented Mar 28, 2023

Ishmeet commented Mar 10, 2023 •

edited

Ishmeet commented Mar 10, 2023 •

edited