Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus index out of range during delete_series #5037

Closed
MaxDiOrio opened this Issue Dec 24, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@MaxDiOrio
Copy link

MaxDiOrio commented Dec 24, 2018

Bug Report

What did you do?

For some reason, I am having an issue with one series.

2/24/2018 11:22:54 AM level=error ts=2018-12-24T16:22:54.269059847Z caller=compact.go:403 component=tsdb msg="removed tmp folder after failed compaction" err="remove /data/01CZGGDC0C36S569AC669SR7A0.tmp/chunks: directory not empty"
12/24/2018 11:22:54 AM level=error ts=2018-12-24T16:22:54.26911238Z caller=db.go:281 component=tsdb msg="compaction failed" err="persist head block: write compaction: add series: out-of-order series added with label set "{name=\"container_cpu_load_average_10s\",beta_kubernetes_io_arch=\"amd64\",beta_kubernetes_io_os=\"linux\",id=\"/docker/7be99dc8541e3425db0204e7c2013c5ec6877ca80113b141051ebf13bb4d6114\",image=\"rancher/rke-tools:v0.1.13\",instance=\"la-1pk8s-w2\",job=\"kubernetes-nodes-cadvisor\",kubernetes_io_hostname=\"la-1pk8s-w2\",name=\"service-sidekick\",node_role_kubernetes_io_worker=\"true\"}""
12/24/2018 11:23:26 AM level=info ts=2018-12-24T16:23:26.313990352Z caller=compact.go:393 component=tsdb msg="compact blocks" count=1 mint=1542902400000 maxt=1542909600000
12/24/2018 11:23:29 AM level=error ts=2018-12-24T16:23:29.188778779Z caller=compact.go:403 component=tsdb msg="removed tmp folder after failed compaction" err="remove /data/01CZGGEEB94FFX5YN2217FY4WG.tmp/chunks: directory not empty"
12/24/2018 11:23:29 AM level=error ts=2018-12-24T16:23:29.188866667Z caller=db.go:281 component=tsdb msg="compaction failed" err="persist head block: write compaction: add series: out-of-order series added with label set "{name=\"container_cpu_load_average_10s\",beta_kubernetes_io_arch=\"amd64\",beta_kubernetes_io_os=\"linux\",id=\"/docker/7be99dc8541e3425db0204e7c2013c5ec6877ca80113b141051ebf13bb4d6114\",image=\"rancher/rke-tools:v0.1.13\",instance=\"la-1pk8s-w2\",job=\"kubernetes-nodes-cadvisor\",kubernetes_io_hostname=\"la-1pk8s-w2\",name=\"service-sidekick\",node_role_kubernetes_io_worker=\"true\"}""

So I figure I'd delete the series and let it re-create:

Calling

curl -XPOST -d@prom_delete.json -H"Content-Type: application/json" http://prometheus.prod.cluster/api/v2/admin/tsdb/delete_series

with

{
"matchers": [{
"type": "EQ",
"name": "job",
"value": "kubernetes-nodes-cadvisor"
}]
}

Yields:

12/24/2018 11:23:33 AM panic: runtime error: index out of range
12/24/2018 11:23:33 AM
12/24/2018 11:23:33 AM goroutine 3354 [running]:
12/24/2018 11:23:33 AM github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*memSeries).minTime(...)
12/24/2018 11:23:33 AM /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:1109
12/24/2018 11:23:33 AM github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).Delete(0xc420283540, 0x80003883122cdf10, 0x7fffc77cedd324d7, 0xc60b769dd0, 0x1, 0x1, 0xc4abc52788, 0xc4abc52790)
12/24/2018 11:23:33 AM /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:591 +0x50d
12/24/2018 11:23:33 AM github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*DB).Delete.func2(0xc400000008, 0x1b6e848)
12/24/2018 11:23:33 AM /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/db.go:726 +0x5e
12/24/2018 11:23:33 AM github.com/prometheus/prometheus/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1(0xc5ffa6ff00, 0xc5fcb182c0)
12/24/2018 11:23:33 AM /go/src/github.com/prometheus/prometheus/vendor/golang.org/x/sync/errgroup/errgroup.go:58 +0x57
12/24/2018 11:23:33 AM created by github.com/prometheus/prometheus/vendor/golang.org/x/sync/errgroup.(*Group).Go
12/24/2018 11:23:33 AM /go/src/github.com/prometheus/prometheus/vendor/golang.org/x/sync/errgroup/errgroup.go:55 +0x66

Environment

Running in K8S

Prometheus version=2.2.1, branch=HEAD, revision=bc6058c81272a8d938c05e75607371284236aadc

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Dec 24, 2018

Thanks! Any chance you could try this with 2.6.0? 2.2.1 is quite an old Prometheus version and there have been a myriad of fixes to the TSDB code since then.

In particular, in the current master of Prometheus I see an array length check that prevents this specific index out of range access: https://github.com/prometheus/prometheus/blob/v2.6.0/vendor/github.com/prometheus/tsdb/head.go#L1397

@MaxDiOrio

This comment has been minimized.

Copy link
Author

MaxDiOrio commented Dec 25, 2018

I'll give it a try. I'm using the Rancher Helm chart that is severly outdated it seems.

@MaxDiOrio

This comment has been minimized.

Copy link
Author

MaxDiOrio commented Dec 26, 2018

Thank you! This seems to have resolved the issue. Prometheus also starts up significantly faster than it did before.

Now all I have to figure out is why my Influx DB queue is filling up causing prometheus to dump metrics.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jan 2, 2019

Thanks for the heads-up @MaxDiOrio! I'm closing the issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.