Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete_series does not remove from typeahead #3488

Open
zemek opened this Issue Nov 17, 2017 · 13 comments

Comments

Projects
None yet
6 participants
@zemek
Copy link
Contributor

zemek commented Nov 17, 2017

We had a lot of bad metric names we wanted to delete so it wouldn't show up in the typeahead on /graph

When deleting a metric using

curl -XPOST -g 'http://localhost/api/v2/admin/tsdb/delete_series' -d '{"matchers": [{"name": "__name__", "value": "metric_name"}]}'

It deletes the timeseries data, but doesn't remove it from the typeahead results.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Nov 17, 2017

Deletion works via tombstones. We write a tombstone saying all data for the series has been deleted from t=0 to t=Inf but we don't actually delete the data until the next compaction. This means the index still holds the references to the series and they end up in the typeahead of /graph. The series would be removed after the relevant blocks are compacted away.

This is not actually semantically wrong and I am not sure we need to support this request. Any strong issue with the series appearing the typeahead?

@zemek

This comment has been minimized.

Copy link
Contributor Author

zemek commented Nov 17, 2017

@gouthamve The main reason is that we had consul statsd metrics emitting hostnames inside the metric names, which blew up the number of metrics by tens of thousands. This makes the typeahead really slow so I was trying to delete the bad metrics after we fixed the names.

When you say they'll be removed after the relevant blocks are compacted away, if the blocks are already at the maximum size, does that mean they'll never get removed? (until they go past the retention period?)

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Nov 17, 2017

Yes, if they are already large enough, you need to wait for them to go past the retention period. I might write a tool to process blocks with tombstones soon.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Nov 17, 2017

This sounds more like #2119

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Nov 17, 2017

This is unrelated to rendering query results @brian-brazil. We suggest metric names which have existed at any time in /graph. The label values API doesn't support range parameters at this moment and always queries from absolute minimum to absolute maximum. It'd make sense to only query for the metrics available in the selected time range.

@grobie grobie reopened this Nov 17, 2017

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Nov 17, 2017

That doesn't work UX wise as the metric name is the first thing entered, and the console view (which is the default) doesn't have a time parameter. The series endpoint does already support a time range.

The user's core complaint appears to be the slowdown caused by this situation, which is what #2119 covers.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Nov 17, 2017

Yeah, I'd rather see all the metrics, even not active in the time-range. Sometimes, I would change the range to see where the metric existed.

This issue is about slowness of the dropdown which lists metric names, I believe. Anyways, a tool which can process the blocks would be an acceptable stop-gap, I presume. Removing the series from the index would be an expensive operation where the block needs to be re-written.

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Nov 17, 2017

That doesn't work UX wise as the metric name is the first thing entered, and the console view (which is the default) doesn't have a time parameter.

Take the 1h default from the graph view. Showing metrics older than that in the console view which won't show any results any way is the exact problem here.

On increasing the time range in graph, the list can be reloaded.

Yeah, I'd rather see all the metrics, even not active in the time-range.

We repeatedly have issues with that behavior during outages. Metrics which have existed weeks ago but were renamed make it hard to see actual metrics available and investigating the situation.

Sometimes, I would change the range to see where the metric existed.

If you want to see any metrics which have ever existed in the time range, you'll still be able to do so as the list will be reloaded after you increased the time range.

@zemek

This comment has been minimized.

Copy link
Contributor Author

zemek commented Nov 17, 2017

Metrics which have existed weeks ago but were renamed make it hard to see actual metrics available and investigating the situation.

This is also an issue that we have since we standardized everything to lowercase, but both the old and new metric names show up and we have to explain to people why the metric names are duplicated and which one to pick

@matthias-kloeckner

This comment has been minimized.

Copy link

matthias-kloeckner commented Jan 9, 2018

We have the same problem. Deleting the metrics in 1.x prometheus almost instantly removed them in typeahead results. With 2.0 they are still available. Retention period is set to 365 days and we want to keep it that way.

Some exporters have a weird behaviour and create metrics on the fly - e.g. jenkins exporter is adding 2 new series for every agent. As we are creating new docker agents for every build, we ended up with 10k unneeded metrics before we found this unwanted behaviour. To fix this we added a drop action in the config.

So I think this is a valid and not uncommon use case to completely remove metrics including the typeahead results in the GUI. And as @gouthamve suggested a CLI tool would be fine for that.

@zemek

This comment has been minimized.

Copy link
Contributor Author

zemek commented Jan 18, 2018

@gouthamve Is this resolved in #3523 ? Does cleaning up the tombstones remove the metrics from the typeahead too?

@matthias-kloeckner

This comment has been minimized.

Copy link

matthias-kloeckner commented Feb 1, 2018

@zemek updating to prometheus 2.1 and cleaning the tombstones via API worked for us. thanks!

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Feb 1, 2018

@matthias-kloeckner yes it will work if none of the tombstones for those data were in in-mem block (which was in your case). But if it's there in in-mem block, then it still wont work.

Currently cleaning tombstones doesn't clean it from in-mem block (another similar issue #3728)

@zemek I am on it, this will be fixed soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.