Prune index in Cassandra #1069

replay · 2018-09-26T15:22:30Z

We currently keep adding entries to the index in Cassandra and never prune them. At startup MT needs to load all of that data and filter it by the LastUpdated property to ignore the ones that have not been updated for a certain amount of time, but this makes the startup slower and slower because it needs to filter more data.
We should delete index entries from Cassandra once they have reached a certain age. That pruning age should probably be higher than when we prune them from the memory index, because we want to keep the ability to just adjust the memory pruning settings and restart MT to restore index entries that have already been pruned from memory.
If a user decides to send a metric again, and hence "activates" it again in the cassandra/memory indices, the historic data will still be available just like it is now.

The simplest solution for that would probably be a simple go routine that occasionally loads all the data from the cassandra index and deletes all the entries that haven't been updated for a certain time.

deniszh · 2018-10-01T09:46:08Z

I'm very 👍 for this ticket. Old metrics are bloating used memory on cluster restarts and potentially hazardous to overall cluster stability (e.g. after long period cluster will go OOM during restart).

replay · 2018-10-01T14:10:50Z

@deniszh have you observed a case where MT OOMed before it became ready, because it was loading so much index data? AFAIK that's a scenario that we have not seen ourselves (yet). I think it might be relatively easy to solve this specific problem by simply filtering the data from the index while it is being loaded, instead of first loading and then filtering it.

shanson7 · 2018-10-01T14:17:46Z

@replay - As another datapoint, we see a decent amount of memory used just loading in the cassandra data (even before indexing), but much less than when we actually start consuming the backlog.

I think it might be relatively easy to solve this specific problem by simply filtering the data from the index while it is being loaded, instead of first loading and then filtering it.

This might not be as easy as you think, since the reason that the data is all loaded in first is to join series that may differ by only interval changes.

deniszh · 2018-10-01T15:39:27Z

@replay

have you observed a case where MT OOMed before it became ready, because it was loading so much index data?

Not yet, but I definitely notice increased memory consumption right after cluster restart, and - if metrics are never cleaning up - what will stop index to grow indefinitely and consume more and more memory? After 1 month we already have 500K metric in tank, and 2M in the index (per instance, that's real numbers), so, in 2 months it will be 4M, in 3 - 6M etc.

I think it might be relatively easy to solve this specific problem by simply filtering the data from the index while it is being loaded, instead of first loading and then filtering it.

Yes, I was also thinking of a similar solution. Not sure what @shanson7 means in "series differ by only interval changes", though...

replay · 2018-10-01T16:02:42Z

I think what he means is that if a metric's interval changes while its name remains the same, then the updated metric definition gets added to the index as a new separate metric, so the old one (with the old interval) does not receive updates anymore and all the new data points will go to the new metric. In such a case we don't want the old metric to get pruned though, because otherwise that would look like the data from before the interval change of that metric just disappeared.

shanson7 · 2018-10-01T16:08:01Z

@replay - exactly what I meant.

@deniszh

what will stop index to grow indefinitely and consume more and more memory?

While all of the metric definitions for a given set of partitions are pulled in on start up, they will not all be indexed. That is controlled by the MT_CASSANDRA_IDX_MAX_STALE setting (ours is set to 720h. That means that it really is just a start up cost. Most of our instances are at around 4M in the index, and about a 10GB heap to load in all of those metric definitions.

One thing we did do, however, is set a TTL on our metric_idx table. I think this is the cheapest solution to the "forever growing" problem.

replay · 2018-10-01T16:24:01Z

@shanson7 if you use TTL to expire entries from the metric_idx table, wouldn't that also lead to issues of the old entries expiring if a metric's interval has changed? or do you just set the TTL so high that this doesn't matter anymore?

shanson7 · 2018-10-01T16:26:00Z

I have it set to the lifetime of the time-series data, but I expect I could be more lenient.

deniszh · 2018-10-01T16:26:23Z

Still not getting how MT_CASSANDRA_IDX_MAX_STALE will help. I also have it set to 30d, but if I reading https://github.com/grafana/metrictank/blob/master/idx/cassandra/cassandra.go right it will just prune old metrics from memory (https://github.com/grafana/metrictank/blob/master/idx/cassandra/cassandra.go#L541) and record will be in metric_idx table forever, right? As far as I understand, this issue is exactly about that?

So, during restart, all indexes (for specific partitions) will be loaded from Cassandra and will be in memory until the next prune period happened (3 hours). If I have 100M stale metric in index all 100M will be loaded and probably will kill MT in the process.

shanson7 · 2018-10-01T16:30:33Z

So, during restart, all indexes (for specific partitions) will be loaded from Cassandra and will be in memory until the next prune period happened (3 hours)

No, it also uses when loading the initial index.

It will read in all of the old data, but will not index anything older than maxStale (except in the interval change scenario). It is still quite expensive on start up, of course.

deniszh · 2018-10-01T16:35:06Z

Ah, I totally misread the whole issue then, sorry. 🤦‍♂️ Indeed, it's filtering during load index, so, it's just a matter of startup time.
Well, separate "disk prune" parameter for separate prune routine will be enough IMO.

deniszh · 2018-10-01T16:36:57Z

I'm wondering now does I have working prune setting at all... idx.cassandra.prune.values.count32 is still 0 in my case...

shanson7 · 2018-10-01T16:40:09Z

You might be hitting the same issue I did. 30d didn't work for us, we needed 720h

See #944

deniszh · 2018-10-01T16:44:14Z

Wow, that's interesting! Thanks, @shanson7, will check.

replay · 2018-10-01T16:47:48Z

unfortunately that variable appears to rely on ParseDuration, d is not a valid unit: https://golang.org/pkg/time/#ParseDuration

deniszh · 2018-10-01T16:56:06Z

Wow, that's very sneaky, indeed! Thanks for help!

deniszh · 2018-10-01T17:51:34Z

As I'm not only one who was hitten by that pesky duration, it's provably better to at least mention that in example config, or even log warning or something. Will create separate issue for that.

deniszh · 2018-10-01T17:53:38Z

Ah, missed #944

shanson7 · 2018-10-04T15:37:47Z

This issue inspired me to do a little research into our cluster. It takes our instances 15-20 minutes (!!) to load in all the metadata from cassandra. I wrote a tool that mimics the cassandra idx and does the same logic to keep series that have an interval change from a live series.

The result is that 60% of our data is staler than 90d, and we only use 30d for the memory idx stale time. So, pruning cassandra could drastically speed up our start up (not to mention, reduce the memory cost)

Dieterbe · 2019-01-14T16:05:14Z

It takes our instances 15-20 minutes (!!)

I'm seeing the same with our largest cloud instances.
obviously this is something we will address, but it will take at least another few months I think. bigger fish to fry...

Dieterbe mentioned this issue Mar 6, 2019

We should have a tool to prune the persistent storage index #1229

Closed

replay mentioned this issue Mar 7, 2019

add the mt-index-prune utility #1231

Merged

woodsaj closed this as completed in #1231 Mar 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prune index in Cassandra #1069

Prune index in Cassandra #1069

replay commented Sep 26, 2018

deniszh commented Oct 1, 2018

replay commented Oct 1, 2018

shanson7 commented Oct 1, 2018

deniszh commented Oct 1, 2018 •

edited

Loading

replay commented Oct 1, 2018

shanson7 commented Oct 1, 2018 •

edited

Loading

replay commented Oct 1, 2018

shanson7 commented Oct 1, 2018

deniszh commented Oct 1, 2018

shanson7 commented Oct 1, 2018 •

edited

Loading

deniszh commented Oct 1, 2018

deniszh commented Oct 1, 2018

shanson7 commented Oct 1, 2018 •

edited

Loading

deniszh commented Oct 1, 2018

replay commented Oct 1, 2018

deniszh commented Oct 1, 2018

deniszh commented Oct 1, 2018

deniszh commented Oct 1, 2018

shanson7 commented Oct 4, 2018

Dieterbe commented Jan 14, 2019 •

edited

Loading

Prune index in Cassandra #1069

Prune index in Cassandra #1069

Comments

replay commented Sep 26, 2018

deniszh commented Oct 1, 2018

replay commented Oct 1, 2018

shanson7 commented Oct 1, 2018

deniszh commented Oct 1, 2018 • edited Loading

replay commented Oct 1, 2018

shanson7 commented Oct 1, 2018 • edited Loading

replay commented Oct 1, 2018

shanson7 commented Oct 1, 2018

deniszh commented Oct 1, 2018

shanson7 commented Oct 1, 2018 • edited Loading

deniszh commented Oct 1, 2018

deniszh commented Oct 1, 2018

shanson7 commented Oct 1, 2018 • edited Loading

deniszh commented Oct 1, 2018

replay commented Oct 1, 2018

deniszh commented Oct 1, 2018

deniszh commented Oct 1, 2018

deniszh commented Oct 1, 2018

shanson7 commented Oct 4, 2018

Dieterbe commented Jan 14, 2019 • edited Loading

deniszh commented Oct 1, 2018 •

edited

Loading

shanson7 commented Oct 1, 2018 •

edited

Loading

shanson7 commented Oct 1, 2018 •

edited

Loading

shanson7 commented Oct 1, 2018 •

edited

Loading

Dieterbe commented Jan 14, 2019 •

edited

Loading