Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Prune index in Cassandra #1069

Closed
replay opened this issue Sep 26, 2018 · 20 comments · Fixed by #1231
Closed

Prune index in Cassandra #1069

replay opened this issue Sep 26, 2018 · 20 comments · Fixed by #1231

Comments

@replay
Copy link
Contributor

replay commented Sep 26, 2018

We currently keep adding entries to the index in Cassandra and never prune them. At startup MT needs to load all of that data and filter it by the LastUpdated property to ignore the ones that have not been updated for a certain amount of time, but this makes the startup slower and slower because it needs to filter more data.
We should delete index entries from Cassandra once they have reached a certain age. That pruning age should probably be higher than when we prune them from the memory index, because we want to keep the ability to just adjust the memory pruning settings and restart MT to restore index entries that have already been pruned from memory.
If a user decides to send a metric again, and hence "activates" it again in the cassandra/memory indices, the historic data will still be available just like it is now.

The simplest solution for that would probably be a simple go routine that occasionally loads all the data from the cassandra index and deletes all the entries that haven't been updated for a certain time.

@deniszh
Copy link

deniszh commented Oct 1, 2018

I'm very 👍 for this ticket. Old metrics are bloating used memory on cluster restarts and potentially hazardous to overall cluster stability (e.g. after long period cluster will go OOM during restart).

@replay
Copy link
Contributor Author

replay commented Oct 1, 2018

@deniszh have you observed a case where MT OOMed before it became ready, because it was loading so much index data? AFAIK that's a scenario that we have not seen ourselves (yet). I think it might be relatively easy to solve this specific problem by simply filtering the data from the index while it is being loaded, instead of first loading and then filtering it.

@shanson7
Copy link
Collaborator

shanson7 commented Oct 1, 2018

@replay - As another datapoint, we see a decent amount of memory used just loading in the cassandra data (even before indexing), but much less than when we actually start consuming the backlog.

I think it might be relatively easy to solve this specific problem by simply filtering the data from the index while it is being loaded, instead of first loading and then filtering it.

This might not be as easy as you think, since the reason that the data is all loaded in first is to join series that may differ by only interval changes.

@deniszh
Copy link

deniszh commented Oct 1, 2018

@replay

have you observed a case where MT OOMed before it became ready, because it was loading so much index data?

Not yet, but I definitely notice increased memory consumption right after cluster restart, and - if metrics are never cleaning up - what will stop index to grow indefinitely and consume more and more memory? After 1 month we already have 500K metric in tank, and 2M in the index (per instance, that's real numbers), so, in 2 months it will be 4M, in 3 - 6M etc.

I think it might be relatively easy to solve this specific problem by simply filtering the data from the index while it is being loaded, instead of first loading and then filtering it.

Yes, I was also thinking of a similar solution. Not sure what @shanson7 means in "series differ by only interval changes", though...

@replay
Copy link
Contributor Author

replay commented Oct 1, 2018

I think what he means is that if a metric's interval changes while its name remains the same, then the updated metric definition gets added to the index as a new separate metric, so the old one (with the old interval) does not receive updates anymore and all the new data points will go to the new metric. In such a case we don't want the old metric to get pruned though, because otherwise that would look like the data from before the interval change of that metric just disappeared.

@shanson7
Copy link
Collaborator

shanson7 commented Oct 1, 2018

@replay - exactly what I meant.

@deniszh

what will stop index to grow indefinitely and consume more and more memory?

While all of the metric definitions for a given set of partitions are pulled in on start up, they will not all be indexed. That is controlled by the MT_CASSANDRA_IDX_MAX_STALE setting (ours is set to 720h. That means that it really is just a start up cost. Most of our instances are at around 4M in the index, and about a 10GB heap to load in all of those metric definitions.

One thing we did do, however, is set a TTL on our metric_idx table. I think this is the cheapest solution to the "forever growing" problem.

@replay
Copy link
Contributor Author

replay commented Oct 1, 2018

@shanson7 if you use TTL to expire entries from the metric_idx table, wouldn't that also lead to issues of the old entries expiring if a metric's interval has changed? or do you just set the TTL so high that this doesn't matter anymore?

@shanson7
Copy link
Collaborator

shanson7 commented Oct 1, 2018

I have it set to the lifetime of the time-series data, but I expect I could be more lenient.

@deniszh
Copy link

deniszh commented Oct 1, 2018

Still not getting how MT_CASSANDRA_IDX_MAX_STALE will help. I also have it set to 30d, but if I reading https://github.com/grafana/metrictank/blob/master/idx/cassandra/cassandra.go right it will just prune old metrics from memory (https://github.com/grafana/metrictank/blob/master/idx/cassandra/cassandra.go#L541) and record will be in metric_idx table forever, right? As far as I understand, this issue is exactly about that?

So, during restart, all indexes (for specific partitions) will be loaded from Cassandra and will be in memory until the next prune period happened (3 hours). If I have 100M stale metric in index all 100M will be loaded and probably will kill MT in the process.

@shanson7
Copy link
Collaborator

shanson7 commented Oct 1, 2018

So, during restart, all indexes (for specific partitions) will be loaded from Cassandra and will be in memory until the next prune period happened (3 hours)

No, it also uses when loading the initial index.

It will read in all of the old data, but will not index anything older than maxStale (except in the interval change scenario). It is still quite expensive on start up, of course.

@deniszh
Copy link

deniszh commented Oct 1, 2018

Ah, I totally misread the whole issue then, sorry. 🤦‍♂️ Indeed, it's filtering during load index, so, it's just a matter of startup time.
Well, separate "disk prune" parameter for separate prune routine will be enough IMO.

@deniszh
Copy link

deniszh commented Oct 1, 2018

I'm wondering now does I have working prune setting at all... idx.cassandra.prune.values.count32 is still 0 in my case...

@shanson7
Copy link
Collaborator

shanson7 commented Oct 1, 2018

You might be hitting the same issue I did. 30d didn't work for us, we needed 720h

See #944

@deniszh
Copy link

deniszh commented Oct 1, 2018

Wow, that's interesting! Thanks, @shanson7, will check.

@replay
Copy link
Contributor Author

replay commented Oct 1, 2018

unfortunately that variable appears to rely on ParseDuration, d is not a valid unit: https://golang.org/pkg/time/#ParseDuration

@deniszh
Copy link

deniszh commented Oct 1, 2018

Wow, that's very sneaky, indeed! Thanks for help!

@deniszh
Copy link

deniszh commented Oct 1, 2018

As I'm not only one who was hitten by that pesky duration, it's provably better to at least mention that in example config, or even log warning or something. Will create separate issue for that.

@deniszh
Copy link

deniszh commented Oct 1, 2018

Ah, missed #944

@shanson7
Copy link
Collaborator

shanson7 commented Oct 4, 2018

This issue inspired me to do a little research into our cluster. It takes our instances 15-20 minutes (!!) to load in all the metadata from cassandra. I wrote a tool that mimics the cassandra idx and does the same logic to keep series that have an interval change from a live series.

The result is that 60% of our data is staler than 90d, and we only use 30d for the memory idx stale time. So, pruning cassandra could drastically speed up our start up (not to mention, reduce the memory cost)

@Dieterbe
Copy link
Contributor

Dieterbe commented Jan 14, 2019

It takes our instances 15-20 minutes (!!)

I'm seeing the same with our largest cloud instances.
obviously this is something we will address, but it will take at least another few months I think. bigger fish to fry...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants