Chunkspan wrapping seconds should be shifted across metrics #450

replay · 2017-01-06T15:08:24Z

With the merge of the chunk cache we can hopefully remove a lot of requests from Cassandra by serving them out of memory.
Currently, the chunkspans for all metrics are the same. That means that all metrics need to persist a chunk and allocate a new chunk at the same second. Assuming a query pattern where most queries can be served out of the chunk cache, this means that each chunkspan there is going to be a second at which all queries will hit Cassandra to request the newly persisted chunk and add it into the cache. This will likely lead to a spike of queries per time on Cassandra and in the worst case even to timeouts.
I don't think it would be a good idea to push chunks into the cache at the time they get persisted to Cassandra, because this would likely add pressure to evict hot data from the cache in exchange for something that might never get queried.
If we could shift the times when the chunks get wrapped around and distribute that wrapping second of the different metrics across time, then we could get rid of the load spike on Cassandra. For example we could hash the metric name into a number and then wrap at now() % chunkspan + hash.

The text was updated successfully, but these errors were encountered:

woodsaj · 2017-01-06T16:01:41Z

When persisting a chunk to cassandra can we just check if the last chunk is in the cache, and if so add this chunk as well. Essentially pre-filling the cache we data we expect to be queried.

So if a request is being executed periodically for the last 1hour of data, we can just continually feed new chunks into the cache.

replay · 2017-01-06T16:07:42Z

Ok, that's possible as well. Then I'd adding some method to the cache interface AddIfHot(*chunk) or something like that, which takes a chunk and only adds it to the cache if the previous chunk is there, otherwise it would drop it.
Then I'll just pass every chunk in there at the time it gets evicted from the ring buffer.

Dieterbe · 2017-01-09T09:55:11Z

When persisting a chunk to cassandra can we just check if the last chunk is in the cache, and if so add this chunk as well. Essentially pre-filling the cache we data we expect to be queried.

+1 I also like this approach, it would solve this problem nicely so let's first confirm that we actually see the problem manifest in deployments, and then implement the cache.AddIfHot().

However there might be other merits to the OP idea. It would address #357 (comment) (but we're still not sure if this is a problem to solve), and #99
My main concern about the shifting is that it makes reasoning and understandability harder. Personally I find it hard enough already to reason about what happens to chunks, chunk saves, synchronisation between nodes etc. That's why I would hold off, but we can revisit this once we iron out some other things, especially things that simplify operation.

replay · 2017-01-09T15:48:17Z

I'll create a new ticket about add new chunks to cache if metric is hot and give that higher prio than this one

stale · 2020-04-04T11:39:35Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot added the stale label Apr 4, 2020

stale bot closed this as completed Apr 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunkspan wrapping seconds should be shifted across metrics #450

Chunkspan wrapping seconds should be shifted across metrics #450

replay commented Jan 6, 2017

woodsaj commented Jan 6, 2017

replay commented Jan 6, 2017 •

edited

Dieterbe commented Jan 9, 2017 •

edited

replay commented Jan 9, 2017

stale bot commented Apr 4, 2020

Chunkspan wrapping seconds should be shifted across metrics #450

Chunkspan wrapping seconds should be shifted across metrics #450

Comments

replay commented Jan 6, 2017

woodsaj commented Jan 6, 2017

replay commented Jan 6, 2017 • edited

Dieterbe commented Jan 9, 2017 • edited

replay commented Jan 9, 2017

stale bot commented Apr 4, 2020

replay commented Jan 6, 2017 •

edited

Dieterbe commented Jan 9, 2017 •

edited