Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] [STREAMS] Control Stream Size via Config or Command #10270

Open
bubbajoe opened this issue Feb 9, 2022 · 19 comments
Open

[FR] [STREAMS] Control Stream Size via Config or Command #10270

bubbajoe opened this issue Feb 9, 2022 · 19 comments

Comments

@bubbajoe
Copy link

bubbajoe commented Feb 9, 2022

The problem/use-case that the feature addresses

We use redis for streaming and caching, but the issue is once you have millions of events going through redis, redis will start to delete keys from memory. This behavior is not wanted, what if we can control the size(sum in bytes) of all streams?

Description of the feature

How about adding this in the config?

stream-maxmemory 1gb - all streams should have a consecutive
stream-maxmemory -1 - no limit (what is currently being done)

stream-maxmemory-policy delete-keys - delete keys when memory limit is hit (what is currently being done)
stream-maxmemory-policy delete-messages - delete messages from the tail of stream when the memory limit
stream-maxmemory-policy restrict - XADD is disabled when the memory limit is hit

Alternatives you've considered

this could also be a command for a specific stream, for example:

XSIZE MAXSIZE 500mb POLICY (restrict|messages|keys) STREAMS stream1 stream2

Also, we have spun up a second redis instance to avoid this issue. But i think this is better. Potential this could replace XTRIM for most.

@bubbajoe bubbajoe changed the title [FR] [STREAMS] Control Stream Size [FR] [STREAMS] Control Stream Size via Config or Command Feb 9, 2022
@oranagra
Copy link
Member

oranagra commented Feb 9, 2022

few random notes:

  1. I think we want to avoid per-key configs (i.e. XSIZE), which is why XADD has the trim feature.
  2. we do have some type specific configs (like listpack encoding thresholds), but it doesn't make sense to add a global config that trims data, or limit records in random keys.
  3. is the complaint that the XADD trimming is based on record count rather than bytes? would adding trimming policy based on bytes to XADD and XTRIM help?
  4. or do you want it to kick in only when the server is over it's memory threshold? in which case it would be complicated to decide which stream key to trim.

@bubbajoe
Copy link
Author

bubbajoe commented Feb 9, 2022

@oranagra Thanks for the reply.

  1. I see, this makes sense.

  2. Concerning global configs that trim the data, I think this make sense because as redis streams integrations become more popular with different services, they will add the functionality to push to redis via streams. This means the having to set XTRIM manually on all streams which may not be optimal. The idea is: if you control the redis cluster, you should control size/length of all stream. So having global config for stream(s) sizes could help.

Actually, this is exactly my use-case, we are using debezium server to push data into redis; there are thousands of events coming in every hour.. of all different sizes. So having this global config would be really nice (even if it sacrifices performance, it's better than going OOM).

  1. The size of message is more important than the length imo, but when using XTRIM you have to keep calling it, right? The idea is to keep a stream(s) at a certain size (or length) without having to keep calling XTRIM.

@zalmane
Copy link

zalmane commented Feb 9, 2022

are you using the Redis Debezium sink to push the events into Redis? We just pushed a PR for Debezium Redis sink that pauses writing to Redis if the memory is full and resumes once memory is available. see debezium/debezium#3185. Would that address your specific use case?

@bubbajoe
Copy link
Author

bubbajoe commented Feb 10, 2022

@zalmane No, basically I think there should be a feature that automatically controls the size/len of streams. For example, you set something like maxstreamsize=100mb (each stream can only be 100mb) and maxglobalstreamsize=1gb (all streams should be less than this), then if either threshhold is hit, start removing items from the back.

So it's more about avoiding hitting that memory limit, through some configuration. Rather than deciding what to do when the memory limit is hit.

@bubbajoe
Copy link
Author

To be clear, I think this should be added:

XADD mystream MAXSIZE ~ 10mb ...
XTRIM mystream MAXSIZE ~ 10mb

In addition. i also think these should be added:

CONFIG SET maxstreamsize 100mb
CONFIG SET maxglobalstreamsize 1gb

Or something similar @oranagra

Please let me know if there should be a different feature request for these

@oranagra
Copy link
Member

A MAXSIZE feature to XADD and XTRIM sounds reasonable to me (it's basically an additional trimming threshold).

A global config that controls the maximum size of each stream is just the same as the above, I.E. XADD can implicitly use that value when it's MAXSIZE argument is missing (assuming no one expects us to scan all streams if / when the config changes in retrospect).
But anyway, considering such a feature is limited (in the way that you have one setting that affects all keys, not very flexible), i think the one discussed above is better.

Regarding a global setting that affects the total size of all streams, that's far more problematic, specifically if we also expect "eviction" to be able to find some stream (based on some criteria) and trim it.

@itamarhaber
Copy link
Member

itamarhaber commented Feb 10, 2022

I fully agree that a top level config isn't aligned with the current.
I don't have a strong objection to MAXSIZE trimming, but perhaps a couple of weak ones :)

Assuming that the stream's entries are (more or less) equal in size, this is just syntactic sugar for MAXLEN with the right factor.

If the entries' size variance is significant, a MAXSIZE could have negative impacts.

/cc @guybe7

@bubbajoe
Copy link
Author

bubbajoe commented Feb 11, 2022

Most use cases of trimming will be for limiting the amount of memory used (I assume), so wouldn't using memory as a factor make more sense than trying to guess/calculate the appropriate size?

Can you elaborate on the negative affects, I don't quite understand.

@itamarhaber
Copy link
Member

Given a stream capped to 100 memory units, where entries' sizes vary between 1 and 100 units, a new entry will require evicting an unknown number of entries.
This is likely to affect performance, resulting "jagged" latency graphs. More importantly, this introduces a big unknown regarding the actual number of entries in the stream. For example, in pathological cases, the stream will always have only one entry in it:

  1. Add 1u entry => xlen = 1
  2. Add 100u entry -> evict => xlen = 1
  3. Add 1u entry -> evict => xlen = 1
  4. Goto 2

@manhdaovan
Copy link

manhdaovan commented Feb 18, 2022

@bubbajoe

I don't think limiting stream size can avoid hitting OOM or evicting keys on Redis side.

  • Let's say we can add a limit for each stream (via CONFIG SET maxstreamsize 100mb as you mentioned above), so what if we add a new stream or a lot of keys? Do we need to re-config size for each stream?
  • Or, let's say we can add a limit for all streams (via CONFIG SET maxglobalstreamsize 1gb as you mentioned above), so what if all current streams reach the limit (1gb) and a new stream would be added? It could not be added if I understand correctly.

So, IMHO, limiting stream by size does not have a significant benefit compared to limiting the stream by the number of entries (of course limiting the stream by the number of entries also cannot avoid OOM).

And all OOM-related issues should be handled on the infrastructure layer, not Redis itself (for example, an early warning about memory usage)

@bubbajoe
Copy link
Author

bubbajoe commented Feb 18, 2022

@manhdaovan

Let's say we can add a limit for each stream (via CONFIG SET maxstreamsize 100mb as you mentioned above), so what if we add a new stream? Do we need to re-config size for each stream?

Why would we need to re-config size when we add a new stream. This would be a global setting, so it's not stream specific.

let's say we can add a limit for all streams (via CONFIG SET maxglobalstreamsize 1gb as you mentioned above), so what if all current streams reach the limit (1gb) and a new stream would be added? It could not be added if I understand correctly.

This would never happen. Whenever you add new messages to a stream, i expected there to be some (async or sync) job for checking the stream size and evicting from the back of the largest stream. This depends on how it's implemented, but still that situation should never happen.

@itamarhaber @yossigo @oranagra

My primary complaint is that Redis Streams should have a better way to control the size of stream(s) internally. Redis stores all data in memory, if you have millions of events/hour coming though Redis Stream (like most enterprise use-cases), there should be a way to continue to allow events to flow through Redis by delete old events by the size of the stream, not the length. Currently, use a script to get memory usage and xtrim based on that (I also heard RedisGears can do this also), but definitely shouldn't be done on the application level because the application doesn't know (and it shouldn't) how much memory the Redis has. And let's say I want to increase/decrease Redis memory and/or stream memory. This would need to be done on the application layer, which is not good.

@bubbajoe
Copy link
Author

bubbajoe commented Mar 4, 2022

@itamarhaber @yossigo @oranagra

Any updates on this? I want to know if you understand my point why I think the Redis Streams is missing a really useful feature.

@yossigo
Copy link
Member

yossigo commented Mar 6, 2022

@bubbajoe
I think it's not a good idea to use the same Redis instance both as a cache and a stream, if you reasonably expect the memory limited to be reached, because Redis only provides limited control over how keys are evicted.

If we assume the stream gets its own instance, then a possible solution could be an XADD trim option that enables in-stream eviction of old keys when OOM is reached. It would still suffer from the MAXSIZE shortcomings, but it does satisfy the need to separate this logic from the application layer.

I'm not sure how common this use case is, though.

@madolson
Copy link
Contributor

madolson commented Mar 8, 2022

We might want to consider supporting more advanced types of "eviction" policies in future versions. I've actually heard a number of customers from AWS grumble that a global LRU/LFU is limiting. I've heard examples like they want to evict all items starting with X first, and once that is done consider the remaining items in an LRU fashion. This will require more thought for how to make it efficient.

I'm not sure I like the idea of XADD variant that isn't marked with use-memory and will trim the stream to make memory available, and it also doesn't seem to match the outlined use case very well. I actually would prefer we figure out MAXSIZE on XADD. Reminds me of #10152.

@bubbajoe
Copy link
Author

bubbajoe commented Mar 9, 2022

@bubbajoe

I think it's not a good idea to use the same Redis instance both as a cache and a stream, if you reasonably expect the memory limited to be reached, because Redis only provides limited control over how keys are evicted.

If we assume the stream gets its own instance, then a possible solution could be an XADD trim option that enables in-stream eviction of old keys when OOM is reached. It would still suffer from the MAXSIZE shortcomings, but it does satisfy the need to separate this logic from the application layer.

I'm not sure how common this use case is, though.

@yossigo

This is exactly my point, once a stream hits the maximum size. (Whether you are using it for cache, streaming, or both; this issue will happen at some point whether you have you have 4GB Or 32GB) Then how can you free up this memory? By xtrim or xadd with trim, right? There should be another way to do this that doesn't require clients to implement this confusing logic in there producers.

In a perfect world, xadd with trim would be enough. But if you take Kafka for example, there are many integrations done by many people and you can't always expect them to have xadd trim options. Kafka has many configurations that are possible, including retention policies. Which is really important especially for Redis, considering all the data need to fit in memory.

As I mentioned before having a client do this is not good. Especially when horizontal scaling redis across internal services. I think someone also mentioned that this was possible with Redis gears, but I believe this functionality to be a missing feature in Redis Streams.

@bubbajoe
Copy link
Author

bubbajoe commented Mar 9, 2022

We might want to consider supporting more advanced types of "eviction" policies in future versions. I've actually heard a number of customers from AWS grumble that a global LRU/LFU is limiting. I've heard examples like they want to evict all items starting with X first, and once that is done consider the remaining items in an LRU fashion. This will require more thought for how to make it efficient.

I'm not sure I like the idea of XADD variant that isn't marked with use-memory and will trim the stream to make memory available, and it also doesn't seem to match the outlined use case very well. I actually would prefer we figure out MAXSIZE on XADD. Reminds me of #10152.

I don't know about the best implementation, I just think there should be a feature to automatically remove messages from a stream to free up memory. And maybe have different Max-memory per each stream and for all streams?

@oranagra oranagra added this to the Next minor backlog milestone Mar 9, 2022
@oranagra oranagra added this to backlog in 7.2 <obsolete> via automation Mar 9, 2022
@oranagra oranagra removed this from backlog in 7.2 <obsolete> Jul 6, 2022
@waveozhangli
Copy link

I think I got the same requirement, when the Max-memory is reached, instead of remove the whole stream (because it will remove by keys), just remove the earliest messages from streams.

The use case is:

  1. For example, every stream we use got the maxlen of 10K (we use MAXLEN 10000 when xadd message)
  2. The number of streams changes. It might increase or decrease, so even we added a maxlen, it's still possible that total size exceeds the max-memory, there will be OOM problems.
  3. We cannot reduce the MAXLEN for each stream because the messages come in by batch, if we set it too small, messages will be flushed away before it's read.
  4. If we can remove the earliest messages in streams. We can keep the max memory usage dynamically. When messages flush into stream B, remove the earliest messages from stream A, after that when messages flush into stream A, remove the earliest messages from stream B.

@bubbajoe
Copy link
Author

I ended up writing a script for this, but I really wish someone from the Redis project would understand that setting stream configurations like trimming, shouldn't be handle a consumer (or even a producer IMHO). It's just weird..

A consumer/producer shouldn't need to care about stuff like that. It should be able to blindly push/pull data. And the configurations for stream should be set elsewhere...

They need to take a look at how Kafka does it and also keep in mind the limitations of Redis for streaming..

@oranagra
Copy link
Member

we understand that for a database mainly composed of (large) streams, it makes sense to ask to limit the memory by trimming the streams rather than eviction a complete key.
however, this doesn't fit the design of eviction in redis, so it's not that easy to handle.
we'll try to think of a way to achieve that in 8.0, but meanwhile, i think you have to stick to the existing stream trimming options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

8 participants