Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis evict almost all the keys when reach maxmemory #4496

Open
jianqingdu opened this issue Nov 30, 2017 · 13 comments
Open

Redis evict almost all the keys when reach maxmemory #4496

jianqingdu opened this issue Nov 30, 2017 · 13 comments

Comments

@jianqingdu
Copy link
Contributor

Redis version: 2.8.22
maxmemory-policy: allkeys-lru
Problem Description:
After Redis reach maxmeory 40G (the server has 128G memory), it was very slow in response and evict almost all keys, there is no any info in the log file, and can not reproduce the problem in the test environment

@trevor211
Copy link
Collaborator

I think it's already fixed by #7653. @oranagra

@oranagra
Copy link
Member

@trevor211 there's not enough info here to really know what happened, but i don't think the incremental eviction has anything to do with it. the old code would have stopped evicting keys as soon as it went below the limit.
i think this is likely to be caused by client output buffers consuming a lot of memory and causing mass eviction. the only way the incremental eviction would assist here is by delaying eviction and possibly increasing the chance that client buffers will be drained (not a high chance IMHO).

i think this will be solved by the "client eviction" mechanism that we need to design (see the other ticket i linked above).

@jianqingdu it's a little bit late, but maybe you can add more info on the traffic / load on the server at that time? any chance there were a lot of clients doing large MGET or GET in pipeline?
or alternatively the client output buffer limit was set to high?

@ShooterIT
Copy link
Collaborator

A guess FYI @oranagra
There may be some terrible things when redis starts to evict keys. Before #5126 (incorrect not-counted memory) and #7202 (continue to add reply bulks to reply buffer if close asap), there will be a big gap of not-counted memory (got by freeMemoryGetNotCountedMemory) if master has many slaves, and it may be continue to worsen when master evicts more keys and feeds more replies (flushSlavesOutputBuffers may not write all replies to slaves especially when poor network). In our company, we also occurred master evicted almost all the keys some times, and i noticed redis used 5-6x maxmemory memory and it had 5 slaves at that moment.

Currently, I think some above fixes may lessen the risk to evict most all keys with configuring limited output buffer size. But there still are bad cases, your issue #7676 may describe a main one. right?

@oranagra
Copy link
Member

@ShooterIT do you mean to say that maybe this incident is already resolved by some fix that's already merged?

IIRC, the fix in #5126 was for a problem that was created in redis 4.0, so not applicable for this specific report (v2.8).
And i don't consider the fix in #7202 related: IIRC it would have stopped appending to the output buffer of that client, but instead of releasing that memory immediately would keep running other commands before releasing it in beforeSleep. so these other commands could indeed cause some eviction, but only likely to evict the entire keyspace (unless many clients hit that on the same event loop cycle).

What's more likely and i think happens a lot is that each client eats just a little bit of memory (not reaching the output buffer limit), but together (when there's a spike of traffic) all the clients consume enough output buffers to induce eviction of all keys (not in a single event loop cycle).
that's what #7676 should resolve.

@ShooterIT
Copy link
Collaborator

Yes, even though I really know what happened, redis used 40G, there is small possibility with rehashing, I also notice that maxmemory-policy is allkeys-lru. And I truly think #5126 make great sense and some problems mix with it. Incorrect not-counted memory computation just like a blasting fuse makes things terrible.

I also want to provide my thoughts for you. For #7202 , I think it is more likely that many clients hit on the same event loop cycle and one or two client output buffer may reach configured limit when redis has heavy writing traffic. #7202 doesn't release memory but stop using more memory. maybe we should release async-free-clients' memory before evicting keys?

For most clients, I think they don’t use much memory(10k extra clients will totally use 160M if every one uses 16k). Furthermore, we always take all clients buffer into maxmeory even before evicting.

After some fix including #7653 (won't always continue to evict keys in one function), especially #5126, I think there is little possibility to evict too much keys. I notice the issues you mentioned in #7676 are old.

together (when there's a spike of traffic) all the clients consume enough output buffers to induce eviction of all keys

I truly agree it is a bad case, but i think it is rare.

@oranagra
Copy link
Member

@ShooterIT i've seen this many times, including recently.
a burst of MGETs/HGETALLs from multiple clients, populating output buffers faster than clients read it, leading to eviction of all the keys, and no client disconnection.

i don't see a reason to think it is rarer than a case of multiple clients reaching the output buffer limit in the same event loop cycle (i.e. the later case requires bigger MGETs and for the limit to be reached in the same event loop cycle, whereas the first can happen happen solely, less timing specific and smaller values / commands).

as i said, this specific incident can't be due to slave buffers mis-count, since IIRC that problem was introduced in 4.0 (changed the way output buffers are kept).

but anyway, we fixed what we fixed, what's left is to fix the remaining issue (working on it slowly in the background), which will drop clients when they're combined memory usage grows, and do that before evicting keys, this will probably implicitly also drop clients that reached the output buffer limit earlier (before they cause mass key eviction).

meanwhile, if you want to make a PR that will drop these sooner go ahead.
i think it is not the solution for eliminating eviction of the entire keyspace (since it doesn't solve the problem of combined usage of all clients, each using lower than the output buffer limit), but it can prevent eviction of some keys, or maybe full keyspace eviction in some use-cases.

@ShooterIT
Copy link
Collaborator

i've seen this many times, including recently.

@oranagra Oh, thanks for correcting me.

Maybe the problems we have are different, for normal in client-output-buffer-limit, it is not limited by default. That is convenient for users to get everything they want, but we lose the control of used memory, actually, in our company internal, we may limit users' behaviors to make service stable, and even can set small size for normal client-output-buffer-limit, and in our practice, I find just several requests will abuse redis commands. But for cloud users, we can't since users have the right to do whatever they want.

If we set normal 0 0 0, #7202 doesn't mitigate keys eviction, dropping clients is more effective and reasonable to avoid eviction.

@Hornswoggles
Copy link

@oranagra , we have several customers (3 in last 6 months) that have run into this issue and wanted to check in on the status of the remaining work. You mentioned that you were working on it slowly in the background. Is that still the case? Your proposed solution of dropping clients would be acceptable for our case.

Thanks

@oranagra
Copy link
Member

@Hornswoggles please describe your case, and mention which version you're using.
Note that redis 7.0 has something that can maybe help this probelm #8687

@Hornswoggles
Copy link

Redis version: 2.8.23
maxmemory-policy: allkeys-lru

@oranagra The issue is exactly as you've described here

a burst of MGETs/HGETALLs from multiple clients, populating output buffers faster than clients read it, leading to eviction of all the keys, and no client disconnection.

@oranagra
Copy link
Member

@Hornswoggles why are you using an 8 year old software?

@Hornswoggles
Copy link

Hornswoggles commented Jul 26, 2023

@oranagra We run a platform. 2.8 was the initial version provided 10 years ago and not every customer has been in a position to upgrade. Version 6.2.8 is the latest version we make available. Would upgrading to 6.2 resolve this specific issue? If so we can request that our customers upgrade to resolve their issue.

@oranagra
Copy link
Member

@Hornswoggles i'm not certain it'll solve the problem. maybe it'll improve things. a lot have changed since then and it's hard to keep track of that.
the key issue if MGET flooding the output buffers can causing eviction hasn't been solved.
we did introduce client-eviction in 7.0, but that's not enabled by default and isn't a perfect solution.
i did also start working on a back pressure mechanism, but that didn't mature.

one other thing that comes to mind is that in version 3.2 the reply list of the output buffer was changed from robj (the same object that is used to keep values of keys), to a plain buffer, which means that MGET copies data to the output buffers, rather than adding a reference count. hence making your situation worse. 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants