-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] LFU Morris Counter #7943
Comments
@minilek thanks for reaching out. Salvatore is no longer involved, so i'm not sure he'll comment, but i'd say for him that he usually had a way of looking around for ideas, and then invent something else that better suites redis practical use cases, so it seems that he was aware of how Morris Counter works and i'm guessing that for some reason he decided to go a different way. I don't currently have the bandwidth, but i invite you to try and compare the performance of his formula to the other one and post your conclusions. it would also be nice to see an actual benchmark using redis (hit/miss ratio and throughput of a cache use-case), maybe the plots will show one thing but the practical use case will show another. |
I see. I'm teaching a course this semester covering sketching methods (including Morris Counters) and there's a research-oriented final project. I also do not have the bandwidth to investigate myself, but I will suggest looking into this further to students as a potential final project. |
It appears that something like f(x)=(a/2)*x^2 would be an unbiased estimator. This agrees with the table provided at https://redis.io/topics/lru-cache. Analysis similar to @minilek's analysis of the Morris counter found here shows that this estimate is highly concentrated, with error on the order of eps*n^0.75 (which is empirically supported). This is much better error than the eps*n of the Morris counter. On the flip side, the memory usage is roughly log(f^-1(n))=log(n)/2+O(1), which asymptotically only saves a factor of two number of bits as compared to the deterministic counter. (Morris reduces the number of bits needed to log log n). So a true Morris counter asymptotically uses far fewer bits, but it's possible this implementation does better when we limit the counter to 8 bits of memory, as Redis does. I'll investigate this further, and also investigate what affect the type of counter used has on the optimal protocol for decrementing the counters over time. It also appears that the documentation for how the counters are decremented differs from the actual implementation. It claims here that the counters are either halved or decremented by one every minute depending on their value. But the function "LFUDecrAndReturn" found here seems only to handle the case of decrementing by one. |
To chime in, I created some visualizations contrasting the Redis version of the morris counter and what is found in literature. I looked just at insertions and used different values of server.lfu_log_factor based on Lines 1851 to 1865 in 8c291b9
So for 1000 and 10000 insertions both counters look nearly identical, with the value of a chosen for the literature morris counter empirically. At 1,000,000 it does seem that the Redis counter does better. |
@alexkassil thank you for this detailed analysis!! |
I would say switching from the current implementation to a true Morris counter is a good idea. Just to clarify the terminology I use: the probabilistic counter implemented by Redis is not a true Morris counter. I refer to it as the "Redis counter" or "Redis implementation of a probabilistic counter". A probabilistic counter is considered a Morris counter if the increment probabilities are exponential. I ran 10,000 simulations of counting to about 4,000,000 for different 8-bit probabilistic counters. I compared the Redis implementation for 9 different values of the (misnamed) The results match the experiments of @alexkassil, but also show two big advantages of Morris counters: they maintain low error for longer, and also produce low error for counts under 10k or so. The default parameter used in the Redis implementation allows it to count only to 200k or so, and the largest parameter specified here produces huge error for counts under 10k. Each Redis counter is beaten by some Morris counter (in terms of accuracy) for most count values. Another important consideration is the "decay" schedule which decrements the counter by some amount every minute. The theoretical analysis here suggests that using the Morris counter in this setting produces desirable behavior: the counts will quicly converge to the rate of requests (see section 5.2). Ultimately, the best choice of counter depends on the range of values it must store and how much error is tolerable at different values. The Morris counter provides good constant relative error on a huge range of values. The Morris counter can be improved upon (see Theorem 4 here) if one can tolerate larger error for some ranges of values; but that improvement will still likely differ significantly from the Redis implementation. |
Checking in on this @oranagra, I can submit a PR, but need some input on how to handle existing user settings. There's a trade-off between capacity (the max value the counter can reach before "saturating") and error. The use can control what to prioritize using Morris can do better with respect to both objectives; but is there one I should prioritize improving? ie, if the user has set a particular Overall the error gains are small, so my own recommendation would be to prioritize capacity; this would look something like setting |
@RDShah i'm sorry, i don't currently have the bandwidth to dive into it. |
The documentation page https://redis.io/topics/lru-cache states that LFU uses the Morris Counter, but it seemed to not be the case when I tried looking at the source code (line 315 of https://github.com/redis/redis/blob/unstable/src/evict.c). I am raising this issue since I'm not sure whether it's intentional or a bug.
Before I continue, I should give some background since not everyone reading this might be familiar. Think of the following data structural problem: you want to maintain a counter N subject to three operations: (1) init() sets N to 0, (2) increment() changes N to N+1, and (3) query() reports the value of N. This is solved easily with a counter, which uses O(log N) bits of memory. Morris figured out that if you only want a randomized algorithm that approximates N with good probability, you can do better: more like O(log log N) bits. The absolute most basic idea is: what if instead of storing N, we store a different counter X. init() sets X to 0. During increment(), we increment X randomly: with 50% probability we increment it, else we do nothing! Then E[X] = N/2, so we can estimate N by 2*X during a query(). The benefit is that since X is typically only half as big as N, we save a bit by storing X instead.
OK, the above idea is fine, but it's only saving at most 1 bit, and has large probability of large error..not that great. How did Morris fix it? The first idea is to increment even less often. We increment X with probability 1/2^X. One can then show that the estimator N~ = (2^X) - 1 is an unbiased estimator of N, i.e. E[N~] = N. Now X is typically roughly log N, which means we can store it in O(log log N) bits. The variance isn't great though. This is fixed by making the following observation: if we increment with probability 1.0^X, then the memory sucks (log N bits), but the variance is great (it's zero; this is a deterministic counter). On the other hand, we know that if we increment with probability 0.5^X, then the memory is great (loglog N bits), but the variance sucks. A natural question, which Morris asked and answered, is what happens if we increment with probability 0.99^X? Or more generally, probability 1/(1+a)^X for some small a close to 0? It turns out that the memory is concentrated around O(loglog N + log(1/a)) bits, and the variance goes to 0 as a goes to 0. Specifically, by setting "a" appropriately small it's possible to show that the estimator N^ = (1/a)((1+a)^X - 1) satisfies Pr(|N^ - N| > epsN) < delta, and the space usage is O(loglog N + log(1/eps) + loglog(1/delta)) bits (see https://arxiv.org/abs/2010.02116).
Now back to Redis: Redis neither increments X w.p. 1/2, nor 1/2^X, nor 1/(1+a)^X. Rather, it seems it increments X with probability 1/(1 + a*X) (where 'a' is server.lfu_log_factor, and is set to 10 by default but can be changed). It only does such probabilistic increments once X > LFU_INIT_VAL (which is 5 by default); this is a good decision, even for the Morris Counter it can be shown that if you want good estimates for small N, you should use a deterministic counter at first then only switch to probabilistic updates as N grows. What concerns me though is the update probability being ~ 1/X instead of 1/(1+a)^X. There isn't any analysis I'm aware of which shows that this has good behavior, either analytically or empirically. Is there a reason this update rule was chosen? I also spent a few minutes trying to figure out what the estimator should be to get an unbiased estimate of N, but that wasn't clear to me either.
P.S. I'm attaching here a plot of running the actual Morris Counter parametrized to use 8 bits of memory whp. I did the following 5000 times using code Huacheng and I wrote when we were working on our previously linked paper: pick a random number N between 50k and 100k (so about 16-17 bits) then increment() the Morris Counter N times. As you can see, even though we only used 8 bits to count this bigger number, the median error was about 10% and the error was never more than 50%. Note: a point at (x,y) in this plot means x% of the time the relative error was y% or less.
The text was updated successfully, but these errors were encountered: