Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: use LFU to find hotkeys #4392

Closed
wants to merge 5 commits into from

Conversation

soloestoy
Copy link
Collaborator

  1. We have added a new LFU mode for maxmemory-policy, but forgot config get and config rewrite.

    And I think int is enough for the LFU factors.

  2. Another commit about LFU: do some changes about LFU to find hotkeys

    Firstly, use access time to replace the decreas time of LFU.
    For function LFUDecrAndReturn,
    it should only try to get decremented counter,
    not update the LFU fields, we will update it in an explicit way.
    And we will times halve the counter according to the times of
    elapsed time than server.lfu_decay_time.
    Everytime a key is accessed, we should update the LFU
    including update access time, and increment the counter after
    call function LFUDecrAndReturn.
    If a key is overwritten, the LFU should be also updated.
    Then we can use OBJECT freq command to get a key's frequence,
    and LFUDecrAndReturn should be called in OBJECT freq command
    in case of the key has not been accessed for a long time,
    because we update the access time only when the key is read or
    overwritten.

Firstly, use access time to replace the decreas time of LFU.
For function LFUDecrAndReturn,
it should only try to get decremented counter,
not update LFU fields, we will update it in an explicit way.
And we will times halve the counter according to the times of
elapsed time than server.lfu_decay_time.
Everytime a key is accessed, we should update the LFU
including update access time, and increment the counter after
call function LFUDecrAndReturn.
If a key is overwritten, the LFU should be also updated.
Then we can use `OBJECT freq` command to get a key's frequence,
and LFUDecrAndReturn should be called in `OBJECT freq` command
in case of the key has not been accessed for a long time,
because we update the access time only when the key is read or
overwritten.
@antirez
Copy link
Contributor

antirez commented Nov 27, 2017

I'm reviewing this one @soloestoy, news soon.

@antirez
Copy link
Contributor

antirez commented Nov 27, 2017

There is something I do not understand about the following function changes:

unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;
    unsigned long counter = o->lru & 255;
    long halve_times = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
    if (halve_times > 0 && counter) {
        if (halve_times == 1) {
            if (counter > LFU_INIT_VAL*2) {
                counter /= 2;
                if (counter < LFU_INIT_VAL*2) counter = LFU_INIT_VAL*2;
            } else {
                counter--;
            }
        } else {
            counter = counter >> halve_times;
        }
    }
    return counter;
}

Why if half_times is greater than 1 we no longer care to decrement based or split in half based on the current value of the counter? It looks like the function should be modified into:

unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;
    unsigned long counter = o->lru & 255;
    long halve_times = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
    while (halve_times > 0 && counter) {
        if (counter > LFU_INIT_VAL*2) {
            counter /= 2;
            if (counter < LFU_INIT_VAL*2) counter = LFU_INIT_VAL*2;
        } else {
            counter--;
        }
        halve_times--;
    }
    return counter;
}

Makes sense? Thanks.

@antirez
Copy link
Contributor

antirez commented Nov 27, 2017

There is another potential problem:

  1. After your change, LFUDecrAndReturn() no longer udpates the time at which it needs to decrement the counter. So calling LFUDecrAndReturn() multiple times against the same object, may result in the counter to be decremented more and more.
  2. The function gets called by evictionPoolPopulate(). The keys we select there, are random, thus what I described in the previous point can actually happen. If there are few keys, and we are out of memory, the frequency counter of the few objects that we have will easily drop to zero.

Please could you state exactly the problems that your PR tried to address, that is, the bugs that we had in the original implementation? We already discussed this in Hangzhou face2face but very briefly. If you could re-state what you were tying to fix, maybe we can find an alternative solution without the above problem. Thanks!

@soloestoy
Copy link
Collaborator Author

Hi, @antirez

LFUDecrAndReturn() no longer udpates the time at which it needs to decrement the counter. So calling LFUDecrAndReturn() multiple times against the same object, may result in the counter to be decremented more and more.

I also noticed that you mentioned, but it won't happen in this PR, because I didn't update time nor counter, so the counter won't be decremented more and more.

The access time and counter will be updated in a explicit way, when the key is really read or overwritten.

/* Update LFU when an object is accessed.
 * Firstly, decrement the counter if the decrement time is reached.
 * Then logarithmically increment the counter, and update the access time. */
void updateLFU(robj *val) {
    unsigned long counter = LFUDecrAndReturn(val);
    counter = LFULogIncr(counter);
    val->lru = (LFUGetTimeInMinutes()<<8) | counter;
}

P.S. Maybe we can rename the function LFUDecrAndReturn as LFUGetRealCounter, that can make it more clear.

I have an issue #4473 about the problem I try to address, it is that:

But there is a problem, if a key old has been touched frequently long ago, and then never be touched, its counter never be decremented until it is selected into eviction pool. So, if a key new is touched frequently now but the counter is smaller than old, the key new will be deleted, because the first 16 bits of LFU is Last decr time not Last access time. But I think delete old is better, because key new has more chance to be touched.

@soloestoy
Copy link
Collaborator Author

Why if half_times is greater than 1 we no longer care to decrement based or split in half based on the current value of the counter?

About this question, because the access time's precision is in minute, so when the halve_times is 1, we cannot ensure if the key really has not been accessed out of 1 minute or just the time change from x:00 to x:01.

So, when the halve_times is 1, we can keep the previous code. If the halve_times is greater than 1, just times halve the counter even halve to 0, ignore the minimal value LFU_INIT_VAL*2. And I think the >> operator is faster than a while loop.

@antirez
Copy link
Contributor

antirez commented Nov 28, 2017

Oh ok! So the function changed semantically and no longer changes the object. It's only used to retrieve the "logical" counter at this point. Got it. I'll review again based on this observation. Yep btw a name change would be a good idea, but I can do it myself after merging your PR. I think I'll have more questions today, thanks for your replies and help!

@antirez
Copy link
Contributor

antirez commented Nov 28, 2017

P.S. I just noticed that the top comment now made it clear that the function no longer updates anything.

@antirez
Copy link
Contributor

antirez commented Nov 28, 2017

@soloestoy Ok I re-read the patch based on the new things, I'll surely rename a few things and update a few comments, but now it makes sense in the fundamental mechanism. However there is something that I believe is broken, not in your PR specifically, but in the way it was originally coded by me: I think we should always just decrement the value of counter, and never split it in half, because the counter is logarithmic, so actually decrementing one in a logarithmic counter, is equivalent to splitting the popularity by half. So we could use instead:

unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;
    unsigned long counter = o->lru & 255;
    long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
    if (num_periods) {
        counter = (num_periods > counter) ? 0 : counter - num_periods;
    }
    return counter;
}

@antirez
Copy link
Contributor

antirez commented Nov 28, 2017

And... there is another interesting thing we could do after updating the above code, that is, OBJECT FREQ could be modified in order to return a linear frequency in terms of accesses per unit of time. I can work at the math myself, no need to update this PR from this point of view, just to talk about other improvements...

@soloestoy
Copy link
Collaborator Author

I think we should always just decrement the value of counter, and never split it in half, because the counter is logarithmic, so actually decrementing one in a logarithmic counter, is equivalent to splitting the popularity by half.

Agree!

@antirez
Copy link
Contributor

antirez commented Nov 28, 2017

I verified the patch experimentally (after applying the fix of just decrementing), and the hit/miss ratio is a bit better compared to the vanilla unstable branch. So everything looks good... Probably we are losing less information thanks to the simple decrement compared to splitting in half.

@antirez
Copy link
Contributor

antirez commented Nov 29, 2017

I merged everything here into unstable. Thanks. Not clear if I'll merge those into 4.0.3 or 4.0.4 to back port the things, I'll check better tomorrow. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants