Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dictScan(): It can't scan all buckets when dict is shrinking. #4907

Merged
merged 1 commit into from
Jun 1, 2018
Merged

Fix dictScan(): It can't scan all buckets when dict is shrinking. #4907

merged 1 commit into from
Jun 1, 2018

Conversation

youjiali1995
Copy link
Contributor

@youjiali1995 youjiali1995 commented May 8, 2018

The only difference is when iterating over larger table, cursor is increased in reverse:

        /* Iterate over indices in larger table that are the expansion
         * of the index pointed to by the cursor in the smaller table */
        do {
            /* Emit entries at cursor */
            if (bucketfn) bucketfn(privdata, &t1->table[v & m1]);
            de = t1->table[v & m1];
            while (de) {
                next = de->next;
                fn(privdata, de);
                de = next;
            }

            /* Increment the reverse cursor not covered by the smaller mask.*/
            v |= ~m1;
            v = rev(v);
            v++;
            v = rev(v);

            /* Continue while bits covered by mask difference is non-zero */
        } while (v & (m0 ^ m1));

When jumping out of the loop, the highest bit of (v & m0) is increased, so move following code to if block:

        /* Set unmasked bits so incrementing the reversed cursor
         * operates on the masked bits */
        v |= ~m0;

        /* Increment the reverse cursor */
        v = rev(v);
        v++;
        v = rev(v);

@antirez
Copy link
Contributor

antirez commented May 15, 2018

Thanks for this submission. Before looking at this patch I invoke the spirit of @pietern that wrote the original code, because I will be more comfortable changing this code (that so much helped Redis) after his nod :-D

@antirez
Copy link
Contributor

antirez commented Jun 1, 2018

Thanks @youjiali1995, I've checked the problem, verified that your solution is sounding, and created a simulation that allows us to study SCAN better in case of new problems (https://github.com/antirez/dict-scan-fuzz-tester). I'm merging your Pull Request in all the branches.

Please note that, fortunately, only SCAN is affected. HSCAN, ZSCAN and SSCAN should be immune to the problem because their dictionaries always expand from a smaller to a bigger one. The issue you found only affects dictScan() when going to smaller hash tables. Thank you for your contribution!

@antirez antirez merged commit 86de089 into redis:unstable Jun 1, 2018
@antirez
Copy link
Contributor

antirez commented Jun 1, 2018

Just one final thing @youjiali1995, I would love to know how you have found this bug.

@youjiali1995
Copy link
Contributor Author

youjiali1995 commented Jun 1, 2018

Because we use scan to clear data. But sometimes there are some keys left, and we can reproduce this problem. So I read the code and find the bug. @antirez

@antirez
Copy link
Contributor

antirez commented Jun 1, 2018

Thanks, awesome process to find the bug. I appreciate that instead of just posting an issue you went the extra mile and read the code to understand what was happening.

@youjiali1995 youjiali1995 deleted the fix-dictScan branch June 1, 2018 15:45
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Dec 17, 2018
Fix dictScan(): It can't scan all buckets when dict is shrinking.
@beijingzhangwei
Copy link

@youjiali1995 咨询你个问题:字典扫描都是从小表开始,游标从零开始,缩容的时候,参考你们博客https://tech.meituan.com/2018/07/27/redis-rehash-practice-optimization.html 游标返回的范围不应该是小表的范围吗? 看博客是示例应该0-7,为什么会出现返回20为游标的情况呢? (我知道如果人工指定游标参数的话会出现漏key情况)

@sundb
Copy link
Collaborator

sundb commented Nov 17, 2021

@beijingzhangwei You should preferably use English to describe the problem.
Can you provide more information? How much does tablesize shrink from?
Are you sure it was still rehashing when you call scan?

@beijingzhangwei
Copy link

@youjiali1995 Refer to your company blog https://tech.meituan.com/2018/07/27/redis-rehash-practice-optimization.html
Shouldn't the range returned by the cursor be the range of the small table? Looking at the blog, the example should be 0-7. Why does it return 20 as the cursor? (I know that if the cursor parameters are manually specified, there will be key leakage)

@beijingzhangwei
Copy link

@beijingzhangwei You should preferably use English to describe the problem. Can you provide more information? How much does tablesize shrink from? Are you sure it was still rehashing when you call scan?

ok,i use english. My question come from @youjiali1995 company blog, which describe the bug.

@sundb
Copy link
Collaborator

sundb commented Nov 17, 2021

@beijingzhangwei Not sure I fully understand what you meant.
Cursor 20 was fetched before rehashing, so returning 20 is possible.
After that, the cursor will be 0-7.

@beijingzhangwei
Copy link

beijingzhangwei commented Nov 17, 2021 via email

pulllock pushed a commit to pulllock/redis that referenced this pull request Jun 28, 2023
Fix dictScan(): It can't scan all buckets when dict is shrinking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants