Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quicklist (linked list + ziplist) #2143

Merged
merged 29 commits into from Jan 8, 2015
Merged

Quicklist (linked list + ziplist) #2143

merged 29 commits into from Jan 8, 2015

Conversation

@mattsta
Copy link
Contributor

@mattsta mattsta commented Nov 18, 2014

What

My implementation of what Twitter has previously described as ziplists in a linked list for better memory efficiency.

Longer writeup with improvement graphs at https://matt.sh/redis-quicklist

How

Redis currently uses two list encodings. This replace both. The code in this PR/branch removes the two current list encodings of REDIS_ENCODING_ZIPLIST and REDIS_ENCODING_LINKEDLIST and replaces them with REDIS_ENCODING_QUICKLIST.

The only parameter for the new list type is the existing list-max-ziplist-entries field which tells Redis how many elements to allow per ziplist node before creating a new node. The other param of list-max-ziplist-value is now deleted.

The RDB format remains the same. A quicklist is written to the RDB as a linked list, so the RDB is still usable by prior Redis versions. The existing RDB format for saved ziplists and linkedlists both get converted to quicklists when the RDB is loaded. Quicklists are saved as RDB linkedlist types so on reload, Redis will re-create a new compact quicklist with full interior ziplists even if the saved quicklist had not-full interior ziplists.

Tests

All test pass for me. You should try to break it too. There is one spurious timing error in Redis tests (waiting too long for RDB loading), but I don't think it's new. (?)

The quicklist code also includes ~1,000 lines of internal tests.

I added a new compile parameter of "REDIS_TEST" — if you compile with -DREDIS_TEST you can run Redis code tests with redis-server test <module>. So, to test quicklist with code-level tests, compile with that define and run redis-server test quicklist. Also supported: ziplist, util, intset, and more.

Extras

Also includes DEBUG JEMALLOC INFO to get internal jemalloc stats.

Figuring out what any of that means is left as an exercise for the reader.

Future Work

This approach could be applied to some other areas too. Redis currently converts hashes from zipmaps to full hash tables when they get too big, but we could make a hash table of zipmaps to save a lot of space too.

@mattsta mattsta force-pushed the mattsta:quicklist branch 3 times, most recently from 4cda935 to 99bab4a Nov 18, 2014
@badboy
Copy link
Member

@badboy badboy commented Nov 18, 2014

Sir, this is one hell of a Pull Request. I will definitely read the code and test it tomorrow while flying 35000ft above the ground.

respect

D("Requested merge (a,b) (%u, %u) and picked target %u", a->count, b->count,
target->count);

int where;

This comment has been minimized.

@badboy

badboy Nov 20, 2014
Member

The compiler warns that this variable may be used unintialized. Of course it isn't, as target will always be either a or b, but we could surpress this warning by just initalizing where anyway.

This comment has been minimized.

@mattsta

mattsta Nov 20, 2014
Author Contributor

The compiler

Which compiler/version? It didn't complain under my menagerie of compilers yet.

this variable

Maybe I don't know how to use GitHub, but I see a 26 line diff attached to this comment (and all the comments actually), so it's difficult to tell what "this variable" is. :-\ (obviously next I see you mean where, but the github code reply doesn't make it clear. boo github. Though, on second look, it seems the last line in the diff is the comment target? I'll assume that going forward. still, boo github.)

surpress this warning by just initalizing where anyway.

Good point! I'll add it.

This comment has been minimized.

@badboy

badboy Nov 20, 2014
Member

Ah yes. Comments are always made below the line I meant. GitHub could really show this better.

This comment has been minimized.

@badboy

badboy Nov 20, 2014
Member

[~]% gcc --version
gcc (GCC) 4.9.2

This comment has been minimized.

@mattsta

mattsta Nov 20, 2014
Author Contributor

Ah, that's the difference. My Linux testing box was 4.9.0. I've got a 4.9.2 on OS X and I see your extra warnings now. Fixing.

* complex than using the existing ziplist API to read/push as below. */
while (ziplistGet(p, &val, &sz, &longval)) {
if (!val) {
sz = snprintf(lv, sizeof(lv), "%lld", longval);

This comment has been minimized.

@badboy

badboy Nov 20, 2014
Member

I don't know how critical this call is. We have sdsll2str in the code base, which should do effectively the same thing. Would be interesting to see if it has an impact to use it here instead.

This comment has been minimized.

@mattsta

mattsta Nov 20, 2014
Author Contributor

Good catch. That has been on my list of things to replace, but I kept forgetting when sitting in front of a keyboard. I'll get it added.

quicklistNode *n;
unsigned long long accum = 0;
unsigned long long index;
int forward = idx < 0 ? 0 : 1; /* < 0 -> reverse, positive -> forward */

This comment has been minimized.

@badboy

badboy Nov 20, 2014
Member

I think the comment here is not totally correct (the code is though).
Isn't it: negative -> backward, non-negative -> forward?

initEntry(entry);
entry->quicklist = quicklist;

if (!forward) {

This comment has been minimized.

@badboy

badboy Nov 20, 2014
Member

Why negate here? For consistency (as I think it's easier to read and done further down anyway) I would swap the if/else conditions.

if (forward) { 
        index = idx;
        n = quicklist->head;
} else {
        index = (-idx) - 1;
        n = quicklist->tail;
}

This comment has been minimized.

@mattsta

mattsta Nov 20, 2014
Author Contributor

I think it was inspired by https://github.com/antirez/redis/blob/unstable/src/ziplist.c#L680-L681

For readability, showing the first special case of "negate index, subtract one" helps the reader more quickly see something strange is happening since it's the first case.

if ((accum + n->count) > index) {
break;
} else {
D("Skipping over (%p) %u at accum %lld", n, n->count, accum);

This comment has been minimized.

@badboy

badboy Nov 20, 2014
Member

When compiled with D enabled, the compiler warns that n has the wrong type. For cosmetic reasons we could cast it to "void*" here.

This comment has been minimized.

@mattsta

mattsta Nov 20, 2014
Author Contributor

crazy gcc. will fix.

if (!n)
return 0;

D("Found node: %p at accum %llu, idx %llu, sub+ %llu, sub- %llu", n, accum,

This comment has been minimized.

@badboy

badboy Nov 20, 2014
Member

Same as above. (void*)

}

void quicklistInsertAfter(quicklist *quicklist, const size_t fill,
quicklistEntry *entry, void *value, const size_t sz) {

This comment has been minimized.

@badboy

badboy Nov 20, 2014
Member

Another smallish cosmetic thing here: Inserting a newline after void *value will keep the line under 80 chars just as the function above.

This comment has been minimized.

@mattsta

mattsta Nov 20, 2014
Author Contributor

It's actually exactly 80 characters, so it's still okay.

The quicklist.{c,h} was was formatted with clang-format -style="{BasedOnStyle: llvm, IndentWidth: 4}" which gives reasonable output except for a few small cases here and there (small functions end up on a single line, and in-place struct initializers get too many spaces added around them).

This comment has been minimized.

@badboy

badboy Nov 20, 2014
Member

I see. I just rechecked my config and it turns out for some reason it complained on column 79 instead of 80 -_-

@badboy
Copy link
Member

@badboy badboy commented Nov 20, 2014

Impressive work. I read the code all in once and it was quite clear and concise. I saw no immediate bugs. Also great you have extensive test code in there (nearly half the new code is tests).

I'm sure we can get this into unstable soon. ✈️

@mattsta mattsta force-pushed the mattsta:quicklist branch 3 times, most recently from 38f2a2b to c36c9d5 Nov 20, 2014
@mattsta
Copy link
Contributor Author

@mattsta mattsta commented Nov 20, 2014

Thanks for the review! I incorporated your fixes (plus a few more cleanups) into the existing commit. The diff is below since it's difficult to see changes between rebased forced pushes.

I don't see any warnings against clang or gcc-4.9.2 anymore, but feel free to try more compilers and tests. The more problems we find now the fewer cleanup commits we have to apply later. :)

matt@ununoctium:~/repos/redis/src% git diff
diff --git a/src/quicklist.c b/src/quicklist.c
index bb29121..081296d 100644
--- a/src/quicklist.c
+++ b/src/quicklist.c
@@ -28,12 +28,15 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */

-#include <stdlib.h>
-#include <stdio.h>  /* for snprintf */
 #include <string.h> /* for memcpy */
 #include "quicklist.h"
 #include "zmalloc.h"
 #include "ziplist.h"
+#include "util.h" /* for ll2string */
+
+#if defined(REDIS_TEST) || defined(REDIS_TEST_VERBOSE)
+#include <stdio.h> /* for printf (debug printing), snprintf (genstr) */
+#endif

 /* If not verbose testing, remove all debug printing. */
 #ifndef REDIS_TEST_VERBOSE
@@ -323,15 +326,14 @@ static quicklistNode *_quicklistZiplistMerge(quicklist *quicklist,
       target->count);

     int where;
-
     unsigned char *p = NULL;
     if (target == a) {
         /* If target is node a, we append node b to node a, in-order */
         where = ZIPLIST_TAIL;
         p = ziplistIndex(b->zl, 0);
         D("WILL TRAVERSE B WITH LENGTH: %u, %u", b->count, ziplistLen(b->zl));
-    } else if (target == b) {
-        /* If target b, we pre-pend node a to node b, in reverse order of a */
+    } else {
+        /* If target b, we prepend node a to node b, in reverse order of a */
         where = ZIPLIST_HEAD;
         p = ziplistIndex(a->zl, -1);
         D("WILL TRAVERSE A WITH LENGTH: %u, %u", a->count, ziplistLen(a->zl));
@@ -347,7 +349,7 @@ static quicklistNode *_quicklistZiplistMerge(quicklist *quicklist,
      * complex than using the existing ziplist API to read/push as below. */
     while (ziplistGet(p, &val, &sz, &longval)) {
         if (!val) {
-            sz = snprintf(lv, sizeof(lv), "%lld", longval);
+            sz = ll2string(lv, sizeof(lv), longval);
             val = (unsigned char *)lv;
         }
         target->zl = ziplistPush(target->zl, val, sz, where);
@@ -747,7 +749,7 @@ int quicklistNext(quicklistIter *iter, quicklistEntry *entry) {
         return 0;
     }

-    unsigned char *(*nextFn)(unsigned char *, unsigned char*) = NULL;
+    unsigned char *(*nextFn)(unsigned char *, unsigned char *) = NULL;
     int offset_update = 0;

     if (!iter->zi) {
@@ -842,7 +844,7 @@ int quicklistIndex(const quicklist *quicklist, const long long idx,
     quicklistNode *n;
     unsigned long long accum = 0;
     unsigned long long index;
-    int forward = idx < 0 ? 0 : 1; /* < 0 -> reverse, positive -> forward */
+    int forward = idx < 0 ? 0 : 1; /* < 0 -> reverse, 0+ -> forward */

     initEntry(entry);
     entry->quicklist = quicklist;
@@ -862,7 +864,8 @@ int quicklistIndex(const quicklist *quicklist, const long long idx,
         if ((accum + n->count) > index) {
             break;
         } else {
-            D("Skipping over (%p) %u at accum %lld", n, n->count, accum);
+            D("Skipping over (%p) %u at accum %lld", (void *)n, n->count,
+              accum);
             accum += n->count;
             n = forward ? n->next : n->prev;
         }
@@ -871,8 +874,8 @@ int quicklistIndex(const quicklist *quicklist, const long long idx,
     if (!n)
         return 0;

-    D("Found node: %p at accum %llu, idx %llu, sub+ %llu, sub- %llu", n, accum,
-      index, index - accum, (-index) - 1 + accum);
+    D("Found node: %p at accum %llu, idx %llu, sub+ %llu, sub- %llu", (void *)n,
+      accum, index, index - accum, (-index) - 1 + accum);

     entry->node = n;
     if (forward) {
@@ -906,7 +909,7 @@ void quicklistRotate(quicklist *quicklist, const size_t fill) {
     /* If value found is NULL, then ziplistGet populated longval instead */
     if (!value) {
         /* Write the longval as a string so we can re-add it */
-        int wrote = snprintf(longstr, sizeof(longstr), "%lld", longval);
+        int wrote = ll2string(longstr, sizeof(longstr), longval);
         value = (unsigned char *)longval;
         sz = wrote;
     }
@@ -1017,15 +1020,16 @@ void quicklistPush(quicklist *quicklist, const size_t fill, void *value,

 /* The rest of this file is test cases and test helpers. */
 #ifdef REDIS_TEST
-#include <stdio.h>
 #include <stdint.h>

 #define assert(_e)                                                             \
-    ((_e) ? (void)0 : (_assert(#_e, __FILE__, __LINE__), exit(1)))
-static void _assert(char *estr, char *file, int line) {
-    printf("\n\n=== ASSERTION FAILED ===\n");
-    printf("==> %s:%d '%s' is not true\n", file, line, estr);
-}
+    do {                                                                       \
+        if (!(_e)) {                                                           \
+            printf("\n\n=== ASSERTION FAILED ===\n");                          \
+            printf("==> %s:%d '%s' is not true\n", __FILE__, __LINE__, #_e);   \
+            err++;                                                             \
+        }                                                                      \
+    } while (0)

 #define yell(str, ...) printf("ERROR! " str "\n\n", __VA_ARGS__)

@@ -1718,7 +1722,7 @@ int quicklistTest(int argc, char *argv[]) {
         long long nums[5000];
         for (int i = 0; i < 5000; i++) {
             nums[i] = -5157318210846258176 + i;
-            int sz = snprintf(num, sizeof(num), "%lld", nums[i]);
+            int sz = ll2string(num, sizeof(num), nums[i]);
             quicklistPushTail(ql, F, num, sz);
         }
         quicklistPushTail(ql, F, "xxxxxxxxxxxxxxxxxxxx", 20);
@@ -1876,7 +1880,7 @@ int quicklistTest(int argc, char *argv[]) {
             long long nums[5000];
             for (int i = 0; i < 760; i++) {
                 nums[i] = -5157318210846258176 + i;
-                int sz = snprintf(num, sizeof(num), "%lld", nums[i]);
+                int sz = ll2string(num, sizeof(num), nums[i]);
                 quicklistPushTail(ql, f, num, sz);
             }

@@ -1901,7 +1905,7 @@ int quicklistTest(int argc, char *argv[]) {
             long long nums[5000];
             for (int i = 0; i < 32; i++) {
                 nums[i] = -5157318210846258176 + i;
-                int sz = snprintf(num, sizeof(num), "%lld", nums[i]);
+                int sz = ll2string(num, sizeof(num), nums[i]);
                 quicklistPushTail(ql, f, num, sz);
             }
             if (f == 32)
@@ -1929,7 +1933,7 @@ int quicklistTest(int argc, char *argv[]) {
             long long nums[5000];
             for (int i = 0; i < 33; i++) {
                 nums[i] = i;
-                int sz = snprintf(num, sizeof(num), "%lld", nums[i]);
+                int sz = ll2string(num, sizeof(num), nums[i]);
                 quicklistPushTail(ql, f, num, sz);
             }
             if (f == 32)
@@ -1973,7 +1977,7 @@ int quicklistTest(int argc, char *argv[]) {
             long long nums[5000];
             for (int i = 0; i < 33; i++) {
                 nums[i] = -5157318210846258176 + i;
-                int sz = snprintf(num, sizeof(num), "%lld", nums[i]);
+                int sz = ll2string(num, sizeof(num), nums[i]);
                 quicklistPushTail(ql, f, num, sz);
             }
             if (f == 32)
@@ -1999,7 +2003,7 @@ int quicklistTest(int argc, char *argv[]) {
             long long nums[5000];
             for (int i = 0; i < 33; i++) {
                 nums[i] = -5157318210846258176 + i;
-                int sz = snprintf(num, sizeof(num), "%lld", nums[i]);
+                int sz = ll2string(num, sizeof(num), nums[i]);
                 quicklistPushTail(ql, f, num, sz);
             }
             if (f == 32)
@mattsta mattsta force-pushed the mattsta:quicklist branch from c36c9d5 to 5769575 Nov 20, 2014
/* NOTE: We could potentially create a built-in ziplist operation
* allowing direct merging of two ziplists. It would be more memory
* efficient (one big realloc instead of incremental), but it's more
* complex than using the existing ziplist API to read/push as below. */

This comment has been minimized.

@sunheehnus

sunheehnus Nov 21, 2014
Contributor

I tried to create the new operation with __ziplistCascadeUpdate().
It may be useful to rewrite this part. :)
commit

@mattsta mattsta force-pushed the mattsta:quicklist branch from 5769575 to 3bd37fe Nov 22, 2014
}

/* Use result of center merge to try and merge result with next node. */
if (target && target->next &&

This comment has been minimized.

@sunheehnus

sunheehnus Nov 22, 2014
Contributor

If center and center->prev doesn't merge, center and center->next will definitely not merge?

This comment has been minimized.

@mattsta

mattsta Nov 22, 2014
Author Contributor

Let's draw it! The numbers at the top boxes are the count for that node.

  +------------+  +-----------+  +------------+  +-------------+  +------------+
  |     1      |  |     2     |  |      3     |  |      4      |  |     5      |
  |            |  |           |  |            |  |             |  |            |
  | prev->prev |  |  prev     |  |  center    |  | next        |  | next->next |
  |            |  |           |  |            |  |             |  |            |
  |            |  |           |  |            |  |             |  |            |
  +------------+  +-----------+  +------------+  +-------------+  +------------+

        +------------------+                          +--------------------+
        |                  |                          |                    |
        |                  |                          |                    |
        |                  |                          |                    |
        |(prev->prev)+prev |                          | next+(next->next)  |
        |                  |                          |                    |
        |    (1 + 2) = 3   |                          |     (4 + 5) = 9    |
        |                  |                          |                    |
        +------------------+                          +--------------------+
        (this is new 'prev')                           (this is new 'next')

                       +-------------------+
                       |                   |
                       |                   |
                       |    prev+center    |
                       |                   |
                       |    (3 + 3) = 6    |
                       |                   |
                       +-------------------+
                       (this now 'target',
                        since it potentially
                        invalidated 'center'
                        during the merge)

                                       +------------------------+
                                       |                        |
                                       |                        |
                                       |      target+next       |
                                       |                        |
                                       |    (6 + 9) = 15        |
                                       |                        |
                                       |                        |
                                       +------------------------+
                                       now all five original nodes
                                       are merged into one node of
                                       length 15

So, the case you noted is:

If center and center->prev doesn't merge, center and center->next will definitely not merge?

That's a good point since target is only created if prev is merged. We need to set target = center if the (center+center->prev) doesn't happen. I'll add it.

Thanks!

@mattsta mattsta force-pushed the mattsta:quicklist branch from 3bd37fe to c2914dc Nov 22, 2014
@mattsta
Copy link
Contributor Author

@mattsta mattsta commented Nov 22, 2014

Updated with below diff thanks to sunheehnus at #2143 (comment)!

The fix also caught an invalid check in a test case. Now that we're merging quicklist nodes more properly, we use fewer nodes in a test (26 -> 25) and that's a good thing.

diff --git a/src/quicklist.c b/src/quicklist.c
index dbb2dae..c27827b 100644
--- a/src/quicklist.c
+++ b/src/quicklist.c
@@ -380,6 +380,9 @@ static void _quicklistMergeNodes(quicklist *quicklist, const size_t fill,
     if (center->prev && (center->count + center->prev->count) <= fill) {
         target = _quicklistZiplistMerge(quicklist, center->prev, center);
         center = NULL; /* center could have been deleted, invalidate it. */
+    } else {
+        /* If can't merge here, target still needs to be valid for below. */
+        target = center;
     }

     /* Use result of center merge to try and merge result with next node. */
@@ -1467,7 +1470,7 @@ int quicklistTest(int argc, char *argv[]) {
                 quicklistInsertBefore(ql, f, &entry, genstr("abc", i), 32);
             }
             if (f == 32)
-                ql_verify(ql, 26, 750, 32, 20);
+                ql_verify(ql, 25, 750, 32, 20);
             quicklistRelease(ql);
         }
     }
@mattsta mattsta force-pushed the mattsta:quicklist branch from c2914dc to cdcef25 Nov 22, 2014
@antirez
Copy link
Contributor

@antirez antirez commented Nov 23, 2014

Love the feature and that there are even reviews here. Thank you. Tomorrow I'll be back in Sicily from Bracelona, and Monday I'll review the code as well.


new_node->zl = zmalloc(zl_sz);
if (!new_node->zl)
return NULL;

This comment has been minimized.

@sunheehnus

sunheehnus Nov 23, 2014
Contributor

Forget to free new_node :)

new_node = _quicklistSplitNode(node, entry->offset, after);
new_node->zl = ziplistPush(new_node->zl, value, sz,
after ? ZIPLIST_HEAD : ZIPLIST_TAIL);
new_node->count++;

This comment has been minimized.

@sunheehnus

sunheehnus Nov 23, 2014
Contributor

As _quicklistSplitNode puts it this way,

If after==1, then the returned node has elements [OFFSET, END].
Otherwise if after==0, then the new node has [0, OFFSET)

Suppose the former quicklistNode is [0, 1, 2, 3, 4, 5] & entry->offset is 3.
And it will split into two quicklistNode [0, 1, 2] [3, 4, 5]

If after==1, then the new_node is [3, 4, 5]
Otherwise if after==0, then the new_node is [0, 1, 2]

Then,

If after==1, we push the value at HEAD and the new_node is [value, 3, 4, 5]
Otherwise if after==0, we push the value at TAIL and the new_node is [0, 1, 2, value]

What's more it forget to deal with the NULL result _quicklistSplitNode returns.

This comment has been minimized.

@sunheehnus

sunheehnus Nov 23, 2014
Contributor

I saw your trick to deal with this,
int orig_start = after ? offset + 1 : 0;
int orig_extent = after ? -1 : offset;
int new_start = after ? 0 : offset;
int new_extent = after ? offset + 1 : -1;
with offset + 1 :)
But in that way, the comment of _quicklistSplitNode is not accurate any more. :)

This comment has been minimized.

@mattsta

mattsta Nov 23, 2014
Author Contributor

What's more it forget to deal with the NULL result _quicklistSplitNode returns.

Dealing with dead memory allocations here is tricky. The Redis allocator actually just kills the entire process on OOM, so we don't need extensive malloc checking (and there isn't anything sensible we can do if allocating a new node fails anyway).

But in that way, the comment of _quicklistSplitNode is not accurate any more. :)

Oh, good point. I'll update it and note the changes below.

The comment does have a reinforcing statement of The input node keeps all elements not taken by the returned node just in case any of the examples given were actually wrong. :)

new_node->zl = ziplistPush(new_node->zl, value, sz, ZIPLIST_TAIL);
new_node->count++;
} else if (full && ((at_tail && node->next && full_next && after) ||
(at_head && node->prev && full_prev && !after))) {

This comment has been minimized.

@sunheehnus

sunheehnus Nov 23, 2014
Contributor

Is it okay that we just judge else if (full && (at_tail || at_head))?
This will involve some cases next else if owns, and make the cases more clear. :)

This comment has been minimized.

@mattsta

mattsta Nov 23, 2014
Author Contributor

MORE DRAWINGS!

Let's say maximum fill is 6. Nodes here can't have more than 6 entries.

If we want to insert when two neighboring nodes are full, we have:

         +-------------------+      +---------------------+
         |  Node A           |      |   Node B            |
         |                   |      |                     |
         |                   |      |                     |
         |  1,2,3,4,5,6      |      |   7,8,9,10,11,12    |
         |                   |      |                     |
         |                   |      |                     |
         |                   |      |                     |
         |                   |      |                     |
         +-------------------+      +---------------------+


                 Requested:
                 Insert new element before position 7


                    So, we want to insert:
                       - BEFORE 7
                       - so as of 7, we are AT HEAD for NODE B
                       - We are AT HEAD and Node B is full
                       - Also, Node A is full
                    Now we have: full && at_head && full_prev && !after

                    So, the only way forward is to allocate an entirely
                    new node and create a ziplist of 1 element.

Actually, as you (i think) noted, the next "splitting node" case can match for this case too, but splitting requires extra copies and math.

Right here we know we can create an entirely new node, so it's quicker/easier/cleaner to just do it directly.

Also, seeing as we are already tracking the combinational state of like 25 things in that if/else tree, more explicit details everywhere make it more clear about what is happening and why (plus optional verbose debugging and comments).

This comment has been minimized.

@sunheehnus

sunheehnus Nov 23, 2014
Contributor

Yap, you make it more clear about what is happening and why.
The case you gave above matches else if (full && (at_tail || at_head)) with above 4 conditions.
I just think a special case (nodeA [1, 2, 3, 4, 5, 6]) -> NULL
We append an element to nodeA, and it will split into two nodes(nodeA and an empty node).
Of course I didn't find any bug here, I just thought the special case should be operated by the 5th branch.

And the suggestion make no progress and mass up existing code. My bad. :)

@mattsta mattsta force-pushed the mattsta:quicklist branch from cdcef25 to 69383be Nov 23, 2014
@mattsta
Copy link
Contributor Author

@mattsta mattsta commented Nov 23, 2014

Updated:

diff --git a/src/quicklist.c b/src/quicklist.c
index 664a6c6..7a2490c 100644
--- a/src/quicklist.c
+++ b/src/quicklist.c
@@ -393,10 +393,22 @@ static void _quicklistMergeNodes(quicklist *quicklist, const size_t fill,
     }
 }

-/* Split 'node' at 'offset' into two parts.
+/* Split 'node' into two parts, parameterized by 'offset' and 'after'.
+ *
+ * The 'after' argument controls which quicklistNode gets returned.
+ * If 'after'==1, returned node has elements after 'offset'.
+ *                input node keeps elements up to 'offset', including 'offset'.
+ * If 'after'==0, returned node has elements up to 'offset', including 'offset'.
+ *                input node keeps elements after 'offset'.
+ *
+ * If 'after'==1, returned node will have elements _after_ 'offset'.
+ *                The returned node will have elements [OFFSET+1, END].
+ *                The input node keeps elements [0, OFFSET].
+ *
+ * If 'after'==0, returned node will keep elements up to and including 'offset'.
+ *                The returned node will have elements [0, OFFSET].
+ *                The input node keeps elements [OFFSET+1, END].
  *
- * If after==1, then the returned node has elements [OFFSET, END].
- * Otherwise if after==0, then the new node has [0, OFFSET)
  * The input node keeps all elements not taken by the returned node.
  *
  * Returns newly created node or NULL if split not possible. */
@@ -409,8 +421,10 @@ static quicklistNode *_quicklistSplitNode(quicklistNode *node, int offset,
         return NULL;

     new_node->zl = zmalloc(zl_sz);
-    if (!new_node->zl)
+    if (!new_node->zl) {
+        zfree(new_node);
         return NULL;
+    }

     /* Copy original ziplist so we can split it */
     memcpy(new_node->zl, node->zl, zl_sz);

Thanks for the ongoing reviews!

/* Create new node consisting of an entire pre-formed ziplist.
* Used for loading old RDBs where entire ziplists have been stored
* to be restored later. */
void quicklistPushTailZiplist(quicklist *quicklist, unsigned char *zl) {

This comment has been minimized.

@sunheehnus

sunheehnus Nov 24, 2014
Contributor

Without fill limitation?

This comment has been minimized.

@mattsta

mattsta Nov 24, 2014
Author Contributor

That's a great observation!

We need to rename quicklistPushTailZiplist(quicklist, ziplist) to something like quicklistAppendValuesFromZiplist(quicklist, fill, ziplist).

Then we just:

  • iterate over the existing ziplist
    • insert each element from the ziplist into tail of the quicklist (quicklistPushTail())
  • then free the original ziplist

Adding each value from the old ziplist individually will create nodes with the proper fill for this instance. (this is only used when users restore an old RDB with a native ziplist encoding. it'll help them rebuild their single (maybe large 512+) entry ziplist into a quicklist with a smaller fill factor.)

I'll add it to the todo list.

@mattsta mattsta force-pushed the mattsta:quicklist branch 2 times, most recently from 109145d to 7afc858 Dec 30, 2014
mattsta added 18 commits Nov 13, 2014
This replaces individual ziplist vs. linkedlist representations
for Redis list operations.

Big thanks for all the reviews and feedback from everybody in
#2143
This started out as #2158 by sunheehnus, but I kept rewriting it
until I could understand things more easily and get a few more
correctness guarantees out of the readability flow.

The original commit created and returned a new ziplist with the contents of
both input ziplists, but I prefer to grow one of the input ziplists
and destroy the other one.

So, instead of malloc+copy as in #2158, the merge now reallocs one of
the existing ziplists and copies the other ziplist into the new space.

Also added merge test cases to ziplistTest()
Freeing our test lists helps keep valgrind output clean
Fill factor now has two options:
  - negative (1-5) for size-based ziplist filling
  - positive for length-based ziplist filling with implicit size cap.

Negative offsets define ziplist size limits of:
  -1: 4k
  -2: 8k
  -3: 16k
  -4: 32k
  -5: 64k

Positive offsets now automatically limit their max size to 8k.  Any
elements larger than 8k will be in individual nodes.

Positive ziplist fill factors will keep adding elements
to a ziplist until one of:
  - ziplist has FILL number of elements
    - or -
  - ziplist grows above our ziplist max size (currently 8k)

When using positive fill factors, if you insert a large
element (over 8k), that element will automatically allocate
an individual quicklist node with one element and no other elements will be
in the same ziplist inside that quicklist node.

When using negative fill factors, elements up to the size
limit can be added to one quicklist node.  If an element
is added larger than the max ziplist size, that element
will be allocated an individual ziplist in a new quicklist node.

Tests also updated to start testing at fill factor -5.
Use the existing memory space for an SDS to convert it to a regular
character buffer so we don't need to allocate duplicate space just
to extract a usable buffer for native operations.
This saves us an unnecessary zmalloc, memcpy, and two frees.
Turns out it's a huge improvement during save/reload/migrate/restore
because, with compression enabled, we're compressing 4k or 8k
chunks of data consisting of multiple elements in one ziplist
instead of compressing series of smaller individual elements.
Previously, the old test ran 5,000 loops and used about 500k.

With quicklist, storing those same 5,000 loops takes up 24k, so the
"large value check" failed!

This increases the test to 20,000 loops which makes the object dump 96k.
We trust zmalloc to kill the whole process on memory failure
Added field 'ql_nodes' and 'ql_avg_per_node'.

ql_nodes is the number of quicklist nodes in the quicklist.
ql_avg_node is the average fill level in each quicklist node. (LLEN / QL_NODES)

Sample output:
127.0.0.1:6379> DEBUG object b
Value at:0x7fa42bf2fed0 refcount:1 encoding:quicklist serializedlength:18489 lru:8983768 lru_seconds_idle:3 ql_nodes:430 ql_avg_per_node:511.73
127.0.0.1:6379> llen b
(integer) 220044
Let user set how many nodes to *not* compress.

We can specify a compression "depth" of how many nodes
to leave uncompressed on each end of the quicklist.

Depth 0 = disable compression.
Depth 1 = only leave head/tail uncompressed.
  - (read as: "skip 1 node on each end of the list before compressing")
Depth 2 = leave head, head->next, tail->prev, tail uncompressed.
  - ("skip 2 nodes on each end of the list before compressing")
Depth 3 = Depth 2 + head->next->next + tail->prev->prev
  - ("skip 3 nodes...")
etc.

This also:
  - updates RDB storage to use native quicklist compression (if node is
    already compressed) instead of uncompressing, generating the RDB string,
    then re-compressing the quicklist node.
  - internalizes the "fill" parameter for the quicklist so we don't
    need to pass it to _every_ function.  Now it's just a property of
    the list.
  - allows a runtime-configurable compression option, so we can
    expose a compresion parameter in the configuration file if people
    want to trade slight request-per-second performance for up to 90%+
    memory savings in some situations.
  - updates the quicklist tests to do multiple passes: 200k+ tests now.
Small fixes due to a new version of clang-format (it's less
crazy than the older version).
Actually makes a noticeable difference.

Branch hints were selected based on profiler hotspots.
This removes:
  - list-max-ziplist-entries
  - list-max-ziplist-value

This adds:
  - list-max-ziplist-size
  - list-compress-depth

Also updates config file with new sections and updates
tests to use quicklist settings instead of old list settings.
Adds: ql_compressed (boolean, 1 if compression enabled for list, 0
otherwise)
Adds: ql_uncompressed_size (actual uncompressed size of all quicklistNodes)
Adds: ql_ziplist_max (quicklist max ziplist fill factor)

Compression ratio of the list is then ql_uncompressed_size / serializedlength

We report ql_uncompressed_size for all quicklists because serializedlength
is a _compressed_ representation anyway.

Sample output from a large list:
127.0.0.1:6379> llen abc
(integer) 38370061
127.0.0.1:6379> debug object abc
Value at:0x7ff97b51d140 refcount:1 encoding:quicklist serializedlength:19878335 lru:9718164 lru_seconds_idle:5 ql_nodes:21945 ql_avg_node:1748.46 ql_ziplist_max:-2 ql_compressed:0 ql_uncompressed_size:1643187761
(1.36s)

The 1.36s result time is because rdbSavedObjectLen() is serializing the
object, not because of any new stats reporting.

If we run DEBUG OBJECT on a compressed list, DEBUG OBJECT takes almost *zero*
time because rdbSavedObjectLen() reuses already-compressed ziplists:
127.0.0.1:6379> debug object abc
Value at:0x7fe5c5800040 refcount:1 encoding:quicklist serializedlength:19878335 lru:9718109 lru_seconds_idle:5 ql_nodes:21945 ql_avg_node:1748.46 ql_ziplist_max:-2 ql_compressed:1 ql_uncompressed_size:1643187761
This also defines REDIS_STATIC='' for building everything
inside src/ and everything inside deps/lua/.
@mattsta mattsta force-pushed the mattsta:quicklist branch from 7afc858 to 5870e22 Jan 2, 2015
@antirez
Copy link
Contributor

@antirez antirez commented Jan 7, 2015

Hello Matt, in order to merge I'm doing my changes to RDB here:

https://github.com/antirez/redis/commits/rdbchanges

I see that we diverged a bit, you mostly changed spacing and did a force update: trivial to rebase my changes upon yours, but please for future change add commits instead of force-updating so that it will be simpler to merge. Thx!

antirez added a commit that referenced this pull request Jan 8, 2015
Quicklist (linked list + ziplist)
@antirez antirez merged commit 05ba119 into redis:unstable Jan 8, 2015
@antirez
Copy link
Contributor

@antirez antirez commented Jan 8, 2015

Merged! Applying my commits from the other branch to master as well, and RDB v7 is done...

@mattsta
Copy link
Contributor Author

@mattsta mattsta commented Jan 9, 2015

I see that we diverged a bit,

Sorry for the confusion! Thanks for getting this added (and thanks for the new K/V RDB format)!

trivial to rebase my changes upon yours, but please for future change add commits instead of force-updating so that it will be simpler to merge.

That's the problem with distributed revision control (and github PRs)... the code lives in the author's repo and can be updated at any time.

If I did only add new commits after this PR was first created, it would have 100 cleanup commits sitting here. :)

The real problem is git has a bad default interface. The default should be git pull -r to automatically fix any upstream changes. Without git pull -r, git thinks everything changed on top of existing changes, and it can't deal with it.

So, if git pull freaks out, run git merge --abort then git pull -r.

Or, worst case, git checkout unstable; git branch -D [messed up branch]; git checkout [clean branch] (assuming the branch is only the PR branch without any local changes on it).

:shipit:

@hey-jude
Copy link

@hey-jude hey-jude commented Dec 11, 2015

👍
Which version can I use this feature?

@badboy
Copy link
Member

@badboy badboy commented Dec 11, 2015

It's currently in the testing branch and will thus end up in the upcoming 3.2 release.

JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Aug 29, 2016
This replaces individual ziplist vs. linkedlist representations
for Redis list operations.

Big thanks for all the reviews and feedback from everybody in
redis#2143
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants