Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] listpack migration - replace all usage of ziplist with listpack #8702

Closed
sundb opened this issue Mar 26, 2021 · 8 comments · Fixed by #9740
Closed

[NEW] listpack migration - replace all usage of ziplist with listpack #8702

sundb opened this issue Mar 26, 2021 · 8 comments · Fixed by #9740
Assignees

Comments

@sundb
Copy link
Collaborator

sundb commented Mar 26, 2021

The problem

When inserting or deleting elements in the middle of ziplist, ziplist may have a cascading effect.
ziplist is already used in streams, t_zset and quicklist should also use listpack instead of ziplist.

Description of the feature

Changes:

  1. quicklist is compatible with the old ziplist storage format.
  2. Use listpack instead of ziplsit in t_zset.c.
  3. Use listpack instead of ziplsit in quicklist.
  4. Use listpack instead of ziplsit in hash.
  5. Personality of rdb format.
    • If deep sanitization is disable, does not convert when reading rdb, because the conversion from ziplist to listpack will cause slows downs rdb loading, We should convert when quicklistNode is being accessed.
    • if deep sanitization is enabled, direct ziplist to listpack conversion.
@sundb sundb self-assigned this Mar 26, 2021
@oranagra oranagra added this to To do in 7.0 via automation Mar 26, 2021
@zuiderkwast
Copy link
Contributor

zuiderkwast commented Mar 29, 2021

If we're going to do take on this effort, we could as well take the step to be even more efficient than listpack.

I've figured we can save even more space by utilizing the suffix not only for storing a reverse-encoded length, but to tag the value itself and put payload bits in it. I have draft specs (and an incomplete proof-of-concept) here:

@sundb
Copy link
Collaborator Author

sundb commented Mar 30, 2021

@zuiderkwast The reason for using listpack instead of ziplist is that ziplist may cause cascading updates when insert and delete in middle, which is the biggest problem.
capacity is incremented by 1, 2 or 4, this can lead to a lot of memory waste, ziplsit and listpack are designed to save memory, so is it a loss or gain?

@zuiderkwast
Copy link
Contributor

Thanks for reading it. :-) I know cascading updates is the main potential problem and that there was a bug which was fixed and that's why listpack was designed. Now, it doesn't seem to be a problem in practice anymore though.

Listpack also stores numbers up to 127 in a slightly more compact way. The compactness can be explored further. That's my point. My "bidipack" is not a complete spec yet.

I'll clarify the capacity idea. It's not incremented by 1, 2 or 4 times. (It was badly explained by me.) It is the input to the capacity formula, which if you increment it by 1 gives the underlying allocation sizes of the allocations used by jemalloc (8, 16, 32, 48, ..., 8 KiB, 10 KiB, 12 KiB, .., 160 KiB, 192 KiB, 224 KiB, 256 KiB, ...) as per http://jemalloc.net/jemalloc.3.html#size_classes i.e. four sizes before the allocation is doubled. There is no point to allocate something in between these sizes, since it would be unused anyway. It's the sizes returned by malloc_size() for any allocation.

The problems this allocation scheme is designed to solve is the extra memory moves caused by realloc followed by memmove for every insertion. It's a problem which has been mentioned e.g. by @oranagra in this comment This is not the main part idea of the bidipack though and it can be completely skipped.

@oranagra
Copy link
Member

oranagra commented Mar 30, 2021

@zuiderkwast i guess you meant this comment

i didn't have time to read the entire thing yet, but from the portion i read i think that:

  1. the idea of doing prereallocation is nice speedup, but it obviously comes on the expense of memory. for quicklist and stream it may be ok, since we'll trim that before creating a new node and the excess may be insignificant, but in hash, and zset it's probably not a good idea.
  2. the idea of encoding growth strategy inside the listpack is probably not useful, even if we add some global config for that strategy (unlikely), we're very unlikely to add a per-key config.
  3. encoding the capacity may not be needed, since we can call malloc_size, or even just do a realloc and count on it being very fast when it's a NOP. that's what i means in that other comment, when IIRC at that time i misunderstood that PR (that we should benchmark and see if there's reason to avoid these NOP calls).
  4. one of the reasons behind listpack, was not only to avoid the cascading updates of ziplist, but also to simplify the code and make it less bug prone (we suspect there's still a bug in ziplist causing corruption, but we can't found it).

@zuiderkwast
Copy link
Contributor

Thanks for finding the relevant comment. I suppose many of those things in my draft are not needed, e.g. alloc strategy (unless to optimize for append only, i.e. streams), but the way I wrote that description was as a stand-alone idea and malloc_size isn't widely used. Avoiding calls to malloc_size could be a small speedup and avoiding realloc when it does move data is good to avoid moving data twice, but all that can be skipped for simplicity.

@huangzhw
Copy link
Collaborator

@zuiderkwast Very interesting idea. Maybe a little complicated.

@zuiderkwast
Copy link
Contributor

Thanks @huangzhw, I think we can pick only a small part of it, to make it simple. Or just use listpack since it's already battle-tested in streams.

It could be useful to store a version tag somewhere to be able to update stuff in the future and to support multiple encodings.

@oranagra
Copy link
Member

you can't obviously change the encoding of listpacks in streams without breaking compatibility with old rdb files.

what we can do (either now, or in the future), is use different rdb opcodes when we store the new format like create RDB_TYPE_LIST_QUICKLIST_LISTPACK and RDB_TYPE_STREAM_BIDIPACK (same as the RDB_TYPE_ZSET_2 we have now).

then, we can either convert all payloads at rdb load time (if it's just about adding a version header), or support both side by side (i.e. support both listpack and bidipack, or a listpack with version header field, like we're gonna do for supporting ziplist side by side with listpack).

some formats, like hash, can simply have another (3rd or 4th) encoding, others like quicklist, have an "encoding" (container) field per node (see QUICKLIST_NODE_CONTAINER_ZIPLIST), so we can support both listpack and bidipack.

@oranagra oranagra removed this from Backlog in 7.0 Apr 22, 2021
oranagra added a commit that referenced this issue Aug 10, 2021
Part one of implementing #8702 (taking hashes first before other types)

## Description of the feature
1. Change ziplist encoded hash objects to listpack encoding.
2. Convert existing ziplists on RDB loading time. an O(n) operation.

## Rdb format changes
1. Add RDB_TYPE_HASH_LISTPACK rdb type.
2. Bump RDB_VERSION to 10

## Interface changes
1. New `hash-max-listpack-entries` config is an alias for `hash-max-ziplist-entries` (same with `hash-max-listpack-value`)
2. OBJECT ENCODING will return `listpack` instead of `ziplist`

## Listpack improvements:
1. Support direct insert, replace integer element (rather than convert back and forth from string)
3. Add more listpack capabilities to match the ziplist ones (like `lpFind`, `lpRandomPairs` and such)
4. Optimize element length fetching, avoid multiple calculations
5. Use inline to avoid function call overhead.

## Tests
1. Add a new test to the RDB load time conversion
2. Adding the listpack unit tests. (based on the one in ziplist.c)
3. Add a few "corrupt payload: fuzzer findings" tests, and slightly modify existing ones.

Co-authored-by: Oran Agra <oran@redislabs.com>
JackieXie168 pushed a commit to JackieXie168/redis that referenced this issue Sep 8, 2021
Part one of implementing redis#8702 (taking hashes first before other types)

## Description of the feature
1. Change ziplist encoded hash objects to listpack encoding.
2. Convert existing ziplists on RDB loading time. an O(n) operation.

## Rdb format changes
1. Add RDB_TYPE_HASH_LISTPACK rdb type.
2. Bump RDB_VERSION to 10

## Interface changes
1. New `hash-max-listpack-entries` config is an alias for `hash-max-ziplist-entries` (same with `hash-max-listpack-value`)
2. OBJECT ENCODING will return `listpack` instead of `ziplist`

## Listpack improvements:
1. Support direct insert, replace integer element (rather than convert back and forth from string)
3. Add more listpack capabilities to match the ziplist ones (like `lpFind`, `lpRandomPairs` and such)
4. Optimize element length fetching, avoid multiple calculations
5. Use inline to avoid function call overhead.

## Tests
1. Add a new test to the RDB load time conversion
2. Adding the listpack unit tests. (based on the one in ziplist.c)
3. Add a few "corrupt payload: fuzzer findings" tests, and slightly modify existing ones.

Co-authored-by: Oran Agra <oran@redislabs.com>
oranagra pushed a commit that referenced this issue Sep 9, 2021
Part two of implementing #8702 (zset), after #8887.

## Description of the feature
Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance.

## Rdb format changes
New `RDB_TYPE_ZSET_LISTPACK` rdb type.

## Rdb loading improvements:
1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist.
2) Simplifying the release of empty key objects when RDB loading.
3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c.

## Interface changes
1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`).
2) OBJECT ENCODING will return listpack instead of ziplist.

## Listpack improvements:
1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack.
2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string.
3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`.

## Zset improvements:
1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop.
2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset.

## Tests
1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function.
2) Add zset RDB loading test.
3) Add benchmark test for `lpCompare` and `ziplsitCompare`.
4) Add empty listpack zset corrupt dump test.
@oranagra oranagra added this to Backlog in 7.0 via automation Nov 1, 2021
@oranagra oranagra moved this from Backlog to To Do in 7.0 Nov 1, 2021
@oranagra oranagra linked a pull request Nov 7, 2021 that will close this issue
2 tasks
@oranagra oranagra removed this from To Do in 7.0 Nov 14, 2021
oranagra added a commit that referenced this issue Nov 24, 2021
Part three of implementing #8702, following #8887 and #9366 .

## Description of the feature
1. Replace the ziplist container of quicklist with listpack.
2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation.

## Interface changes
1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`.
2. Replace `debug ziplist` command with `debug listpack`.

## Internal changes
1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`)
2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`)
3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following #9357 ).
    It represent that a quicklistNode is a packed node, as opposed to a plain node.
4. Remove `createZiplistObject` method, which is never used.
5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`.
    We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k.

## Improvements
1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at #9366.
2. Optimize `quicklistAppendPlainNode` to avoid memcpy data.

## Bugfix
1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from #9366.

## Test
1. Add unittest for `lpMerge`.
2. Modify the old quicklist ziplist corrupt dump test.

Co-authored-by: Oran Agra <oran@redislabs.com>
hwware pushed a commit to hwware/redis that referenced this issue Dec 20, 2021
Part three of implementing redis#8702, following redis#8887 and redis#9366 .

## Description of the feature
1. Replace the ziplist container of quicklist with listpack.
2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation.

## Interface changes
1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`.
2. Replace `debug ziplist` command with `debug listpack`.

## Internal changes
1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`)
2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`)
3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following redis#9357 ).
    It represent that a quicklistNode is a packed node, as opposed to a plain node.
4. Remove `createZiplistObject` method, which is never used.
5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`.
    We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k.

## Improvements
1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at redis#9366.
2. Optimize `quicklistAppendPlainNode` to avoid memcpy data.

## Bugfix
1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from redis#9366.

## Test
1. Add unittest for `lpMerge`.
2. Modify the old quicklist ziplist corrupt dump test.

Co-authored-by: Oran Agra <oran@redislabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants