Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace all usage of ziplist with listpack for t_hash #8887

Merged
merged 78 commits into from Aug 10, 2021

Conversation

sundb
Copy link
Collaborator

@sundb sundb commented Apr 29, 2021

Part one of implementing #8702 (taking hashes first before other types)

Description of the feature

  1. Change ziplist encoded hash objects to listpack encoding.
  2. Convert existing ziplists on RDB loading time. an O(n) operation.

Rdb format changes

  1. Add RDB_TYPE_HASH_LISTPACK rdb type.
  2. Bump RDB_VERSION to 10

Interface changes

  1. New hash-max-listpack-entries config is an alias for hash-max-ziplist-entries (same with hash-max-listpack-value)
  2. OBJECT ENCODING will return listpack instead of ziplist

Listpack improvements:

  1. Support direct insert, replace integer element (rather than convert back and forth from string)
  2. Add more listpack capabilities to match the ziplist ones (like lpFind, lpRandomPairs and such)
  3. Optimize element length fetching, avoid multiple calculations
  4. Use inline to avoid function call overhead.

Tests

  1. Add a new test to the RDB load time conversion
  2. Adding the listpack unit tests. (based on the one in ziplist.c)
  3. Add a few "corrupt payload: fuzzer findings" tests, and slightly modify existing ones.

@oranagra oranagra added this to Backlog in 7.0 via automation May 6, 2021
@oranagra oranagra moved this from Backlog to In Review in 7.0 May 6, 2021
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sundb thank you for that effort.

i've added a few comments inside the code, but here are some general ones:

  1. i don't like the term "list container". in "ziplist" and "listpack", we mention it is zipped, or packed, i.e. encoded. this container can be more easily mistaken to be something related to list data type rather than an encoding type.
    maybe a term like "encoded list" or "packed list" (i.e. something that is generic, and both ziplist and listpack comply to, i.e. both ziplist and listpack are encoded and packed).
  2. currently we only create new hashes with listpack, and when modifying an existing hashes we'll always keep them as ziplists.
    we need to see if we can come up with a plan to gradually convert them.
    i.e. i don't want an O(N) operation at load time or any other time, but eventually i want them all gone.
  3. we have a problem with testing. the majority of the tests work by creating a new db from scratch, so the ziplist code is now mostly unreachable.
    maybe we need some DEBUG sub-command that will tell redis to default to creating ziplists, and then run the entire testsuite in that mode (in the daily CI)
    i'm specifically worried about the corrupt-dump-fuzzer test. we either need to save an old rdb file in the assets folder, or let it run twice with that DEBUG tweak i suggested.

src/server.h Outdated Show resolved Hide resolved
src/server.h Outdated Show resolved Hide resolved
src/db.c Outdated Show resolved Hide resolved
src/listpack.c Outdated Show resolved Hide resolved
src/listpack.c Show resolved Hide resolved
src/t_hash.c Outdated Show resolved Hide resolved
src/t_hash.c Outdated Show resolved Hide resolved
tests/unit/keyspace.tcl Outdated Show resolved Hide resolved
@oranagra
Copy link
Member

oranagra commented May 6, 2021

One more thing, we need an option to convert the encoding at rdb loading time.
either when doing full sanitization, since in that case we already do O(N) operation anyway, but also maybe some people will want to do that conversion at upgrade time (slave will take longer to go online), maybe we'll even make it the default one day.

i think it would also be a good idea to benchmark such a conversion, and get a feeling of how much longer it takes to load an rdb file if we do a conversion during loading vs one that just loads the ziplists as they are. can you try doing such a benchmark?

@sundb
Copy link
Collaborator Author

sundb commented May 7, 2021

  1. i don't like the term "list container". in "ziplist" and "listpack", we mention it is zipped, or packed, i.e. encoded. this container can be more easily mistaken to be something related to list data type rather than an encoding type.
    maybe a term like "encoded list" or "packed list" (i.e. something that is generic, and both ziplist and listpack comply to, i.e. both ziplist and listpack are encoded and packed).
I also do not like "list container", it is too difficult to come up with a good name.
  1. currently we only create new hashes with listpack, and when modifying an existing hashes we'll always keep them as ziplists.
    we need to see if we can come up with a plan to gradually convert them.
    i.e. i don't want an O(N) operation at load time or any other time, but eventually i want them all gone.
Maybe we can convert ziplist to listpack when call hashTypeInitIterator.
  1. we have a problem with testing. the majority of the tests work by creating a new db from scratch, so the ziplist code is now mostly unreachable.
    maybe we need some DEBUG sub-command that will tell redis to default to creating ziplists, and then run the entire testsuite in that mode (in the daily CI)
    i'm specifically worried about the corrupt-dump-fuzzer test. we either need to save an old rdb file in the assets folder, or let it run twice with that DEBUG tweak i suggested.
I have write corrupt-dump-fuzzer test in #8761, I will complete these tests.

@sundb
Copy link
Collaborator Author

sundb commented May 8, 2021

@oranagra I tested the speed of rdb loading, before(keep ziplsit) and after(convert ziplist) using the same dump.rdb, creating 100000 keys per test and filling value with random strings.
It looks like the speed becomes 3 times of the original.

entries num of one ziplist max value size rdb loading time without convert rdb loading time with convert
256 64bytes 6.888s 18.078s
256 32bytes 6.849s 17.869s
256 16bytes 6.991s 18.0842s
128 64bytes 3.553s 8.983s
128 32bytes 3.521s 9.0222s
128 16bytes 3.558s 8.982s

1) Fix CR

2) In resolving conflict, I reverted all corrupt-test tests, as some
were fixed in redis#9302, and Re-add the new issue caused this pr.

3) ziplistValidateIntegrity fails to validate an empty ziplsit because
hash ziplist->listpack conversion is always deep santization, this
should be fixed in redis#9297 but was missed.
tests/integration/corrupt-dump.tcl Outdated Show resolved Hide resolved
tests/integration/corrupt-dump.tcl Outdated Show resolved Hide resolved
src/ziplist.c Outdated Show resolved Hide resolved
src/listpack.c Show resolved Hide resolved
@oranagra oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus labels Aug 10, 2021
@oranagra oranagra merged commit 02fd76b into redis:unstable Aug 10, 2021
@oranagra oranagra moved this from In Review to Done in 7.0 Aug 11, 2021
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
Part one of implementing redis#8702 (taking hashes first before other types)

## Description of the feature
1. Change ziplist encoded hash objects to listpack encoding.
2. Convert existing ziplists on RDB loading time. an O(n) operation.

## Rdb format changes
1. Add RDB_TYPE_HASH_LISTPACK rdb type.
2. Bump RDB_VERSION to 10

## Interface changes
1. New `hash-max-listpack-entries` config is an alias for `hash-max-ziplist-entries` (same with `hash-max-listpack-value`)
2. OBJECT ENCODING will return `listpack` instead of `ziplist`

## Listpack improvements:
1. Support direct insert, replace integer element (rather than convert back and forth from string)
3. Add more listpack capabilities to match the ziplist ones (like `lpFind`, `lpRandomPairs` and such)
4. Optimize element length fetching, avoid multiple calculations
5. Use inline to avoid function call overhead.

## Tests
1. Add a new test to the RDB load time conversion
2. Adding the listpack unit tests. (based on the one in ziplist.c)
3. Add a few "corrupt payload: fuzzer findings" tests, and slightly modify existing ones.

Co-authored-by: Oran Agra <oran@redislabs.com>
oranagra pushed a commit that referenced this pull request Sep 9, 2021
Part two of implementing #8702 (zset), after #8887.

## Description of the feature
Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance.

## Rdb format changes
New `RDB_TYPE_ZSET_LISTPACK` rdb type.

## Rdb loading improvements:
1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist.
2) Simplifying the release of empty key objects when RDB loading.
3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c.

## Interface changes
1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`).
2) OBJECT ENCODING will return listpack instead of ziplist.

## Listpack improvements:
1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack.
2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string.
3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`.

## Zset improvements:
1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop.
2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset.

## Tests
1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function.
2) Add zset RDB loading test.
3) Add benchmark test for `lpCompare` and `ziplsitCompare`.
4) Add empty listpack zset corrupt dump test.
@sundb sundb deleted the listpack-migration-hash branch September 16, 2021 06:36
oranagra added a commit that referenced this pull request Nov 24, 2021
Part three of implementing #8702, following #8887 and #9366 .

## Description of the feature
1. Replace the ziplist container of quicklist with listpack.
2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation.

## Interface changes
1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`.
2. Replace `debug ziplist` command with `debug listpack`.

## Internal changes
1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`)
2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`)
3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following #9357 ).
    It represent that a quicklistNode is a packed node, as opposed to a plain node.
4. Remove `createZiplistObject` method, which is never used.
5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`.
    We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k.

## Improvements
1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at #9366.
2. Optimize `quicklistAppendPlainNode` to avoid memcpy data.

## Bugfix
1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from #9366.

## Test
1. Add unittest for `lpMerge`.
2. Modify the old quicklist ziplist corrupt dump test.

Co-authored-by: Oran Agra <oran@redislabs.com>
hwware pushed a commit to hwware/redis that referenced this pull request Dec 20, 2021
Part three of implementing redis#8702, following redis#8887 and redis#9366 .

## Description of the feature
1. Replace the ziplist container of quicklist with listpack.
2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation.

## Interface changes
1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`.
2. Replace `debug ziplist` command with `debug listpack`.

## Internal changes
1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`)
2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`)
3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following redis#9357 ).
    It represent that a quicklistNode is a packed node, as opposed to a plain node.
4. Remove `createZiplistObject` method, which is never used.
5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`.
    We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k.

## Improvements
1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at redis#9366.
2. Optimize `quicklistAppendPlainNode` to avoid memcpy data.

## Bugfix
1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from redis#9366.

## Test
1. Add unittest for `lpMerge`.
2. Modify the old quicklist ziplist corrupt dump test.

Co-authored-by: Oran Agra <oran@redislabs.com>
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request May 19, 2022
Remove some dead code in object.c, ziplist is no longer used in 7.0

Some backgrounds:
zipmap - hash: replaced by ziplist in redis#285
ziplist - hash: replaced by listpack in redis#8887
ziplist - zset: replaced by listpack in redis#9366
ziplist - list: replaced by quicklist (listpack) in redis#2143 / redis#9740
@dev-lemontree
Copy link

@sundb @oranagra I have a question. It can still support old RDB Version?
If we try to migrate Redis 6 to Redis 7 with replication, Maybe Redis 6 creates RDB with old version(using ziplist), Does Redis 7 load it(Redis 6 RDB)?
It is very common pattern to upgrade redis server.

@sundb
Copy link
Collaborator Author

sundb commented May 19, 2022

@dev-lemontree Indeed, It is backward compatible.

oranagra pushed a commit that referenced this pull request May 22, 2022
Remove some dead code in object.c, ziplist is no longer used in 7.0

Some backgrounds:
zipmap - hash: replaced by ziplist in #285
ziplist - hash: replaced by listpack in #8887
ziplist - zset: replaced by listpack in #9366
ziplist - list: replaced by quicklist (listpack) in #2143 / #9740

Moved the location of ziplist.h in the server.c
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
Remove some dead code in object.c, ziplist is no longer used in 7.0

Some backgrounds:
zipmap - hash: replaced by ziplist in redis#285
ziplist - hash: replaced by listpack in redis#8887
ziplist - zset: replaced by listpack in redis#9366
ziplist - list: replaced by quicklist (listpack) in redis#2143 / redis#9740

Moved the location of ziplist.h in the server.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus
Projects
Archived in project
7.0
Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants