Release 6.0.15 #9266

oranagra · 2021-07-21T15:03:36Z

Upgrade urgency: SECURITY, contains fixes to security issues that affect
authenticated client connections on 32-bit versions. MODERATE otherwise.

Fix integer overflow in BITFIELD on 32-bit versions (CVE-2021-32761).
An integer overflow bug in Redis version 2.2 or newer can be exploited using the
BITFIELD command to corrupt the heap and potentially result with remote code
execution.

Bug fixes that involve behavior changes:

Change reply type for ZPOPMAX/MIN with count in RESP3 to nested array (Change return value type for ZPOPMAX/MIN in RESP3 #8981).
Was using a flat array like in RESP2 instead of a nested array like ZRANGE does.

Bug fixes:

Fail EXEC command in case a watched key is expired (Fail Exec command in case a watched key is expired #9194)
Fix SMOVE not to invalidate dest key (WATCH and tracking) when member already exists (SMOVE only notify dstset when the addition is successful. #9244)
Fix SINTERSTORE not to delete dest key when getting a wrong type error (Fix accidental deletion of sinterstore command when we meet wrong type error. #9032)
Fix overflows on 32-bit versions in GETBIT, SETBIT, BITCOUNT, BITPOS, and BITFIELD (On 32 bit platform, the bit position of GETBIT/SETBIT/BITFIELD/BITCOUNT,BITPOS may overflow #9191)
Set TCP keepalive on inbound cluster bus connections (Set TCP keepalive on inbound clusterbus connections #9230)
Fix ziplist length updates on big-endian platforms (Fix ziplist length updates on bigendian platforms #2080)
Fix diskless replica loading to recover from RDB short read on module AUX data (Avoid exiting to allow diskless loading to recover from RDB short read on module AUX data #9199)
Fix race in client side tracking (Fix race in client side tracking #9116)
If diskless repl child is killed, make sure to reap the child pid (if diskless repl child is killed, make sure to reap the pid #7742)
Add a timeout mechanism for replicas stuck in fullsync (Add a timeout mechanism for replicas stuck in fullsync #8762)

CLI tools:

redis-cli cluster import support source and target that require auth (redis-cli cluster import support source and target that need auth #7994)
redis-cli cluster import command may issue wrong MIGRATE command, sending COPY instead of REPLACE (redis-cli cluster import command may issue wrong MIGRATE command. #8945)
redis-cli support for RESP3 set type in CSV and RAW output (redis-cli: support for REDIS_REPLY_SET in CSV and RAW output. #7338)

Starting redis 6.0 and the changes we made to the diskless master to be suitable for TLS, I made the master avoid reaping (wait3) the pid of the child until we know all replicas are done reading their rdb. I did that in order to avoid a state where the rdb_child_pid is -1 but we don't yet want to start another fork (still busy serving that data to replicas). It turns out that the solution used so far was problematic in case the fork child was being killed (e.g. by the kernel OOM killer), in that case there's a chance that we currently disabled the read event on the rdb pipe, since we're waiting for a replica to become writable again. and in that scenario the master would have never realized the child exited, and the replica will remain hung too. Note that there's no mechanism to detect a hung replica while it's in rdb transfer state. The solution here is to add another pipe which is used by the parent to tell the child it is safe to exit. this mean that when the child exits, for whatever reason, it is safe to reap it. Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was part of redis#6271 (Accelerate diskless master connections) but was dropped when that PR was rebased after the TLS fork/pipe changes (5a47794). Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK has chance to detect that the child exited, it should be the one to call it so that we don't have to wait for cron (server.hz) to do that. (cherry picked from commit 573246f)

Starting redis 6.0 (part of the TLS feature), diskless master uses pipe from the fork child so that the parent is the one sending data to the replicas. This mechanism has an issue in which a hung replica will cause the master to wait for it to read the data sent to it forever, thus preventing the fork child from terminating and preventing the creations of any other forks. This PR adds a timeout mechanism, much like the ACK-based timeout, we disconnect replicas that aren't reading the RDB file fast enough. (cherry picked from commit d63d026)

Fixes redis#8797 (cherry picked from commit a60016e)

…rSocket (redis#8991) In diskless replication, we create a read pipe for the RDB, between the child and the parent. When we close this pipe (fd), the read handler also needs to be removed from the event loop (if it still registered). Otherwise, next time we will use the same fd, the registration will be fail (panic), because we will use EPOLL_CTL_MOD (the fd still register in the event loop), on fd that already removed from epoll_ctl (cherry picked from commit 501d775)

…7338) Fixes redis#6792. Added support of REDIS_REPLY_SET in raw and csv output of `./redis-cli` Test: run commands to test: ./redis-cli -3 --csv COMMAND ./redis-cli -3 --raw COMMAND Now they are returning resuts, were failing with: "Unknown reply type: 10" before the change. (cherry picked from commit 96bb078)

The `Tracking gets notification of expired keys` test in tracking.tcl used to hung in valgrind CI quite a lot. It turns out the reason is that with valgrind and a busy machine, the server cron active expire cycle could easily run in the same event loop as the command that created `mykey`, so that when they key got expired, there were two change events to broadcast, one that set the key and one that expired it, but since we used raxTryInsert, the client that was associated with the "last" change was the one that created the key, so the NOLOOP filtered that event. This commit adds a test that reproduces the problem by using lazy expire in a multi-exec which makes sure the key expires in the same event loop as the one that added it. (cherry picked from commit 9b564b5)

…d on module AUX data (redis#9199) Currently a replica is able to recover from a short read (when diskless loading is enabled) and avoid crashing/exiting, replying to the master and then the rdb could be sent again by the master for another load attempt by the replica. There were a few scenarios that were not behaving similarly, such as when there is no end-of-file marker, or when module aux data failed to load, which should be allowed to occur due to a short read. (cherry picked from commit f06d782)

… stop sending diff to child in aof rewrite. (redis#8767) In aof rewrite, when parent stop sending data to child, if there is new rewrite data, aofChildWriteDiffData write event will be installed. Then this event is issued and deletes the file event without do anyting. This will happen over and over again until aof rewrite finish. This bug used to waste a few system calls per excessive wake-up (epoll_ctl and epoll_wait) per cycle, each cycle triggered by receiving a write command from a client. (cherry picked from commit cb961d8)

There are two issues fixed in this commit: 1. we want to fail the EXEC command in case there is a watched key that's logically expired but not yet deleted by active expire or lazy expire. 2. we saw that currently cache time is update in every `call()` (including nested calls), this time is being also being use for the isKeyExpired comparison, we want to update the cache time only in the first call (execCommand) Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit ac8b1df)

Adds call to intrev16ifbe to ensure ZIPLIST_LENGTH is compared correctly (cherry picked from commit eaa5271) (cherry picked from commit 4c18123)

When using RESP3, ZPOPMAX/ZPOPMIN should return nested arrays for consistency with other commands (e.g. ZRANGE). We do that only when COUNT argument is present (similarly to how LPOP behaves). for reasoning see redis#8824 (comment) This is a breaking change only when RESP3 is used, and COUNT argument is present! (cherry picked from commit 7f34202) (cherry picked from commit caaad2d)

…e error. (redis#9032) SINTERSTORE would have deleted the dest key right away, even when later on it is bound to fail on an (WRONGTYPE) error. With this change it first picks up all the input keys, and only later delete the dest key if one is empty. Also add more tests for some commands. Mainly focus on - `wrong type error`: expand test case (base on sinter bug) in non-store variant add tests for store variant (although it exists in non-store variant, i think it would be better to have same tests) - the dstkey result when we meet `non-exist key (empty set)` in *store sdiff: - improve test case about wrong type error (the one we found in sinter, although it is safe in sdiff) - add test about using non-exist key (treat it like an empty set) sdiffstore: - according to sdiff test case, also add some tests about `wrong type error` and `non-exist key` - the different is that in sdiffstore, we will consider the `dstkey` result sunion/sunionstore add more tests (same as above) sinter/sinterstore also same as above ... (cherry picked from commit b8a5da8) (cherry picked from commit f4702b8)

…dis#8945) In clusterManagerCommandImport strcat was used to concat COPY and REPLACE, the space maybe not enough. If we use --cluster-replace but not --cluster-copy, the MIGRATE command contained COPY instead of REPLACE. (cherry picked from commit a049f62) (cherry picked from commit d4771a9)

This makes it possible to distinguish between null response and an empty array (currently the tests infra translates both to an empty string/list) (cherry picked from commit 7103367) (cherry picked from commit e04bce2)

…9235) - promote the code in DEBUG PROTOCOL to addReplyBigNum - DEBUG PROTOCOL ATTRIB skips the attribute when client is RESP2 - networking.c addReply for push and attributes generate assertion when called on a RESP2 client, anything else would produce a broken protocol that clients can't handle. (cherry picked from commit 6a5bac3) (cherry picked from commit 7f38aa8)

…dis#7994) Make it possible for redis-cli cluster import to work with source and target that require AUTH. Adding two different flags --cluster-from-user, --cluster-from-pass and --cluster-askpass for source node authentication. Also for target authentication, using existing --user and --pass flag. Example: ./redis-cli --cluster import 127.0.0.1:7000 --cluster-from 127.0.0.1:6379 --pass 1234 --user default --cluster-from-user default --cluster-from-pass 123456 ./redis-cli --cluster import 127.0.0.1:7000 --cluster-from 127.0.0.1:6379 --askpass --cluster-from-user default --cluster-from-askpass (cherry picked from commit 639b73c)

Set TCP keepalive on inbound clusterbus connections to prevent memory leak (cherry picked from commit f03af47)

in case dest key already contains the member, the dest key isn't modified, so the command shouldn't invalidate watch. (cherry picked from commit 11dc4e5)

…edis#9241) Fix module info genModulesInfoStringRenderModulesList lack separator when there's more than one module in the list. Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit 1895e13)

…NT,BITPOS may overflow (see CVE-2021-32761) (redis#9191) GETBIT, SETBIT may access wrong address because of wrap. BITCOUNT and BITPOS may return wrapped results. BITFIELD may access the wrong address but also allocate insufficient memory and segfault (see CVE-2021-32761). This commit uses `uint64_t` or `long long` instead of `size_t`. related redis#8096 At 32bit platform: > setbit bit 4294967295 1 (integer) 0 > config set proto-max-bulk-len 536870913 OK > append bit "\xFF" (integer) 536870913 > getbit bit 4294967296 (integer) 0 When the bit index is larger than 4294967295, size_t can't hold bit index. In the past, `proto-max-bulk-len` is limit to 536870912, so there is no problem. After this commit, bit position is stored in `uint64_t` or `long long`. So when `proto-max-bulk-len > 536870912`, 32bit platforms can still be correct. For 64bit platform, this problem still exists. The major reason is bit pos 8 times of byte pos. When proto-max-bulk-len is very larger, bit pos may overflow. But at 64bit platform, we don't have so long string. So this bug may never happen. Additionally this commit add a test cost `512MB` memory which is tag as `large-memory`. Make freebsd ci and valgrind ci ignore this test. * This test is disabled in this version since bitops doesn't rely on proto-max-bulk-len. some of the overflows can still occur so we do want the fixes. (cherry picked from commit 71d4528)

oranagra and others added 21 commits July 15, 2021 14:07

ReplicationCron: Prevent invalid access to freed pointer (redis#8799)

7b19e2e

Fixes redis#8797 (cherry picked from commit a60016e)

Fix ziplist length updates on bigendian platforms (redis#2080)

a327d76

Adds call to intrev16ifbe to ensure ZIPLIST_LENGTH is compared correctly (cherry picked from commit eaa5271) (cherry picked from commit 4c18123)

Set TCP keepalive on inbound clusterbus connections (redis#9230)

f427c91

Set TCP keepalive on inbound clusterbus connections to prevent memory leak (cherry picked from commit f03af47)

SMOVE only notify dstset when the addition is successful. (redis#9244)

8f8b7fd

in case dest key already contains the member, the dest key isn't modified, so the command shouldn't invalidate watch. (cherry picked from commit 11dc4e5)

Redis 6.0.15

8bfdc69

oranagra requested a review from a team July 21, 2021 15:12

yossigo approved these changes Jul 21, 2021

View reviewed changes

oranagra merged commit e0cf85b into redis:6.0 Jul 21, 2021

oranagra deleted the release-6.0.15 branch July 21, 2021 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 6.0.15 #9266

Release 6.0.15 #9266

oranagra commented Jul 21, 2021 •

edited

Release 6.0.15 #9266

Release 6.0.15 #9266

Conversation

oranagra commented Jul 21, 2021 • edited

oranagra commented Jul 21, 2021 •

edited