Redis 6.2.7 #10653

oranagra · 2022-04-27T10:20:43Z

Upgrade urgency: SECURITY, contains fixes to security issues.

Security Fixes:

(CVE-2022-24736) An attacker attempting to load a specially crafted Lua script
can cause NULL pointer dereference which will result with a crash of the
redis-server process. This issue affects all versions of Redis.
[reported by Aviv Yahav].
(CVE-2022-24735) By exploiting weaknesses in the Lua script execution
environment, an attacker with access to Redis can inject Lua code that will
execute with the (potentially higher) privileges of another Redis user.
[reported by Aviv Yahav].

Potentially Breaking Fixes

LPOP/RPOP with count against non-existing list return null array (LPOP/RPOP with count against non existing list return null array #10095)
LPOP/RPOP used to produce wrong replies when count is 0 (Fixes LPOP/RPOP wrong replies when count is 0 #9692)

Performance and resource utilization improvements

Speed optimization in command execution pipeline (Optimization: Use either monotonic or wall-clock to measure command execution time, to regain up to 4% execution time #10502)
Fix regression in Z[REV]RANGE commands (by-rank) introduced in Redis 6.2 (Optimization: Avoid deferred array reply on ZRANGE commands BYRANK #10337)

Platform / toolchain support related improvements

Fix RSS metrics on NetBSD and OpenBSD (fix zmalloc_get_rss on NetBSD #10116, zmalloc_get_rss openbsd implementation #10149)
Fix OpenSSL 3.0.x related issues (Fix OpenSSL 3.0.x related issues. #10291)

Bug Fixes

Lua: Add checks for min-slave-* configs when evaluating Lua scripts (Add check min-slave-* feature when evaluating Lua scripts and Functions #10160)
Lua: fix crash on a script call with many arguments, a regression in v6.2.6 (Clean Lua stack before parsing call reply to avoid crash on a call with many arguments #9809)
Tracking: Make invalidation messages always after command's reply (Make tracking invalidation messages always after command's reply #9422)
Fix excessive stream trimming due to an overflow (lpGetInteger returns int64_t, avoid overflow #10068)
Add missed error counting for INFO errorstats (add missed error counting #9646)
Fix geo search bounding box check causing missing results (Fix geo search bounding box check causing missing results #10018)
Improve EXPIRE TTL overflow detection (Add tests to cover EXPIRE overflow detection fix #9839)
Modules: Fix thread safety violation when a module thread adds an error reply, broken in 6.2 (Fix and improve module error reply statistics #10278)
Modules: Fix missing and duplicate error stats (Fix and improve module error reply statistics #10278)
Module APIs: release clients blocked on module commands in cluster resharding
and down state (Release clients blocked on module commands in cluster resharding and down state #9483)
Sentinel: Fix memory leak with TLS (Sentinel tls memory leak #9753)
Sentinel: Fix issues with hostname support (Fixed Sentinel support for hostnames and dynamic IP addresses #10146)
Sentinel: Fix election failures on certain container environments (Add entropy to srand (and fix Sentinel failures) #10197)

The new feature can be turned off and on using the new `lua_enablereadonlytable` Lua API. (cherry picked from commit 92b5098b87e2d0880a530899119524bf1dfbc332)

Today, Redis wrap the user Lua code with a Lua function. For example, assuming the user code is: ``` return redis.call('ping') ``` The actual code that would have sent to the Lua interpreter was: ``` f_b3a02c833904802db9c34a3cf1292eee3246df3c() return redis.call('ping') end ``` The wraped code would have been saved on the global dictionary with the following name: `f_<script sha>` (in our example `f_b3a02c833904802db9c34a3cf1292eee3246df3c`). This approach allows one user to easily override the implementation a another user code, example: ``` f_b3a02c833904802db9c34a3cf1292eee3246df3c = function() return 'hacked' end ``` Running the above code will cause `evalsha b3a02c833904802db9c34a3cf1292eee3246df3c 0` to return hacked although it should have returned `pong`. Another disadventage is that Redis basically runs code on the loading (compiling) phase without been aware of it. User can do code injection like this: ``` return 1 end <run code on compling phase> function() return 1 ``` The wraped code will look like this and the entire `<run code on compling phase>` block will run outside of eval or evalsha context: ``` f_<sha>() return 1 end <run code on compling phase> function() return 1 end ```

Use the new `lua_enablereadonlytable` Lua API to protect the global tables of evals scripts. The implemetation is easy, we simply call `lua_enablereadonlytable` on the global table to turn it into a readonly table.

…llow list. The allow list is done by setting a metatable on the global table before initializing any library. The metatable set the `__newindex` field to a function that check the allow list before adding the field to the table. Fields which is not on the allow list are simply ignored. After initialization phase is done we protect the global table and each table that might be reachable from the global table. For each table we also protect the table metatable if exists.

* Add keyname tags to avoid CROSSSLOT errors in external server CI * Use new wait_for_blocked_clients_count in pause.tcl (cherry picked from commit 5dddf49)

…is#9422) Tracking invalidation messages were sometimes sent in inconsistent order, before the command's reply rather than after. In addition to that, they were sometimes embedded inside other commands responses, like MULTI-EXEC and MGET. (cherry picked from commit fd135f3)

Cherry pick a more complete fix to 0215324 that also doesn't leak memory from latest hiredis. (cherry picked from commit 922ef86)

* add: add missed error counting in sentinel.c and cluster.c (cherry picked from commit aa6deff)

…down state (redis#9483) Prevent clients from being blocked forever in cluster when they block with their own module command and the hash slot is migrated to another master at the same time. These will get a redirection message when unblocked. Also, release clients blocked on module commands when cluster is down (same as other blocked clients) This commit adds basic tests for the main (non-cluster) redis test infra that test the cluster. This was done because the cluster test infra can't handle some common test features, but most importantly we only build the test modules with the non-cluster test suite. note that rather than really supporting cluster operations by the test infra, it was added (as dup code) in two files, one for module tests and one for non-modules tests, maybe in the future we'll refactor that. Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit 4962c55)

Following redis#9483 the daily CI exposed a few problems. * The cluster creation code (uses redis-cli) is complicated to test with TLS enabled. for now i'm just skipping them since the tests we run there don't really need that kind of coverage * cluster port binding failures note that `find_available_port` already looks for a free cluster port but the code in `wait_server_started` couldn't detect the failure of binding (the text it greps for wasn't found in the log) (cherry picked from commit 7d6744c)

redis#9705) (cherry picked from commit 734cde7)

Introduced in redis#8179, this fixes the command's replies in the 0 count edge case. [BREAKING] changes the reply type when count is 0 to an empty array (instead of nil) Moves LPOP ... 0 fast exit path after type check to reply with WRONGTYPE (cherry picked from commit 06dd202)

There was a memory leak when tls is used in Sentinels. The memory leak is noticed when some of the replicas are offline. (cherry picked from commit 2ce29e0)

If the last bytes in ziplist are corrupt and we decode from tail to head, we may reach slightly outside the ziplist. (cherry picked from commit a3a0142)

In redis#8287, some overflow checks have been added. But when `when *= 1000` overflows, it will become a positive number. And the check not able to catch it. The key will be added with a short expiration time and will deleted a few seconds later. In redis#9601, will check the overflow after `*=` and return an error first, and avoiding this situation. In this commit, added some tests to cover those code paths. Found it in redis#9825, and close it. (cherry picked from commit 9273d09)

…th many arguments (redis#9809) This commit 0f8b634 (CVE-2021-32626 released in 6.2.6, 6.0.16, 5.0.14) fixes an invalid memory write issue by using `lua_checkstack` API to make sure the Lua stack is not overflow. This fix was added on 3 places: 1. `luaReplyToRedisReply` 2. `ldbRedis` 3. `redisProtocolToLuaType` On the first 2 functions, `lua_checkstack` is handled gracefully while the last is handled with an assert and a statement that this situation can not happened (only with misbehave module): > the Redis reply might be deep enough to explode the LUA stack (notice that currently there is no such command in Redis that returns such a nested reply, but modules might do it) The issue that was discovered is that user arguments is also considered part of the stack, and so the following script (for example) make the assertion reachable: ``` local a = {} for i=1,7999 do a[i] = 1 end return redis.call("lpush", "l", unpack(a)) ``` This is a regression because such a script would have worked before and now its crashing Redis. The solution is to clear the function arguments from the Lua stack which makes the original assumption true and the assertion unreachable. (cherry picked from commit 6b0b04f)

a rare case of short read that can happen when breaking the master-replica connection on diskless load mode, (cherry picked from commit 9f9c785)

…s#9889) When an invalid listpack entry starts with EOF, we will skip it when we verify it in the loop. (cherry picked from commit 1808618)

(cherry picked from commit c40d23b)

…d indefinitely (redis#10032) Now if redis is still loading when we receive sigterm, we will wait for the loading to reach the event loop (once in 2mb) before actually shutting down. See redis#10003. This change caused valgrind CI to fail. See https://github.com/redis/redis/runs/4662901673?check_suite_focus=true This pr is mainly to solve the problem that redis process cannot be exited normally. When the master is disconnected, if repl is processing diskless loading and using `connRead` to read data from master, it may enter an infinite retry state, which does not handle `connRead` returning 0(master connection disconnected). (cherry picked from commit 73951ab)

Older version of GNU Make (<4.3) required quoting of number signs (#) to avoid them being treated as a comment. Newer versions will treat this quote as a literal. This issue and a proposed solution is discussed here: https://lists.gnu.org/archive/html/info-gnu/2020-01/msg00004.html Co-authored-by: Yossi Gottlieb <yossigo@gmail.com> (cherry picked from commit 747b08b)

Fix redis#9410 Crucial for the ms and sequence deltas, but I changed all calls, just in case (e.g. "flags") Before this commit: `ms_delta` and `seq_delta` could have overflown, causing `currid` to be wrong, which in turn would cause `streamTrim` to trim the entire rax node (see new test) (cherry picked from commit 7cd6a64)

…is#10095) It used to return `$-1` in RESP2, now we will return `*-1`. This is a bug in redis 6.2 when COUNT was added, the `COUNT` option was introduced in redis#8179. Fix redis#10089. the documentation of [LPOP](https://redis.io/commands/lpop) says ``` When called without the count argument: Bulk string reply: the value of the first element, or nil when key does not exist. When called with the count argument: Array reply: list of popped elements, or nil when key does not exist. ``` (cherry picked from commit 39feee8)

Seems like the previous implementation was broken (always returning 0) since kinfo_proc2 is used the KERN_PROC2 sysctl oid is more appropriate and also the query's length was not necessarily accurate (6 here). (cherry picked from commit 50fa627)

Sentinel tries to resolve instances hostname to IP only during registration. It might be that the instance is unavailable during that time, such as leader crashed and failover took place. Yet, promoted replica must support: - Register leader, even if it fails to resolve its hostname during failover - Try later to resolve it, if instance is disconnected. Note that this condition also support ip-change of an instance. (cherry picked from commit 79f089b)

As Sentinel relies upon consensus algorithm, all sentinel instances, randomize a time to initiate their next attempt to become the leader of the group. But time after time, all raffled the same value. The problem is in the line `srand(time(NULL)^getpid())` such that all spinned up containers get same time (in seconds) and same pid which is always 1. Added material `tv_usec` and verify that even consecutive calls brings different values and makes the difference. (cherry picked from commit 52b2fbe)

`PSYNC replicationid str_offset` will crash the server. The reason is in `masterTryPartialResynchronization`, we will call `getLongLongFromObjectOrReply` check the offset. With a wrong offset, it will add a reply and then trigger a full SYNC and the client become a replica. So crash in `c->bufpos == 0 && listLength(c->reply) == 0`. In this commit, we check the psync_offset before entering function `masterTryPartialResynchronization`, and return. Regardless of that crash, accepting the sync, but also replying with an error would have corrupt the replication stream. (cherry picked from commit 344e41c)

The protocol error was caused by the buggy `writeHandler` in `redis-benchmark.c`, which didn't handle one of the cases, thereby repeating data, leading to protocol errors when the values being sent are very long. This PR fixes redis#10233, issue introduced by redis#7959 (cherry picked from commit bb87560)

The theory is that a replica gets disconnected from within REPLCONF ACK, so when we go up the stack, we'll crash when attempting to access c->cmd->flags (cherry picked from commit aa9beac)

This PR handles several aspects 1. Calls to RM_ReplyWithError from thread safe contexts don't violate thread safety. 2. Errors returning from RM_Call to the module aren't counted in the statistics (they might be handled silently by the module) 3. When a module propagates a reply it got from RM_Call to it's client, then the error statistics are counted. This is done by: 1. When appending an error reply to the output buffer, we avoid updating the global error statistics, instead we cache that error in a deferred list in the client struct. 2. When creating a RedisModuleCallReply object, the deferred error list is moved from the client into that object. 3. when a module calls RM_ReplyWithCallReply we copy the deferred replies to the dest client (if that's a real client, then that's when the error statistics are updated to the server) Note about RM_ReplyWithCallReply: if the original reply had an array with errors, and the module replied with just a portion of the original reply, and not the entire reply, the errors are currently not propagated and the errors stats will not get propagated. Fix redis#10180 (cherry picked from commit b099889)

* Drop obsolete initialization calls. * Use decoder API for DH parameters. * Enable auto DH parameters if not explicitly used, which should be the preferred configuration going forward. (cherry picked from commit 3881f78)

Consider the following example: 1. geoadd k1 -0.15307903289794921875 85 n1 0.3515625 85.00019260486917005437 n2. 2. geodist k1 n1 n2 returns "4891.9380" 3. but GEORADIUSBYMEMBER k1 n1 4891.94 m only returns n1. n2 is in the boundingbox but out of search areas.So we let search areas contain boundingbox to get n2. Co-authored-by: Binbin <binloveplay1314@qq.com> (cherry picked from commit b2d393b)

…edis#10334) Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects. In some pipelined workloads this achieves about 10% improvement. Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit b857928)

…edis#10337) Avoid deferred array reply on genericZrangebyrankCommand() when consumer type is client. I.e. any ZRANGE / ZREVRNGE (when tank is used). This was a performance regression introduced in redis#7844 (v 6.2) mainly affecting pipelined workloads. Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit 1dc89e2)

…edis#10353) * Fix memory leak in RM_StreamIteratorStop * Fix memory leak in moduleFreeKeyIterator (cherry picked from commit dff153f)

When vlen = sizeof(buf), the statement buf[vlen] = '\0' accessing the buffer buf is an off by one error. (cherry picked from commit 08aed7e)

Avoid printing "Killed by PID" when si_code != SI_USER. Apparently SI_USER isn't always set to 0. e.g. on Mac it's 0x10001 and the check that did <= was wrong. (cherry picked from commit 6761d10)

Partial cherry pick from redis#9601 in order for the tests in redis#9601 to pass (cherry picked from commit b91d8b2)

…leapi test (redis#9499) Before redis#9497, before redis-server was shut down, we did not manually shut down all the clients, which would have prevented valgrind from detecting a memory leak in the client's argc. (cherry picked from commit 1376d83)

(cherry picked from commit 8bf433d)

…xecution time, to regain up to 4% execution time (redis#10502) In redis#7491 (part of redis 6.2), we started using the monotonic timer instead of mstime to measure command execution time for stats, apparently this meant sampling the clock 3 times per command rather than two (wince we also need the wall-clock time). In some cases this causes a significant overhead. This PR fixes that by avoiding the use of monotonic timer, except for the cases were we know it should be extremely fast. This PR also adds a new INFO field called `monotonic_clock` that shows which clock redis is using. Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit 3cd8baf)

oranagra · 2022-04-27T10:23:40Z

closing to trigger GH Action again

MeirShpilraien and others added 30 commits April 11, 2022 13:15

Added support for Lua readonly tables.

eeaf692

The new feature can be turned off and on using the new `lua_enablereadonlytable` Lua API. (cherry picked from commit 92b5098b87e2d0880a530899119524bf1dfbc332)

Protect globals of evals scripts.

c67431e

Use the new `lua_enablereadonlytable` Lua API to protect the global tables of evals scripts. The implemetation is easy, we simply call `lua_enablereadonlytable` on the global table to turn it into a readonly table.

Add missing pause tcl test to test_helper.tcl (redis#9158)

08eff24

* Add keyname tags to avoid CROSSSLOT errors in external server CI * Use new wait_for_blocked_clients_count in pause.tcl (cherry picked from commit 5dddf49)

hiredis: improve calloc() overflow fix. (redis#9630)

4a3ac0a

Cherry pick a more complete fix to 0215324 that also doesn't leak memory from latest hiredis. (cherry picked from commit 922ef86)

add missed error counting (redis#9646)

5295671

* add: add missed error counting in sentinel.c and cluster.c (cherry picked from commit aa6deff)

Fix memory leak when there's a read error of module aux data from rdb. (

9ad45ba

redis#9705) (cherry picked from commit 734cde7)

Sentinel tls memory leak (redis#9753)

d8da38c

There was a memory leak when tls is used in Sentinels. The memory leak is noticed when some of the replicas are offline. (cherry picked from commit 2ce29e0)

fix invalid read on corrupt ziplist (redis#9831)

95346cd

If the last bytes in ziplist are corrupt and we decode from tail to head, we may reach slightly outside the ziplist. (cherry picked from commit a3a0142)

fixed mem leak on rdb load error (redis#9860)

30ddd58

a rare case of short read that can happen when breaking the master-replica connection on diskless load mode, (cherry picked from commit 9f9c785)

Santize dump payload: fix invalid listpack entry start with EOF (redi…

1bad37f

…s#9889) When an invalid listpack entry starts with EOF, we will skip it when we verify it in the loop. (cherry picked from commit 1808618)

Redact ACL SETUSER arguments if the user has spaces (redis#9935)

3bff6f2

(cherry picked from commit c40d23b)

Attempt to fix a rare crash in cluster tests. (redis#10265)

a61fdb6

The theory is that a replica gets disconnected from within REPLCONF ACK, so when we go up the stack, we'll crash when attempting to access c->cmd->flags (cherry picked from commit aa9beac)

yossigo and others added 15 commits April 12, 2022 15:48

Fix OpenSSL 3.0.x related issues. (redis#10291)

1a70c3f

* Drop obsolete initialization calls. * Use decoder API for DH parameters. * Enable auto DH parameters if not explicitly used, which should be the preferred configuration going forward. (cherry picked from commit 3881f78)

Fix memory leak in RM_StreamIteratorStop and moduleFreeKeyIterator (r…

820bd5a

…edis#10353) * Fix memory leak in RM_StreamIteratorStop * Fix memory leak in moduleFreeKeyIterator (cherry picked from commit dff153f)

Fix an off by one error in zzlStrtod (redis#10465)

04cae6a

When vlen = sizeof(buf), the statement buf[vlen] = '\0' accessing the buffer buf is an off by one error. (cherry picked from commit 08aed7e)

crash log, print killer pid only when si_code is SI_USER (redis#10454)

0ecc01b

Avoid printing "Killed by PID" when si_code != SI_USER. Apparently SI_USER isn't always set to 0. e.g. on Mac it's 0x10001 and the check that did <= was wrong. (cherry picked from commit 6761d10)

test suite improvements pulled back from 7.0 for cherry picked commits

142e464

Fix overflow check in expireGenericCommand

ffcfd5c

Partial cherry pick from redis#9601 in order for the tests in redis#9601 to pass (cherry picked from commit b91d8b2)

Clean unused var compiler warning in module test. (redis#9289)

1aa4507

(cherry picked from commit 8bf433d)

Bring some CI adjustments from 7.0 into 6.2

10d4047

Whitelist Lua print function to avoid breaking change in old releases

e9b05d8

Redis 6.2.7

3b45e05

oranagra changed the base branch from unstable to 6.2 April 27, 2022 10:20

oranagra closed this Apr 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis 6.2.7 #10653

Redis 6.2.7 #10653

oranagra commented Apr 27, 2022

oranagra commented Apr 27, 2022

Redis 6.2.7 #10653

Redis 6.2.7 #10653

Conversation

oranagra commented Apr 27, 2022

Potentially Breaking Fixes

Performance and resource utilization improvements

Platform / toolchain support related improvements

Bug Fixes

oranagra commented Apr 27, 2022