Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis 6.2.7 #10653

Closed
wants to merge 45 commits into from
Closed

Redis 6.2.7 #10653

wants to merge 45 commits into from

Conversation

oranagra
Copy link
Member

Upgrade urgency: SECURITY, contains fixes to security issues.

Security Fixes:

  • (CVE-2022-24736) An attacker attempting to load a specially crafted Lua script
    can cause NULL pointer dereference which will result with a crash of the
    redis-server process. This issue affects all versions of Redis.
    [reported by Aviv Yahav].
  • (CVE-2022-24735) By exploiting weaknesses in the Lua script execution
    environment, an attacker with access to Redis can inject Lua code that will
    execute with the (potentially higher) privileges of another Redis user.
    [reported by Aviv Yahav].

Potentially Breaking Fixes

Performance and resource utilization improvements

Platform / toolchain support related improvements

Bug Fixes

MeirShpilraien and others added 30 commits April 11, 2022 13:15
The new feature can be turned off and on using the new `lua_enablereadonlytable` Lua API.

(cherry picked from commit 92b5098b87e2d0880a530899119524bf1dfbc332)
Today, Redis wrap the user Lua code with a Lua function.
For example, assuming the user code is:

```
return redis.call('ping')
```

The actual code that would have sent to the Lua interpreter was:

```
f_b3a02c833904802db9c34a3cf1292eee3246df3c() return redis.call('ping') end
```

The wraped code would have been saved on the global dictionary with the
following name: `f_<script sha>` (in our example `f_b3a02c833904802db9c34a3cf1292eee3246df3c`).

This approach allows one user to easily override the implementation a another user code, example:

```
f_b3a02c833904802db9c34a3cf1292eee3246df3c = function() return 'hacked' end
```

Running the above code will cause `evalsha b3a02c833904802db9c34a3cf1292eee3246df3c 0` to return
hacked although it should have returned `pong`.

Another disadventage is that Redis basically runs code on the loading (compiling) phase without been
aware of it. User can do code injection like this:

```
return 1 end <run code on compling phase> function() return 1
```

The wraped code will look like this and the entire `<run code on compling phase>` block will run outside
of eval or evalsha context:

```
f_<sha>() return 1 end <run code on compling phase> function() return 1 end
```
Use the new `lua_enablereadonlytable` Lua API to protect the global tables of
evals scripts. The implemetation is easy, we simply call `lua_enablereadonlytable`
on the global table to turn it into a readonly table.
…llow list.

The allow list is done by setting a metatable on the global table before initializing
any library. The metatable set the `__newindex` field to a function that check
the allow list before adding the field to the table. Fields which is not on the
allow list are simply ignored.

After initialization phase is done we protect the global table and each table
that might be reachable from the global table. For each table we also protect
the table metatable if exists.
* Add keyname tags to avoid CROSSSLOT errors in external server CI
* Use new wait_for_blocked_clients_count in pause.tcl

(cherry picked from commit 5dddf49)
…is#9422)

Tracking invalidation messages were sometimes sent in inconsistent order,
before the command's reply rather than after.
In addition to that, they were sometimes embedded inside other commands
responses, like MULTI-EXEC and MGET.

(cherry picked from commit fd135f3)
Cherry pick a more complete fix to 0215324 that also doesn't leak
memory from latest hiredis.

(cherry picked from commit 922ef86)
* add: add missed error counting in sentinel.c and cluster.c

(cherry picked from commit aa6deff)
…down state (redis#9483)

Prevent clients from being blocked forever in cluster when they block with their own module command
and the hash slot is migrated to another master at the same time.
These will get a redirection message when unblocked.
Also, release clients blocked on module commands when cluster is down (same as other blocked clients)

This commit adds basic tests for the main (non-cluster) redis test infra that test the cluster.
This was done because the cluster test infra can't handle some common test features,
but most importantly we only build the test modules with the non-cluster test suite.

note that rather than really supporting cluster operations by the test infra, it was added (as dup code)
in two files, one for module tests and one for non-modules tests, maybe in the future we'll refactor that.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 4962c55)
Following redis#9483 the daily CI exposed a few problems.

* The cluster creation code (uses redis-cli) is complicated to test with TLS enabled.
  for now i'm just skipping them since the tests we run there don't really need that kind of coverage
* cluster port binding failures
  note that `find_available_port` already looks for a free cluster port
  but the code in `wait_server_started` couldn't detect the failure of binding
  (the text it greps for wasn't found in the log)

(cherry picked from commit 7d6744c)
Introduced in redis#8179, this fixes the command's replies in the 0 count edge case.
[BREAKING] changes the reply type when count is 0 to an empty array (instead of nil)
Moves LPOP ... 0 fast exit path after type check to reply with WRONGTYPE

(cherry picked from commit 06dd202)
There was a memory leak when tls is used in Sentinels.
The memory leak is noticed when some of the replicas are offline.

(cherry picked from commit 2ce29e0)
If the last bytes in ziplist are corrupt and we decode from tail to head,
we may reach slightly outside the ziplist.

(cherry picked from commit a3a0142)
In redis#8287, some overflow checks have been added. But when
`when *= 1000` overflows, it will become a positive number.
And the check not able to catch it. The key will be added with
a short expiration time and will deleted a few seconds later.

In redis#9601, will check the overflow after `*=` and return an
error first, and avoiding this situation.

In this commit, added some tests to cover those code paths.
Found it in redis#9825, and close it.

(cherry picked from commit 9273d09)
…th many arguments (redis#9809)

This commit 0f8b634 (CVE-2021-32626 released in 6.2.6, 6.0.16, 5.0.14)
fixes an invalid memory write issue by using `lua_checkstack` API to make
sure the Lua stack is not overflow. This fix was added on 3 places:
1. `luaReplyToRedisReply`
2. `ldbRedis`
3. `redisProtocolToLuaType`

On the first 2 functions, `lua_checkstack` is handled gracefully while the
last is handled with an assert and a statement that this situation can
not happened (only with misbehave module):

> the Redis reply might be deep enough to explode the LUA stack (notice
that currently there is no such command in Redis that returns such a nested
reply, but modules might do it)

The issue that was discovered is that user arguments is also considered part
of the stack, and so the following script (for example) make the assertion reachable:
```
local a = {}
for i=1,7999 do
    a[i] = 1
end
return redis.call("lpush", "l", unpack(a))
```

This is a regression because such a script would have worked before and now
its crashing Redis. The solution is to clear the function arguments from the Lua
stack which makes the original assumption true and the assertion unreachable.

(cherry picked from commit 6b0b04f)
a rare case of short read that can happen when breaking the master-replica
connection on diskless load mode,

(cherry picked from commit 9f9c785)
…s#9889)

When an invalid listpack entry starts with EOF, we will skip it when we verify it in the loop.

(cherry picked from commit 1808618)
…d indefinitely (redis#10032)

Now if redis is still loading when we receive sigterm, we will wait for the loading to reach the event
loop (once in 2mb) before actually shutting down. See redis#10003.

This change caused valgrind CI to fail.
See https://github.com/redis/redis/runs/4662901673?check_suite_focus=true

This pr is mainly to solve the problem that redis process cannot be exited normally.
When the master is disconnected, if repl is processing diskless loading and using `connRead` to read data from master,
it may enter an infinite retry state, which does not handle `connRead` returning 0(master connection disconnected).

(cherry picked from commit 73951ab)
Older version of GNU Make (<4.3) required quoting of number signs (#) to
avoid them being treated as a comment. Newer versions will treat this
quote as a literal.

This issue and a proposed solution is discussed here:
https://lists.gnu.org/archive/html/info-gnu/2020-01/msg00004.html

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
(cherry picked from commit 747b08b)
Fix redis#9410

Crucial for the ms and sequence deltas, but I changed all
calls, just in case (e.g. "flags")

Before this commit:
`ms_delta` and `seq_delta` could have overflown, causing `currid` to be wrong,
which in turn would cause `streamTrim` to trim the entire rax node (see new test)

(cherry picked from commit 7cd6a64)
…is#10095)

It used to return `$-1` in RESP2, now we will return `*-1`.
This is a bug in redis 6.2 when COUNT was added, the `COUNT`
option was introduced in redis#8179. Fix redis#10089.

the documentation of [LPOP](https://redis.io/commands/lpop) says
```
When called without the count argument:
Bulk string reply: the value of the first element, or nil when key does not exist.

When called with the count argument:
Array reply: list of popped elements, or nil when key does not exist.
```

(cherry picked from commit 39feee8)
Seems like the previous implementation was broken (always returning 0)

since kinfo_proc2 is used the KERN_PROC2 sysctl oid is more appropriate
and also the query's length was not necessarily accurate (6 here).

(cherry picked from commit 50fa627)
Sentinel tries to resolve instances hostname to IP only during registration.
It might be that the instance is unavailable during that time, such as
leader crashed and failover took place. Yet, promoted replica must support:

 - Register leader, even if it fails to resolve its hostname during failover
 - Try later to resolve it, if instance is disconnected. Note that
   this condition also support ip-change of an instance.

(cherry picked from commit 79f089b)
As Sentinel relies upon consensus algorithm, all sentinel instances,
randomize a time to initiate their next attempt to become the
leader of the group. But time after time, all raffled the same value.

The problem is in the line `srand(time(NULL)^getpid())` such that
all spinned up containers get same time (in seconds) and same pid
which is always 1. Added material `tv_usec` and verify that even
consecutive calls brings different values and makes the difference.

(cherry picked from commit 52b2fbe)
`PSYNC replicationid str_offset` will crash the server.

The reason is in `masterTryPartialResynchronization`,
we will call `getLongLongFromObjectOrReply` check the
offset. With a wrong offset, it will add a reply and
then trigger a full SYNC and the client become a replica.

So crash in `c->bufpos == 0 && listLength(c->reply) == 0`.
In this commit, we check the psync_offset before entering
function `masterTryPartialResynchronization`, and return.

Regardless of that crash, accepting the sync, but also replying
with an error would have corrupt the replication stream.

(cherry picked from commit 344e41c)
The protocol error was caused by the buggy `writeHandler` in `redis-benchmark.c`,
which didn't handle one of the cases, thereby repeating data, leading to protocol errors
when the values being sent are very long.

This PR fixes redis#10233, issue introduced by redis#7959

(cherry picked from commit bb87560)
The theory is that a replica gets disconnected from within REPLCONF ACK,
so when we go up the stack, we'll crash when attempting to access
c->cmd->flags

(cherry picked from commit aa9beac)
This PR handles several aspects
1. Calls to RM_ReplyWithError from thread safe contexts don't violate thread safety.
2. Errors returning from RM_Call to the module aren't counted in the statistics (they
  might be handled silently by the module)
3. When a module propagates a reply it got from RM_Call to it's client, then the error
  statistics are counted.

This is done by:
1. When appending an error reply to the output buffer, we avoid updating the global
  error statistics, instead we cache that error in a deferred list in the client struct.
2. When creating a RedisModuleCallReply object, the deferred error list is moved from
  the client into that object.
3. when a module calls RM_ReplyWithCallReply we copy the deferred replies to the dest
  client (if that's a real client, then that's when the error statistics are updated to the server)

Note about RM_ReplyWithCallReply: if the original reply had an array with errors, and the module
replied with just a portion of the original reply, and not the entire reply, the errors are currently not
propagated and the errors stats will not get propagated.

Fix redis#10180

(cherry picked from commit b099889)
yossigo and others added 15 commits April 12, 2022 15:48
* Drop obsolete initialization calls.
* Use decoder API for DH parameters.
* Enable auto DH parameters if not explicitly used, which should be the
  preferred configuration going forward.

(cherry picked from commit 3881f78)
Consider the following example:
1. geoadd k1 -0.15307903289794921875 85 n1 0.3515625 85.00019260486917005437 n2.
2. geodist k1 n1 n2 returns  "4891.9380"
3. but GEORADIUSBYMEMBER k1 n1 4891.94 m only returns n1.
n2 is in the  boundingbox but out of search areas.So we let  search areas contain boundingbox to get n2.

Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit b2d393b)
…edis#10334)

Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects.
In some pipelined workloads this achieves about 10% improvement.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit b857928)
…edis#10337)

Avoid deferred array reply on genericZrangebyrankCommand() when consumer type is client.
I.e. any ZRANGE / ZREVRNGE (when tank is used).
This was a performance regression introduced in redis#7844 (v 6.2) mainly affecting pipelined workloads.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 1dc89e2)
…edis#10353)

* Fix memory leak in RM_StreamIteratorStop
* Fix memory leak in moduleFreeKeyIterator

(cherry picked from commit dff153f)
When vlen = sizeof(buf), the statement buf[vlen] = '\0' accessing the buffer buf is an off by one error.

(cherry picked from commit 08aed7e)
Avoid printing "Killed by PID" when si_code != SI_USER.
Apparently SI_USER isn't always set to 0. e.g. on Mac it's 0x10001 and the check that did <= was wrong.

(cherry picked from commit 6761d10)
Partial cherry pick from redis#9601 in order for the tests in redis#9601 to pass

(cherry picked from commit b91d8b2)
…leapi test (redis#9499)

Before redis#9497, before redis-server was shut down, we did not manually shut down all the clients,
which would have prevented valgrind from detecting a memory leak in the client's argc.

(cherry picked from commit 1376d83)
…xecution time, to regain up to 4% execution time (redis#10502)

In redis#7491 (part of redis 6.2), we started using the monotonic timer instead of mstime to measure
command execution time for stats, apparently this meant sampling the clock 3 times per command
rather than two (wince we also need the wall-clock time).
In some cases this causes a significant overhead.

This PR fixes that by avoiding the use of monotonic timer, except for the cases were we know it
should be extremely fast.
This PR also adds a new INFO field called `monotonic_clock` that shows which clock redis is using.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 3cd8baf)
@oranagra oranagra changed the base branch from unstable to 6.2 April 27, 2022 10:20
@oranagra
Copy link
Member Author

closing to trigger GH Action again

@oranagra oranagra closed this Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet