2. Redis eval命令踩得那些坑 #7

nethibernate · 2019-04-07T10:37:10Z

当业务复杂到一定程度的时候，有时候使用eval来执行一个lua script会更加的高效，但是在网上搜索到的很多关于redis执行lua script的内容，都是千篇一律，不是讲一些入门的东西，就是写如何加锁，这些东西对我们项目实际的使用贡献都不大，所以这里就记录一下我们自己在使用eval执行lua script的时候，踩的那些深一脚浅一脚的坑。
（下面用到的redis是在Ubuntu上用apt安装的redis 4.x)

get命令获得的东西判断nil的坑

这是我们踩的第一个坑，下面是我们执行的一个lua script（我们抽出了最能说明问题的部分做了简化）和它在redis上的执行结果：

eval "local a = redis.call('get', 'test'); if a == nil then return 0; else return 1; end" 0
(integer) 1

首先我们对天发誓此时的redis是空的，也就是不存在test这个key，那么按理说应该返回nil，也就是说上面的执行结果应该是0才对，但是redis给出了1这个匪夷所思的结果。

经过我们google的结果，我们参考了stackoverflow上的这个问题以及它里面提到的官方的这篇文章，我们发现了下面这个“潜规则”：

Redis Nil bulk reply and Nil multi bulk reply -> Lua false boolean type

也就是说，在lua这边，redis会将nil转换成lua里boolean的false值。根据这个结果，我们重新调整了上面的lua script如下：

eval "local a = redis.call('get', 'test'); if a == false then return 0; else return 1; end" 0
(integer) 0

这次我们得到了期望的0。

结论：在lua script中判断某个key是否存在，需要经行false判断，不能和nil作比较！

redis到lua数据的类型转换

这个问题把我们坑的不浅，看下面的例子：

127.0.0.1:6379> set test 2
OK
127.0.0.1:6379> object encoding test
"int"

可以看到，我们将key test设置成了整型2，查看它的encoding发现也是int类型，那么我们在lua script中将test get成一个变量，感觉应该也是number类型吧，但是现实是残酷的：

127.0.0.1:6379> eval "local a = redis.call('get', 'test'); return type(a);" 0
"string"

我们得到的是string类型。这是为啥嘞？
这个问题我们google了很久，最后查看了很长时间的源码，后来发现官方的文档其实已经说明了这个问题，但是不是那么明显。在官方的这篇文档中，提到了Redis将会怎样把类型映射到lua中：

Redis to Lua conversion table.

Redis integer reply -> Lua number

Redis bulk reply -> Lua string

Redis multi bulk reply -> Lua table (may have other Redis data types nested)

Redis status reply -> Lua table with a single ok field containing the status

Redis error reply -> Lua table with a single err field containing the error

Redis Nil bulk reply and Nil multi bulk reply -> Lua false boolean type

注意，不要被这个简单的介绍给糊弄过去。我们就是理解错了里面的意思，花了很长时间去扒源码。
官方文档中提到的这个table，有一个很重要的信息，就是redis对于类型的转换，是针对每一个命令的，和key本身是个啥没有关系，这就是上面每一条对应的都是一个XXX reply。
大家可以查看一下官方任意一个command文档，就可以看到每个key的reply类型，官方command文档都有提到，比如让我们困惑的上面例子中get命令，官方文档是这么写的：

Return value

Bulk string reply: the value of key, or nil when key does not exist.

看到了么，get命令的返回就是固定的Bulk string reply，对应的就是lua string。

最后，我们截取了部分redis的源码，来佐证上面提到的逻辑。在源码scripting.c中，有一个转换函数如下：

/* Take a Redis reply in the Redis protocol format and convert it into a
* Lua type. Thanks to this function, and the introduction of not connected
* clients, it is trivial to implement the redis() lua function.
* ...
*/
char *redisProtocolToLuaType(lua_State *lua, char* reply) {
    char *p = reply;
    switch(*p) {
    case ':': p = redisProtocolToLuaType_Int(lua,reply); break;
    case '$': p = redisProtocolToLuaType_Bulk(lua,reply); break;
    case '+': p = redisProtocolToLuaType_Status(lua,reply); break;
    case '-': p = redisProtocolToLuaType_Error(lua,reply); break;
    case '*': p = redisProtocolToLuaType_MultiBulk(lua,reply); break;
    }
    return p;
}

这里提到的，是根据redis里reply的protocol进行转换的，而这个protocol，可以在官方的这篇文章中得到以下解释：

In RESP, the type of some data depends on the first byte:

For Simple Strings the first byte of the reply is "+"

For Errors the first byte of the reply is "-"

For Integers the first byte of the reply is ":"

For Bulk Strings the first byte of the reply is "$"

For Arrays the first byte of the reply is "*"

结论：lua内获得的redis的数据，不根据key的类型决定，而是根据key的reply决定。另外多提一句，eval中传入的ARGV数组，redis官方全部都是作为string来处理的！

lua script在cluster中执行的目标机器

在我们琢磨用lua的时候，设计、编写等都是在自己开发环境里处理的，所以压根考虑不到在cluster中的问题，但是一旦实际上线运行就挂B了。

我们都知道，在redis cluster环境中，key是按照slot槽来存储的，而不同的slot槽又是存储在不同的机器上的，那当我们运行的一个lua script涉及到多个key时，到底由哪个机器来执行呢？

这个坑里涉及到的问题稍微有些杂，所以我们分开说。

首先是key怎么写的问题。我们知道，eval命令后面会跟一个key的列表，但是同样，命令没有禁止我们把key直接写到lua script里，那么我们在实际使用的时候，该怎么选择呢？

下面这段是来自Redis官方的eval命令的文档：

All Redis commands must be analyzed before execution to determine which keys the command will operate on. In order for this to be true for EVAL, keys must be passed explicitly. This is useful in many ways, but especially to make sure Redis Cluster can forward your request to the appropriate cluster node.

加粗的那句话表明，为了能让redis正确的搞明白key到底该怎么执行，需要显式的传进去，也就是说，我们需要用eval的key的参数列表来传入我们要操作的key，而不能把它直接写到lua script里。

接下来的一个问题就是，如果我们有多个key，redis到底会把lua script放到哪个机器上去执行？
我们其实可以自己去尝试一下，搭建一个集群，然后执行一个需要多个key的lua script，极大概率你会得到下面的错误：

(error) CROSSSLOT Keys in request don't hash to the same slot

Redis要求，在使用eval的时候，涉及到的key必须在同一个slot槽中，否则，就会出现上面的错误。
到这里也就和第一个问题联系了起来，如果我们把key写到了lua script中，那么即使lua script能够顺利进入某个机器开始执行，大概率也会出现当前机器中没有我们写死的这个key，此时会得到下面的错误：

Lua script attempted to access a non local key in a cluster node

解决上面问题的方法就是使用hash tag（hash tag可以参照官方文档）。我们需要把key中的一部分使用{}包起来，redis将通过{}中间的内容作为计算slot的key，类似key1{mykey}、key2{mykey}这样的都会存放到同一个slot中。当然，这就需要我们在设计业务的时候提前考虑好，否则上线后再处理，就很麻烦了。

hash tag带来的一个问题就是会让cluster中某个节点压力增加，这个只能取舍了。

另外，我查阅了codis和twemproxy的官方文档，它们默认也是支持hash tag的。只不过twemproxy比较奇葩，如果不指定hash tag的话，它只会对第一个key做slot的处理判断。

结论：在cluster环境中使用eval命令时，一定要注意key的slot问题，最好使用hash tag。不过也要考虑hash tag可能带来的cluster中某个node压力的问题！

eval和evalsha在编程里的使用

之前写lua script的时候就觉得，这东西每次执行要把那么长的scipt字符串发过去，网络带宽很麻烦啊。当时用的是spring-data-redis，就想说看看它底层怎么处理这个问题的，于是就看到了下面的代码：

Object result;
try {    
    result = connection.evalSha(script.getSha1(), returnType, numKeys, keysAndArgs);
} catch (Exception e) {
    if (!ScriptUtils.exceptionContainsNoScriptError(e)) {
        throw e instanceof RuntimeException ? (RuntimeException) e : new       
                RedisSystemException(e.getMessage(), e); 
    }
    result = connection.eval(scriptBytes(script), returnType, numKeys, keysAndArgs);
}

第一次看到这个代码，我给人的理解是，先去使用evalsha来执行scipt，如果redis服务器上没有缓存这个script，则直接使用eval把script全部都发过去执行。所以，一直困扰我的是，我服务器那边没有缓存script脚本的话，那这个逻辑岂不是一直要走catch里的代码？不但性能没提升，还因为多了一次evalsha的执行，多走了一遍网络，这不是有毛病么？甚至我自己后来给自己找了个"答案"：可能是要先使用script load方式在redis服务器先把写好的lua script加载了，然后再使用。

后来不死心，正好看到了go-redis的代码，于是找找看它是怎么解决这个问题的，于是看到了下面的这段：

// Run optimistically uses EVALSHA to run the script. If script does not exist
// it is retried using EVAL.
func (s *Script) Run(c scripter, keys []string, args ...interface{}) *Cmd {
	r := s.EvalSha(c, keys, args...)
	if err := r.Err(); err != nil && strings.HasPrefix(err.Error(), "NOSCRIPT ") {
		return s.Eval(c, keys, args...)
	}
	return r
}

先看代码逻辑，简直和spring-data-redis一毛一样；再看注释，md，不就是我理解的那个么？难道这些开源库的作者都瓦特了么？

恰巧有天突然某网站的代码泄露到了github上，也是go的，天助我也，这是线上验证过的逻辑，看看它们的正好学习一下怎么处理的，于是我看到了下面这段：

// Do evaluates the script. Under the covers, Do optimistically evaluates the
// script using the EVALSHA command. If the command fails because the script is
// not loaded, then Do evaluates the script using the EVAL command (thus
// causing the script to load).
func (s *Script) Do(c Conn, keysAndArgs ...interface{}) (interface{}, error) {
	v, err := c.Do("EVALSHA", s.args(s.hash, keysAndArgs)...)
	if e, ok := err.(Error); ok && strings.HasPrefix(string(e), "NOSCRIPT ") {
		v, err = c.Do("EVAL", s.args(s.src, keysAndArgs)...)
	}
	return v, err
}

额，从代码来看和go-redis是一样的，但是注释写的棒棒的，thus causing the script to load.
啥？eval还有缓存script这种凶残功能咩？赶紧上官网扒文档：

Executed scripts are guaranteed to be in the script cache of a given execution of a Redis instance forever. This means that if an EVAL is performed against a Redis instance all the subsequent EVALSHA calls will succeed.

看到这里解惑了，合着spring-data-redis和go-redis的作者根本没有瓦特，是我从头到尾瓦特了。

redis官方文档其实写的很清楚了，这里记录几个我个人认为的重点：

使用eval执行一个script之后，这个script会一直被缓存在redis服务器中，直到重启（如果没有存储的话）或者被执行了script flush命令。
当我们程序里连接redis服务器的connection没有断开，我们就可以确定缓存的script依然存在
在pipeline里可以先使用script load命令，然后紧接着不需要检查任何错误就可以使用evalsha来执行

结论：在程序中使用lua script的时候，尽量使用evalsha，这样可以节省网络带宽，如果script不存在，再执行eval来获取结果，同时eval还可以让redis服务器缓存script，直到redis服务器停止或执行了script flush。

The text was updated successfully, but these errors were encountered:

qyvlik · 2019-04-29T01:44:20Z

redis script 不具备 all or nothing 特性的，可能是 crud 程序猿会遇到，这可能是思维惯性导致的。
https://blog.csdn.net/qyvlik/article/details/89668611

qloog · 2020-06-03T08:47:32Z

恰巧有天突然某网站的代码泄露到了github上，也是go的，天助我也 👍

我又去看了下代码，确实有 😄

HQidea · 2024-04-12T06:41:31Z

Executed scripts are guaranteed to be in the script cache of a given execution of a Redis instance forever. This means that if an EVAL is performed against a Redis instance all the subsequent EVALSHA calls will succeed.

问题是怎么获取sha呢？

nethibernate · 2024-06-06T07:25:05Z

Executed scripts are guaranteed to be in the script cache of a given execution of a Redis instance forever. This means that if an EVAL is performed against a Redis instance all the subsequent EVALSHA calls will succeed.

问题是怎么获取sha呢？

这个是对lua的script做一个sha的计算，redis服务器那边的算法和本地的算法是一致的，只要script不变，两方的sha都不会变。具体怎么算sha，可以看下各种redis连接库的源码，都是有支持的，包括redis本身源码也可以看到。

nethibernate added 2019 redis labels Apr 7, 2019

nethibernate changed the title ~~Redis eval命令踩得那些坑~~ 2. Redis eval命令踩得那些坑 Jul 4, 2019

nethibernate closed this as completed Jan 14, 2020

hackfengJam mentioned this issue Apr 25, 2021

Redis eval命令踩得那些坑 hackfengJam/blog#21

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2. Redis eval命令踩得那些坑 #7

2. Redis eval命令踩得那些坑 #7

nethibernate commented Apr 7, 2019 •

edited

Loading

Return value

qyvlik commented Apr 29, 2019 •

edited

Loading

qloog commented Jun 3, 2020

HQidea commented Apr 12, 2024

nethibernate commented Jun 6, 2024

2. Redis eval命令踩得那些坑 #7

2. Redis eval命令踩得那些坑 #7

Comments

nethibernate commented Apr 7, 2019 • edited Loading

get命令获得的东西判断nil的坑

redis到lua数据的类型转换

Return value

lua script在cluster中执行的目标机器

eval和evalsha在编程里的使用

qyvlik commented Apr 29, 2019 • edited Loading

qloog commented Jun 3, 2020

HQidea commented Apr 12, 2024

nethibernate commented Jun 6, 2024

nethibernate commented Apr 7, 2019 •

edited

Loading

qyvlik commented Apr 29, 2019 •

edited

Loading