Redis 4.0.8 cluster, slaver crash during bgsave #5287

qingyuan18 · 2018-08-28T02:15:34Z

we have a redis 4.0.8 cluster with 3 master and 3 slavers , one or two of the slaver nodes alwasy crash during bgsave, the crash report as below:

Suspect RAM error? Use redis-server --test-memory to verify it.

17841:S 28 Aug 02:19:51.394 # Background saving terminated by signal 11
17841:S 28 Aug 02:19:52.021 * 10000 changes in 120 seconds. Saving...
17841:S 28 Aug 02:20:01.355 * Background saving started by pid 21707

=== REDIS BUG REPORT START: Cut & paste starting from here ===
21707:C 28 Aug 02:25:41.946 # ------------------------------------------------
21707:C 28 Aug 02:25:41.947 # !!! Software Failure. Press left mouse button to continue
21707:C 28 Aug 02:25:41.947 # Guru Meditation: Unknown object type #rdb.c:628
21707:C 28 Aug 02:25:41.947 # (forcing SIGSEGV in order to print the stack trace)
21707:C 28 Aug 02:25:41.947 # ------------------------------------------------
21707:C 28 Aug 02:25:41.947 # Redis 4.0.8 crashed by signal: 11
21707:C 28 Aug 02:25:41.947 # Crashed running the instruction at: 0x466fd3
21707:C 28 Aug 02:25:41.947 # Accessing address: 0xffffffffffffffff
21707:C 28 Aug 02:25:41.947 # Failed assertion: (:0)

------ STACK TRACE ------
EIP:
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x466fd3]

Backtrace:
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x466dfc]
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x468033]
/lib64/libpthread.so.0[0x30a3e0f790]
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x466fd3]
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x446ed7]
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x4488d7]
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x448bc6]
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x448d95]
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x449060]
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x42d4d0]
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x424d7d]
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x424f2b]
redis-rdb-bgsave 10.146.14.17:8003 cluster[0x42db42]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x30a3a1ed5d]
redis-rdb-bgsave 10.146.14.17:8003 [cluster][0x422319]

the rdb.c:628 code shows there is no correct object type:

case OBJ_ZSET:
    if (o->encoding == OBJ_ENCODING_ZIPLIST)
        return rdbSaveType(rdb,RDB_TYPE_ZSET_ZIPLIST);
    else if (o->encoding == OBJ_ENCODING_SKIPLIST)
        return rdbSaveType(rdb,RDB_TYPE_ZSET_2);
    else
        serverPanic("Unknown sorted set encoding");
case OBJ_HASH:
    if (o->encoding == OBJ_ENCODING_ZIPLIST)
        return rdbSaveType(rdb,RDB_TYPE_HASH_ZIPLIST);
    else if (o->encoding == OBJ_ENCODING_HT)
        return rdbSaveType(rdb,RDB_TYPE_HASH);
    else
        serverPanic("Unknown hash encoding");
case OBJ_MODULE:
    return rdbSaveType(rdb,RDB_TYPE_MODULE_2);
default:
    serverPanic("Unknown object type");
}

our data is zset with 18 lenth String and other simply String type:

[00.00%] Biggest string found so far 'music_user:a2d4ee50-c0e5-4712-b059-8d79c4e387f8' with 11 bytes
[00.00%] Biggest zset found so far 'video:long:15252799960' with 100 members
[00.00%] Biggest zset found so far 'music:fm:18701531951' with 279 members
[00.00%] Biggest zset found so far 'music:fm:355981050728643' with 289 members
[00.00%] Biggest zset found so far 'music:fm:867483029348786' with 290 members

10.146.14.15:7001> zrange music:fm:867483029348786 0 -1

"600902000007331809"
"600907000007852796"
"600907000004182649"
"600902000009205261"
"600907000008439207"

can anyone help to locate which issue and how to fix it? so far I didn't see any client code's error ,do I need to upgrade the redis to 4.0.9?

any suggestion would be appreciated ! Thanks

The text was updated successfully, but these errors were encountered:

qingyuan18 · 2018-08-29T02:40:06Z

any suggestion?

BTW: we found the master-slaver synchronize transfer 40G+ every time:

seems like master - slaver synchronize full nodes's data ,not increnmental synchronzied

itamarhaber · 2018-09-23T10:18:38Z

Hello @qingyuan18

Upgrading to the latest version is generally recommended. Also, please include the full crash report in the issue.

BTW: the following is from Redis' documentation

However if there is not enough backlog in the master buffers, or if the slave is referring to an history (replication ID) which is no longer known, than a full resynchronization happens: in this case the slave will get a full copy of the dataset, from scratch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis 4.0.8 cluster, slaver crash during bgsave #5287

Redis 4.0.8 cluster, slaver crash during bgsave #5287

qingyuan18 commented Aug 28, 2018

qingyuan18 commented Aug 29, 2018

itamarhaber commented Sep 23, 2018

Redis 4.0.8 cluster, slaver crash during bgsave #5287

Redis 4.0.8 cluster, slaver crash during bgsave #5287

Comments

qingyuan18 commented Aug 28, 2018

qingyuan18 commented Aug 29, 2018

itamarhaber commented Sep 23, 2018