Skip to content

redis7.0.12使用过程中内存崩溃 #617

@Kevin-zhao326

Description

@Kevin-zhao326

redis部署模式为sentinel,系统为中标麒麟,版本为V10-sp3(2212),redis部署版本7.0.12,在某一时间点,redis主节点master突然内存崩溃,redis进程被系统直接kill掉,排查服务器内存、cpu、io均未达到瓶颈,以下是redis报错日志截取如下:

=== REDIS BUG REPORT START: Cut & paste starting from here ===
141955:M 28 Aug 2024 09:55:01.401 # Redis 7.0.12 crashed by signal: 11, si_code: 1
141955:M 28 Aug 2024 09:55:01.401 # Accessing address: 0x10
141955:M 28 Aug 2024 09:55:01.401 # Crashed running the instruction at: 0x4622dc

------ STACK TRACE ------
EIP:
./redis-server *:36379[0x4622dc]

Backtrace:
/usr/lib64/libpthread.so.0(+0x134c0)[0x7fec81ebd4c0]
./redis-server *:36379[0x4622dc]
./redis-server *:36379(_writeToClient+0xab)[0x46400b]
./redis-server *:36379(writeToClient+0x45)[0x464195]
./redis-server *:36379(handleClientsWithPendingWrites+0x5c)[0x4643bc]
./redis-server *:36379(handleClientsWithPendingWritesUsingThreads+0x1f5)[0x46ac35]
./redis-server *:36379(beforeSleep+0x13c)[0x4481bc]
./redis-server *:36379(aeProcessEvents+0x82)[0x444162]
./redis-server *:36379(aeMain+0x1d)[0x44466d]
./redis-server *:36379(main+0x32b)[0x44043b]
/usr/lib64/libc.so.6(__libc_start_main+0xe7)[0x7fec81d17b27]
./redis-server *:36379(_start+0x2a)[0x440b0a]

------ REGISTERS ------
141955:M 28 Aug 2024 09:55:01.402 #
RAX:0000000000000000 RBX:00007fec817fb000
RCX:0000000000000006 RDX:ffffffffffffe768
RDI:00007ffda3dc1940 RSI:00007fec81a5c8d8
RBP:00000000000084db RSP:00007ffda3dc1940
R8 :00007fec81800900 R9 :00007fec816012f8
R10:0000000000000002 R11:0000000000000002
R12:00007ffda3dc59b8 R13:0000000000000002
R14:0000000000000000 R15:0000000000000001
RIP:00000000004622dc EFL:0000000000010246
CSGSFS:002b000000000033
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc194f) -> 0000000000005977
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc194e) -> 00007fec6b30fed0
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc194d) -> 0000000000018127
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc194c) -> 00007fec6b2db750
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc194b) -> 00000000000544b7
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc194a) -> 00007fec6b3696d0
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1949) -> 0000000000000ada
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1948) -> 00007fec6fd677d0
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1947) -> 0000000000006e09
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1946) -> 00007fec704bfe90
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1945) -> 00000000000084db
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1944) -> 00007fec70636ed0
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1943) -> 0000000000000800
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1942) -> 00007fec817cc000
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1941) -> 0000000000000000
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1940) -> 0000000000000000
通过GBD排查,发现崩溃点如下:
(gdb)
#0 0x00007fec81d2b757 in ?? ()
#1 0x00000000004a574c in sdslen (s=0x1 <Address 0x1 out of bounds>) at sds.h:89
#2 logCurrentClient () at debug.c:1818
#3 0x0000000000000000 in ?? ()
(gdb) info frame 0
Stack frame at 0x7ffda3dc12c0:
rip = 0x7fec81d2b757; saved rip 0x4a574c
called by frame at 0x7ffda3dc1300
Arglist at 0x7ffda3dc12b0, args:
Locals at 0x7ffda3dc12b0, Previous frame's sp is 0x7ffda3dc12c0
Saved registers:
rip at 0x7ffda3dc12b8
(gdb) list
11 * notice, this list of conditions and the following disclaimer in the
12 * documentation and/or other materials provided with the distribution.
13 * * Neither the name of Redis nor the names of its contributors may be used
14 * to endorse or promote products derived from this software without
15 * specific prior written permission.
16 *
17 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
18 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
20 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
(gdb) info frame 1
Stack frame at 0x7ffda3dc1300:
rip = 0x4a574c in sdslen (sds.h:89); saved rip 0x0
inlined into frame 2, caller of frame at 0x7ffda3dc12c0
source language c.
Arglist at unknown address.
Locals at unknown address, Previous frame's sp is 0x7ffda3dc12c0
Saved registers:
rip at 0x7ffda3dc12b8
(gdb) list
21 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
22 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
23 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
24 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
25 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
26 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
27 * POSSIBILITY OF SUCH DAMAGE.
28 */
29
30 #include "server.h"
(gdb) info frame 2
Stack frame at 0x7ffda3dc1300:
rip = 0x4a574c in logCurrentClient (debug.c:1818); saved rip 0x0
called by frame at 0x7ffda3dc1308, caller of frame at 0x7ffda3dc1300
source language c.
Arglist at 0x7ffda3dc12b8, args:
Locals at 0x7ffda3dc12b8, Previous frame's sp is 0x7ffda3dc1300
Saved registers:
rbx at 0x7ffda3dc12c8, rbp at 0x7ffda3dc12d0, r12 at 0x7ffda3dc12d8, r13 at 0x7ffda3dc12e0, r14 at 0x7ffda3dc12e8, r15 at 0x7ffda3dc12f0, rip at 0x7ffda3dc12f8
(gdb) list
31 #include "monotonic.h"
32 #include "cluster.h"
33 #include "slowlog.h"
34 #include "bio.h"
35 #include "latency.h"
36 #include "atomicvar.h"
37 #include "mt19937-64.h"
38 #include "functions.h"
39 #include "syscheck.h"
40
(gdb) info frame 3
Stack frame at 0x7ffda3dc1308:
rip = 0x0; saved rip 0x0
caller of frame at 0x7ffda3dc1300
Arglist at 0x7ffda3dc12f8, args:
Locals at 0x7ffda3dc12f8, Previous frame's sp is 0x7ffda3dc1308
Saved registers:
rip at 0x7ffda3dc1300
(gdb) list
41 #include <time.h>
42 #include <signal.h>
43 #include <sys/wait.h>
44 #include <errno.h>
45 #include <assert.h>
46 #include <ctype.h>
47 #include <stdarg.h>
48 #include <arpa/inet.h>
49 #include <sys/stat.h>
50 #include <fcntl.h>
通过断点排查仍未找到被kill的原因,大家有遇到过相关问题吗,还请协助并给出解决方案,该问题是否为redis7.0.12版本的BUG

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions