-
Notifications
You must be signed in to change notification settings - Fork 277
Description
redis部署模式为sentinel,系统为中标麒麟,版本为V10-sp3(2212),redis部署版本7.0.12,在某一时间点,redis主节点master突然内存崩溃,redis进程被系统直接kill掉,排查服务器内存、cpu、io均未达到瓶颈,以下是redis报错日志截取如下:
=== REDIS BUG REPORT START: Cut & paste starting from here ===
141955:M 28 Aug 2024 09:55:01.401 # Redis 7.0.12 crashed by signal: 11, si_code: 1
141955:M 28 Aug 2024 09:55:01.401 # Accessing address: 0x10
141955:M 28 Aug 2024 09:55:01.401 # Crashed running the instruction at: 0x4622dc
------ STACK TRACE ------
EIP:
./redis-server *:36379[0x4622dc]
Backtrace:
/usr/lib64/libpthread.so.0(+0x134c0)[0x7fec81ebd4c0]
./redis-server *:36379[0x4622dc]
./redis-server *:36379(_writeToClient+0xab)[0x46400b]
./redis-server *:36379(writeToClient+0x45)[0x464195]
./redis-server *:36379(handleClientsWithPendingWrites+0x5c)[0x4643bc]
./redis-server *:36379(handleClientsWithPendingWritesUsingThreads+0x1f5)[0x46ac35]
./redis-server *:36379(beforeSleep+0x13c)[0x4481bc]
./redis-server *:36379(aeProcessEvents+0x82)[0x444162]
./redis-server *:36379(aeMain+0x1d)[0x44466d]
./redis-server *:36379(main+0x32b)[0x44043b]
/usr/lib64/libc.so.6(__libc_start_main+0xe7)[0x7fec81d17b27]
./redis-server *:36379(_start+0x2a)[0x440b0a]
------ REGISTERS ------
141955:M 28 Aug 2024 09:55:01.402 #
RAX:0000000000000000 RBX:00007fec817fb000
RCX:0000000000000006 RDX:ffffffffffffe768
RDI:00007ffda3dc1940 RSI:00007fec81a5c8d8
RBP:00000000000084db RSP:00007ffda3dc1940
R8 :00007fec81800900 R9 :00007fec816012f8
R10:0000000000000002 R11:0000000000000002
R12:00007ffda3dc59b8 R13:0000000000000002
R14:0000000000000000 R15:0000000000000001
RIP:00000000004622dc EFL:0000000000010246
CSGSFS:002b000000000033
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc194f) -> 0000000000005977
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc194e) -> 00007fec6b30fed0
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc194d) -> 0000000000018127
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc194c) -> 00007fec6b2db750
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc194b) -> 00000000000544b7
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc194a) -> 00007fec6b3696d0
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1949) -> 0000000000000ada
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1948) -> 00007fec6fd677d0
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1947) -> 0000000000006e09
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1946) -> 00007fec704bfe90
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1945) -> 00000000000084db
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1944) -> 00007fec70636ed0
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1943) -> 0000000000000800
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1942) -> 00007fec817cc000
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1941) -> 0000000000000000
141955:M 28 Aug 2024 09:55:01.402 # (00007ffda3dc1940) -> 0000000000000000
通过GBD排查,发现崩溃点如下:
(gdb)
#0 0x00007fec81d2b757 in ?? ()
#1 0x00000000004a574c in sdslen (s=0x1 <Address 0x1 out of bounds>) at sds.h:89
#2 logCurrentClient () at debug.c:1818
#3 0x0000000000000000 in ?? ()
(gdb) info frame 0
Stack frame at 0x7ffda3dc12c0:
rip = 0x7fec81d2b757; saved rip 0x4a574c
called by frame at 0x7ffda3dc1300
Arglist at 0x7ffda3dc12b0, args:
Locals at 0x7ffda3dc12b0, Previous frame's sp is 0x7ffda3dc12c0
Saved registers:
rip at 0x7ffda3dc12b8
(gdb) list
11 * notice, this list of conditions and the following disclaimer in the
12 * documentation and/or other materials provided with the distribution.
13 * * Neither the name of Redis nor the names of its contributors may be used
14 * to endorse or promote products derived from this software without
15 * specific prior written permission.
16 *
17 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
18 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
20 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
(gdb) info frame 1
Stack frame at 0x7ffda3dc1300:
rip = 0x4a574c in sdslen (sds.h:89); saved rip 0x0
inlined into frame 2, caller of frame at 0x7ffda3dc12c0
source language c.
Arglist at unknown address.
Locals at unknown address, Previous frame's sp is 0x7ffda3dc12c0
Saved registers:
rip at 0x7ffda3dc12b8
(gdb) list
21 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
22 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
23 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
24 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
25 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
26 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
27 * POSSIBILITY OF SUCH DAMAGE.
28 */
29
30 #include "server.h"
(gdb) info frame 2
Stack frame at 0x7ffda3dc1300:
rip = 0x4a574c in logCurrentClient (debug.c:1818); saved rip 0x0
called by frame at 0x7ffda3dc1308, caller of frame at 0x7ffda3dc1300
source language c.
Arglist at 0x7ffda3dc12b8, args:
Locals at 0x7ffda3dc12b8, Previous frame's sp is 0x7ffda3dc1300
Saved registers:
rbx at 0x7ffda3dc12c8, rbp at 0x7ffda3dc12d0, r12 at 0x7ffda3dc12d8, r13 at 0x7ffda3dc12e0, r14 at 0x7ffda3dc12e8, r15 at 0x7ffda3dc12f0, rip at 0x7ffda3dc12f8
(gdb) list
31 #include "monotonic.h"
32 #include "cluster.h"
33 #include "slowlog.h"
34 #include "bio.h"
35 #include "latency.h"
36 #include "atomicvar.h"
37 #include "mt19937-64.h"
38 #include "functions.h"
39 #include "syscheck.h"
40
(gdb) info frame 3
Stack frame at 0x7ffda3dc1308:
rip = 0x0; saved rip 0x0
caller of frame at 0x7ffda3dc1300
Arglist at 0x7ffda3dc12f8, args:
Locals at 0x7ffda3dc12f8, Previous frame's sp is 0x7ffda3dc1308
Saved registers:
rip at 0x7ffda3dc1300
(gdb) list
41 #include <time.h>
42 #include <signal.h>
43 #include <sys/wait.h>
44 #include <errno.h>
45 #include <assert.h>
46 #include <ctype.h>
47 #include <stdarg.h>
48 #include <arpa/inet.h>
49 #include <sys/stat.h>
50 #include <fcntl.h>
通过断点排查仍未找到被kill的原因,大家有遇到过相关问题吗,还请协助并给出解决方案,该问题是否为redis7.0.12版本的BUG