Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redis rewriting AOF failed and crash #868

Closed
ghost opened this issue Jan 9, 2013 · 5 comments
Closed

redis rewriting AOF failed and crash #868

ghost opened this issue Jan 9, 2013 · 5 comments

Comments

@ghost
Copy link

ghost commented Jan 9, 2013

...
[9561] 09 Jan 17:10:08.374 * Starting automatic rewriting of AOF on 100% growth
[9561] 09 Jan 17:10:08.552 * Background append only file rewriting started by pid 29812
[9561] 09 Jan 17:10:31.007 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this ma
y slow down Redis.
[9561] 09 Jan 17:10:35.006 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this ma
y slow down Redis.
[9561] 09 Jan 17:10:39.006 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this ma
y slow down Redis.
[9561] 09 Jan 17:10:43.002 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this ma
y slow down Redis.
[9561] 09 Jan 17:10:46.007 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this ma
y slow down Redis.
[29812] 09 Jan 17:10:47.220 #


[29812] 09 Jan 17:10:47.537 # === ASSERTION FAILED ===
[29812] 09 Jan 17:10:47.539 # ==> t_hash.c:325 'vptr != NULL' is not true
[29812] 09 Jan 17:10:47.539 # (forcing SIGSEGV to print the bug report.)
[29812] 09 Jan 17:10:47.539 #     Redis 2.6.6 crashed by signal: 11
[29812] 09 Jan 17:10:47.539 #     Failed assertion: vptr != NULL (t_hash.c:325)
[29812] 09 Jan 17:10:47.539 # --- STACK TRACE
/usr/local/redis/bin/redis-server(logStackTrace+0x75)[0x440fb5]
/usr/local/redis/bin/redis-server(_redisAssert+0x6f)[0x440e2f]
/lib/libpthread.so.0(+0xf8f0)[0x7f0cdee938f0]
/usr/local/redis/bin/redis-server(_redisAssert+0x6f)[0x440e2f]
/usr/local/redis/bin/redis-server(hashTypeNext+0xe6)[0x438996]
/usr/local/redis/bin/redis-server(rewriteHashObject+0xd2)[0x43d772]
/usr/local/redis/bin/redis-server(rewriteAppendOnlyFile+0x1f4)[0x43f424]
/usr/local/redis/bin/redis-server(rewriteAppendOnlyFileBackground+0xc4)[0x43f7e4]
/usr/local/redis/bin/redis-server(serverCron+0x443)[0x41dce3]
/usr/local/redis/bin/redis-server(aeProcessEvents+0x202)[0x417802]
/usr/local/redis/bin/redis-server(aeMain+0x2b)[0x4179fb]
/usr/local/redis/bin/redis-server(main+0x23b)[0x41e6ab]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f0cdeb1fc4d]
/usr/local/redis/bin/redis-server[0x416dc9]
[29812] 09 Jan 17:10:47.810 # --- INFO OUTPUT
[29812] 09 Jan 17:10:47.810 # # Server
redis_version:2.6.6
redis_git_sha1:00000000
redis_git_dirty:0
redis_mode:standalone
os:Linux 2.6.32-21-server x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.3
process_id:29812
run_id:321b8046be44120fd80746fddf64199ab0f13892
tcp_port:6386
uptime_in_seconds:786487
uptime_in_days:9
lru_clock:1554532

# Clients
connected_clients:38
client_longest_output_list:0
blocked_clients:0

# Memory
used_memory:5507789808
used_memory_human:5.13G
used_memory_rss:5633081344
used_memory_peak:5513067952
used_memory_peak_human:5.13G
used_memory_lua:31744
mem_fragmentation_ratio:1.02
mem_allocator:jemalloc-3.2.0

# Persistence
loading:0
rdb_changes_since_last_save:20450345
rdb_bgsave_in_progress:0
rdb_last_save_time:1357660856
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:55
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:68
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_current_size:11727211392
aof_base_size:5863605642
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:38

# Stats
total_connections_received:78996577
total_commands_processed:8553080268
instantaneous_ops_per_sec:14392
rejected_connections:0
expired_keys:0
evicted_keys:0

keyspace_hits:8137002949
keyspace_misses:22946621
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:219996

# Replication
role:master
connected_slaves:0

# CPU
used_cpu_sys:5.52
used_cpu_user:33.16
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Commandstats
cmdstat_del:calls=2535263,usec=17922460,usec_per_call=7.07
cmdstat_lpush:calls=2648761,usec=11724381,usec_per_call=4.43
cmdstat_rpop:calls=3164942,usec=16855835,usec_per_call=5.33
cmdstat_llen:calls=91,usec=229,usec_per_call=2.52
cmdstat_hset:calls=384768500,usec=2597644930,usec_per_call=6.75
cmdstat_hget:calls=5979730156,usec=26494193464,usec_per_call=4.43
cmdstat_hgetall:calls=2180219323,usec=47638987054,usec_per_call=21.85
cmdstat_bgsave:calls=9,usec=1751970,usec_per_call=194663.33
cmdstat_info:calls=13223,usec=4085738,usec_per_call=308.99

# Keyspace
db0:keys=6377758,expires=0
hash_init_value: 1357679091

[29812] 09 Jan 17:10:47.810 # --- CLIENT LIST OUTPUT

...

[29812] 09 Jan 17:10:47.811 # --- REGISTERS
[29812] 09 Jan 17:10:47.811 #
RAX:0000000000000000 RBX:0000000000000145
RCX:0000000000016e40 RDX:00007f0cdee7fe98
RDI:0000000000000000 RSI:0000000000000000
RBP:00000000004991a0 RSP:00007fff0413a3c0
R8 :0000000001cb8e90 R9 :00000000ffffffff
R10:7562206568742074 R11:0000000000000206
R12:00000000004991ba R13:ffffffffffffffda
R14:0000000000000000 R15:0000000000000040
RIP:0000000000440e2f EFL:0000000000010202
CSGSFS:0000000000000033
[29812] 09 Jan 17:10:47.811 # (00007fff0413a438) -> 00007fff0413a4b0
[29812] 09 Jan 17:10:47.811 # (00007fff0413a430) -> ffffffffffffffff
[29812] 09 Jan 17:10:47.811 # (00007fff0413a428) -> 00007fff0413a470
[29812] 09 Jan 17:10:47.811 # (00007fff0413a420) -> 00007f0cde188300
[29812] 09 Jan 17:10:47.811 # (00007fff0413a418) -> 00007f0c77d0bf60
[29812] 09 Jan 17:10:47.811 # (00007fff0413a410) -> 00007f0bb1216e80
[29812] 09 Jan 17:10:47.811 # (00007fff0413a408) -> 00007fff0413a4b0
[29812] 09 Jan 17:10:47.811 # (00007fff0413a400) -> 00007f0bf1f29088
[29812] 09 Jan 17:10:47.811 # (00007fff0413a3f8) -> 000000000043d772
[29812] 09 Jan 17:10:47.811 # (00007fff0413a3f0) -> 00007f0b98870580
[29812] 09 Jan 17:10:47.811 # (00007fff0413a3e8) -> 00007fff0413a470
[29812] 09 Jan 17:10:47.811 # (00007fff0413a3e0) -> 0000000000000020
[29812] 09 Jan 17:10:47.812 # (00007fff0413a3d8) -> 0000000000438996
[29812] 09 Jan 17:10:47.812 # (00007fff0413a3d0) -> 00007f0c6abe5800
[29812] 09 Jan 17:10:47.812 # (00007fff0413a3c8) -> 00007f0c6abf0b27
[29812] 09 Jan 17:10:47.812 # (00007fff0413a3c0) -> 00007f0b98870580
[29812] 09 Jan 17:10:47.812 #

@antirez
Copy link
Contributor

antirez commented Jan 9, 2013

From the bug report, it is either an issue with the hardware (like a memory error), or a subtle bug in the ziplist implementation.

A few things:

  • Please could you test the system memory and report back?
  • In general the system appears to work well? It is the first time you get a crash?

If system is ok it's worth investigating the dump of registers and the stack to see if there is some clue about the values that triggered this issue.

Thanks for reporting

@antirez
Copy link
Contributor

antirez commented Jan 9, 2013

Sorry another question.

From your bug report it seems like only the child crashed, so is the instance actually still running?

In that case, does BGSAVE successfully produce an RDB file, or the child crashes again? Thanks.

@ghost
Copy link
Author

ghost commented Jan 16, 2013

Please could you test the system memory and report back?

It's a production box, I can't stop current redis and other k-v services for testing,
however, memcached and the old version redis (v2.4.x) is stable and works well on that box for a couple of years.

In general the system appears to work well? It is the first time you get a crash?

I have never see this in v2.4.x, I got first time it crashes after I upgrade it to 2.6.x.

does BGSAVE successfully produce an RDB file, or the child crashes again?

Our engineer write some scripts to monitor and autorestart redis service,
a script try to start redis daemon process after it crashes, I aware it AUTO does rewrite aof,
but it rewrite it failed, then it does it again and again.

Finally, I kill it by SIGKILL, remove all auto-rewrite temporary files, do redis-check-aof for foo.aof,
start it by manual and it works again.

@antirez
Copy link
Contributor

antirez commented Jan 21, 2013

Please could you report if this happened again after the first time? Thanks.

@antirez
Copy link
Contributor

antirez commented Jul 17, 2013

Can't act for lack of info, closing. Thanks for reporting, please reopen if it happens with a recent Redis 2.6.x release.

@antirez antirez closed this as completed Jul 17, 2013
yossigo added a commit to yossigo/redis that referenced this issue Feb 14, 2022
f8de9a4bd Merge pull request redis#1046 from redis/rockylinux-ci
a41c9bc8b CentOS 8 is EOL, switch to RockyLinux
be41ed60d Avoid incorrect call to the previous reply's callback (redis#1040)
f2e8010d9 fix building on AIX and SunOS (redis#1031)
e73ab2f23 Add timeout support for libuv adapter (redis#1016)
f2ce5980e Allow sending commands after sending an unsubscribe (redis#1036)
ff860e55d Correction for command timeout during pubsub (redis#1038)
24d534493 CMakeLists.txt: allow building without a C++ compiler (redis#872)
4ece9a02e Fix adapters/libevent.h compilation for 64-bit Windows (redis#937)
799edfaad Don't link with crypto libs if USE_SSL isn't set.
f74b08182 Makefile: move SSL options into a block and refine rules
f347743b7 Update CMakeLists.txt for more portability (redis#1005)
f2be74802 Fix integer overflow when format command larger than 4GB (redis#1030)
58aacdac6 Handle array response in parallell with pubsub using RESP3 (redis#1014)
d3384260e Support PING while subscribing (RESP2) (redis#1027)
e3a479e40 FreeBSD build fixes + CI (redis#1026)
da5a4ff36 Add asynchronous test for pubsub using RESP3 (redis#1012)
b5716ee82 Valgrind returns error exit code when errors found (redis#1011)
1aed21a8c Move to using make directly in Cygwin (redis#1020)
a83f4b890 Correct CMake warning for libevent adapter example
c4333203e Remove unused parameter warning in libev adapter
7ad38dc4a Small tweaks of the async tests
4021726a6 Add asynchronous test for pubsub using RESP2
648763c36 Add build options for enabling async tests
c98c6994d Correcting the build target `coverage` for enabled SSL (redis#1009)
30ff8d850 Run SSL tests in CI
4a126e8a9 Add valgrind and CMake to tests
b73c2d410 Add Centos8
e9f647384 We should run actions on PRs
6ad4ccf3c Add Cygwin build test
783a3789c Add Windows tests in GitHub actions
0cac8dae1 Switch to GitHub actions
fa900ef76 Fix unused variable warning.
e489846b7 Minor refactor of CVE-2021-32765 fix.
51c740824 Remove extra comma from cmake var. Or it'll be treated as part of the var name.
632bf0718 Merge branch 'release/v1.0.2'
b73128324 Prepare for v1.0.2 GA
d4e6f109a Revert erroneous SONAME bump
a39824a5d Merge branch 'release/v1.0.1'
8d1bfac46  Prepare for v1.0.1 GA
76a7b1000 Fix for integer/buffer overflow CVE-2021-32765
9eca1f36f Allow to override OPENSSL_PREFIX in Linux
2d9d77518 Don't leak memory if an invalid type is set (redis#906)
f5f31ff9b Added REDIS_NO_AUTO_FREE_REPLIES flag (redis#962)
5850a8ecd Ensure we curry any connect error to an async context.
b6f86f38c Fix README.md
667dbf536 Merge pull request redis#935 from kristjanvalur/pr5
9bf6c250e Merge pull request redis#939 from zmartzone/improve_pr_896_ssl_leak
959af9760 Merge pull request redis#949 from plan-do-break-fix/Typo-corrections
0743f57bb fix(docs): corrects typos in project README
5f4382247 improve SSL leak fix redis/hiredis#896
e06ecf7e4 Ignore timeout callback from a successful connect
dfa33e60b Change order independant push logic to not change behavior.
6204182aa Handle the case where an invalidation is sent second.
d6a0b192b Merge branch 'reader-updates'
410c24d2a Fix off-by-one error in seekNewline
bd7488d27 read: Validate line items prior to checking for object creation callbacks
5f9242a1f read: Remove obsolete comment on nested multi bulk depth limitation
83c145042 read: Add support for the RESP3 bignum type
c6646cb19 read: Ensure no invalid '\r' or '\n' in simple status/error strings
e43061156 read: Additional validation and test case for RESP3 double
c8adea402 redisReply: Fix parent type assertions during double, nil, bool creation
ff73f1f9e redisReply: Explicitly list nil and bool cases in freeReplyObject() switch.
0f9251884 test: Add test case for RESP3 set
33c06dd50 test: Add test case for RESP3 map
397fe2630 read: Use memchr() in seekNewline() instead of looping over entire string
81c48a982 test: Add test cases for RESP3 bool
51e693f4f read: Add additional RESP3 bool validation
790b4d3b4 test: Add test cases for RESP3 nil
d8899fbc1 read: Add additional RESP3 nil validation
96e8ea611 test: Add test cases for infinite and NaN doubles
f913e9b99 read: Fix double validation and infinity parsing
8039c7d26 test: Add test case for doubles
49539fd1a redisReply: Fix - set len in double objects
53a8144c8 Merge pull request redis#924 from cheese1/master
9390de006 http -> https
7d99b5635 Merge pull request redis#917 from Nordix/stack-alloc-dict-iter
4bba72103 Handle OOM during async command callback registration
920128a26 Stack allocate dict iterators
297ecbecb Tiny formatting changes + suppress implicit memcpy warning
f746a28e7 Removed 2 typecasts
940a04f4d Added fuzzer
e4a200040 Merge pull request redis#896 from ayeganov/bugfix/ssl_leak
aefef8987 Free SSL object when redisSSLConnect fails
e3f88ebcf Merge pull request redis#894 from jcohen02/fix/issue893
308ffcab8 Updating SSL connection example
297f6551d Merge pull request redis#889 from redis/wincert
e7dda9785 Formatting
f44945a0a Merge pull request redis#874 from masariello/position-independent-code
74e78498c Merge pull request redis#888 from michael-grunder/nil-push-invalidation
b9b9f446f Fix handling of NIL invalidation messages.
acc917548 Merge pull request redis#885 from gkorland/patch-1
b086f763e clean a warning, remvoe empty else block
b47fae4e7 Merge pull request redis#881 from timgates42/bugfix_typo_terminated
f989670e5 docs: Fix simple typo, termined -> terminated
773d6ea8a Copy error to redisAsyncContext on timeout
e35300a66 add pdb files to packages for MSVC builds
dde6916b4 Add d suffix to debug libraries so that can packaged together with optimized builds (Release, RelWithDebInfo, etc)
3b68b5018 Enable position-independent code
6693863f4 Add support for system CA certificate store on Windows
2a5a57b90 Remove whitespace
1b40ec509 fixed issue with unit test linking on windows with SSL
d7b1d21e8 Merge branch 'master' of github.com:redis/hiredis
fb0e6c0dd Merge pull request redis#870 from michael-grunder/cmake-c99
13a35bdb6 Explicitly set c99 in CMake
bea137ca9 Merge pull request redis#868 from michael-grunder/fix-sockaddr-typo
bd6f86eb6 Fix sockaddr typo
48696e7e5 Don't use non-installed win32.h helper in examples (redis#863)
faa1c4863 Merge tag 'v1.0.0'
5003906d6 Define a no op assert if we detect NDEBUG (redis#861)
ea063b7cc Use development specific versions in master
04a27f480 We can run SSL tests everywhere except mingw/Windows (redis#859)
8966a1fc2 Remove extra whitespace (redis#858)
34b7f7a0f Keep libev's code style (redis#857)
07c3618ff Add static library target and cpack support
REVERT: 00272d669 Rename sds calls so they don't conflict in Redis.

git-subtree-dir: deps/hiredis
git-subtree-split: f8de9a4bd433791890572f7b9147e685653ddef9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant