-
Notifications
You must be signed in to change notification settings - Fork 23.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Master-slave synchronization problems on big dbs #957
Comments
Hello, thanks for reporting, please provide INFO output of the master instance. Cheers. |
http://pastebin.com/XLE9e954 - it's from test server where failed was reproduced |
Hey, thanks for the additional info. I see you are using FreeBSD, this is very likely a FreeBSD specific issue, I probably need some help from you in order to understand what the problem is. To start it would be very helpful if you could do the following: File syncio.c, in syncRead():
Change it into
After this change try again and send me the output. Thanks! |
[83823] 25 Feb 15:11:54.431 # Server started, Redis version 2.6.10 |
http://pastebin.com/UQhVvANn here is more detailed. SYNCREAD appears at the same time as "Master replied to PING, replication can continue..." |
I think it is because this code in readSyncBulkPayload() in replication.c nread = read(fd,buf,readlen);
if (nread <= 0) {
redisLog(REDIS_WARNING,"I/O error trying to sync with MASTER: %s",
(nread == -1) ? strerror(errno) : "connection lost");
replicationAbortSyncTransfer();
return;
} |
@bakwc could you test this code? |
@charsyam it is not possible that the problem is what you stated, so @bakwc does not need to test this code. If you look at the logs, you'll read:
connection lost is only printed when the return value is 0, that is not what your additional commit handles. |
@bakwc before to continue, it seems realistic to you to be able to provide shell access in the test environment to me? Otherwise if this is not possible I'll create a branch with much more debugging information printed by Redis. What is strange is that what was supposed to shed some light on the issue was a bit useless as the new error is now in a different part of the code, and apparently the connection is lost while the data transfer is in progress, that I've no idea why it happens at all. So either I need to directly mess with the code in the system where this happens or to provide a much more debug-intensive version of Redis that you can try to see what happens. Thanks! |
@antirez . Yes, You're right. but it can also make another problem when EAGAIN happens. |
@charsyam in theory this should never happen because the signal handler called us because there is more data to read on the socket, we should be able to read at least a single byte. |
Sorry the event handler, not signal. |
@antirez Thank you for your teaching. I misunderstand the source. |
no prob at all, however it is possible that we'll find some error in the code as it is unlikely that this is a FreeBSD-specific issue in so well-understood networking code and APIs. Probably I mis-handle something. |
@antirez sor, can't provide access ( You may try to reproduce by installing freebsd somewhere and try to sync, but I'm not sure if it is reproduced on little load / memory usage. |
So, could you make a debug branch please? |
The problem was with little client output buffer for slaves. Full repclication take ~10 minutes, new data in slave buffer was more than 64 mb after ~3 minutes, connection was droped by soft (or hard) limit. |
We have some issues with redis: 1467:S 22 Feb 11:12:43.771 * Master replied to PING, replication can continue...
1467:S 22 Feb 11:12:43.775 * Partial resynchronization not possible (no cached master)
1467:S 22 Feb 11:12:46.184 * Full resync from master: 9304d314d029c32e6fe268e9f35c13ec937b0b3e:9077166633026
1467:S 22 Feb 11:13:47.902 # Timeout receiving bulk data from MASTER... If the problem persists try to set the 'repl-timeout' parameter in redis.conf to a larger value.
1467:S 22 Feb 11:13:47.902 * Connecting to MASTER prod-redis-backend01.r4e:9696
1467:S 22 Feb 11:13:47.906 * MASTER <-> SLAVE sync started
1467:S 22 Feb 11:13:47.907 * Non blocking connect for SYNC fired the event.
1467:S 22 Feb 11:13:47.908 * Master replied to PING, replication can continue...
1467:S 22 Feb 11:13:47.910 * Partial resynchronization not possible (no cached master)
1467:S 22 Feb 11:13:51.322 * Full resync from master: 9304d314d029c32e6fe268e9f35c13ec937b0b3e:9077213295690
1467:S 22 Feb 11:14:49.534 # I/O error reading bulk count from MASTER: Resource temporarily unavailable
1467:S 22 Feb 11:14:50.040 * Connecting to MASTER prod-redis-backend01.r4e:9696 Sometimes, slave is showing timeout and sometimes it is erroring I/O error. updated repl-timeout on redis.conf, but still no luck. Would you guys suggest for the fix. |
Same problem on 4.0.2 2 of 68 Slaves fails with
Slave:
Master
|
b6a052fe0 Helper for setting TCP_USER_TIMEOUT socket option (redis#1188) 3fa9b6944 Add RedisModule adapter (redis#1182) d13c091e9 Fix wincrypt symbols conflict 5d84c8cfd Add a test ensuring we don't clobber connection error. 3f95fcdae Don't attempt to set a timeout if we are in an error state. aacb84b8d Fix typo in makefile. 563b062e3 Accept -nan per the RESP3 spec recommendation. 04c1b5b02 Fix colliding option values 4ca8e73f6 Rework searching for openssl cd208812f Attempt to find the correct path for openssl. 011f7093c Allow specifying the keepalive interval e9243d4f7 Cmake static or shared (redis#1160) 1cbd5bc76 Write a version file for the CMake package (redis#1165) 6f5bae8c6 fix typo acd09461d CMakeLists.txt: respect BUILD_SHARED_LIBS 97fcf0fd1 Add sdevent adapter ccff093bc Bump dev version for the next release cycle. c14775b4e Prepare for v1.1.0 GA f0bdf8405 Add support for nan in RESP3 double (redis#1133) 991b0b0b3 Add an example that calls redisCommandArgv (redis#1140) a36686f84 CI updates (redis#1139) 8ad4985e9 fix flag reference 7583ebb1b Make freeing a NULL redisAsyncContext a no op. 2c53dea7f Update version in dev branch. f063370ed Prepare for v1.1.0-rc1 2b069573a CI fixes in preparation of release e1e9eb40d Add author information to release-drafter template. afc29ee1a Update for mingw cross compile ceb8a8815 fixed cpp build error with adapters/libhv.h 3b15a04b5 Fixup of PR734: Coverage of hiredis.c (redis#1124) c245df9fb CMake corrections for building on Windows (redis#1122) 9c338a598 Fix PUSH handler tests for Redis >= 7.0.5 6d5c3ee74 Install on windows fixes (redis#1117) 68b29e1ad Add timeout support to libhv adapter. (redis#1109) 722e3409c Additional include directory given by pkg-config (redis#1118) bd9ccb8c4 Use __attribute__ when building with clang on windows 5392adc26 set default SSL certificate directory 560e66486 Minor refactor d756f68a5 Add libhv example to our standard Makefile a66916719 Add adapters/libhv 855b48a81 Fix pkgconfig for hiredis_ssl 79ae5ffc6 Fix protocol error (redis#1106) 61b5b299f Use a windows specific keepalive function. (redis#1104) fce8abc1c Introduce .close method for redisContextFuncs cfb6ca881 Add REDIS_OPT_PREFER_UNSPEC (redis#1101) cc7c35ce6 Update documentation to explain redisConnectWithOptions. bc8d837b7 fix heap-buffer-overflow (redis#957) ca4a0e850 uvadapter: reduce number of uv_poll_start calls 35d398c90 Fix cmake config path on Linux. CMake config files were installed to `/usr/local/share/hiredis`, which is not recognizable by `find_package()`. I'm not sure why it was set that way. Given the commit introducing it is for Windows, I keep that behavior consistent there, but fix the rest. 10c78c6e1 Add possibility to prefer IPv6, IPv4 or unspecified 1abe0c828 fuzzer: No alloc in redisFormatCommand() when fail 329eaf9ba Fix heap-buffer-overflow issue in redisvFormatCommad eaae7321c Polling adapter requires sockcompat.h 0a5fa3dde Regression test for off-by-one parsing error 9e174e8f7 Add do while(0) protection for macros 4ad99c69a Rework asSleep to be a generic millisleep function. 75cb6c1ea Do store command timeout in the context for redisSetTimeout (redis#593) c57cad658 CMake: remove dict.c form hiredis_sources 8491a65a9 Add Github Actions CI workflow for hiredis: Arm, Arm64, 386, windows. (redis#943) 77e4f09ea Merge pull request redis#964 from afcidk/fix-createDoubleObject 9219f7e7c Merge pull request redis#901 from devnexen/illumos_test_fix 810cc6104 Merge pull request redis#905 from sundb/master df8b74d69 Merge pull request redis#1091 from redis/ssl-error-ub-fix 0ed6cdec3 Fix some undefined behaviour 507a6dcaa Merge pull request redis#1090 from Nordix/subscribe-oom-error b044eaa6a Copy error to redisAsyncContext when finding subscribe cb e0200b797 Merge pull request redis#1087 from redis/const-and-non-const-callback 6a3e96ad2 Maintain backward compatibiliy withour onConnect callback. e7afd998f Merge pull request redis#1079 from SukkaW/drop-macos-10.15-runner 17c8fe079 Merge pull request redis#931 from kristjanvalur/pr2 b808c0c20 Merge pull request redis#1083 from chayim/ck-drafter 367a82bf0 Merge pull request redis#1085 from stanhu/ssl-improve-options-setting 71119a71d Make it possible to set SSL verify mode dd7979ac1 Merge pull request redis#1084 from stanhu/sh-improve-ssl-docs c71116178 Improve example for SSL initialization in README.md 5c9b6b571 Release drafter a606ccf2a CI: use recommended `vmactions/freebsd-vm@v0` 0865c115b Merge pull request redis#1080 from Nordix/readme-corrections f6cee7142 Fix README typos 06be7ff31 Merge pull request redis#1050 from smmir-cent/fix-cmake-version 7dd833d54 CI: bump macos runner version f69fac769 Drop `const` on redisAsyncContext in redisConnectCallback Since the callback is now re-entrant, it can call apis such as redisAsyncDisconnect() 005d7edeb Support calling redisAsyncDisconnect from the onConnected callback, by deferring context deletion 6ed060920 Add async regression test for issue redis#931 eaa2a7ee7 Merge pull request redis#932 from kristjanvalur/pr3 2ccef30f3 Add regression test for issue redis#945 4b901d44a Initial async tests 31c91408e Polling adapter and example 8a15f4d65 Merge pull request redis#1057 from orgads/static-name 902dd047f Merge pull request redis#1054 from kristjanvalur/pr08 c78d0926b Merge pull request redis#1074 from michael-grunder/kristjanvalur-pr4 2b115d56c Whitespace 1343988ce Fix typos 47b57aa24 Add some documentation on connect/disconnect callbacks and command callbacks a890d9ce2 Merge pull request redis#1073 from michael-grunder/kristjanvalur-pr1 f246ee433 Whitespace, style 94c1985bd Use correct type for getsockopt() 5e002bc21 Support failed async connects on windows. 5d68ad2f4 Merge pull request redis#1072 from michael-grunder/fix-redis7-unit-tests f4b6ed289 Fix tests so they work for Redis 7.0 95a0c1283 Merge pull request redis#1058 from orgads/win64 eedb37a65 Fix warnings on Win64 47c3ecefc Merge pull request redis#1062 from yossigo/fix-push-notification-order e23d91c97 Merge pull request redis#1061 from yossigo/update-redis-apt 34211ad54 Merge pull request redis#1063 from redis/fix-windows-tests 9957af7e3 Whitelist hiredis repo path in cygwin b455b3381 Handle push notifications before or after reply. aed9ce446 Use official repository for redis package. d7683f35a Merge pull request redis#1047 from Nordix/unsubscribe-handling 7c44a9d7e Merge pull request redis#1045 from Nordix/sds-updates dd4bf9783 Use the same name for static and shared libraries ff57c18b9 Embed debug information in windows static lib, rather than create a .pdb file 8310ad4f5 fix cmake version 7123b87f6 Handle any pipelined unsubscribe in async b6fb548fc Ignore pubsub replies without a channel/pattern 00b82683b Handle overflows as errors instead of asserting 64062a1d4 Catch size_t overflows in sds.c 066c6de79 Use size_t/long to avoid truncation c6657ef65 Merge branch 'redis:master' into master 50cdcab49 Fix potential fault at createDoubleObject fd033e983 Remove semicolon after do-while in _EL_CLEANUP 664c415e7 Illumos test fixes, error message difference fot bad hostname test. git-subtree-dir: deps/hiredis git-subtree-split: b6a052fe0959dae69e16b9d74449faeb1b70dbe1
On synchronizing with master db got errors: http://pastebin.com/vwb5aTS1
Reproduced on sync with master db under load (lua scripts and ~ 100 queries per second).
The text was updated successfully, but these errors were encountered: