Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluctuation of one of the slave_offset #5249

Open
jaki2012 opened this issue Aug 15, 2018 · 3 comments
Open

fluctuation of one of the slave_offset #5249

jaki2012 opened this issue Aug 15, 2018 · 3 comments

Comments

@jaki2012
Copy link

Some times, the info replication command behaves normally, but
frequently one of the slave(i.e., slave3) will return offset=0,
soon later it will become normally again, how could this happen?

Replication

role:master
connected_slaves:4
min_slaves_good_slaves:4
slave0:ip=10.200.20.37,port=16379,state=online,offset=12293126789,lag=1
slave1:ip=10.200.20.36,port=16379,state=online,offset=12293126789,lag=1
slave2:ip=10.200.20.58,port=46379,state=online,offset=12293126789,lag=1
slave3:ip=10.200.20.58,port=16379,state=online,offset=0,lag=0

@jaki2012
Copy link
Author

Part of the log from master-instance:
As you can see the connection is lost every 6-8 seconds.
I've checked the setting of repl_timeout, and output buffer but both are correct

33295:M 15 Aug 22:56:16.748 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:56:21.242 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:56:23.808 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:56:28.517 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:56:30.179 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:56:35.094 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:56:37.922 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:56:42.639 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:56:44.954 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:56:49.024 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:56:51.459 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:56:56.875 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:56:59.062 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:57:03.439 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:57:05.252 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:57:11.004 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:57:12.726 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:57:17.079 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:57:20.146 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:57:24.535 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:57:27.087 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:57:31.951 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:57:33.762 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:57:38.968 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:57:40.594 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:57:45.838 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:57:47.644 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:57:52.487 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:57:54.430 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:57:59.724 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:58:02.168 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:58:06.577 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:58:08.491 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:58:13.928 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:58:15.869 # Connection with slave 10.200.20.58:16379 lost.
33295:M 15 Aug 22:58:21.044 # Connection with slave 10.200.20.58:46379 lost.
33295:M 15 Aug 22:58:22.513 # Connection with slave 10.200.20.58:16379 lost.

@jaki2012
Copy link
Author

More detailedly, the complete log is attached
39693:M 15 Aug 23:34:45.342 - Client closed connection
39693:M 15 Aug 23:34:45.342 # Connection with slave 10.200.20.58:46379 lost.
39693:M 15 Aug 23:34:45.918 - Accepted 10.200.20.27:44148
39693:M 15 Aug 23:34:45.921 * Slave 10.200.20.58:46379 asks for synchronization
39693:M 15 Aug 23:34:45.921 . [PSYNC] Slave request offset: 41369
39693:M 15 Aug 23:34:45.921 . [PSYNC] Backlog size: 314572800
39693:M 15 Aug 23:34:45.921 . [PSYNC] First byte: 1
39693:M 15 Aug 23:34:45.921 . [PSYNC] History len: 41368
39693:M 15 Aug 23:34:45.921 . [PSYNC] Current index: 41368
39693:M 15 Aug 23:34:45.921 . [PSYNC] Skipping: 41368
39693:M 15 Aug 23:34:45.921 . [PSYNC] Index of first byte: 0
39693:M 15 Aug 23:34:45.921 . [PSYNC] Reply total length: 0
39693:M 15 Aug 23:34:45.921 * Partial resynchronization request from 10.200.20.58:46379 accepted. Sending 0 bytes of backlog starting from offset 41369.
39693:M 15 Aug 23:34:46.391 - Client closed connection
39693:M 15 Aug 23:34:46.391 # Connection with slave 10.200.20.58:16379 lost.
39693:M 15 Aug 23:34:46.692 - Client closed connection
39693:M 15 Aug 23:34:46.732 - Accepted 10.200.20.27:44162
39693:M 15 Aug 23:34:46.735 * Slave 10.200.20.58:16379 asks for synchronization
39693:M 15 Aug 23:34:46.735 . [PSYNC] Slave request offset: 41651
39693:M 15 Aug 23:34:46.735 . [PSYNC] Backlog size: 314572800
39693:M 15 Aug 23:34:46.735 . [PSYNC] First byte: 1
39693:M 15 Aug 23:34:46.735 . [PSYNC] History len: 41650
39693:M 15 Aug 23:34:46.735 . [PSYNC] Current index: 41650
39693:M 15 Aug 23:34:46.735 . [PSYNC] Skipping: 41650
39693:M 15 Aug 23:34:46.735 . [PSYNC] Index of first byte: 0
39693:M 15 Aug 23:34:46.735 . [PSYNC] Reply total length: 0
39693:M 15 Aug 23:34:46.735 * Partial resynchronization request from 10.200.20.58:16379 accepted. Sending 0 bytes of backlog starting from offset 41651.
39693:M 15 Aug 23:34:46.789 - Accepted 10.200.20.29:26684
39693:M 15 Aug 23:34:47.687 - Client closed connection
39693:M 15 Aug 23:34:47.697 - Accepted 10.200.20.29:26751
39693:M 15 Aug 23:34:48.783 - DB 0: 12431 keys (0 volatile) in 16384 slots HT.
39693:M 15 Aug 23:34:48.783 - 11 clients connected (4 slaves), 340013608 bytes in use
39693:M 15 Aug 23:34:48.794 - Client closed connection
39693:M 15 Aug 23:34:48.842 - Accepted 10.200.20.27:52216
39693:M 15 Aug 23:34:48.935 - Accepted 172.21.8.62:58777
39693:M 15 Aug 23:34:49.065 - Client closed connection
39693:M 15 Aug 23:34:50.095 - Client closed connection
39693:M 15 Aug 23:34:50.104 - Accepted 10.200.20.27:58797
39693:M 15 Aug 23:34:50.997 - Client closed connection
39693:M 15 Aug 23:34:51.055 - Accepted 10.200.20.29:29931
39693:M 15 Aug 23:34:51.196 - Client closed connection
39693:M 15 Aug 23:34:51.222 - Accepted 10.200.20.27:10001
39693:M 15 Aug 23:34:52.124 - Client closed connection
39693:M 15 Aug 23:34:52.124 # Connection with slave 10.200.20.58:46379 lost.
39693:M 15 Aug 23:34:52.896 - Client closed connection
39693:M 15 Aug 23:34:52.933 - Accepted 10.200.20.29:44125
39693:M 15 Aug 23:34:52.936 * Slave 10.200.20.58:46379 asks for synchronization
39693:M 15 Aug 23:34:52.936 . [PSYNC] Slave request offset: 43484
39693:M 15 Aug 23:34:52.936 . [PSYNC] Backlog size: 314572800
39693:M 15 Aug 23:34:52.936 . [PSYNC] First byte: 1
39693:M 15 Aug 23:34:52.936 . [PSYNC] History len: 43765
39693:M 15 Aug 23:34:52.936 . [PSYNC] Current index: 43765
39693:M 15 Aug 23:34:52.936 . [PSYNC] Skipping: 43483
39693:M 15 Aug 23:34:52.936 . [PSYNC] Index of first byte: 0
39693:M 15 Aug 23:34:52.936 . [PSYNC] Reply total length: 282
39693:M 15 Aug 23:34:52.936 . [PSYNC] addReply() length: 282
39693:M 15 Aug 23:34:52.937 * Partial resynchronization request from 10.200.20.58:46379 accepted. Sending 282 bytes of backlog starting from offset 43484.
39693:M 15 Aug 23:34:52.983 - Accepted 10.200.20.27:10023
39693:M 15 Aug 23:34:53.689 - Client closed connection
39693:M 15 Aug 23:34:53.689 # Connection with slave 10.200.20.58:16379 lost.
39693:M 15 Aug 23:34:53.746 - Accepted 10.200.20.29:44172
39693:M 15 Aug 23:34:53.749 * Slave 10.200.20.58:16379 asks for synchronization
39693:M 15 Aug 23:34:53.750 . [PSYNC] Slave request offset: 44189
39693:M 15 Aug 23:34:53.750 . [PSYNC] Backlog size: 314572800
39693:M 15 Aug 23:34:53.750 . [PSYNC] First byte: 1
39693:M 15 Aug 23:34:53.750 . [PSYNC] History len: 44188
39693:M 15 Aug 23:34:53.750 . [PSYNC] Current index: 44188
39693:M 15 Aug 23:34:53.750 . [PSYNC] Skipping: 44188
39693:M 15 Aug 23:34:53.750 . [PSYNC] Index of first byte: 0
39693:M 15 Aug 23:34:53.750 . [PSYNC] Reply total length: 0
39693:M 15 Aug 23:34:53.750 * Partial resynchronization request from 10.200.20.58:16379 accepted. Sending 0 bytes of backlog starting from offset 44189.
39693:M 15 Aug 23:34:53.790 - DB 0: 12431 keys (0 volatile) in 16384 slots HT.
39693:M 15 Aug 23:34:53.790 - 11 clients connected (4 slaves), 339890784 bytes in use
39693:M 15 Aug 23:34:53.937 - Accepted 172.21.8.62:58785
39693:M 15 Aug 23:34:54.071 - Client closed connection
39693:M 15 Aug 23:34:54.076 - Client closed connection
39693:M 15 Aug 23:34:54.118 - Accepted 10.200.20.27:10031
39693:M 15 Aug 23:34:55.514 - Client closed connection
39693:M 15 Aug 23:34:55.552 - Accepted 10.200.20.29:55026
39693:M 15 Aug 23:34:56.492 - Client closed connection
39693:M 15 Aug 23:34:56.591 - Accepted 10.200.20.27:14332
39693:M 15 Aug 23:34:57.182 - Client closed connection
39693:M 15 Aug 23:34:57.228 - Accepted 10.200.20.29:58604
39693:M 15 Aug 23:34:57.541 - Client closed connection
39693:M 15 Aug 23:34:57.600 - Accepted 10.200.20.27:14337
39693:M 15 Aug 23:34:58.796 - DB 0: 12431 keys (0 volatile) in 16384 slots HT.
39693:M 15 Aug 23:34:58.796 - 11 clients connected (4 slaves), 339931728 bytes in use
39693:M 15 Aug 23:34:58.980 - Accepted 172.21.8.62:58791
39693:M 15 Aug 23:34:59.126 - Client closed connection
39693:M 15 Aug 23:34:59.242 - Client closed connection
39693:M 15 Aug 23:34:59.311 - Accepted 10.200.20.27:14350
39693:M 15 Aug 23:34:59.459 - Client closed connection
39693:M 15 Aug 23:34:59.459 # Connection with slave 10.200.20.58:46379 lost.
39693:M 15 Aug 23:34:59.947 - Accepted 10.200.20.29:10007
39693:M 15 Aug 23:34:59.951 * Slave 10.200.20.58:46379 asks for synchronization
39693:M 15 Aug 23:34:59.951 . [PSYNC] Slave request offset: 46456
39693:M 15 Aug 23:34:59.951 . [PSYNC] Backlog size: 314572800
39693:M 15 Aug 23:34:59.951 . [PSYNC] First byte: 1
39693:M 15 Aug 23:34:59.951 . [PSYNC] History len: 46596
39693:M 15 Aug 23:34:59.951 . [PSYNC] Current index: 46596
39693:M 15 Aug 23:34:59.951 . [PSYNC] Skipping: 46455
39693:M 15 Aug 23:34:59.951 . [PSYNC] Index of first byte: 0
39693:M 15 Aug 23:34:59.951 . [PSYNC] Reply total length: 141
39693:M 15 Aug 23:34:59.951 . [PSYNC] addReply() length: 141
39693:M 15 Aug 23:34:59.951 * Partial resynchronization request from 10.200.20.58:46379 accepted. Sending 141 bytes of backlog starting from offset 46456.
39693:M 15 Aug 23:35:00.041 - Client closed connection
39693:M 15 Aug 23:35:00.041 # Connection with slave 10.200.20.58:16379 lost.
39693:M 15 Aug 23:35:00.757 - Accepted 10.200.20.29:10013
39693:M 15 Aug 23:35:00.765 * Slave 10.200.20.58:16379 asks for synchronization
39693:M 15 Aug 23:35:00.766 . [PSYNC] Slave request offset: 46597
39693:M 15 Aug 23:35:00.766 . [PSYNC] Backlog size: 314572800
39693:M 15 Aug 23:35:00.766 . [PSYNC] First byte: 1
39693:M 15 Aug 23:35:00.766 . [PSYNC] History len: 46878
39693:M 15 Aug 23:35:00.766 . [PSYNC] Current index: 46878
39693:M 15 Aug 23:35:00.766 . [PSYNC] Skipping: 46596
39693:M 15 Aug 23:35:00.766 . [PSYNC] Index of first byte: 0
39693:M 15 Aug 23:35:00.766 . [PSYNC] Reply total length: 282
39693:M 15 Aug 23:35:00.766 . [PSYNC] addReply() length: 282
39693:M 15 Aug 23:35:00.766 * Partial resynchronization request from 10.200.20.58:16379 accepted. Sending 282 bytes of backlog starting from offset 46597.
39693:M 15 Aug 23:35:00.847 - Client closed connection
39693:M 15 Aug 23:35:00.887 - Accepted 10.200.20.27:14376
39693:M 15 Aug 23:35:02.047 - Client closed connection
39693:M 15 Aug 23:35:02.096 - Accepted 10.200.20.29:10025

@shenlongxing
Copy link
Contributor

It seems the slave closes the connection periodically. Can you post the redis log of slave side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants