New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor refactoring for rioConnRead and adding errno #9280
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I don't have a lot of context here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ewg-c the fix looks good to me, and it also simplifies the code.
indeed there's no reason to check for this case inside the loop if all the conditions leading to the error don't change in the loop and we only actually check the input arguments vs the startup state.
what i don't yet understand is:
- what was the problem with the original code, it seems that it would function correctly as well (and you should have got your desired
errno
). - this code is supposed to be dead code. the scenario leading for this
read_limit
to be used is only when the master is diskless and the replica is disk-based, and even then an attempt to read outside of the range should never happen since rdb.c knows how much it should read for each type it reads.
Thank you for the correction @oranagra. For 2. I did not get impression this is a "dead" code. I might be missing something, please let me know. We have seen it with full sync and diskless on both master and replica. I found this adjusted stack trace in my notes:
|
I'll also add the information that this is AWS, and we do use the configuration diskless on primary and disk based on the replica by default. |
@madolson @Ewg-c the configuration is perfectly valid, the test suite uses it too, but I still don't understand what's the problem this PR come to fix (other than a cleanup). Redis 6 was indeed buggy in that line, and that was fixed in #7557 (with an additional fix in #7564), so as far as I can tell, the current code that this PR come to change was OK. The other thing that bothers me is that this condition is suppose to be dead code, so I still don't understand how you run into it (consistently). rdb.c knows how many bytes to read for each type, and it should never ask to read more than there is in the rdb file. When it sees the EOF byte it stops. |
On a second look, maybe the PR title and top comment are clear that this is just a refactoring.. But I'm still curious how come you got to get this error.. Maybe some modification in your fork? |
Oh, yes, this is just a refactoring to help identify an issue we saw internally that we needed gdb to identify. AFAIK the issue was fixed in 6.2, (we maintain too many old versions) and we saw this on a 6.0 version. @Ewg-c can fill in that details if you are really curious. |
@oranagra sorry if the information on the ticket mislead you.
I posted the stack trace earlier and can add that the cause of the culprit is toread assignment. It does not take EOF into account and would attempt reading over the limit. This is where the condition in question is triggered. Personally I believe that everyone running Redis 6.0 would be affected, though it requires unlucky match of the values, it still should be happening regularly. I also think it was TLS enabled cluster. |
Ok, so it's not that rdb.c attempted to read beyond the limit, but that rioConnRead messed up and attempted to buffer more than it should. |
@oranagra
|
ohh, right.. i implemented it in redis 2.8, but couldn't get Salvatore to merge my PR until recently. |
minor refactoring for rioConnRead and adding errno
@Ewg-c i've edited the top comment to be used for the release notes, please review / fix (specifically what where the implications of the bug). |
@oranagra thank you. I updated the top comment. It should be good I believe. |
minor refactoring for rioConnRead and adding errno (cherry picked from commit a403816)
minor refactoring for rioConnRead and adding errno (cherry picked from commit a403816)
Redis 6.0 contained a bug when the master uses disk-based replication (repl-diskless-sync is no) and the replica is disk-less (repl-diskless-load is set to non-default value).
The bug could cause the rdb loading code in the replica to buffer too much data from the socket and fail the replication.
Due to da840e9 the error condition does not depend on the while loop where we read from socket. This change cleans up the code and extracts the condition outside the loop.
The change adds errno to "Failed trying to load the MASTER synchronization DB" error message in readSyncBulkPayload() to make debugging of the similar problems easier in the future.