New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix:del keys in slot replicate to replica, and trigger other invalidations #11084
bugfix:del keys in slot replicate to replica, and trigger other invalidations #11084
Conversation
Close #10967 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one concern about this change. Which is that the replica may incorrectly report an empty dataset since it has received the deletes over replication but hasn't yet processed the clusterbus information about the slot ownership yet. On the master these two events are happening at the same time, so we don't need to worry about it. This is related to my position that we should be replicating slot ownership transitions so they are properly sequenced with the replicated data. I think a better fix would be to replicate some type of "slot flush + DEL" command.
It's also worth noting that this is really only possible during misuse or failure modes. The reproduction in the issue was forcing a cluster to take ownership of a slot instead of migrating it using the importing/migrating functionality.
So the master in this case will replicate a single besides that, all the other missing DEL related actions are still needed (on both the master and replica). |
thanks for your advise @judeng @madolson @oranagra @MeirShpilraien @oranagra clusterCommand is “CMD_ADMIN” command , so i think "CLUSTER SLOTFLUSH" maybe not better replicate to the replica |
@weim0000 i think we're still missing notifyKeyspaceEvent. Regarding CLUSTER being an admin command, i don't see a problem with that. But more importantly, i agree with @madolson that it's better that the slot ownership changes are better be properly sequenced on the data stream rather than relying on the cluster bus alone. |
@oranagra There is a scenario where the master node of the cluster has a replica in standalone mode,and it is strange to execute the |
I doubt if there really is a need to notify the keyspace events to users here, these keys already belong to dirty data that the user cannot access |
i don't think it's a viable configuration, for a cluster node to have a replica in standalone more.
I agree, sending keyspace notification about deletions will cause a similar damage as i mentioned above (the keys were not really deleted from the database, just moved), but i do think modules need to know these keys are no longer locally available, so i think we should send some keyspace notification just to modules. |
Yes, I don't think we should address the wider issue here, there is another issue #10517 which aims to address a bunch of these items, I would prefer us address it there. Any type of complex change would also be unwise to backport. Therefor, I think the current outlined approach is acceptable for now, since it's solving the data divergence issue. We can safely backport it to 7.0 and earlier, and we can continue on the long term approach. |
thanks for your advise @judeng @madolson @oranagra
i agree |
@weim0000 Cool, change looks gtm. Would you be comfortable adding a test around this case? Want to avoid future regressions here, since we clearly weren't testing for it. |
thanks @madolson i have added a test for this case, and update cluster.c with 'call propagatePendingCommands();'. |
thanks @enjoy-binbin i have already update the test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the delay (been busy).
generally LGTM with few minor comments resulting from another PR that just got merged.
didn't really review the test code in detail.
fb0323f
to
e781980
Compare
2. use replica replace of slave
thanks @oranagra |
@weim0000 I moved the test to the new framework, so now the tests will run as part of CI. Also kicked off a full test run here: https://github.com/redis/redis/actions/runs/2897153567. |
@weim0000 thank you. |
@oranagra OK, thanks. |
As discussed on redis#11084 (comment) `propagatePendingCommands` should happened after the del notification is fired so that the notification effect and the `del` will be replicated inside MULTI EXEC. Test was added to verify the fix.
…idations (redis#11084) Bugfix: with the scenario if we force assigned a slot to other master, old master will lose the slot ownership, then old master will call the function delKeysInSlot() to delete all keys which in the slot. These delete operations should replicate to replicas, avoid the data divergence issue in master and replicas. Additionally, in this case, we now call: * signalModifiedKey (to invalidate WATCH) * moduleNotifyKeyspaceEvent (key space notification for modules) * dirty++ (to signal that the persistence file may be outdated) Co-authored-by: weimeng <weimeng@didiglobal.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
…agation (redis#11377) As discussed on redis#11084, `propagatePendingCommands` should happened after the del notification is fired so that the notification effect and the `del` will be replicated inside MULTI EXEC. Test was added to verify the fix.
…idations (redis#11084) Bugfix: with the scenario if we force assigned a slot to other master, old master will lose the slot ownership, then old master will call the function delKeysInSlot() to delete all keys which in the slot. These delete operations should replicate to replicas, avoid the data divergence issue in master and replicas. Additionally, in this case, we now call: * signalModifiedKey (to invalidate WATCH) * moduleNotifyKeyspaceEvent (key space notification for modules) * dirty++ (to signal that the persistence file may be outdated) Co-authored-by: weimeng <weimeng@didiglobal.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
…agation (redis#11377) As discussed on redis#11084, `propagatePendingCommands` should happened after the del notification is fired so that the notification effect and the `del` will be replicated inside MULTI EXEC. Test was added to verify the fix.
…idations (redis#11084) Bugfix: with the scenario if we force assigned a slot to other master, old master will lose the slot ownership, then old master will call the function delKeysInSlot() to delete all keys which in the slot. These delete operations should replicate to replicas, avoid the data divergence issue in master and replicas. Additionally, in this case, we now call: * signalModifiedKey (to invalidate WATCH) * moduleNotifyKeyspaceEvent (key space notification for modules) * dirty++ (to signal that the persistence file may be outdated) Co-authored-by: weimeng <weimeng@didiglobal.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
…agation (redis#11377) As discussed on redis#11084, `propagatePendingCommands` should happened after the del notification is fired so that the notification effect and the `del` will be replicated inside MULTI EXEC. Test was added to verify the fix.
Fix some outdated comments and add comment for moduleNotifyKeyspaceEvent we added in redis#11084 since it seems a bit implicit.
Fix some outdated comments and add comment for moduleNotifyKeyspaceEvent we added in #11084 since it seems a bit implicit. --------- Co-authored-by: Oran Agra <oran@redislabs.com>
Bugfix:
with the scenario if we force assigned a slot to other master, old master will lose the slot ownership, then old master will call the function delKeysInSlot() to delete all keys which in the slot. These delete operations should replicate to replicas, avoid the data divergence issue in master and replicas.
Additionally, in this case, we now call:
Fixes #10967