New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REA-2664] Sequence of ALTER and DROP on a table causes us to get stuck in a resnapshotting loop #106
Comments
@gvsg-rs yes, this is that ticket. I created a fix, but after additional testing I found that there are other similar issues that my fix didn’t cover. Current status is that I would like to try to find a better fix, but I am not actively working on it. |
@nickelization — Is this the ticket that you wanted to do more analysis for a better solution? In either case, can you update the status of this ticket? |
<~accountid:609236b4b9ac3a007151a40b> yeah that seems like an okay solution I guess. I don’t love it because it seems like it increases the odds of the https://readysettech.atlassian.net/browse/ENG-2785 fix failing to do its job, but I guess if that fix isn’t guaranteed to avoid us getting stuck anyway, then updating the replication offset when tables have been deleted avoids doesn’t really make the situation all that much worse. Certainly having a small chance of getting temporarily stuck in the catch-up phase is better than definitely getting stuck in a snapshot loop forever. |
Maybe “update the schema replication offset if any tables no longer exist” is the better solution here? |
This was a tricky one, but I’m pretty sure I’ve figured out what’s going on. Looks like this is a relatively recent regression caused by https://gerrit.readyset.name/c/readyset//4637. (Though, since the vertical DDL tests caught this, the fact that it slipped past us is at least partly my fault for not integrating the vertical DDL tests into CI yet.) The trouble is that when we drop the So then we try to re-apply the same DDL event again, and end up returning Not sure what the right fix is here, and my brain is pretty fried from debugging this, so I’m going to step away for a while. But <accountid:609236b4b9ac3a007151a40b> <accountid:609236f3d3538000687db999> if either of you have any suggestions on how to approach this I would love to hear any ideas you might have 🙂 |
This was fixed by reverting a prior commit that was found to have introduced this as a regression - see a0fe5ae |
This was fixed some time ago when we fixed #106 by reverting the following commit: 670413429 (replicators: Skip catch up if ReadySet did not snapshot any table., 2023-03-30) in this revert commit here: 0e8746ac3 (Revert "replicators: Skip catch up if ReadySet did not snapshot any table.", 2023-07-10) But I neglected to also enable the test after the revert was merged. Since the test now passes, I'm removing the ignore. Refs: #106 Refs: REA-2664 Refs: REA-2437 Change-Id: I2ff061f1deeaab65c862c5140f497c60e02fb9b2
This was fixed some time ago when we fixed #106 by reverting the following commit: 670413429 (replicators: Skip catch up if ReadySet did not snapshot any table., 2023-03-30) in this revert commit here: 0e8746ac3 (Revert "replicators: Skip catch up if ReadySet did not snapshot any table.", 2023-07-10) But I neglected to also enable the test after the revert was merged. Since the test now passes, I'm removing the ignore. Refs: #106 Refs: REA-2664 Refs: REA-2437 Change-Id: I2ff061f1deeaab65c862c5140f497c60e02fb9b2 Reviewed-on: https://gerrit.readyset.name/c/readyset/+/5881 Tested-by: Buildkite CI Reviewed-by: Ethan Donowitz <ethan@readyset.io>
The vertical DDL test found a failure, with a minimal failing case that looks like this:
From SyncLinear.com | REA-2664
The text was updated successfully, but these errors were encountered: