Fix eventual consistency bug#21
Merged
nerdsane merged 1 commit intonerdsane:mainfrom Apr 2, 2026
Merged
Conversation
When ReplicatedShardActor applied a remote delta, it first merged the delta into replica_state and then updated the executor from the incoming delta payload. That meant a stale delta could lose the merge in replica_state but still overwrite the executor, so subsequent reads returned the wrong value. Fix this by reloading the merged value from replica_state after apply_remote_delta and projecting that merged value into the executor. This preserves the existing CRDT and LWW merge semantics, keeps read behavior aligned with replicated state, and is safe because it only changes which already-computed value is mirrored into the executor; it does not change message ordering, conflict resolution, or local write generation.
nerdsane
reviewed
Apr 2, 2026
Owner
There was a problem hiding this comment.
Thanks for this catch, @carlsverre! The bug is confirmed — apply_remote_delta_impl uses the incoming delta value instead of the CRDT-merged result from replica_state, causing reads to return stale data when a lower-timestamp delta arrives.
I'll address the remaining items and merge this myself:
- Fix all branches (scalar, hash fields + tombstones, expiry) to read from merged state
- Fix the same pattern in
multi_node.rssimulation code - Add TigerStyle postcondition (executor-replica_state consistency assertion)
- Add regression test for stale delta scenario
- Rebuild and re-run Maelstrom linearizability tests
Thanks for reading the blog, and doing a deep-dive. The self-heal loop continues. 🔧
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is an AI-generated change audited by me (a human). The bug causes a replica to permanently fall out of sync and not converge. I found this while playing with this code after reading this fascinating blog post: https://www.datadoghq.com/blog/ai/harness-first-agents/
AI bug description below:
When ReplicatedShardActor applied a remote delta, it first merged the delta into replica_state and then updated the executor from the incoming delta payload. That meant a stale delta could lose the merge in replica_state but still overwrite the executor, so subsequent reads returned the wrong value.
Fix this by reloading the merged value from replica_state after apply_remote_delta and projecting that merged value into the executor. This preserves the existing CRDT and LWW merge semantics, keeps read behavior aligned with replicated state, and is safe because it only changes which already-computed value is mirrored into the executor; it does not change message ordering, conflict resolution, or local write generation.