Skip to content

Fix production panic regressions#943

Merged
zxch3n merged 16 commits intoloro-dev:mainfrom
zxch3n:codex/march-2026-panic-hardening
Apr 21, 2026
Merged

Fix production panic regressions#943
zxch3n merged 16 commits intoloro-dev:mainfrom
zxch3n:codex/march-2026-panic-hardening

Conversation

@zxch3n
Copy link
Copy Markdown
Member

@zxch3n zxch3n commented Apr 2, 2026

Summary

  • fix stale DAG causal iteration after lazy node splits so export/import no longer drop same-peer ranges
  • fix richtext IdToCursor fragment-tail updates and zero-width update handling
  • add a list import_batch() regression that exercises the same split-tracking path seen in production
  • recover document-owned locks after poisoning instead of cascading PoisonError panics
  • add unpoisoned lock helpers across core internals so ordinary Mutex users follow the same recovery rule

Verification

  • cargo test -p loro-internal stale_iterator_state_repairs_missing_same_peer_continuation -- --nocapture
  • cargo test -p loro-internal larger_node_only_advances_consumed_slice -- --nocapture
  • cargo test -p loro-internal repeated_tail_splits_keep_id_to_cursor_consistent -- --nocapture
  • cargo test -p loro-internal zero_width_small_update_keeps_insert_set_non_empty -- --nocapture
  • cargo test -p loro-internal large_update_can_replace_the_tail_range -- --nocapture
  • cargo test -p loro-internal list_import_batch_stays_consistent_after_repeated_tail_splits -- --nocapture
  • cargo test -p loro-internal poison -- --nocapture
  • cargo check -p loro-internal

Notes

  • the active ImVersionVector export panic still was not reproducible in this checkout; no speculative fix was included for that path
  • the FFI-side unpoison work lives in /Users/zxch3n/Code/loro-ffi on branch codex/unpoison-locks and is not part of this PR

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a28c207df2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/loro-internal/src/configure.rs Outdated
zxch3n and others added 7 commits April 2, 2026 16:32
…c-hardening

# Conflicts:
#	crates/loro-internal/src/handler.rs
Three production blobs in loro-debug reliably panic inside doc.import on
loro-crdt@1.10.6:

- DagCausalIter assertion at dag/iter.rs: when a peer has multiple node
  segments in the target span and the later segment's deps fall outside
  the span, both segments reach the initial stack with zero in-degree and
  LIFO pops the higher counter first. Fix: in DagCausalIter::new, after
  out_degrees and succ are built, synthesize per-peer ordering edges so
  the lower counter must drain before the higher one is released.

- OnceCell::set(..).unwrap() double-set at oplog/loro_dag.rs:917 during
  ensure_vv_for on a diamond dep (loro-dev#929): a shared ancestor
  gets pushed onto the iterative-DFS stack by multiple paths and the
  second pop tries to initialize an already-filled cell. Fix: skip nodes
  whose vv is already Some at the top of the loop and swallow the Err
  from the final set as a defensive measure.

- list_state Index-out-of-range panic for the mads-bootstrap fixture:
  the ListDiffCalculator cold-starts its RichtextTracker via
  new_with_unknown() and never learns about the snapshot's real list
  content. During replay the tracker's per-change checkouts temporarily
  retreat some snapshot ops, so a new op's fugue anchor lands inside the
  unknown prefix. CrdtRope::get_diff() then emits Retain(N) where N is
  larger than ListState.len(). Fix: change ContainerState::apply_diff
  (and DocState::apply_diff, init_with_states_and_version) to return
  LoroResult<()>; ListState::apply_diff pre-validates the delta against
  current length and returns LoroError::internal(..) on mismatch so
  doc.import surfaces the error instead of panicking. Root cause in the
  tracker cold-start path still needs follow-up.

Regression tests for all three live in crates/loro/tests/march_2026_panics.rs
with the captured production blobs in fixtures_march_2026/. Additional
targeted unit tests in dag/iter.rs and oplog/loro_dag.rs cover the
smallest synthetic DAG shapes that triggered each bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zxch3n zxch3n changed the title Fix March 2026 production panic regressions Fix production panic regressions Apr 21, 2026
@zxch3n zxch3n merged commit 6f5b7a9 into loro-dev:main Apr 21, 2026
1 check passed
@zxch3n zxch3n deleted the codex/march-2026-panic-hardening branch April 21, 2026 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant