Roll back config handler on persistence failure (#469.2)#495
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughRefactors /server/config/set to record pre-mutation value, attempt disk persistence, roll back the in-memory value on ChangesConfig set handler persistence with rollback
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGELOG.md`:
- Around line 3-8: Add a new top-level versioned release entry in CHANGELOG.md
above the existing [Unreleased] section using semantic versioning (e.g., X.Y.Z)
and a YYYY-MM-DD date, and move the current Unreleased bullet ("Config handler
persistence atomicity — /server/config/set ...") into the Fixed section of that
new release; ensure the new release includes the standard sections: Changed,
Added, Fixed, Removed, Security (even if some are empty), and leave the existing
[Unreleased] header in place for future work.
In `@src/python/web/handler/config.py`:
- Around line 81-91: The rollback currently always restores old_value on
OSError, which can overwrite a newer concurrent update; change the except
OSError handler so it first reads the current value (e.g., current =
getattr(inner_config, key)) and only call inner_config.set_property(key,
old_value) if current == value (i.e., the in-memory value still equals the
failed-to-persist value); otherwise skip the rollback, log that a newer value is
present, and still return the 500 response. Reference inner_config.set_property,
getattr(inner_config, key), self.__config.to_file, self.__config_path and
self.__logger when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b8f92149-8808-485a-88c8-c3a123b352f3
📒 Files selected for processing (3)
CHANGELOG.mdsrc/python/tests/integration/test_web/test_handler/test_config.pysrc/python/web/handler/config.py
CodeRabbit follow-up on #495. The unconditional rollback could clobber a newer value if a concurrent request set_property'd between our own set_property and the OSError. The except branch now reads the current in-memory value, compares it to the native form of what we set, and only restores old_value if they still match. Otherwise it logs that a newer value is present and still returns 500 — the persistence failure is real, but the rollback is unsafe. Adds test_set_persistence_failure_skips_rollback_when_concurrent_update: to_file's side_effect simulates a concurrent write by mutating the field to "WARNING" before raising OSError; the test asserts the rollback didn't revert it to "INFO". Still TOCTOU between the equality check and the restore set_property, but strictly better than unconditional rollback. A proper fix needs a handler-level lock around set+persist, which is out of scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
set_property previously mutated the in-memory Config before attempting to_file, so a disk failure left the runtime ahead of the on-disk state and could fire the LFTP hot-reload callback on a value that never persisted. Now: capture the old native value, mutate, attempt the write, and on OSError restore the old value and return a structured HTTP 500. The LFTP callback only fires after a successful write. Capturing-and-rolling-back is preferred over copy-then-swap because the Config instance is shared with the Controller via Context — a swap on the handler's reference wouldn't propagate to the rest of the app. Adds three integration tests covering the failure rollback, the no-callback guarantee on a hot-reload key, and a positive baseline that the callback still fires on successful writes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit follow-up on #495. The unconditional rollback could clobber a newer value if a concurrent request set_property'd between our own set_property and the OSError. The except branch now reads the current in-memory value, compares it to the native form of what we set, and only restores old_value if they still match. Otherwise it logs that a newer value is present and still returns 500 — the persistence failure is real, but the rollback is unsafe. Adds test_set_persistence_failure_skips_rollback_when_concurrent_update: to_file's side_effect simulates a concurrent write by mutating the field to "WARNING" before raising OSError; the test asserts the rollback didn't revert it to "INFO". Still TOCTOU between the equality check and the restore set_property, but strictly better than unconditional rollback. A proper fix needs a handler-level lock around set+persist, which is out of scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
efd0876 to
8d107e8
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/python/web/handler/config.py (1)
81-105:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftSerialize all config writes; this key-local rollback is still unsafe.
Because
self.__config.to_file(...)persists the entire shared config, the guard at Lines 95-97 only covers “the same key changed.” A second request can successfully persist some other key while this request’s value is still present in memory, which also writes this request’sset_nativevalue to disk; if thisexceptblock then restoresold_value, memory falls behind disk again. There is also still a read-check-write race between Lines 95 and 97. This flow needs a lock shared by every writer of the sameConfigaround mutate →to_file→ rollback/callback, not another point-in-time equality check.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/python/web/handler/config.py` around lines 81 - 105, The current rollback is unsafe because self.__config.to_file persists the whole shared config and the point-in-time equality check can miss concurrent writer races; fix by serializing writers with a shared lock around the entire mutate→persist→rollback sequence: acquire a per-Config writer lock before calling inner_config.set_property(key, value), keep the lock while calling self.__config.to_file(self.__config_path), and only perform the rollback (inner_config.set_property(key, old_value)) while still holding that same lock; add or reuse a lock attribute (e.g., self.__config_lock or attach a lock to self.__config) and use it in the try/except that currently surrounds set_property and to_file, and log via self.__logger as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/python/web/handler/config.py`:
- Around line 81-105: The current rollback is unsafe because
self.__config.to_file persists the whole shared config and the point-in-time
equality check can miss concurrent writer races; fix by serializing writers with
a shared lock around the entire mutate→persist→rollback sequence: acquire a
per-Config writer lock before calling inner_config.set_property(key, value),
keep the lock while calling self.__config.to_file(self.__config_path), and only
perform the rollback (inner_config.set_property(key, old_value)) while still
holding that same lock; add or reuse a lock attribute (e.g., self.__config_lock
or attach a lock to self.__config) and use it in the try/except that currently
surrounds set_property and to_file, and log via self.__logger as before.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4634718e-c3fa-4c1b-8d11-b4cafcdb0f4d
📒 Files selected for processing (2)
src/python/tests/integration/test_web/test_handler/test_config.pysrc/python/web/handler/config.py
ruff 0.15.13 collapses the multi-line .get() call back onto one line (112 chars, under the 120 limit). Fix the format-check CI failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit follow-up on #495. The point-in-time equality check from the previous round was racy on two axes: 1. TOCTOU between the post-failure getattr and the rollback set_property. 2. self.__config.to_file persists the whole shared config, so a concurrent writer on a different section could have half-applied state captured into the file regardless of the equality check. Replace the check with a threading.Lock acquired before set_property and released after rollback (or success). Inside the lock no other writer can interleave, so the rollback is unconditional again. Drops the simulated-concurrent-write test (the scenario can't happen with the lock held) and adds test_set_serializes_concurrent_writers, which spawns two real threads: writer A blocks inside to_file on an Event, writer B issues its set request and must wait for A's rollback to complete before its own mutate→persist runs. The final state matches B's value, proving the lock fully serialized them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…4)
Both handlers previously let OSError from to_file bubble up as an
unhandled HTTP 500 with a stack trace. They now catch OSError, log via
logger.exception, and return a controlled 500 with a structured body.
Auto-queue endpoints return text/plain "Failed to persist auto-queue".
Path-pairs endpoints return JSON {"error": "failed to persist path
pairs"}; the persistence logic is extracted into a small helper so the
three CRUD handlers share the same error path.
Deliberately does not roll back the in-memory mutation. The issue
explicitly accepts the in-memory/disk divergence on these endpoints —
the goal is to convert a stack trace into a recoverable client error
the UI can react to. (Contrast with #495, where rollback was needed
because the alternative was the LFTP runtime hot-reloading to a value
that never made it to disk.) For auto-queue specifically, rollback is
also undesirable because add_pattern fires a listener side-effect that
a remove_pattern rollback wouldn't undo.
Adds five integration tests: two for auto-queue add/remove, three for
path-pairs create/update/delete. Each patches to_file to raise OSError
and asserts the 500 response with the expected body.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Addresses item #2 in #469.
`/server/config/set` previously mutated the in-memory `Config` before attempting `to_file`. If the write failed (disk full, permissions, …), the in-memory state ended up ahead of the on-disk state, and the LFTP hot-reload callback could fire on a value that never persisted — so after a restart the runtime would silently revert to the on-disk value, contradicting what the UI showed.
The handler now:
Deviation from the issue's suggested approach
The original write-up suggested a copy-then-write-then-swap pattern. That doesn't actually work in this codebase because the `Config` instance is shared with the `Controller` via `Context` — swapping the handler's reference wouldn't propagate to the rest of the app, and the LFTP runtime would still read the old config. Capture-and-rollback is the equivalent semantically and works with the existing aliasing. Called out in the commit message.
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit