Skip to content

Improve ops orchestrator shutdown responsiveness, diagnostics, and tests#252

Merged
noog6 merged 1 commit intomainfrom
codex/refactor-shutdown-process-in-ops_orchestrator
Feb 19, 2026
Merged

Improve ops orchestrator shutdown responsiveness, diagnostics, and tests#252
noog6 merged 1 commit intomainfrom
codex/refactor-shutdown-process-in-ops_orchestrator

Conversation

@noog6
Copy link
Copy Markdown
Owner

@noog6 noog6 commented Feb 19, 2026

Motivation

  • Make orchestrator shutdown more responsive under blocked probes or lock contention by waking the loop immediately after requesting stop.
  • Reduce contention during probe/tick so shutdown metadata capture can proceed without being blocked by long-critical sections.
  • Improve observability of shutdown progress so callers can tell if the loop thread is still alive and emit one final heartbeat when the loop observes the stop request.
  • Add tests to cover retry-after-wake join behavior and final-heartbeat emission.

Description

  • Added a lightweight loop wake event _loop_wake_event and a shutdown_requested_at marker to OpsOrchestrator and updated start_loop/stop_loop to set the marker, set stop_event, and explicitly wake the loop before joining.
  • Implemented a retry join: after the first bounded join times out the code will explicitly wake the loop and yield (time.sleep(0)) then attempt a grace join.
  • Reduced lock hold time in _tick by constructing the HealthSnapshot while holding the lock but emitting the snapshot after releasing the lock to avoid long critical sections.
  • Added _wake_loop and _emit_shutdown_heartbeat helpers so the loop emits one final heartbeat/log when it observes stop_event.
  • Enhanced main.py shutdown handling to include whether the loop thread is still alive (is_loop_alive()) in the shutdown warning and to emit explicit follow-up guidance when stop_loop returns timed_out.
  • Extended and updated unit tests in tests/ covering timeout/warning behavior, lock contention handling, explicit wake + retry behavior, and final heartbeat emission, and adjusted tests/test_main_ops_shutdown_warning.py to assert the new logging behavior.

Testing

  • Performed environment setup command: python -m pip install pyyaml -q which completed successfully.
  • Ran targeted unit tests: pytest -q tests/test_ops_orchestrator_shutdown.py tests/test_main_ops_shutdown_warning.py and observed 12 passed in 0.59s.
  • All modified and related shutdown tests passed locally.

Codex Task

@noog6 noog6 merged commit 761a7a2 into main Feb 19, 2026
@noog6 noog6 deleted the codex/refactor-shutdown-process-in-ops_orchestrator branch February 19, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant