Conversation
|
Important Review skippedBot user detected. To trigger a single review, invoke the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| await lease_scope.before_lease_hook.wait() | ||
| # Safety timeout: prevent permanent deadlock if before_lease_hook | ||
| # was never set due to a race (e.g. conn_tg cancelled early). | ||
| with move_on_after(30) as timeout_scope: |
There was a problem hiding this comment.
I think a problem could arise here if 30 < the configure before_lease hook timeout in the exporter
There was a problem hiding this comment.
Good catch. You're right that the hardcoded 30-second timeout could conflict with the configured hook timeout (which defaults to 120s and is user-configurable).
That said, looking at the code flow more carefully, this timeout should never actually fire in practice with the primary fix in place:
run_before_lease_hook(which runs the actual hook) always callsbefore_lease_hook.set()in its ownfinallyblock (hooks.py line 623)- When there's no hook, the new
finallyblock at line 754-755 inhandle_lease()sets it before_cleanup_after_leaseis even called
So by the time _cleanup_after_lease reaches the wait(), the event should already be set. The 30-second timeout is purely defense-in-depth for unforeseen edge cases.
Still, I agree it would be better to either:
- Use the configured hook timeout (e.g.
self.hook_executor.config.before_lease.timeout) plus a margin, or - Use a longer default (e.g. 300s) since this is a last-resort safety net and not a normal code path
I'll update the timeout to be derived from the hook's configured timeout when a hook executor is present, falling back to a generous default otherwise.
Fix planAddressing @mangelajo's feedback and CI failures: 1. Dynamic safety timeout (addresses review feedback)The hardcoded 30-second safety timeout in
2. Add test coverage for new code paths (fixes CI coverage failure)The pytest-matrix checks are failing because the new lines (602-605, 608, 754-755) have 0% diff coverage. Will add tests covering:
3. type-check-python failureThis failure is in |
| # was never set due to a race (e.g. conn_tg cancelled early). | ||
| # Use the configured hook timeout (+ margin) when available so we | ||
| # never interrupt a legitimately-running beforeLease hook. | ||
| safety_timeout = 300 # generous default for no-hook / unknown cases |
There was a problem hiding this comment.
| safety_timeout = 300 # generous default for no-hook / unknown cases | |
| safety_timeout = 15 # generous default for no-hook / unknown cases |
For non hook I would even be more agressive, to match the server/client 30 sec connection timeouts.
There was a problem hiding this comment.
Applied -- changed the default to 15s. Good call on aligning it with the server/client connection timeouts. Also rebased onto main to pick up the type-check fix from #568.
|
This is failing coverage checks, and could benefit from a rebase (that fixes type checking linter) |
Fix plan (addressing latest feedback)1. Apply suggestion: lower default safety timeout to 15s@mangelajo's suggestion makes sense — the 300s default is only used when there's no hook executor configured. In the no-hook case, 2. Rebase onto mainThis will pick up the type-check fix for 3. CoverageThe existing tests cover the timeout logic but the coverage tool doesn't pick them up because they patch |
For exporters without a hook_executor, the before_lease_hook event is only set inside the conn_tg task group. When a lease ends quickly (before conn_tg reaches the set() call), wait_for_lease_end() cancels conn_tg, and before_lease_hook is never set. Then _cleanup_after_lease() blocks forever on before_lease_hook.wait() inside a shielded scope, which also prevents after_lease_hook_done from being set, causing serve() to hang permanently. Two fixes applied: 1. Primary: Set before_lease_hook unconditionally in the finally block of handle_lease(), before calling _cleanup_after_lease. 2. Defense-in-depth: Add a 30-second safety timeout with move_on_after in _cleanup_after_lease when waiting for before_lease_hook. Fixes #567 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address review feedback from @mangelajo: the hardcoded 30s safety timeout in _cleanup_after_lease could conflict with the user-configured hook timeout (default 120s). Now uses hook_timeout + 30s margin when a before_lease hook is configured, falling back to 300s otherwise. Add tests covering the safety timeout and finally-block code paths to fix the diff-coverage CI failure (lines 602-608, 754-755). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
540dafd to
01f109c
Compare
Add [tool.coverage.run] source = ["."] to the jumpstarter package's pyproject.toml so that coverage.xml records the correct source directory path. Without this, the <source> element in coverage.xml was empty, causing diff-cover to fail to match coverage data against git diff paths -- resulting in 0% reported coverage for all changed lines despite tests actually covering them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Coverage fix pushedThe Root cause: When Fix: Added With this fix, local diff-cover reports 80% coverage on Note: The CI workflow ( |
Summary
conn_tggets cancelled beforebefore_lease_hook.set()is reached for no-hook exporters, causing_cleanup_after_lease()to deadlock forever onbefore_lease_hook.wait()inside a shielded scopebefore_lease_hook.set()inhandle_lease()'sfinallyblock as primary fix_cleanup_after_lease()as defense-in-depth against future similar races — uses the configured hook timeout + 30s margin when a before_lease hook is present, or 15s default otherwise (addresses review feedback from @mangelajo)[tool.coverage.run] source = ["."]to the jumpstarter package'spyproject.tomlso thatcoverage.xmlrecords the correct source directory, enablingdiff-coverto match coverage data against git diff pathsFixes #567
Test plan
exporter_test.py,exporter_retry_test.py,lease_context_test.py,hooks_test.py)🤖 Generated with Claude Code