fix(dispatcher): gate scheduler by execution_mode (no PSTN dial for Daily leads)#773
Conversation
…mode
Daily-mode leads (DAILY / DAILY_TEST / DAILY_STREAM) were being placed on
the dispatch ZSET and picked up by workers, which then dialled via the
telephony provider (Plivo/Twilio) — producing phantom PSTN calls for
what should have been web-mode Daily room joins.
Root cause: three ingest paths only gated `schedule_lead` on
`next_attempt_at is not None`. The reconciler SQL query already filtered
to `('TELEPHONY', 'TELEPHONY_TEST')` — the ingest paths didn't match it.
Fix:
- New `DISPATCHABLE_EXECUTION_MODES` + `is_dispatchable()` helper in
dispatch/queue.py, mirroring the reconciler's WHERE clause.
- Gate the three ingest sites: lead-create handler, /dispatch-now
endpoint (400 if non-dispatchable), retry path in handle_call_completion.
- Defensive backstop in Worker._dispatch: drop with error log if a
non-dispatchable mode somehow reaches the worker.
All 5 schedule layers now match the same filter. Adding a new
dispatchable mode in future = change one constant.
Tests:
- test_is_dispatchable_telephony_modes_only: pins which modes pass.
- test_daily_lead_reaching_worker_is_dropped: regression for the
reported bug — confirms a Daily lead on the ready list does not
trigger make_call, channel acquire, or lock acquire.
94 passed, 1 xfailed; pyrefly 0 errors; black/isort clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
WalkthroughThis PR implements execution-mode-based dispatch filtering for Breeze Buddy's Plane 2 telephony system. A new dispatch-eligibility contract ( ChangesDispatch Eligibility by Execution Mode
🎯 2 (Simple) | ⏱️ ~12 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR fixes a dispatcher bug where Daily/web-mode leads could be enqueued into the Redis dispatch pipeline and ultimately dialed via PSTN, by introducing a single execution-mode gate and applying it across ingress and worker layers.
Changes:
- Adds
is_dispatchable()+DISPATCHABLE_EXECUTION_MODES(TELEPHONY + TELEPHONY_TEST only) as a shared source of truth. - Gates lead scheduling in the lead-create handler,
/dispatch-now, and retry scheduling; adds a worker-side defensive drop. - Adds regression tests to ensure non-telephony execution modes never reach the PSTN
make_call()path.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/breeze_buddy/dispatch/test_queue.py | Adds unit test for is_dispatchable() filtering to telephony modes only. |
| tests/breeze_buddy/dispatch/test_end_to_end.py | Adds regression test ensuring a DAILY lead on the ready list is dropped before channel acquisition / dialing. |
| app/api/routers/breeze_buddy/leads/handlers.py | Gates initial scheduling and /dispatch-now to dispatchable execution modes only. |
| app/ai/voice/agents/breeze_buddy/managers/calls.py | Prevents retry ZADDs for non-dispatchable execution modes. |
| app/ai/voice/agents/breeze_buddy/dispatch/worker.py | Adds worker backstop to drop non-dispatchable leads before dialing. |
| app/ai/voice/agents/breeze_buddy/dispatch/queue.py | Introduces DISPATCHABLE_EXECUTION_MODES and is_dispatchable() helper. |
| app/ai/voice/agents/breeze_buddy/dispatch/init.py | Re-exports the new helper/constant for package consumers. |
| if not is_dispatchable(lead.execution_mode): | ||
| # Defensive backstop. The ingest paths (handler, retry, | ||
| # dispatch-now) and the reconciler query all filter non- | ||
| # dispatchable modes out, so this should never fire. If it | ||
| # does, drop the lead with a loud log rather than dialling |
| lead_execution_mode = req.execution_mode or ExecutionMode.TELEPHONY | ||
| if next_attempt_at is not None and is_dispatchable(lead_execution_mode): | ||
| await schedule_lead(lead_id=uuid, next_attempt_at=next_attempt_at) |
Defence-in-depth follow-up to PR #773. If a worker crashes between RPUSH-processing and LREM-processing while holding a DAILY (or any non-dispatchable mode) lead, the reaper currently re-ZADDs it onto SCHEDULE_ZSET unconditionally — creating a wasted promote -> worker- drop cycle once per reaper tick. The worker's defensive backstop in _dispatch already drops such leads without dialling (PR #773), so there's no phantom PSTN call. But the reaper shouldn't keep putting them back on the schedule. This change mirrors the worker's gate inside reap_stuck_processing_lists: clean the tracking entry (LREM) regardless, but only re-ZADD when the lead is still dispatchable. Frequency analysis: expected near-zero occurrence (requires a Daily lead to bypass all 5 ingest gates AND a worker crash inside the ~50ms pre-check window). Fix is for invariant consistency and future- regression hardening, not for an observed incident. Tests: - test_reaper_does_not_reschedule_non_dispatchable_lead: pins the new behaviour — a stranded DAILY lead is LREM'd but NOT re-ZADD'd. - test_reaper_still_reschedules_dispatchable_lead: regression guard ensuring TELEPHONY crash-recovery still works. 96 passed, 1 xfailed; pyrefly 0 errors; black/isort clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defence-in-depth follow-up to PR #773. If a worker crashes between RPUSH-processing and LREM-processing while holding a DAILY (or any non-dispatchable mode) lead, the reaper currently re-ZADDs it onto SCHEDULE_ZSET unconditionally — creating a wasted promote -> worker- drop cycle once per reaper tick. The worker's defensive backstop in _dispatch already drops such leads without dialling (PR #773), so there's no phantom PSTN call. But the reaper shouldn't keep putting them back on the schedule. This change mirrors the worker's gate inside reap_stuck_processing_lists: clean the tracking entry (LREM) regardless, but only re-ZADD when the lead is still dispatchable. Frequency analysis: expected near-zero occurrence (requires a Daily lead to bypass all 5 ingest gates AND a worker crash inside the ~50ms pre-check window). Fix is for invariant consistency and future- regression hardening, not for an observed incident. Tests: - test_reaper_does_not_reschedule_non_dispatchable_lead: pins the new behaviour — a stranded DAILY lead is LREM'd but NOT re-ZADD'd. - test_reaper_still_reschedules_dispatchable_lead: regression guard ensuring TELEPHONY crash-recovery still works. 96 passed, 1 xfailed; pyrefly 0 errors; black/isort clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defence-in-depth follow-up to PR #773. If a worker crashes between RPUSH-processing and LREM-processing while holding a DAILY (or any non-dispatchable mode) lead, the reaper currently re-ZADDs it onto SCHEDULE_ZSET unconditionally — creating a wasted promote -> worker- drop cycle once per reaper tick. The worker's defensive backstop in _dispatch already drops such leads without dialling (PR #773), so there's no phantom PSTN call. But the reaper shouldn't keep putting them back on the schedule. This change mirrors the worker's gate inside reap_stuck_processing_lists: clean the tracking entry (LREM) regardless, but only re-ZADD when the lead is still dispatchable. Frequency analysis: expected near-zero occurrence (requires a Daily lead to bypass all 5 ingest gates AND a worker crash inside the ~50ms pre-check window). Fix is for invariant consistency and future- regression hardening, not for an observed incident. Tests: - test_reaper_does_not_reschedule_non_dispatchable_lead: pins the new behaviour — a stranded DAILY lead is LREM'd but NOT re-ZADD'd. - test_reaper_still_reschedules_dispatchable_lead: regression guard ensuring TELEPHONY crash-recovery still works. 96 passed, 1 xfailed; pyrefly 0 errors; black/isort clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Follow-up to PR #770. Fixes a real bug observed in testing: Daily-mode leads were getting scheduled by the dispatcher and dialled out via the telephony provider (Plivo / Twilio) — producing phantom PSTN calls for what should have been web-mode Daily room joins.
Root cause
PR #770's reconciler SQL already filtered to TELEPHONY only:
```sql
AND "execution_mode" IN ('TELEPHONY', 'TELEPHONY_TEST')
```
But the three ingest sites that put leads onto the schedule didn't have the same filter — they only checked
next_attempt_at is not None. So a Daily lead withnext_attempt_atset entered the dispatch ZSET, got promoted by the promoter, picked up by a worker, and dialled viamake_call()over PSTN.Fix — single source of truth
New helper in
dispatch/queue.py:```python
DISPATCHABLE_EXECUTION_MODES = frozenset(
{ExecutionMode.TELEPHONY, ExecutionMode.TELEPHONY_TEST}
)
def is_dispatchable(execution_mode: ExecutionMode) -> bool:
return execution_mode in DISPATCHABLE_EXECUTION_MODES
```
Now used by all 5 layers, matching the reconciler's filter exactly:
handlers.py:354— skip schedule if non-dispatchablehandlers.py:780— return 400 if non-dispatchablecalls.py:398— skip schedule if parent lead non-dispatchableworker.py:_dispatch— drop with error logWhy a follow-up PR
PR #770 had already been merged when this bug was reported. The Daily-mode fix went onto the now-stale branch via force-push, so it never reached
release. This PR cherry-picks just the Daily-fix delta (7 files, +128/-4 lines) onto a fresh branch offrelease.Tests
test_is_dispatchable_telephony_modes_onlytest_daily_lead_reaching_worker_is_droppedTest plan
next_attempt_atset; verify it does NOT appear inbb:schedule:leads(ZCARD doesn't increment)next_attempt_at— verify it dispatches as beforePOST /leads/{daily_lead_id}/dispatch-now— verify 400 response with clear errorPOST /leads/{telephony_lead_id}/dispatch-now— verify 200 as beforeQuality gates
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests