Skip to content

fix(dispatcher): gate scheduler by execution_mode (no PSTN dial for Daily leads)#773

Merged
swaroopvarma1 merged 1 commit into
releasefrom
fix/dispatcher-daily-mode-gate
May 21, 2026
Merged

fix(dispatcher): gate scheduler by execution_mode (no PSTN dial for Daily leads)#773
swaroopvarma1 merged 1 commit into
releasefrom
fix/dispatcher-daily-mode-gate

Conversation

@swaroopvarma1
Copy link
Copy Markdown
Collaborator

@swaroopvarma1 swaroopvarma1 commented May 20, 2026

Summary

Follow-up to PR #770. Fixes a real bug observed in testing: Daily-mode leads were getting scheduled by the dispatcher and dialled out via the telephony provider (Plivo / Twilio) — producing phantom PSTN calls for what should have been web-mode Daily room joins.

Root cause

PR #770's reconciler SQL already filtered to TELEPHONY only:
```sql
AND "execution_mode" IN ('TELEPHONY', 'TELEPHONY_TEST')
```

But the three ingest sites that put leads onto the schedule didn't have the same filter — they only checked next_attempt_at is not None. So a Daily lead with next_attempt_at set entered the dispatch ZSET, got promoted by the promoter, picked up by a worker, and dialled via make_call() over PSTN.

Fix — single source of truth

New helper in dispatch/queue.py:
```python
DISPATCHABLE_EXECUTION_MODES = frozenset(
{ExecutionMode.TELEPHONY, ExecutionMode.TELEPHONY_TEST}
)

def is_dispatchable(execution_mode: ExecutionMode) -> bool:
return execution_mode in DISPATCHABLE_EXECUTION_MODES
```

Now used by all 5 layers, matching the reconciler's filter exactly:

Layer Site
Lead-create handler handlers.py:354 — skip schedule if non-dispatchable
/dispatch-now endpoint handlers.py:780 — return 400 if non-dispatchable
Retry creation calls.py:398 — skip schedule if parent lead non-dispatchable
Worker defensive backstop worker.py:_dispatch — drop with error log
Reconciler SQL unchanged (already correct)

Why a follow-up PR

PR #770 had already been merged when this bug was reported. The Daily-mode fix went onto the now-stale branch via force-push, so it never reached release. This PR cherry-picks just the Daily-fix delta (7 files, +128/-4 lines) onto a fresh branch off release.

Tests

Test Pins
test_is_dispatchable_telephony_modes_only TELEPHONY / TELEPHONY_TEST pass; DAILY / DAILY_TEST / DAILY_STREAM / HOLD_TRANSFER fail
test_daily_lead_reaching_worker_is_dropped Regression test — confirms a Daily lead placed directly on the ready list exits the worker before make_call / channel acquire / lock acquire

Test plan

  • CI green: black --check, isort --check, autoflake --check, pyrefly check, 1-commit count
  • Apply on staging
  • Create a DAILY lead with next_attempt_at set; verify it does NOT appear in bb:schedule:leads (ZCARD doesn't increment)
  • Verify no Plivo/Twilio dial happens for the Daily lead
  • Verify the existing Daily-mode flow (room creation + customer notification) still works
  • Create a TELEPHONY lead with next_attempt_at — verify it dispatches as before
  • Hit POST /leads/{daily_lead_id}/dispatch-now — verify 400 response with clear error
  • Hit POST /leads/{telephony_lead_id}/dispatch-now — verify 200 as before

Quality gates

Check Result
pytest 94 passed, 1 xfailed
pyrefly 0 errors
black / isort clean
Commit count 1

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Added validation to prevent non-telephony execution modes from being incorrectly routed through the dispatch system.
    • Improved error handling to drop ineligible leads before processing.
  • Tests

    • Added test coverage for execution mode filtering and dispatch validation logic.

Review Change Stack

…mode

Daily-mode leads (DAILY / DAILY_TEST / DAILY_STREAM) were being placed on
the dispatch ZSET and picked up by workers, which then dialled via the
telephony provider (Plivo/Twilio) — producing phantom PSTN calls for
what should have been web-mode Daily room joins.

Root cause: three ingest paths only gated `schedule_lead` on
`next_attempt_at is not None`. The reconciler SQL query already filtered
to `('TELEPHONY', 'TELEPHONY_TEST')` — the ingest paths didn't match it.

Fix:
- New `DISPATCHABLE_EXECUTION_MODES` + `is_dispatchable()` helper in
  dispatch/queue.py, mirroring the reconciler's WHERE clause.
- Gate the three ingest sites: lead-create handler, /dispatch-now
  endpoint (400 if non-dispatchable), retry path in handle_call_completion.
- Defensive backstop in Worker._dispatch: drop with error log if a
  non-dispatchable mode somehow reaches the worker.

All 5 schedule layers now match the same filter. Adding a new
dispatchable mode in future = change one constant.

Tests:
- test_is_dispatchable_telephony_modes_only: pins which modes pass.
- test_daily_lead_reaching_worker_is_dropped: regression for the
  reported bug — confirms a Daily lead on the ready list does not
  trigger make_call, channel acquire, or lock acquire.

94 passed, 1 xfailed; pyrefly 0 errors; black/isort clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 20, 2026 19:22
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b97d6c1c-dc79-4b95-82a2-b45dd6135b93

📥 Commits

Reviewing files that changed from the base of the PR and between a8de871 and 9d729a2.

📒 Files selected for processing (7)
  • app/ai/voice/agents/breeze_buddy/dispatch/__init__.py
  • app/ai/voice/agents/breeze_buddy/dispatch/queue.py
  • app/ai/voice/agents/breeze_buddy/dispatch/worker.py
  • app/ai/voice/agents/breeze_buddy/managers/calls.py
  • app/api/routers/breeze_buddy/leads/handlers.py
  • tests/breeze_buddy/dispatch/test_end_to_end.py
  • tests/breeze_buddy/dispatch/test_queue.py

Walkthrough

This PR implements execution-mode-based dispatch filtering for Breeze Buddy's Plane 2 telephony system. A new dispatch-eligibility contract (DISPATCHABLE_EXECUTION_MODES and is_dispatchable()) gates lead scheduling at all ingress points—worker, call manager, and API handlers—preventing non-telephony execution modes from entering the queue. Comprehensive unit and end-to-end tests validate the contract behavior and worker safety.

Changes

Dispatch Eligibility by Execution Mode

Layer / File(s) Summary
Dispatch eligibility contract and public exports
app/ai/voice/agents/breeze_buddy/dispatch/queue.py, app/ai/voice/agents/breeze_buddy/dispatch/__init__.py
DISPATCHABLE_EXECUTION_MODES frozenset defines telephony-eligible modes (TELEPHONY, TELEPHONY_TEST). is_dispatchable() returns whether an execution mode is eligible. Both are exported from the dispatch package.
Worker dispatch safety gate
app/ai/voice/agents/breeze_buddy/dispatch/worker.py
Worker imports is_dispatchable and adds defensive guard in _dispatch: if lead's execution mode is not dispatchable, log error and return early, preventing lock, defer, or telephony dial.
Call manager retry dispatch guard
app/ai/voice/agents/breeze_buddy/managers/calls.py
Call manager imports is_dispatchable and gates _retry_call to only schedule retry when is_dispatchable(lead.execution_mode) is true, skipping non-eligible modes.
API handlers execution-mode dispatch gates
app/api/routers/breeze_buddy/leads/handlers.py
push_lead_handler adds dispatchability check before scheduling; dispatch_now_lead_handler rejects non-dispatchable leads with HTTP 400 (telephony-only), both using is_dispatchable guard.
Test coverage for contract and integration
tests/breeze_buddy/dispatch/test_queue.py, tests/breeze_buddy/dispatch/test_end_to_end.py
Unit test validates is_dispatchable gates only telephony modes. End-to-end test verifies worker drops DAILY leads before channel token acquisition and telephony calls.

🎯 2 (Simple) | ⏱️ ~12 minutes

🐰 A guard at every gate,
No non-telephony in sight—
Dispatchable leads thrive,
While others sit tight!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding execution_mode gating to the dispatcher to prevent Daily-mode leads from being scheduled for PSTN dialing.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/dispatcher-daily-mode-gate

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a dispatcher bug where Daily/web-mode leads could be enqueued into the Redis dispatch pipeline and ultimately dialed via PSTN, by introducing a single execution-mode gate and applying it across ingress and worker layers.

Changes:

  • Adds is_dispatchable() + DISPATCHABLE_EXECUTION_MODES (TELEPHONY + TELEPHONY_TEST only) as a shared source of truth.
  • Gates lead scheduling in the lead-create handler, /dispatch-now, and retry scheduling; adds a worker-side defensive drop.
  • Adds regression tests to ensure non-telephony execution modes never reach the PSTN make_call() path.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/breeze_buddy/dispatch/test_queue.py Adds unit test for is_dispatchable() filtering to telephony modes only.
tests/breeze_buddy/dispatch/test_end_to_end.py Adds regression test ensuring a DAILY lead on the ready list is dropped before channel acquisition / dialing.
app/api/routers/breeze_buddy/leads/handlers.py Gates initial scheduling and /dispatch-now to dispatchable execution modes only.
app/ai/voice/agents/breeze_buddy/managers/calls.py Prevents retry ZADDs for non-dispatchable execution modes.
app/ai/voice/agents/breeze_buddy/dispatch/worker.py Adds worker backstop to drop non-dispatchable leads before dialing.
app/ai/voice/agents/breeze_buddy/dispatch/queue.py Introduces DISPATCHABLE_EXECUTION_MODES and is_dispatchable() helper.
app/ai/voice/agents/breeze_buddy/dispatch/init.py Re-exports the new helper/constant for package consumers.

Comment on lines +257 to +261
if not is_dispatchable(lead.execution_mode):
# Defensive backstop. The ingest paths (handler, retry,
# dispatch-now) and the reconciler query all filter non-
# dispatchable modes out, so this should never fire. If it
# does, drop the lead with a loud log rather than dialling
Comment on lines +359 to 361
lead_execution_mode = req.execution_mode or ExecutionMode.TELEPHONY
if next_attempt_at is not None and is_dispatchable(lead_execution_mode):
await schedule_lead(lead_id=uuid, next_attempt_at=next_attempt_at)
@swaroopvarma1 swaroopvarma1 merged commit f77fe65 into release May 21, 2026
11 checks passed
swaroopvarma1 pushed a commit that referenced this pull request May 21, 2026
Defence-in-depth follow-up to PR #773. If a worker crashes between
RPUSH-processing and LREM-processing while holding a DAILY (or any
non-dispatchable mode) lead, the reaper currently re-ZADDs it onto
SCHEDULE_ZSET unconditionally — creating a wasted promote -> worker-
drop cycle once per reaper tick.

The worker's defensive backstop in _dispatch already drops such leads
without dialling (PR #773), so there's no phantom PSTN call. But the
reaper shouldn't keep putting them back on the schedule. This change
mirrors the worker's gate inside reap_stuck_processing_lists: clean
the tracking entry (LREM) regardless, but only re-ZADD when the lead
is still dispatchable.

Frequency analysis: expected near-zero occurrence (requires a Daily
lead to bypass all 5 ingest gates AND a worker crash inside the ~50ms
pre-check window). Fix is for invariant consistency and future-
regression hardening, not for an observed incident.

Tests:
- test_reaper_does_not_reschedule_non_dispatchable_lead: pins the new
  behaviour — a stranded DAILY lead is LREM'd but NOT re-ZADD'd.
- test_reaper_still_reschedules_dispatchable_lead: regression guard
  ensuring TELEPHONY crash-recovery still works.

96 passed, 1 xfailed; pyrefly 0 errors; black/isort clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
swaroopvarma1 pushed a commit that referenced this pull request May 21, 2026
Defence-in-depth follow-up to PR #773. If a worker crashes between
RPUSH-processing and LREM-processing while holding a DAILY (or any
non-dispatchable mode) lead, the reaper currently re-ZADDs it onto
SCHEDULE_ZSET unconditionally — creating a wasted promote -> worker-
drop cycle once per reaper tick.

The worker's defensive backstop in _dispatch already drops such leads
without dialling (PR #773), so there's no phantom PSTN call. But the
reaper shouldn't keep putting them back on the schedule. This change
mirrors the worker's gate inside reap_stuck_processing_lists: clean
the tracking entry (LREM) regardless, but only re-ZADD when the lead
is still dispatchable.

Frequency analysis: expected near-zero occurrence (requires a Daily
lead to bypass all 5 ingest gates AND a worker crash inside the ~50ms
pre-check window). Fix is for invariant consistency and future-
regression hardening, not for an observed incident.

Tests:
- test_reaper_does_not_reschedule_non_dispatchable_lead: pins the new
  behaviour — a stranded DAILY lead is LREM'd but NOT re-ZADD'd.
- test_reaper_still_reschedules_dispatchable_lead: regression guard
  ensuring TELEPHONY crash-recovery still works.

96 passed, 1 xfailed; pyrefly 0 errors; black/isort clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
swaroopvarma1 pushed a commit that referenced this pull request May 21, 2026
Defence-in-depth follow-up to PR #773. If a worker crashes between
RPUSH-processing and LREM-processing while holding a DAILY (or any
non-dispatchable mode) lead, the reaper currently re-ZADDs it onto
SCHEDULE_ZSET unconditionally — creating a wasted promote -> worker-
drop cycle once per reaper tick.

The worker's defensive backstop in _dispatch already drops such leads
without dialling (PR #773), so there's no phantom PSTN call. But the
reaper shouldn't keep putting them back on the schedule. This change
mirrors the worker's gate inside reap_stuck_processing_lists: clean
the tracking entry (LREM) regardless, but only re-ZADD when the lead
is still dispatchable.

Frequency analysis: expected near-zero occurrence (requires a Daily
lead to bypass all 5 ingest gates AND a worker crash inside the ~50ms
pre-check window). Fix is for invariant consistency and future-
regression hardening, not for an observed incident.

Tests:
- test_reaper_does_not_reschedule_non_dispatchable_lead: pins the new
  behaviour — a stranded DAILY lead is LREM'd but NOT re-ZADD'd.
- test_reaper_still_reschedules_dispatchable_lead: regression guard
  ensuring TELEPHONY crash-recovery still works.

96 passed, 1 xfailed; pyrefly 0 errors; black/isort clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants