Skip to content

Fix: allow pre-SQL leads to convert via direct bypass#45

Merged
shaypal5 merged 2 commits intomainfrom
fix/direct-conversion-bypass
May 2, 2026
Merged

Fix: allow pre-SQL leads to convert via direct bypass#45
shaypal5 merged 2 commits intomainfrom
fix/direct-conversion-bypass

Conversation

@shaypal5
Copy link
Copy Markdown
Contributor

@shaypal5 shaypal5 commented May 1, 2026

Summary

  • Adds a rare direct conversion path for pre-SQL leads (mql, sal), breaking the deterministic is_sql=False → never converts invariant that leaked outcome information into features
  • Daily probability = standard ConversionHazard rate × _DIRECT_CONVERSION_DISCOUNT (0.01), latent-score-modulated — high enough to break the invariant, low enough that the funnel still matters (~5–12x lower non-SQL conversion rate vs SQL)
  • Fixes post-sim is_sql computation: uses state.sql_day is not None instead of also checking current_stage in _SQL_OR_DEEPER (which incorrectly flagged direct-converted leads as SQL since mark_converted() sets stage to closed_won)
  • Extends opportunity/customer/subscription creation to handle direct-converted non-SQL leads

Closes #44

Test plan

  • test_some_non_sql_leads_convert — with 2000 leads, at least one non-SQL lead converts
  • test_non_sql_conversion_rate_much_lower — non-SQL rate is >5x lower than SQL rate
  • test_direct_conversion_deterministic — same seed produces same labels
  • test_direct_converted_lead_has_opportunity — direct-converted leads get opportunity rows
  • test_direct_converted_lead_has_customer_and_subscription — direct-converted leads get customer + subscription rows
  • All 793 existing tests pass (2 updated: test_most_converted_leads_are_sql, test_non_sql_non_converted_leads_no_opportunity)
  • Lint + format clean

🤖 Generated with Claude Code

The simulation engine previously required leads to progress through the
full funnel (mql → sal → sql → … → negotiation → closed_won) before
conversion, making is_sql=False a deterministic predictor of non-
conversion. This leaked outcome information into features.

Add a direct conversion path: leads at pre-SQL stages (mql, sal) now
have a small daily probability of converting (1% of the standard
ConversionHazard rate, latent-score-modulated). Also fix the post-sim
is_sql computation to use sql_day only (not current_stage), since
mark_converted() sets stage to closed_won which was in _SQL_OR_DEEPER.

Closes #44

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 1, 2026 21:42
@shaypal5 shaypal5 added type: bugfix Fixes a bug layer: core core/ primitives (RNG, IDs, models, exceptions) layer: simulation simulation/ discrete-time engine labels May 1, 2026
@github-actions

This comment has been minimized.

Narrow mechanisms.conversion_hazard from Mechanism (ABC) to
ConversionHazard so mypy can see the daily_probability method.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a rare “direct conversion” bypass for pre-SQL leads to eliminate the deterministic is_sql=False → never converts leakage invariant, while keeping the normal funnel as the dominant conversion path.

Changes:

  • Introduces discounted direct conversion probability for mql/sal leads and adjusts is_sql derivation to rely solely on state.sql_day.
  • Extends post-simulation entity creation so direct-converted non-SQL leads still produce opportunity/customer/subscription rows.
  • Updates and adds tests to validate non-SQL conversions exist, remain much rarer than SQL conversions, and remain deterministic.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
leadforge/simulation/engine.py Adds direct conversion bypass, fixes is_sql computation, and creates opportunities for direct-converted leads.
tests/simulation/test_engine.py Updates prior SQL/conversion assertions and adds a new test class covering direct conversion behavior.
.agent-plan.md Documents the engine fix and the new/updated tests in the project plan.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 215 to 218
"proposal_sent",
"negotiation",
"closed_won",
}:
allows some non-SQL leads to convert too."""
result = _run_sim(n_leads=500, seed=42)
converted = [lead for lead in result.leads if lead.converted_within_90_days]
sql_converted = [lead for lead in converted if lead.is_sql]
Comment on lines +427 to +431
sql_rate = sum(lead.converted_within_90_days for lead in sql_leads) / len(sql_leads)
non_sql_rate = sum(lead.converted_within_90_days for lead in non_sql_leads) / len(
non_sql_leads
)
# Non-SQL rate should be at least 5x lower than SQL rate.
Comment on lines 401 to 403
converted_within_90_days=label,
conversion_timestamp=conv_ts,
)
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

pr-agent-context report:

This run includes unresolved review comments on PR #45 in repository https://github.com/leadforge-dev/leadforge

For each unresolved review comment, recommend one of: resolve as irrelevant, accept and implement
the recommended solution, open a separate issue and resolve as out-of-scope for this PR, accept and
implement a different solution, or resolve as already treated by the code.

After I reply with my decision per item, implement the accepted actions, resolve the corresponding
PR comments, and push all of these changes in a single commit.

# Copilot Comments

## COPILOT-1
Location: tests/simulation/test_engine.py:218
URL: https://github.com/leadforge-dev/leadforge/pull/45#discussion_r3175344690
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    `test_sql_leads_are_flagged` still treats `closed_won` as implying `is_sql`, but direct conversion explicitly allows `closed_won` for non-SQL leads. This assertion can fail as soon as a direct conversion occurs in this 100-lead sim. Consider removing `closed_won` from the stage set (or asserting the implication only for non-terminal SQL+ stages like `sql``negotiation`).

## COPILOT-2
Location: tests/simulation/test_engine.py:226
URL: https://github.com/leadforge-dev/leadforge/pull/45#discussion_r3175344707
Root author: copilot-pull-request-reviewer

Comment:
    This assertion divides by `len(converted)` without guarding against the case where no leads convert for this seed/config, which would raise `ZeroDivisionError`. Add an assertion that `converted` is non-empty (or choose a smaller invariant that doesn’t require division).

## COPILOT-3
Location: tests/simulation/test_engine.py:431
URL: https://github.com/leadforge-dev/leadforge/pull/45#discussion_r3175344721
Root author: copilot-pull-request-reviewer

Comment:
    If `sql_rate` happens to be 0.0, the check `non_sql_rate < sql_rate / 5` becomes `non_sql_rate < 0` and will always fail (or be meaningless if both are 0). Add an explicit assertion that there is at least one SQL conversion (or compare absolute rates with a minimum-count requirement).

## COPILOT-4
Location: leadforge/simulation/engine.py:403
URL: https://github.com/leadforge-dev/leadforge/pull/45#discussion_r3175344737
Status: outdated
Root author: copilot-pull-request-reviewer

Comment:
    The module docstring’s “Post-simulation entity creation” section still states that opportunities are created only for leads that reached `sql` or deeper, but the implementation now also creates opportunities for direct-converted pre-SQL leads. Update the docstring to match the new behavior to avoid misleading future readers.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 25234597720 attempt 1
Comment timestamp: 2026-05-01T21:47:47.747164+00:00
PR head commit: 56bbef3ae10b5bc4d508e4c2a7c4cae059ddc259

@shaypal5 shaypal5 merged commit a3a7e6a into main May 2, 2026
7 checks passed
@shaypal5 shaypal5 deleted the fix/direct-conversion-bypass branch May 2, 2026 08:20
shaypal5 added a commit that referenced this pull request May 3, 2026
…sson(3) boost

Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal.
Two tuning changes restore all validation metrics:

1. Shift SNAPSHOT_DAY from 14 to 20 — features need more accumulation
   time after engine changes; day-14 AUC was 0.611, day-20 is 0.676.

2. Poisson(3) trap boost for converted leads — the wider snapshot
   window leaves fewer post-snapshot days for causal trap signal,
   so a stronger boost compensates (mean delta 0.061, min 0.027).

All mandatory checks pass with comfortable margins. AUC threshold
kept at 0.62 (no relaxation needed).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 3, 2026
* fix: tune v6 pipeline for post-engine-change validation

Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal.
Two fixes:

1. Add Poisson(1) boost to the leakage trap column for converted leads
   (same approach proven in v5), restoring robust trap delta signal
   (mean 0.048, min 0.028 across 10 seeds).

2. Lower baseline AUC threshold from 0.62 to 0.60 — the engine changes
   reduced baseline LR AUC from 0.667 to 0.611, which is still well
   above chance and pedagogically useful. Snapshot day 10 was tested
   but made AUC worse (0.572), so day 14 is retained.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update .agent-plan.md with v6 retune validation results

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: tune v6 for post-engine-change validation — snapshot day 20, Poisson(3) boost

Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal.
Two tuning changes restore all validation metrics:

1. Shift SNAPSHOT_DAY from 14 to 20 — features need more accumulation
   time after engine changes; day-14 AUC was 0.611, day-20 is 0.676.

2. Poisson(3) trap boost for converted leads — the wider snapshot
   window leaves fewer post-snapshot days for causal trap signal,
   so a stronger boost compensates (mean delta 0.061, min 0.027).

All mandatory checks pass with comfortable margins. AUC threshold
kept at 0.62 (no relaxation needed).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update RELEASE_v6.md metrics for retuned pipeline

Snapshot day 14→20, updated all baseline AUC, trap delta, value-aware
ranking, and teaching guidance numbers to match retuned pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review — add boost tests, fix stale docstrings

COPILOT-1: Add 5 unit tests for boost_leakage_trap() covering:
only converted leads boosted, input immutability, determinism,
converted leads >= original, mean increases after boost.

COPILOT-2: Update build script docstring and inline comment to
reflect that the trap is no longer purely causal — it combines
causal post-snapshot touches with a Poisson(3) boost.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer: core core/ primitives (RNG, IDs, models, exceptions) layer: simulation simulation/ discrete-time engine type: bugfix Fixes a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix: is_sql=False deterministically implies 0% conversion

2 participants