fix: tune v6 pipeline for post-engine-change validation#49
Merged
Conversation
Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal. Two fixes: 1. Add Poisson(1) boost to the leakage trap column for converted leads (same approach proven in v5), restoring robust trap delta signal (mean 0.048, min 0.028 across 10 seeds). 2. Lower baseline AUC threshold from 0.62 to 0.60 — the engine changes reduced baseline LR AUC from 0.667 to 0.611, which is still well above chance and pedagogically useful. Snapshot day 10 was tested but made AUC worse (0.572), so day 14 is retained. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Retunes the v6 lead-scoring intro dataset validation after recent simulation engine changes by strengthening the instructor-only leakage trap signal and relaxing the baseline AUC threshold so the validation suite remains stable and pedagogically appropriate.
Changes:
- Lowered v6 baseline AUC validation lower bound from 0.62 → 0.60.
- Added an instructor-only leakage trap “boost” step (Poisson(1) added for converted leads) in the v6 pipeline + snapshot build script.
- Updated internal planning/notes to reflect the new post-engine-change validation numbers.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
scripts/validate_v6_dataset.py |
Relaxes the AUC lower threshold used by the v6 dataset validator. |
scripts/build_v6_snapshot.py |
Applies the new boost_leakage_trap() step to the instructor dataset during snapshot build. |
leadforge/pipelines/build_v6.py |
Introduces boost_leakage_trap() and updates module documentation/exports accordingly. |
.agent-plan.md |
Updates recorded validation results to match the retuned post-engine-change targets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…sson(3) boost Engine changes in PRs #40, #43, #45 weakened the v6 dataset signal. Two tuning changes restore all validation metrics: 1. Shift SNAPSHOT_DAY from 14 to 20 — features need more accumulation time after engine changes; day-14 AUC was 0.611, day-20 is 0.676. 2. Poisson(3) trap boost for converted leads — the wider snapshot window leaves fewer post-snapshot days for causal trap signal, so a stronger boost compensates (mean delta 0.061, min 0.027). All mandatory checks pass with comfortable margins. AUC threshold kept at 0.62 (no relaxation needed). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
Snapshot day 14→20, updated all baseline AUC, trap delta, value-aware ranking, and teaching guidance numbers to match retuned pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
COPILOT-1: Add 5 unit tests for boost_leakage_trap() covering: only converted leads boosted, input immutability, determinism, converted leads >= original, mean increases after boost. COPILOT-2: Update build script docstring and inline comment to reflect that the trap is no longer purely causal — it combines causal post-snapshot touches with a Poisson(3) boost. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #49 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.Run metadata: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validation results (all pass):
Test plan
scripts/validate_v6_dataset.py)🤖 Generated with Claude Code