ci: fix all CI failures from M3 refactor#14
Merged
Conversation
- Run ruff format on 32 files left unformatted after M3 refactor - Add pytest-timeout~=0.5 to dev extras so benchmark workflow --timeout=300 flag is recognized Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix 27 auto-fixable issues (unsorted imports, unused import) - B904: add `from None` to bare raises in ImportError handlers in cli.py - B905: add strict=False to zip() in test_surrogate_extensions.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ruff check --fix modified imports, requiring a follow-up ruff format pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0.5 uses the removed __multicall__ hook API; 2.x is required for pytest 3+. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_api.py imports fastapi at module level; without it the entire test collection fails and coverage drops to 19%. Installing [dev,api] restores normal collection and coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
10 tests fail with 'No module named matplotlib'. Adding viz to the extras group fixes them alongside the api extras added previously. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
jam-sudo
added a commit
that referenced
this pull request
Mar 19, 2026
…s to Δ+0.553 Phase 3a.1 gut-wall fix improved the raw ODE baseline significantly, reducing the hybrid selector's relative contribution (BARE 2.465→2.039; NO_HYBRID 2.346→1.941). New ablation numbers (2026-03-18): FULL: 1.747 [1.48, 2.13] 83% NO_HYBRID: 1.941 [1.65, 2.30] 58% Δ+0.194 (was +0.553 pre-Phase-3a.1) BARE: 2.039 [1.72, 2.44] 54% Δ+0.292 (was +0.672) NO_ENSEMBLE: 1.857 [1.60, 2.20] 71% Δ+0.110 NO_RIDGE: 1.747 +0.000 — dead code confirmed Updated: fig_ablation.py + .pdf/.png, supplementary_table_S2.tex, omega_paper.tex (abstract + ablation section + table + discussion), CLAUDE.md Key Decision #14, MEMORY.md ablation table. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jam-sudo
added a commit
that referenced
this pull request
Mar 23, 2026
… correction ## Summary Comprehensive pipeline improvement across 10 tasks: ### Data Quality (Tasks 1, 8, 9) - Fix 5 platinum reference Cmax values (diclofenac, posaconazole, lenacapavir, valganciclovir, pindolol) — unit errors, route mismatch, extraction errors - Add DDI-boosted flags for lopinavir and darunavir - Add nucleoside 5'-ester SMARTS for molnupiravir prodrug detection - Fix val-ester SMARTS false positive on penicillamine - First AUC validation with MMPK (32 drugs): AAFE 3.205 - Lombardo/Obach VDss cross-validation (17 drugs): AAFE 3.71 ### Pipeline Constants (Tasks 4, 5) - Revert 4 Optuna constants to pre-Optuna defaults (gut_threshold 2.6, peff_min 0.5, pgp 0.5, gse 0.5) — MMPK tuning doesn't generalize - Verify ODE >> analytical for Cmax on clinical data (AAFE 2.41 vs 11.64) ### UQ System (Task 6) - Recalibrate AdaptiveConformal: 68 clean drugs, k=30 Coverage: 93.7% in-domain (was 97% over-wide), width 20.6x (was 4880x) - Replace broken LHS AUC/t½ CI with Cmax q-value heuristic scaling ### VDss Fix (Task 10) - Weighted geometric mean (XGB^0.7 × Berez^0.3) always applied for t½ Core-24 AUC AAFE: 2.344 → 2.142 (-8.6%), Cmax unchanged ### Metrics & Documentation (Tasks 2, 3, 7) - First Spearman ρ measurement: 0.9379 in-domain (excellent ranking) - CLAUDE.md: revoke KD#3/#7/#14, add KD#32-41, update performance - Holdout benchmark: in-domain stratification, DDI-boosted exclusion - CYP3A4 classifier trained but deferred (AUROC 0.634, no holdout impact) ## Results - Core-24 Cmax AAFE: 1.977 → 1.879 (-5.0%) - Core-24 AUC AAFE: 2.344 → 2.142 (-8.6%) - Holdout ALL: 2.780 → 2.440 (-12.2%) - Holdout IN-DOMAIN: first measured = 1.966 (under 2.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ruff formaton 32 files left unformatted after the M3 refactorpytest-timeoutfrom~=0.5to>=2.2(compatible with pytest 8)[api]extras in CI sotest_api.py(fastapi) can be collected[viz]extras in CI so matplotlib-dependent tests passTest plan
🤖 Generated with Claude Code