Skip to content

ci: fix all CI failures from M3 refactor#14

Merged
jam-sudo merged 6 commits intomainfrom
feature/ml-first-refactor-m0-m3
Feb 28, 2026
Merged

ci: fix all CI failures from M3 refactor#14
jam-sudo merged 6 commits intomainfrom
feature/ml-first-refactor-m0-m3

Conversation

@jam-sudo
Copy link
Copy Markdown
Owner

Summary

  • Run ruff format on 32 files left unformatted after the M3 refactor
  • Fix 30 ruff lint errors (unsorted imports, unused import, B904, B905)
  • Bump pytest-timeout from ~=0.5 to >=2.2 (compatible with pytest 8)
  • Install [api] extras in CI so test_api.py (fastapi) can be collected
  • Install [viz] extras in CI so matplotlib-dependent tests pass

Test plan

  • All 5 CI jobs green on this branch (Quality, Test 3.10/3.11/3.12, Smoke)
  • 818 tests pass, 8 skipped/deselected

🤖 Generated with Claude Code

Omega Dev and others added 6 commits February 28, 2026 03:36
- Run ruff format on 32 files left unformatted after M3 refactor
- Add pytest-timeout~=0.5 to dev extras so benchmark workflow
  --timeout=300 flag is recognized

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix 27 auto-fixable issues (unsorted imports, unused import)
- B904: add `from None` to bare raises in ImportError handlers in cli.py
- B905: add strict=False to zip() in test_surrogate_extensions.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ruff check --fix modified imports, requiring a follow-up ruff format pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0.5 uses the removed __multicall__ hook API; 2.x is required for pytest 3+.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_api.py imports fastapi at module level; without it the entire
test collection fails and coverage drops to 19%. Installing [dev,api]
restores normal collection and coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
10 tests fail with 'No module named matplotlib'. Adding viz to the
extras group fixes them alongside the api extras added previously.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jam-sudo jam-sudo merged commit b8e54ba into main Feb 28, 2026
6 checks passed
jam-sudo added a commit that referenced this pull request Mar 19, 2026
…s to Δ+0.553

Phase 3a.1 gut-wall fix improved the raw ODE baseline significantly, reducing
the hybrid selector's relative contribution (BARE 2.465→2.039; NO_HYBRID 2.346→1.941).

New ablation numbers (2026-03-18):
  FULL:         1.747 [1.48, 2.13]  83%
  NO_HYBRID:    1.941 [1.65, 2.30]  58%  Δ+0.194 (was +0.553 pre-Phase-3a.1)
  BARE:         2.039 [1.72, 2.44]  54%  Δ+0.292 (was +0.672)
  NO_ENSEMBLE:  1.857 [1.60, 2.20]  71%  Δ+0.110
  NO_RIDGE:     1.747              +0.000 — dead code confirmed

Updated: fig_ablation.py + .pdf/.png, supplementary_table_S2.tex,
omega_paper.tex (abstract + ablation section + table + discussion),
CLAUDE.md Key Decision #14, MEMORY.md ablation table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jam-sudo added a commit that referenced this pull request Mar 23, 2026
… correction

## Summary
Comprehensive pipeline improvement across 10 tasks:

### Data Quality (Tasks 1, 8, 9)
- Fix 5 platinum reference Cmax values (diclofenac, posaconazole, lenacapavir,
  valganciclovir, pindolol) — unit errors, route mismatch, extraction errors
- Add DDI-boosted flags for lopinavir and darunavir
- Add nucleoside 5'-ester SMARTS for molnupiravir prodrug detection
- Fix val-ester SMARTS false positive on penicillamine
- First AUC validation with MMPK (32 drugs): AAFE 3.205
- Lombardo/Obach VDss cross-validation (17 drugs): AAFE 3.71

### Pipeline Constants (Tasks 4, 5)
- Revert 4 Optuna constants to pre-Optuna defaults (gut_threshold 2.6,
  peff_min 0.5, pgp 0.5, gse 0.5) — MMPK tuning doesn't generalize
- Verify ODE >> analytical for Cmax on clinical data (AAFE 2.41 vs 11.64)

### UQ System (Task 6)
- Recalibrate AdaptiveConformal: 68 clean drugs, k=30
  Coverage: 93.7% in-domain (was 97% over-wide), width 20.6x (was 4880x)
- Replace broken LHS AUC/t½ CI with Cmax q-value heuristic scaling

### VDss Fix (Task 10)
- Weighted geometric mean (XGB^0.7 × Berez^0.3) always applied for t½
  Core-24 AUC AAFE: 2.344 → 2.142 (-8.6%), Cmax unchanged

### Metrics & Documentation (Tasks 2, 3, 7)
- First Spearman ρ measurement: 0.9379 in-domain (excellent ranking)
- CLAUDE.md: revoke KD#3/#7/#14, add KD#32-41, update performance
- Holdout benchmark: in-domain stratification, DDI-boosted exclusion
- CYP3A4 classifier trained but deferred (AUROC 0.634, no holdout impact)

## Results
- Core-24 Cmax AAFE: 1.977 → 1.879 (-5.0%)
- Core-24 AUC AAFE:  2.344 → 2.142 (-8.6%)
- Holdout ALL:       2.780 → 2.440 (-12.2%)
- Holdout IN-DOMAIN: first measured = 1.966 (under 2.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant