Skip to content

[codex] Add flank-aware protein scanning#207

Merged
iskandr merged 2 commits into
masterfrom
fix-flank-aware-predict-proteins
May 6, 2026
Merged

[codex] Add flank-aware protein scanning#207
iskandr merged 2 commits into
masterfrom
fix-flank-aware-predict-proteins

Conversation

@iskandr
Copy link
Copy Markdown
Contributor

@iskandr iskandr commented May 6, 2026

Summary

Fixes #206.

This PR makes protein scanning preserve and forward N-/C-terminal peptide flanks for predictors that use flanking sequence context.

Changes:

  • Add a shared peptide-context helper and explicit BasePredictor.predict_with_flanks(...) extension point.
  • Keep non-flank-aware predictors on the existing deduplicated peptide path.
  • Route flank-aware predictors through occurrence-level prediction so duplicate peptide strings with different source contexts remain distinct.
  • Forward flanks from MHCflurry.predict_proteins(...) into Class1PresentationPredictor.predict(...), using the loaded MHCflurry processing model supported flank lengths when available.
  • Preserve flank fields on emitted Prediction records.
  • Reuse shared flank input validation and context generation in processing predictors.
  • Bump version to 3.13.6.

Validation

  • ./lint.sh
  • ./test.sh -> 409 passed, 9 skipped, 2 xfailed

The MHCflurry regression test compares predict_proteins(...) output against a direct Class1PresentationPredictor.predict(..., n_flanks=..., c_flanks=...) call for the same peptide context.

@iskandr iskandr marked this pull request as ready for review May 6, 2026 17:59
@iskandr iskandr merged commit efb63a2 into master May 6, 2026
4 checks passed
@iskandr iskandr deleted the fix-flank-aware-predict-proteins branch May 6, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BasePredictor.predict_proteins drops n_flank/c_flank — mhcflurry presentation falls back to a less accurate model

1 participant