Add LENS report loader + logistic_normalized — v5.1.0 by iskandr · Pull Request #117 · openvax/topiary

iskandr · 2026-04-13T06:39:49Z

Summary

Two features bundled into v5.1.0:

LENS loader (#110)

topiary.read_lens(path) loads LENS reports (v1.4, v1.5.1, v1.9-dev) into Topiary's wide-form schema as a TopiaryResult. Column-based version detection, per-model binding remap, mhcgnomes for allele normalization (Class I, II, mouse), HGVS-driven effect_type synthesis, flank derivation from pep_context for SNV / SPLICE / FUSION only, fusion-tpm composite-string handling.

`logistic_normalized` (#116)

DSLNode.logistic_normalized(m, w) — logistic rescaled to approach 1 as x → -∞. Separate AST node (LogisticNormalizedExpr) so it round-trips through parse() independently of raw .logistic().

Honest schema gaps on LENS load

peptide_offset set to 0 (LENS doesn't record it)
contains_mutant_residues, mutation_start_in_peptide left NaN (mut_aa_pos semantics ambiguous across antigen sources)
n_flank / c_flank derived only for SNV / SPLICE / FUSION — ERV / INDEL / CTA pep_context is the full ORF or ambiguous
Agretopicity / priority scores pass through as Column(...) annotations but have no sibling in fresh predictions
b2m_* / tap*_* / hla_allele_* are per-sample constants repeated per row (future PR could promote to Metadata.extra)

Fixtures

tests/data/lens/sample_v{1_4, 1_5_1, 1_9}.tsv — 25-30 rows each, drawn from real LENS reports with ~5 rows per antigen_source (SNV, INDEL, SPLICE, FUSION, ERV, CTA/SELF).

Test plan

./test.sh — 876 passed, 1 skipped (+39 new: 34 LENS + 5 logistic_normalized)
./lint.sh — clean
CI green on this PR

topiary.read_lens(path) — load LENS (Landscape of Effective Neoantigens Software) reports in three schema variants (v1.4, v1.5.1, v1.9-dev) into Topiary's wide-form schema (#110). - Column-based version detection (netmhcstabpan_1.0.* → v1.4, snaf_exp → v1.5.1, lohhla_allele_loss_pval → v1.9). - Binding column remap (netmhcpan_4.1b.aff_nm → netmhcpan_affinity_value, etc.) with per-model versions captured in Metadata.models. - Allele normalization via mhcgnomes (handles Class I, II, mouse — removes the manual "insert a star" hack). - effect_type synthesized from HGVS variant_effect when present (p.Ala290Val → Substitution, p.Thr259fs → FrameShift, etc.); falls back to antigen_source mapping for v1.4 which lacks variant_effect. - Flanks derived from pep_context only for SNV/SPLICE/FUSION where the peptide occurs exactly once; ERV/INDEL/CTA skipped because pep_context is the full ORF (up to ~550 residues) or ambiguous. - tpm handled carefully: fusion rows have composite "ENST1:tpm1-ENST2:tpm2" strings — numeric gene_tpm is NaN with the raw string preserved in gene_tpm_raw; when every row is cleanly numeric the raw column is dropped. - LENS-specific annotations (erv_*, priority_score_*, b2m_*, hla_*, lohhla_*, normal-tissue TPMs, agretopicity, etc.) pass through verbatim and are accessible via Column("…") in the DSL. - Honest about losses: peptide_offset set to 0 (not in LENS); contains_mutant_residues / mutation_start_in_peptide left NaN (mut_aa_pos semantics ambiguous across antigen_source types). DSLNode.logistic_normalized(midpoint, width) — logistic sigmoid rescaled to approach 1 as x → -∞ rather than capping at 1/(1+exp(-m/w)) (~0.912 at (350, 150)) (#116). Separate LogisticNormalizedExpr AST node so to_expr_string() round-trips cleanly through parse(); registered in the parser's transform whitelist. The raw .logistic(...) is unchanged. Fixtures: tests/data/lens/sample_v{1_4, 1_5_1, 1_9}.tsv — 25-30 rows each, drawn from real LENS reports with ~5 rows per antigen_source. 34 new tests in tests/test_io_lens.py: version detection, binding remap, allele normalization, effect_type derivation from HGVS, flank derivation, tpm fusion-composite handling, NA normalization, annotation pass-through, DSL round-trip on loaded data, multi-model ambiguity behavior. 5 new tests in tests/test_dsl_roundtrip.py covering logistic_normalized. 876 tests pass total.

coveralls · 2026-04-13T06:48:30Z

coverage: 87.281% (+0.2%) from 87.113% — lens-loader into master

Topiary v5.1.0 (openvax/topiary#117) shipped the logistic_normalized DSL node we asked for in openvax/topiary#116. Switching to it replaces the per-config math.exp(-m/w) computation with a first-class AST node that round-trips through parse() and is more honest about what the default score represents ([0, 1] binder-quality). - requirements.txt: topiary>=5.1.0,<6.0.0 - epitope_dsl: _default_score_node uses Affinity.logistic_normalized(m, w); drop the _logistic_normalizer helper and the `math` import. - README: update the score_expr example to logistic_normalized and drop the un-normalized-sigmoid caveat (the workaround is now a built-in). - tests: remove the formula test for the deleted helper; parity tests still pass byte-for-byte against legacy (LogisticNormalizedExpr does the exact same math). All 251 non-weasyprint tests pass.

* Integrate topiary 5.0 DSL for epitope filtering and ranking (#216) Replaces the per-row EpitopePrediction.logistic_epitope_score call in predict_epitopes with a topiary DSLNode pipeline: optional apply_filter up front, then a single score_node.eval pass indexed by (source_sequence_name, peptide, peptide_offset, allele). Users can now configure filtering/ranking via ``filter_expr`` / ``score_expr`` on EpitopeConfig; when unset, nodes are synthesized from the scalar fields. The synthesized affinity-mode score node divides by 1/(1+exp(-mid/width)) so its output matches the legacy scorer byte-for-byte — topiary's LogisticExpr does not apply that normalizer. Default path has no filter_node so legacy "min_epitope_score=0 keeps everything" semantics are preserved; the score node's (Affinity < cutoff) mask + min-score gate reproduces the old behavior. Other logistic_epitope_score callers (VaccinePeptide, report, epitope_io) are unchanged and will migrate in a follow-up. Deps bumped: topiary>=5.0.0, varcode>=2.1.0, mhctools>=3.7.0 (topiary 5.0 requires >=3.7.0). * Review follow-ups for topiary 5.0 DSL integration - Eager-validate EpitopeConfig.filter_expr / score_expr in __post_init__ so malformed YAML DSL strings fail at config load, not mid-pipeline. - Drop the dead 0.0 fallback in score_series.get(); reindex+fillna upstream already guarantee every group tuple is present. - Add a multi-allele parity test over distinct (peptide, allele) groups. - Document the multi-method ambiguity behavior (default affinity raises when multiple prediction_method_name values are present; qualify via affinity['modelname'] to disambiguate) with two new tests. - README: add a dedicated section on filter_expr / score_expr with a working YAML example, plus rows for both fields in the EpitopeConfig reference table. Use column(ident) — the form topiary's parser accepts — rather than column('str'). - Fix unknown-column test to use the correct column(IDENT) syntax. All 252 non-weasyprint tests pass. * Use topiary 5.1 logistic_normalized instead of manual divisor Topiary v5.1.0 (openvax/topiary#117) shipped the logistic_normalized DSL node we asked for in openvax/topiary#116. Switching to it replaces the per-config math.exp(-m/w) computation with a first-class AST node that round-trips through parse() and is more honest about what the default score represents ([0, 1] binder-quality). - requirements.txt: topiary>=5.1.0,<6.0.0 - epitope_dsl: _default_score_node uses Affinity.logistic_normalized(m, w); drop the _logistic_normalizer helper and the `math` import. - README: update the score_expr example to logistic_normalized and drop the un-normalized-sigmoid caveat (the workaround is now a built-in). - tests: remove the formula test for the deleted helper; parity tests still pass byte-for-byte against legacy (LogisticNormalizedExpr does the exact same math). All 251 non-weasyprint tests pass.

iskandr merged commit b5f507f into master Apr 13, 2026
8 checks passed

iskandr deleted the lens-loader branch April 13, 2026 06:50

This was referenced Apr 13, 2026

Load LENS reports into Topiary wide form #110

Closed

LogisticExpr is unnormalized — add sibling .logistic_normalized(...) transform #116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LENS report loader + logistic_normalized — v5.1.0#117

Add LENS report loader + logistic_normalized — v5.1.0#117
iskandr merged 1 commit into
masterfrom
lens-loader

iskandr commented Apr 13, 2026

Uh oh!

coveralls commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iskandr commented Apr 13, 2026

Summary

LENS loader (#110)

logistic_normalized (#116)

Honest schema gaps on LENS load

Fixtures

Test plan

Uh oh!

coveralls commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`logistic_normalized` (#116)