Skip to content

Add LENS report loader + logistic_normalized — v5.1.0#117

Merged
iskandr merged 1 commit into
masterfrom
lens-loader
Apr 13, 2026
Merged

Add LENS report loader + logistic_normalized — v5.1.0#117
iskandr merged 1 commit into
masterfrom
lens-loader

Conversation

@iskandr
Copy link
Copy Markdown
Contributor

@iskandr iskandr commented Apr 13, 2026

Summary

Two features bundled into v5.1.0:

LENS loader (#110)

topiary.read_lens(path) loads LENS reports (v1.4, v1.5.1, v1.9-dev) into Topiary's wide-form schema as a TopiaryResult. Column-based version detection, per-model binding remap, mhcgnomes for allele normalization (Class I, II, mouse), HGVS-driven effect_type synthesis, flank derivation from pep_context for SNV / SPLICE / FUSION only, fusion-tpm composite-string handling.

logistic_normalized (#116)

DSLNode.logistic_normalized(m, w) — logistic rescaled to approach 1 as x → -∞. Separate AST node (LogisticNormalizedExpr) so it round-trips through parse() independently of raw .logistic().

Honest schema gaps on LENS load

  • peptide_offset set to 0 (LENS doesn't record it)
  • contains_mutant_residues, mutation_start_in_peptide left NaN (mut_aa_pos semantics ambiguous across antigen sources)
  • n_flank / c_flank derived only for SNV / SPLICE / FUSION — ERV / INDEL / CTA pep_context is the full ORF or ambiguous
  • Agretopicity / priority scores pass through as Column(...) annotations but have no sibling in fresh predictions
  • b2m_* / tap*_* / hla_allele_* are per-sample constants repeated per row (future PR could promote to Metadata.extra)

Fixtures

tests/data/lens/sample_v{1_4, 1_5_1, 1_9}.tsv — 25-30 rows each, drawn from real LENS reports with ~5 rows per antigen_source (SNV, INDEL, SPLICE, FUSION, ERV, CTA/SELF).

Test plan

  • ./test.sh — 876 passed, 1 skipped (+39 new: 34 LENS + 5 logistic_normalized)
  • ./lint.sh — clean
  • CI green on this PR

topiary.read_lens(path) — load LENS (Landscape of Effective Neoantigens
Software) reports in three schema variants (v1.4, v1.5.1, v1.9-dev)
into Topiary's wide-form schema (#110).

- Column-based version detection (netmhcstabpan_1.0.* → v1.4,
  snaf_exp → v1.5.1, lohhla_allele_loss_pval → v1.9).
- Binding column remap (netmhcpan_4.1b.aff_nm → netmhcpan_affinity_value,
  etc.) with per-model versions captured in Metadata.models.
- Allele normalization via mhcgnomes (handles Class I, II, mouse —
  removes the manual "insert a star" hack).
- effect_type synthesized from HGVS variant_effect when present
  (p.Ala290Val → Substitution, p.Thr259fs → FrameShift, etc.); falls
  back to antigen_source mapping for v1.4 which lacks variant_effect.
- Flanks derived from pep_context only for SNV/SPLICE/FUSION where the
  peptide occurs exactly once; ERV/INDEL/CTA skipped because pep_context
  is the full ORF (up to ~550 residues) or ambiguous.
- tpm handled carefully: fusion rows have composite "ENST1:tpm1-ENST2:tpm2"
  strings — numeric gene_tpm is NaN with the raw string preserved in
  gene_tpm_raw; when every row is cleanly numeric the raw column is
  dropped.
- LENS-specific annotations (erv_*, priority_score_*, b2m_*, hla_*,
  lohhla_*, normal-tissue TPMs, agretopicity, etc.) pass through
  verbatim and are accessible via Column("…") in the DSL.
- Honest about losses: peptide_offset set to 0 (not in LENS);
  contains_mutant_residues / mutation_start_in_peptide left NaN
  (mut_aa_pos semantics ambiguous across antigen_source types).

DSLNode.logistic_normalized(midpoint, width) — logistic sigmoid
rescaled to approach 1 as x → -∞ rather than capping at
1/(1+exp(-m/w)) (~0.912 at (350, 150)) (#116). Separate
LogisticNormalizedExpr AST node so to_expr_string() round-trips
cleanly through parse(); registered in the parser's transform
whitelist. The raw .logistic(...) is unchanged.

Fixtures: tests/data/lens/sample_v{1_4, 1_5_1, 1_9}.tsv — 25-30 rows
each, drawn from real LENS reports with ~5 rows per antigen_source.

34 new tests in tests/test_io_lens.py: version detection, binding
remap, allele normalization, effect_type derivation from HGVS, flank
derivation, tpm fusion-composite handling, NA normalization,
annotation pass-through, DSL round-trip on loaded data, multi-model
ambiguity behavior. 5 new tests in tests/test_dsl_roundtrip.py
covering logistic_normalized. 876 tests pass total.
@coveralls
Copy link
Copy Markdown

Coverage Status

coverage: 87.281% (+0.2%) from 87.113% — lens-loader into master

@iskandr iskandr merged commit b5f507f into master Apr 13, 2026
8 checks passed
@iskandr iskandr deleted the lens-loader branch April 13, 2026 06:50
iskandr added a commit to openvax/vaxrank that referenced this pull request Apr 13, 2026
Topiary v5.1.0 (openvax/topiary#117) shipped the logistic_normalized DSL
node we asked for in openvax/topiary#116. Switching to it replaces the
per-config math.exp(-m/w) computation with a first-class AST node that
round-trips through parse() and is more honest about what the default
score represents ([0, 1] binder-quality).

- requirements.txt: topiary>=5.1.0,<6.0.0
- epitope_dsl: _default_score_node uses Affinity.logistic_normalized(m, w);
  drop the _logistic_normalizer helper and the `math` import.
- README: update the score_expr example to logistic_normalized and drop
  the un-normalized-sigmoid caveat (the workaround is now a built-in).
- tests: remove the formula test for the deleted helper; parity tests
  still pass byte-for-byte against legacy (LogisticNormalizedExpr does
  the exact same math).

All 251 non-weasyprint tests pass.
iskandr added a commit to openvax/vaxrank that referenced this pull request Apr 13, 2026
* Integrate topiary 5.0 DSL for epitope filtering and ranking (#216)

Replaces the per-row EpitopePrediction.logistic_epitope_score call in
predict_epitopes with a topiary DSLNode pipeline: optional apply_filter
up front, then a single score_node.eval pass indexed by
(source_sequence_name, peptide, peptide_offset, allele). Users can now
configure filtering/ranking via ``filter_expr`` / ``score_expr`` on
EpitopeConfig; when unset, nodes are synthesized from the scalar fields.

The synthesized affinity-mode score node divides by 1/(1+exp(-mid/width))
so its output matches the legacy scorer byte-for-byte — topiary's
LogisticExpr does not apply that normalizer. Default path has no
filter_node so legacy "min_epitope_score=0 keeps everything" semantics
are preserved; the score node's (Affinity < cutoff) mask + min-score
gate reproduces the old behavior.

Other logistic_epitope_score callers (VaccinePeptide, report, epitope_io)
are unchanged and will migrate in a follow-up.

Deps bumped: topiary>=5.0.0, varcode>=2.1.0, mhctools>=3.7.0 (topiary 5.0
requires >=3.7.0).

* Review follow-ups for topiary 5.0 DSL integration

- Eager-validate EpitopeConfig.filter_expr / score_expr in __post_init__
  so malformed YAML DSL strings fail at config load, not mid-pipeline.
- Drop the dead 0.0 fallback in score_series.get(); reindex+fillna
  upstream already guarantee every group tuple is present.
- Add a multi-allele parity test over distinct (peptide, allele) groups.
- Document the multi-method ambiguity behavior (default affinity raises
  when multiple prediction_method_name values are present; qualify via
  affinity['modelname'] to disambiguate) with two new tests.
- README: add a dedicated section on filter_expr / score_expr with a
  working YAML example, plus rows for both fields in the EpitopeConfig
  reference table. Use column(ident) — the form topiary's parser
  accepts — rather than column('str').
- Fix unknown-column test to use the correct column(IDENT) syntax.

All 252 non-weasyprint tests pass.

* Use topiary 5.1 logistic_normalized instead of manual divisor

Topiary v5.1.0 (openvax/topiary#117) shipped the logistic_normalized DSL
node we asked for in openvax/topiary#116. Switching to it replaces the
per-config math.exp(-m/w) computation with a first-class AST node that
round-trips through parse() and is more honest about what the default
score represents ([0, 1] binder-quality).

- requirements.txt: topiary>=5.1.0,<6.0.0
- epitope_dsl: _default_score_node uses Affinity.logistic_normalized(m, w);
  drop the _logistic_normalizer helper and the `math` import.
- README: update the score_expr example to logistic_normalized and drop
  the un-normalized-sigmoid caveat (the workaround is now a built-in).
- tests: remove the formula test for the deleted helper; parity tests
  still pass byte-for-byte against legacy (LogisticNormalizedExpr does
  the exact same math).

All 251 non-weasyprint tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants