Add LENS report loader + logistic_normalized — v5.1.0#117
Merged
Conversation
topiary.read_lens(path) — load LENS (Landscape of Effective Neoantigens Software) reports in three schema variants (v1.4, v1.5.1, v1.9-dev) into Topiary's wide-form schema (#110). - Column-based version detection (netmhcstabpan_1.0.* → v1.4, snaf_exp → v1.5.1, lohhla_allele_loss_pval → v1.9). - Binding column remap (netmhcpan_4.1b.aff_nm → netmhcpan_affinity_value, etc.) with per-model versions captured in Metadata.models. - Allele normalization via mhcgnomes (handles Class I, II, mouse — removes the manual "insert a star" hack). - effect_type synthesized from HGVS variant_effect when present (p.Ala290Val → Substitution, p.Thr259fs → FrameShift, etc.); falls back to antigen_source mapping for v1.4 which lacks variant_effect. - Flanks derived from pep_context only for SNV/SPLICE/FUSION where the peptide occurs exactly once; ERV/INDEL/CTA skipped because pep_context is the full ORF (up to ~550 residues) or ambiguous. - tpm handled carefully: fusion rows have composite "ENST1:tpm1-ENST2:tpm2" strings — numeric gene_tpm is NaN with the raw string preserved in gene_tpm_raw; when every row is cleanly numeric the raw column is dropped. - LENS-specific annotations (erv_*, priority_score_*, b2m_*, hla_*, lohhla_*, normal-tissue TPMs, agretopicity, etc.) pass through verbatim and are accessible via Column("…") in the DSL. - Honest about losses: peptide_offset set to 0 (not in LENS); contains_mutant_residues / mutation_start_in_peptide left NaN (mut_aa_pos semantics ambiguous across antigen_source types). DSLNode.logistic_normalized(midpoint, width) — logistic sigmoid rescaled to approach 1 as x → -∞ rather than capping at 1/(1+exp(-m/w)) (~0.912 at (350, 150)) (#116). Separate LogisticNormalizedExpr AST node so to_expr_string() round-trips cleanly through parse(); registered in the parser's transform whitelist. The raw .logistic(...) is unchanged. Fixtures: tests/data/lens/sample_v{1_4, 1_5_1, 1_9}.tsv — 25-30 rows each, drawn from real LENS reports with ~5 rows per antigen_source. 34 new tests in tests/test_io_lens.py: version detection, binding remap, allele normalization, effect_type derivation from HGVS, flank derivation, tpm fusion-composite handling, NA normalization, annotation pass-through, DSL round-trip on loaded data, multi-model ambiguity behavior. 5 new tests in tests/test_dsl_roundtrip.py covering logistic_normalized. 876 tests pass total.
This was referenced Apr 13, 2026
iskandr
added a commit
to openvax/vaxrank
that referenced
this pull request
Apr 13, 2026
Topiary v5.1.0 (openvax/topiary#117) shipped the logistic_normalized DSL node we asked for in openvax/topiary#116. Switching to it replaces the per-config math.exp(-m/w) computation with a first-class AST node that round-trips through parse() and is more honest about what the default score represents ([0, 1] binder-quality). - requirements.txt: topiary>=5.1.0,<6.0.0 - epitope_dsl: _default_score_node uses Affinity.logistic_normalized(m, w); drop the _logistic_normalizer helper and the `math` import. - README: update the score_expr example to logistic_normalized and drop the un-normalized-sigmoid caveat (the workaround is now a built-in). - tests: remove the formula test for the deleted helper; parity tests still pass byte-for-byte against legacy (LogisticNormalizedExpr does the exact same math). All 251 non-weasyprint tests pass.
iskandr
added a commit
to openvax/vaxrank
that referenced
this pull request
Apr 13, 2026
* Integrate topiary 5.0 DSL for epitope filtering and ranking (#216) Replaces the per-row EpitopePrediction.logistic_epitope_score call in predict_epitopes with a topiary DSLNode pipeline: optional apply_filter up front, then a single score_node.eval pass indexed by (source_sequence_name, peptide, peptide_offset, allele). Users can now configure filtering/ranking via ``filter_expr`` / ``score_expr`` on EpitopeConfig; when unset, nodes are synthesized from the scalar fields. The synthesized affinity-mode score node divides by 1/(1+exp(-mid/width)) so its output matches the legacy scorer byte-for-byte — topiary's LogisticExpr does not apply that normalizer. Default path has no filter_node so legacy "min_epitope_score=0 keeps everything" semantics are preserved; the score node's (Affinity < cutoff) mask + min-score gate reproduces the old behavior. Other logistic_epitope_score callers (VaccinePeptide, report, epitope_io) are unchanged and will migrate in a follow-up. Deps bumped: topiary>=5.0.0, varcode>=2.1.0, mhctools>=3.7.0 (topiary 5.0 requires >=3.7.0). * Review follow-ups for topiary 5.0 DSL integration - Eager-validate EpitopeConfig.filter_expr / score_expr in __post_init__ so malformed YAML DSL strings fail at config load, not mid-pipeline. - Drop the dead 0.0 fallback in score_series.get(); reindex+fillna upstream already guarantee every group tuple is present. - Add a multi-allele parity test over distinct (peptide, allele) groups. - Document the multi-method ambiguity behavior (default affinity raises when multiple prediction_method_name values are present; qualify via affinity['modelname'] to disambiguate) with two new tests. - README: add a dedicated section on filter_expr / score_expr with a working YAML example, plus rows for both fields in the EpitopeConfig reference table. Use column(ident) — the form topiary's parser accepts — rather than column('str'). - Fix unknown-column test to use the correct column(IDENT) syntax. All 252 non-weasyprint tests pass. * Use topiary 5.1 logistic_normalized instead of manual divisor Topiary v5.1.0 (openvax/topiary#117) shipped the logistic_normalized DSL node we asked for in openvax/topiary#116. Switching to it replaces the per-config math.exp(-m/w) computation with a first-class AST node that round-trips through parse() and is more honest about what the default score represents ([0, 1] binder-quality). - requirements.txt: topiary>=5.1.0,<6.0.0 - epitope_dsl: _default_score_node uses Affinity.logistic_normalized(m, w); drop the _logistic_normalizer helper and the `math` import. - README: update the score_expr example to logistic_normalized and drop the un-normalized-sigmoid caveat (the workaround is now a built-in). - tests: remove the formula test for the deleted helper; parity tests still pass byte-for-byte against legacy (LogisticNormalizedExpr does the exact same math). All 251 non-weasyprint tests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two features bundled into v5.1.0:
LENS loader (#110)
topiary.read_lens(path)loads LENS reports (v1.4, v1.5.1, v1.9-dev) into Topiary's wide-form schema as aTopiaryResult. Column-based version detection, per-model binding remap,mhcgnomesfor allele normalization (Class I, II, mouse), HGVS-driveneffect_typesynthesis, flank derivation frompep_contextfor SNV / SPLICE / FUSION only, fusion-tpm composite-string handling.logistic_normalized(#116)DSLNode.logistic_normalized(m, w)— logistic rescaled to approach 1 asx → -∞. Separate AST node (LogisticNormalizedExpr) so it round-trips throughparse()independently of raw.logistic().Honest schema gaps on LENS load
peptide_offsetset to 0 (LENS doesn't record it)contains_mutant_residues,mutation_start_in_peptideleft NaN (mut_aa_possemantics ambiguous across antigen sources)n_flank/c_flankderived only for SNV / SPLICE / FUSION — ERV / INDEL / CTApep_contextis the full ORF or ambiguousColumn(...)annotations but have no sibling in fresh predictionsb2m_*/tap*_*/hla_allele_*are per-sample constants repeated per row (future PR could promote toMetadata.extra)Fixtures
tests/data/lens/sample_v{1_4, 1_5_1, 1_9}.tsv— 25-30 rows each, drawn from real LENS reports with ~5 rows per antigen_source (SNV, INDEL, SPLICE, FUSION, ERV, CTA/SELF).Test plan
./test.sh— 876 passed, 1 skipped (+39 new: 34 LENS + 5 logistic_normalized)./lint.sh— clean