DSL extensions: logistic, Column, WT, peptide properties by iskandr · Pull Request #98 · openvax/topiary

iskandr · 2026-04-09T10:59:07Z

Intent

Extend the ranking/filtering DSL to support the signals needed for Vaxrank-style composite scoring and general immunogenicity/manufacturability analysis. After this PR, users can express:

score = (
    0.4 * Affinity["netmhcpan"].logistic(350, 150)
    + 0.3 * Presentation["mhcflurry"].score
    - 0.1 * Column("cysteine_count")
    - 0.1 * abs(Column("charge"))
    + 0.1 * (Affinity.score - WT(Affinity).score)  # differential binding
)

Changes

1. `.logistic(midpoint, width)` on Expr

Logistic sigmoid transform: 1 / (1 + exp((x - midpoint) / width)). Vaxrank uses midpoint=350, width=150 for IC50→score conversion. Same pattern as existing .norm().

2. `Column("name")` — arbitrary DataFrame column access

New Expr subclass that reads any column from the group DataFrame. This is the key unlock: external signals (read counts, expression, peptide properties, custom annotations) become first-class ranking signals without special-casing each one.

Column("hydrophobicity") >= -0.5
Column("n_alt_reads").sqrt()

CLI: column(name) in filter/ranking strings. Errors with available columns when column is missing.

3. `WT(accessor)` — wildtype comparison wrapper

Wraps a KindAccessor to read wildtype prediction columns (wt_value, wt_score, wt_percentile_rank) instead of the mutant columns. Avoids duplicating every field on KindAccessor.

WT(Affinity).value                        # WT IC50
WT(Affinity["netmhcpan"]).score           # qualified WT
Affinity.score - WT(Affinity).score       # differential

Works with method qualification — same prediction_method_name filtering as the mutant side. When WT columns don't exist (non-variant inputs), evaluates to NaN.

4. `topiary.properties` — peptide property columns

New module computing amino acid properties directly on the peptide column using vectorized pandas string operations (fast enough for proteome-scale):

from topiary.properties import add_peptide_properties

df = add_peptide_properties(df, groups=["manufacturability"])

Named groups:

"core": charge, hydrophobicity, aromaticity, molecular_weight
"manufacturability": core + cysteine_count, instability_index, max_7mer_hydrophobicity, cterm_7mer_hydrophobicity, difficult_nterm, difficult_cterm, asp_pro_bonds
"immunogenicity": core + tcr_charge, tcr_aromaticity, tcr_hydrophobicity

Computation uses pandas str.count(), str[n], apply() with lookup dicts — no external dependencies.

Properties accessible in DSL via Column("charge"), Column("cysteine_count"), etc.

Not in this PR

add_wildtype_predictions() function (the step that actually runs WT predictions and populates the wt_* columns) — separate PR, depends on predictor refactoring
Evaluator performance refactor (vectorized filter/sort) — separate PR

Placeholder commit — implementation follows.

coveralls · 2026-04-09T11:07:50Z

coverage: 85.885% (+0.4%) from 85.441% — dsl-extensions into master

Expr.logistic(midpoint, width) — logistic sigmoid transform for Vaxrank-compatible IC50 scoring. Column("name") — reference any DataFrame column in expressions. Enables peptide properties, read counts, and custom annotations as first-class ranking signals. Errors with "Did you mean" on typos. CLI: column(name) <= threshold. WT(accessor) — wildtype comparison wrapper. Reads wt_value, wt_score, wt_percentile_rank columns alongside mutant values. Works with method qualification: WT(Affinity["netmhcpan"]).score. Returns NaN when WT columns don't exist (non-variant inputs). WT fields are for ranking expressions only, not filters. topiary.properties module — compute amino acid properties on the peptide column using vectorized pandas operations. Named groups: - "core": charge, hydrophobicity, aromaticity, molecular_weight - "manufacturability": core + cysteine_count, instability_index, max_7mer_hydrophobicity, difficult_nterm/cterm, asp_pro_bonds - "immunogenicity": core + tcr_charge/aromaticity/hydrophobicity Includes dipeptide-based instability index (Guruprasad et al. 1990). Supports prefix parameter for WT peptide properties. 52 new tests covering all features, edge cases, and error paths.

…rrors - Fix WT docstring showing filter usage that actually raises TypeError - Validate column() names in CLI parser: reject empty names and nested parens - Column.evaluate() gives clear TypeError for non-numeric values - 3 new tests for the above edge cases

DSL extensions: logistic, Column(), WT(), peptide properties

fde8acf

Placeholder commit — implementation follows.

iskandr added 3 commits April 9, 2026 07:08

Remove unused numpy/pandas imports to fix ruff lint

e5a0ac0

iskandr merged commit 125c4b2 into master Apr 9, 2026
8 checks passed

iskandr mentioned this pull request Apr 9, 2026

Integrate Topiary 5.0.0 DSL for epitope filtering and ranking openvax/vaxrank#216

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DSL extensions: logistic, Column, WT, peptide properties#98

DSL extensions: logistic, Column, WT, peptide properties#98
iskandr merged 4 commits into
masterfrom
dsl-extensions

iskandr commented Apr 9, 2026

Uh oh!

coveralls commented Apr 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iskandr commented Apr 9, 2026

Intent

Changes

1. .logistic(midpoint, width) on Expr

2. Column("name") — arbitrary DataFrame column access

3. WT(accessor) — wildtype comparison wrapper

4. topiary.properties — peptide property columns

Not in this PR

Uh oh!

coveralls commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `.logistic(midpoint, width)` on Expr

2. `Column("name")` — arbitrary DataFrame column access

3. `WT(accessor)` — wildtype comparison wrapper

4. `topiary.properties` — peptide property columns

coveralls commented Apr 9, 2026 •

edited

Loading