Expression DSL v2: scoped contexts, len, count, repr round-trips by iskandr · Pull Request #103 · openvax/topiary

iskandr · 2026-04-10T05:40:10Z

Design

Every prediction row is implicitly scoped to its current peptide. A row can also carry inline references to alternate peptides (wildtype, shuffled, self-proteome) via column prefixes (wt_, shuffled_, self_).

The old WT(Affinity).score wrapper looked like a value transformation but was actually a scope change — it read wt_score instead of score. This PR replaces it with an explicit scope prefix system where wt, shuffled, and self are reserved keywords in the expression DSL.

New syntax

# Scope prefixes — select alternate peptide context
wt.affinity.score                          # wildtype affinity score
wt.affinity["netmhcpan"].descending_cdf(500, 200)
affinity.score - wt.affinity.score         # differential binding
shuffled.affinity.score                    # shuffled decoy

# Peptide length (precomputed column)
len                                        # peptide_length
wt.len                                     # wt_peptide_length

# Amino acid count (dynamic from peptide string)
count('C')                                 # cysteines in peptide
count('KR')                                # basic residues (K + R)
wt.count('C')                              # cysteines in wt_peptide
count('C') - wt.count('C')                # gained/lost cysteines

# All existing syntax still works
affinity.descending_cdf(500, 200)
0.5 * ba.score + 0.5 * el.score
affinity["netmhcpan"].value.clip(1, 50000).log()
mean(affinity.score, presentation.score)

Python API

from topiary import wt, Affinity, Presentation

wt.Affinity.score                          # Field with scope="wt_"
wt.Affinity["netmhcpan"].descending_cdf(500, 200)
wt.len                                     # Len(scope="wt_")
wt.count("C")                              # Count("C", scope="wt_")

Repr round-tripping

All expression objects now have __repr__ that produces valid DSL strings:

>>> expr = parse_expr("0.5 * affinity.score + 0.5 * wt.affinity.score")
>>> repr(expr)
'0.5 * affinity.score + 0.5 * wt.affinity.score'
>>> parse_expr(repr(expr)).evaluate(df) == expr.evaluate(df)
True

Implementation

Core ranking.py changes

Field: New scope slot ("", "wt_", "shuffled_", "self_"). evaluate() reads self.scope + self.field from the DataFrame. Scoped fields raise TypeError on filter comparisons (<=, >=).
KindAccessor: New scope slot, propagated to Field in .value/.score/.rank and all delegating methods.
Scope: New class with __getattr__ for kind lookup, .count() method, .len property. Module-level wt = Scope("wt"), shuffled = Scope("shuffled").
Len(Expr): Reads {scope}peptide_length column. Repr: "len" / "wt.len".
Count(Expr): Dynamically counts amino acid chars from {scope}peptide column at eval time. Repr: "count('C')" / "wt.count('KR')".
__repr__ on all 10 Expr subclasses: _Const, _BinOp, _NormExpr, _SurvivalExpr, _LogisticExpr, _UnaryOp, _ClipExpr, _AggExpr, Column, Field. Includes operator precedence parenthesization and clean float formatting.
WT class: Removed entirely.

Parser changes

wt, shuffled, self are reserved context keywords (_CONTEXT_KEYWORDS)
_parse_atom detects CONTEXT '.' and delegates to _parse_scoped_atom which handles len, count('X'), and kind references within the scope
_parse_kind_accessor accepts scope parameter, rejects reserved keywords as kind names
len keyword and count('X') function supported in both default and scoped contexts

CLI args.py fixes

Removed unused mhc_binding_predictor_from_args import
Simplified _build_ranking_strategy: anything with . or ( routes through parse_expr (fixes affinity.score via --rank-by)
Removed duplicated transform name set (was hardcoded, now uses _TRANSFORM_NAMES from ranking.py — then simplified further)

Depends on

Add all predictors to CLI, add --predictors flag mhctools#186 (mhctools >= 3.4.0, now published)

Tests

524 total tests (was 397), all passing:

Category	Count	Description
CLI integration	15	`_build_ranking_strategy` with expression strings, combined with filters
AST↔string equivalence	28	Python DSL vs `parse_expr` produce identical evaluation
Round-trip	25	`parse_expr(repr(parse_expr(text)))` evaluates identically
Multi-dot / chain	6	Triple/quad chains, CDF after clip, bracket+chain
Scope prefix	12	`wt.affinity.score`, `wt.ba`, underscore-qualified, in arithmetic
Len / Count	18	Default/scoped, arithmetic, missing column NaN, repr round-trips
Scope Python API	10	`wt.Affinity.score`, `wt.len`, `wt.count()`, reserved keyword errors
Existing (updated)	~410	All WT() tests converted to wt. prefix syntax

Design doc

docs/expression-semantics.md — full specification of the scope/prefix model, grammar, internal representation, and future context management (predictor.add_context()).

…ctors Add parse_expr() — full recursive descent parser for ranking expressions from CLI strings. Supports all transforms (ascending_cdf, descending_cdf, logistic, clip, hinge, log, log2, log10, log1p, exp, sqrt), arithmetic, aggregations (mean, geomean, minimum, maximum, median), column() refs, wt() wrapper, and method qualification via brackets. Extend --rank-by to accept expression strings alongside simple kind names. Use mhctools predictors_from_args() so --predictors flows through from mhctools. Require mhctools>=3.4.0. Bump version to 4.7.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace WT() wrapper with scope prefix system (wt., shuffled., self.) for alternate peptide contexts. The scope is a reserved keyword that modifies which DataFrame column a Field reads from. Core changes: - Add scope parameter to Field and KindAccessor - Add Scope class with wt, shuffled module-level instances - Add Len(Expr) for peptide length from precomputed column - Add Count(Expr) for dynamic amino acid counting from peptide string - Add __repr__ to all Expr subclasses enabling parse→repr→parse round-trips - Remove WT class entirely (replaced by wt.Affinity.score syntax) Parser: - wt, shuffled, self are reserved context keywords - Support wt.affinity.score, wt.len, wt.count('C'), count('KR'), len - Fix _build_ranking_strategy to route dotted exprs through parse_expr - Fix broken test_parse_expr_abs, add parse_expr to __all__ - Remove unused mhc_binding_predictor_from_args import - Deduplicate transform name set in CLI args Tests: - 15 CLI integration tests (test_cli_ranking.py) - 28 AST↔string equivalence tests - 25 round-trip tests (parse→repr→parse→evaluate) - 20 WT→scope, bracket+chain, multi-dot, underscore-qualified tests - 30 len/count/scope Python API/parsing/round-trip tests - Update all existing WT() tests to wt. prefix syntax

coveralls · 2026-04-10T05:49:43Z

coverage: 88.779% (+2.3%) from 86.507% — expression-dsl-v2 into master

- Remove dead kind_map block in Scope.__getattr__ (unreachable code) - Validate Count chars non-empty (reject count('')) - Export self_scope from __init__.py - Update README.md, docs/ranking.md, docs/api.md, docs/index.md: replace all WT() references with wt. scope syntax, add len/count docs - Add 16 tests: shuffled scope (parse, eval, differential, Python API, len, count, repr roundtrip), self scope (parse, eval, differential, Python API, roundtrip), edge cases (nested scope rejected, empty count rejected, lowercase count normalized, shuffled reserved)

iskandr and others added 2 commits April 9, 2026 20:29

iskandr merged commit 5cd2d6b into master Apr 10, 2026
8 checks passed

iskandr deleted the expression-dsl-v2 branch April 10, 2026 06:30

iskandr mentioned this pull request Apr 10, 2026

Release 4.8.0: expression DSL v2, scoped contexts, docs overhaul #104

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expression DSL v2: scoped contexts, len, count, repr round-trips#103

Expression DSL v2: scoped contexts, len, count, repr round-trips#103
iskandr merged 3 commits into
masterfrom
expression-dsl-v2

iskandr commented Apr 10, 2026

Uh oh!

coveralls commented Apr 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iskandr commented Apr 10, 2026

Design

New syntax

Python API

Repr round-tripping

Implementation

Core ranking.py changes

Parser changes

CLI args.py fixes

Depends on

Tests

Design doc

Uh oh!

coveralls commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coveralls commented Apr 10, 2026 •

edited

Loading