Skip to content

Expression DSL v2: scoped contexts, len, count, repr round-trips#103

Merged
iskandr merged 3 commits into
masterfrom
expression-dsl-v2
Apr 10, 2026
Merged

Expression DSL v2: scoped contexts, len, count, repr round-trips#103
iskandr merged 3 commits into
masterfrom
expression-dsl-v2

Conversation

@iskandr
Copy link
Copy Markdown
Contributor

@iskandr iskandr commented Apr 10, 2026

Design

Every prediction row is implicitly scoped to its current peptide. A row can also carry inline references to alternate peptides (wildtype, shuffled, self-proteome) via column prefixes (wt_, shuffled_, self_).

The old WT(Affinity).score wrapper looked like a value transformation but was actually a scope change — it read wt_score instead of score. This PR replaces it with an explicit scope prefix system where wt, shuffled, and self are reserved keywords in the expression DSL.

New syntax

# Scope prefixes — select alternate peptide context
wt.affinity.score                          # wildtype affinity score
wt.affinity["netmhcpan"].descending_cdf(500, 200)
affinity.score - wt.affinity.score         # differential binding
shuffled.affinity.score                    # shuffled decoy

# Peptide length (precomputed column)
len                                        # peptide_length
wt.len                                     # wt_peptide_length

# Amino acid count (dynamic from peptide string)
count('C')                                 # cysteines in peptide
count('KR')                                # basic residues (K + R)
wt.count('C')                              # cysteines in wt_peptide
count('C') - wt.count('C')                # gained/lost cysteines

# All existing syntax still works
affinity.descending_cdf(500, 200)
0.5 * ba.score + 0.5 * el.score
affinity["netmhcpan"].value.clip(1, 50000).log()
mean(affinity.score, presentation.score)

Python API

from topiary import wt, Affinity, Presentation

wt.Affinity.score                          # Field with scope="wt_"
wt.Affinity["netmhcpan"].descending_cdf(500, 200)
wt.len                                     # Len(scope="wt_")
wt.count("C")                              # Count("C", scope="wt_")

Repr round-tripping

All expression objects now have __repr__ that produces valid DSL strings:

>>> expr = parse_expr("0.5 * affinity.score + 0.5 * wt.affinity.score")
>>> repr(expr)
'0.5 * affinity.score + 0.5 * wt.affinity.score'
>>> parse_expr(repr(expr)).evaluate(df) == expr.evaluate(df)
True

Implementation

Core ranking.py changes

  • Field: New scope slot ("", "wt_", "shuffled_", "self_"). evaluate() reads self.scope + self.field from the DataFrame. Scoped fields raise TypeError on filter comparisons (<=, >=).
  • KindAccessor: New scope slot, propagated to Field in .value/.score/.rank and all delegating methods.
  • Scope: New class with __getattr__ for kind lookup, .count() method, .len property. Module-level wt = Scope("wt"), shuffled = Scope("shuffled").
  • Len(Expr): Reads {scope}peptide_length column. Repr: "len" / "wt.len".
  • Count(Expr): Dynamically counts amino acid chars from {scope}peptide column at eval time. Repr: "count('C')" / "wt.count('KR')".
  • __repr__ on all 10 Expr subclasses: _Const, _BinOp, _NormExpr, _SurvivalExpr, _LogisticExpr, _UnaryOp, _ClipExpr, _AggExpr, Column, Field. Includes operator precedence parenthesization and clean float formatting.
  • WT class: Removed entirely.

Parser changes

  • wt, shuffled, self are reserved context keywords (_CONTEXT_KEYWORDS)
  • _parse_atom detects CONTEXT '.' and delegates to _parse_scoped_atom which handles len, count('X'), and kind references within the scope
  • _parse_kind_accessor accepts scope parameter, rejects reserved keywords as kind names
  • len keyword and count('X') function supported in both default and scoped contexts

CLI args.py fixes

  • Removed unused mhc_binding_predictor_from_args import
  • Simplified _build_ranking_strategy: anything with . or ( routes through parse_expr (fixes affinity.score via --rank-by)
  • Removed duplicated transform name set (was hardcoded, now uses _TRANSFORM_NAMES from ranking.py — then simplified further)

Depends on

Tests

524 total tests (was 397), all passing:

Category Count Description
CLI integration 15 _build_ranking_strategy with expression strings, combined with filters
AST↔string equivalence 28 Python DSL vs parse_expr produce identical evaluation
Round-trip 25 parse_expr(repr(parse_expr(text))) evaluates identically
Multi-dot / chain 6 Triple/quad chains, CDF after clip, bracket+chain
Scope prefix 12 wt.affinity.score, wt.ba, underscore-qualified, in arithmetic
Len / Count 18 Default/scoped, arithmetic, missing column NaN, repr round-trips
Scope Python API 10 wt.Affinity.score, wt.len, wt.count(), reserved keyword errors
Existing (updated) ~410 All WT() tests converted to wt. prefix syntax

Design doc

docs/expression-semantics.md — full specification of the scope/prefix model, grammar, internal representation, and future context management (predictor.add_context()).

iskandr and others added 2 commits April 9, 2026 20:29
…ctors

Add parse_expr() — full recursive descent parser for ranking expressions
from CLI strings. Supports all transforms (ascending_cdf, descending_cdf,
logistic, clip, hinge, log, log2, log10, log1p, exp, sqrt), arithmetic,
aggregations (mean, geomean, minimum, maximum, median), column() refs,
wt() wrapper, and method qualification via brackets.

Extend --rank-by to accept expression strings alongside simple kind names.
Use mhctools predictors_from_args() so --predictors flows through from
mhctools. Require mhctools>=3.4.0. Bump version to 4.7.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace WT() wrapper with scope prefix system (wt., shuffled., self.)
for alternate peptide contexts. The scope is a reserved keyword that
modifies which DataFrame column a Field reads from.

Core changes:
- Add scope parameter to Field and KindAccessor
- Add Scope class with wt, shuffled module-level instances
- Add Len(Expr) for peptide length from precomputed column
- Add Count(Expr) for dynamic amino acid counting from peptide string
- Add __repr__ to all Expr subclasses enabling parse→repr→parse round-trips
- Remove WT class entirely (replaced by wt.Affinity.score syntax)

Parser:
- wt, shuffled, self are reserved context keywords
- Support wt.affinity.score, wt.len, wt.count('C'), count('KR'), len
- Fix _build_ranking_strategy to route dotted exprs through parse_expr
- Fix broken test_parse_expr_abs, add parse_expr to __all__
- Remove unused mhc_binding_predictor_from_args import
- Deduplicate transform name set in CLI args

Tests:
- 15 CLI integration tests (test_cli_ranking.py)
- 28 AST↔string equivalence tests
- 25 round-trip tests (parse→repr→parse→evaluate)
- 20 WT→scope, bracket+chain, multi-dot, underscore-qualified tests
- 30 len/count/scope Python API/parsing/round-trip tests
- Update all existing WT() tests to wt. prefix syntax
@coveralls
Copy link
Copy Markdown

coveralls commented Apr 10, 2026

Coverage Status

coverage: 88.779% (+2.3%) from 86.507% — expression-dsl-v2 into master

- Remove dead kind_map block in Scope.__getattr__ (unreachable code)
- Validate Count chars non-empty (reject count(''))
- Export self_scope from __init__.py
- Update README.md, docs/ranking.md, docs/api.md, docs/index.md:
  replace all WT() references with wt. scope syntax, add len/count docs
- Add 16 tests: shuffled scope (parse, eval, differential, Python API,
  len, count, repr roundtrip), self scope (parse, eval, differential,
  Python API, roundtrip), edge cases (nested scope rejected,
  empty count rejected, lowercase count normalized, shuffled reserved)
@iskandr iskandr merged commit 5cd2d6b into master Apr 10, 2026
8 checks passed
@iskandr iskandr deleted the expression-dsl-v2 branch April 10, 2026 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants