Expression DSL v2: scoped contexts, len, count, repr round-trips#103
Merged
Conversation
…ctors Add parse_expr() — full recursive descent parser for ranking expressions from CLI strings. Supports all transforms (ascending_cdf, descending_cdf, logistic, clip, hinge, log, log2, log10, log1p, exp, sqrt), arithmetic, aggregations (mean, geomean, minimum, maximum, median), column() refs, wt() wrapper, and method qualification via brackets. Extend --rank-by to accept expression strings alongside simple kind names. Use mhctools predictors_from_args() so --predictors flows through from mhctools. Require mhctools>=3.4.0. Bump version to 4.7.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace WT() wrapper with scope prefix system (wt., shuffled., self.)
for alternate peptide contexts. The scope is a reserved keyword that
modifies which DataFrame column a Field reads from.
Core changes:
- Add scope parameter to Field and KindAccessor
- Add Scope class with wt, shuffled module-level instances
- Add Len(Expr) for peptide length from precomputed column
- Add Count(Expr) for dynamic amino acid counting from peptide string
- Add __repr__ to all Expr subclasses enabling parse→repr→parse round-trips
- Remove WT class entirely (replaced by wt.Affinity.score syntax)
Parser:
- wt, shuffled, self are reserved context keywords
- Support wt.affinity.score, wt.len, wt.count('C'), count('KR'), len
- Fix _build_ranking_strategy to route dotted exprs through parse_expr
- Fix broken test_parse_expr_abs, add parse_expr to __all__
- Remove unused mhc_binding_predictor_from_args import
- Deduplicate transform name set in CLI args
Tests:
- 15 CLI integration tests (test_cli_ranking.py)
- 28 AST↔string equivalence tests
- 25 round-trip tests (parse→repr→parse→evaluate)
- 20 WT→scope, bracket+chain, multi-dot, underscore-qualified tests
- 30 len/count/scope Python API/parsing/round-trip tests
- Update all existing WT() tests to wt. prefix syntax
- Remove dead kind_map block in Scope.__getattr__ (unreachable code)
- Validate Count chars non-empty (reject count(''))
- Export self_scope from __init__.py
- Update README.md, docs/ranking.md, docs/api.md, docs/index.md:
replace all WT() references with wt. scope syntax, add len/count docs
- Add 16 tests: shuffled scope (parse, eval, differential, Python API,
len, count, repr roundtrip), self scope (parse, eval, differential,
Python API, roundtrip), edge cases (nested scope rejected,
empty count rejected, lowercase count normalized, shuffled reserved)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Design
Every prediction row is implicitly scoped to its current peptide. A row can also carry inline references to alternate peptides (wildtype, shuffled, self-proteome) via column prefixes (
wt_,shuffled_,self_).The old
WT(Affinity).scorewrapper looked like a value transformation but was actually a scope change — it readwt_scoreinstead ofscore. This PR replaces it with an explicit scope prefix system wherewt,shuffled, andselfare reserved keywords in the expression DSL.New syntax
Python API
Repr round-tripping
All expression objects now have
__repr__that produces valid DSL strings:Implementation
Core ranking.py changes
Field: Newscopeslot ("","wt_","shuffled_","self_").evaluate()readsself.scope + self.fieldfrom the DataFrame. Scoped fields raiseTypeErroron filter comparisons (<=,>=).KindAccessor: Newscopeslot, propagated toFieldin.value/.score/.rankand all delegating methods.Scope: New class with__getattr__for kind lookup,.count()method,.lenproperty. Module-levelwt = Scope("wt"),shuffled = Scope("shuffled").Len(Expr): Reads{scope}peptide_lengthcolumn. Repr:"len"/"wt.len".Count(Expr): Dynamically counts amino acid chars from{scope}peptidecolumn at eval time. Repr:"count('C')"/"wt.count('KR')".__repr__on all 10 Expr subclasses:_Const,_BinOp,_NormExpr,_SurvivalExpr,_LogisticExpr,_UnaryOp,_ClipExpr,_AggExpr,Column,Field. Includes operator precedence parenthesization and clean float formatting.WTclass: Removed entirely.Parser changes
wt,shuffled,selfare reserved context keywords (_CONTEXT_KEYWORDS)_parse_atomdetectsCONTEXT '.'and delegates to_parse_scoped_atomwhich handleslen,count('X'), and kind references within the scope_parse_kind_accessoracceptsscopeparameter, rejects reserved keywords as kind nameslenkeyword andcount('X')function supported in both default and scoped contextsCLI args.py fixes
mhc_binding_predictor_from_argsimport_build_ranking_strategy: anything with.or(routes throughparse_expr(fixesaffinity.scorevia--rank-by)_TRANSFORM_NAMESfrom ranking.py — then simplified further)Depends on
Tests
524 total tests (was 397), all passing:
_build_ranking_strategywith expression strings, combined with filtersparse_exprproduce identical evaluationparse_expr(repr(parse_expr(text)))evaluates identicallywt.affinity.score,wt.ba, underscore-qualified, in arithmeticwt.Affinity.score,wt.len,wt.count(), reserved keyword errorsDesign doc
docs/expression-semantics.md— full specification of the scope/prefix model, grammar, internal representation, and future context management (predictor.add_context()).