refactor(features): T-CONTEXTS partial — KnnEnrichmentContext for enrich_v6_features by frapercan · Pull Request #73 · frapercan/PROTEA

frapercan · 2026-05-08T15:29:55Z

Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

First incremental Parameter Object slice: introduces a local KnnEnrichmentContext dataclass to drop enrich_v6_features below flake8-bugbear's parameter ceiling, without yet bumping protea-contracts to v0.2.0.

Changes

protea/core/feature_enricher.py:
- New @dataclass(frozen=True) KnnEnrichmentContext bundling 6 per-call KNN-side inputs (valid_accessions, query_embeddings, neighbors_by_aspect, go_map_by_aspect, pair_features, pca_state).
- enrich_v6_features signature collapses 9 keyword-only args → 4 (predictions, session, ctx, compute_taxonomy).
- Body reads ctx.<field> instead of locals; library forwarding into protea_method.feature_enricher is unchanged.
protea/core/operations/predict_go_terms.py: single call-site in _run_aspect_separated_knn now constructs the context inline.

Smell budget

76 → 74 offenders. params>6 baseline 24 → 20 (enrich_v6_features retired plus 3 knock-on improvements as the dataclass centralises the type annotations).

Test plan

poetry run ruff check protea scripts
poetry run flake8 protea/
poetry run pytest tests/ --ignore=tests/test_jobs_pg.py (1162 passed, 11 skipped)
poetry run python scripts/check_smells.py (74 known, none new)

Follow-up

Next T-CONTEXTS slice can attack the next top offender (export_reranker_parquets at 16 args, _knn_transfer_and_label at 16 args) once the training_dump_helpers refactor space cools down.

…ich_v6_features Introduces a frozen ``KnnEnrichmentContext`` dataclass in ``feature_enricher.py`` that bundles the six per-call KNN-side inputs (``valid_accessions``, ``query_embeddings``, ``neighbors_by_aspect``, ``go_map_by_aspect``, ``pair_features``, ``pca_state``) so ``enrich_v6_features`` no longer trips flake8-bugbear's parameter ceiling. Signature collapses from 9 keyword-only args to 4 (``predictions``, ``session``, ``ctx``, ``compute_taxonomy``). The single call-site in ``predict_go_terms.PredictGOTermsBatchOperation._run_aspect_separated_knn`` now constructs the context inline; library forwarding into ``protea_method.feature_enricher`` is unchanged. Sizes: - feature_enricher.py: +20 LOC (dataclass + 1 import); enrich body drops 9 explicit args to a `ctx.<field>` access pattern - Smell baseline: 79 -> 74 (params>6 baseline 24 -> 20: enrich offender retired + 3 knock-on improvements as the dataclass centralises the type annotations) This is the first incremental T-CONTEXTS slice — Parameter Object pattern applied locally rather than waiting on the cross-repo ``protea-contracts`` v0.2.0 bump. Future slices can attack the next top offender (``export_reranker_parquets`` at 16 args) once the training_dump_helpers refactor cools down. Local-first 5 verde (ruff + flake8 + pytest 1162 + check_smells).

Introduces ``CafaEvalRunContext`` frozen dataclass in ``_run_cafa_eval_driver.py`` to bundle the 16 per-call inputs that the per-setting NK/LK/PK loop consumes (artifact paths, reranker bundles, delta cohort, scoring snapshot). ``evaluate_all_settings`` signature collapses 18 args → 3 (``session``, ``ctx``, ``emit``). The orchestrator builds the context inline in ``RunCafaEvaluationOperation.execute``; otherwise the loop body reads ``ctx.<field>`` instead of locals. Sizes: - _run_cafa_eval_driver.py: +20 LOC (dataclass) - run_cafa_evaluation.py: -2 LOC (one less call-site arg list) - Smell baseline: 79 -> 78 (`evaluate_all_settings` retired from params>6 list; offset by methods+1 for the dataclass __init__ count; net params>6 24 → 23) Combined with PR #73 (in flight, KnnEnrichmentContext) the params>6 ratchet should land at ~22 once both merge. Local-first 5 verde (ruff + flake8 + pytest 1162 + check_smells).

## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial) Second incremental Parameter Object slice (after PR #73's `KnnEnrichmentContext`): tackles the top params>6 offender, ``evaluate_all_settings``, which I introduced in PR #72 (round 2d). ## Changes - `protea/core/operations/_run_cafa_eval_driver.py`: - New ``@dataclass(frozen=True) CafaEvalRunContext`` bundling 16 per-call inputs (artifact paths, reranker bundles, delta cohort, scoring snapshot). - ``evaluate_all_settings`` signature collapses 18 args → 3 (``session``, ``ctx``, ``emit``). - Loop body reads ``ctx.<field>`` instead of locals. - `protea/core/operations/run_cafa_evaluation.py`: - Single call-site builds the context inline. - Slight LOC reduction. ## Smell budget 79 → 78 offenders. **`evaluate_all_settings` retired from params>6** (24 → 23). Combined with PR #73 (KnnEnrichmentContext, in flight) the ratchet should land at ~22 once both merge. ## Test plan - [x] `poetry run ruff check protea scripts` - [x] `poetry run flake8 protea/` - [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1162 passed, 11 skipped) - [x] `poetry run python scripts/check_smells.py` (78 known, none new) ## Coordination Disjoint from PR #73 (different files). Both are local Parameter Object slices that don't yet require the cross-repo `protea-contracts` v0.2.0 bump.

…#77) ## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial) Third incremental Parameter Object slice (after #73 KnnEnrichmentContext, #75 CafaEvalRunContext). ## Changes - New private ``_RerankerRegistration`` frozen dataclass bundling the nine non-session inputs both register endpoints (multipart + by-reference) feed into ``_register_model``. - ``_register_model`` signature collapses 10 args → 2 (session, reg). - Both call-sites build the registration inline. - Helper is private — only the two router endpoints in this file use it. Tests touch the endpoints by HTTP, never the helper directly, so no test churn. ## Smell budget 78 → 77 offenders. `_register_model` retired from params>6 (23 → 22). ## Test plan - [x] `poetry run ruff check protea scripts` - [x] `poetry run flake8 protea/` - [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163 passed, 11 skipped) - [x] `poetry run python scripts/check_smells.py` (77 known, none new)

## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial) Fourth incremental Parameter Object slice. Tackles the last remaining 16-arg offender in core (after #73, #75, #77). ## Changes - ``protea/core/parquet_export.py``: - New ``@dataclass(frozen=True) ParquetExportContext`` bundling 15 per-call inputs grouped into 3 sections (source shards, dataset identity, publishing). - ``export_reranker_parquets`` signature collapses 16 args → 1 (``ctx``). - Body destructures the context once at the top so the rest of the implementation stays diff-minimal. - ``protea/core/training_dump_helpers.py``: production caller updated. - ``tests/test_parquet_export_boundary.py``: only test that invokes it updated. ## Smell budget 77 → 75 offenders. **params>6: 22 → 20** (`export_reranker_parquets` retired plus 1 knock-on improvement as the dataclass centralises type annotations). ## Test plan - [x] `poetry run ruff check protea scripts` - [x] `poetry run flake8 protea/` - [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163 passed, 11 skipped) - [x] `poetry run python scripts/check_smells.py` (75 known, none new) ## T-CONTEXTS progress (4 slices total) - #73: KnnEnrichmentContext — enrich_v6_features 9 → 4 args - #75: CafaEvalRunContext — evaluate_all_settings 18 → 3 args - #77: _RerankerRegistration — _register_model 10 → 2 args - this PR: ParquetExportContext — export_reranker_parquets 16 → 1 arg Combined: 53 → 10 args across the four entry points; baseline params>6: 24 → 20 (cumulative drop -4).

## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial) Fifth incremental Parameter Object slice. Tackles the remaining 16-arg offender in core training pipeline. ## Changes - `protea/core/training_dump_helpers.py`: - New ``KnnTransferContext`` frozen dataclass bundling the 12 per-call data inputs (queries, references, ontology maps, optional enrichment helpers). - ``_knn_transfer_and_label`` signature collapses 16 args → 5 (``session``, ``p``, ``ctx``, plus existing ``sequence_context`` / ``stream_output``). - Two production call sites updated (train branch + test-stream branch). - `tests/test_knn_streaming_smoke.py`: shared test runner updated. ## Smell budget 77 → 75 offenders. **params>6: 22 → 20** (`_knn_transfer_and_label` retired plus 1 knock-on). ## Test plan - [x] `poetry run ruff check protea scripts` - [x] `poetry run flake8 protea/` - [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163 passed, 11 skipped) - [x] `poetry run python scripts/check_smells.py` (75 known, none new) ## T-CONTEXTS progress (5 slices total, 4 in flight pending CI) - #73 KnnEnrichmentContext — enrich_v6_features 9 → 4 args - #75 CafaEvalRunContext — evaluate_all_settings 18 → 3 args - #77 _RerankerRegistration — _register_model 10 → 2 args - #78 ParquetExportContext — export_reranker_parquets 16 → 1 arg - this PR KnnTransferContext — _knn_transfer_and_label 16 → 5 args Combined: 69 args → 15 across the 5 entry points.

frapercan enabled auto-merge (squash) May 8, 2026 15:29

frapercan force-pushed the feat/t-contexts-parquet-export-ctx branch from c2414cc to 204885b Compare May 8, 2026 15:31

frapercan merged commit 5074f5b into develop May 8, 2026
13 checks passed

frapercan mentioned this pull request May 8, 2026

refactor(cafa-eval): T-CONTEXTS partial — CafaEvalRunContext #75

Merged

4 tasks

frapercan mentioned this pull request May 8, 2026

refactor(reranker-models): T-CONTEXTS partial — _RerankerRegistration #77

Merged

4 tasks

frapercan mentioned this pull request May 8, 2026

refactor(parquet): T-CONTEXTS partial — ParquetExportContext #78

Merged

4 tasks

frapercan mentioned this pull request May 8, 2026

refactor(training): T-CONTEXTS partial — KnnTransferContext #80

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(features): T-CONTEXTS partial — KnnEnrichmentContext for enrich_v6_features#73

refactor(features): T-CONTEXTS partial — KnnEnrichmentContext for enrich_v6_features#73
frapercan merged 1 commit intodevelopfrom
feat/t-contexts-parquet-export-ctx

frapercan commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frapercan commented May 8, 2026

Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Changes

Smell budget

Test plan

Follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant