Skip to content

refactor(features): T-CONTEXTS partial — KnnEnrichmentContext for enrich_v6_features#73

Merged
frapercan merged 1 commit intodevelopfrom
feat/t-contexts-parquet-export-ctx
May 8, 2026
Merged

refactor(features): T-CONTEXTS partial — KnnEnrichmentContext for enrich_v6_features#73
frapercan merged 1 commit intodevelopfrom
feat/t-contexts-parquet-export-ctx

Conversation

@frapercan
Copy link
Copy Markdown
Owner

Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

First incremental Parameter Object slice: introduces a local KnnEnrichmentContext dataclass to drop enrich_v6_features below flake8-bugbear's parameter ceiling, without yet bumping protea-contracts to v0.2.0.

Changes

  • protea/core/feature_enricher.py:
    • New @dataclass(frozen=True) KnnEnrichmentContext bundling 6 per-call KNN-side inputs (valid_accessions, query_embeddings, neighbors_by_aspect, go_map_by_aspect, pair_features, pca_state).
    • enrich_v6_features signature collapses 9 keyword-only args → 4 (predictions, session, ctx, compute_taxonomy).
    • Body reads ctx.<field> instead of locals; library forwarding into protea_method.feature_enricher is unchanged.
  • protea/core/operations/predict_go_terms.py: single call-site in _run_aspect_separated_knn now constructs the context inline.

Smell budget

76 → 74 offenders. params>6 baseline 24 → 20 (enrich_v6_features retired plus 3 knock-on improvements as the dataclass centralises the type annotations).

Test plan

  • poetry run ruff check protea scripts
  • poetry run flake8 protea/
  • poetry run pytest tests/ --ignore=tests/test_jobs_pg.py (1162 passed, 11 skipped)
  • poetry run python scripts/check_smells.py (74 known, none new)

Follow-up

Next T-CONTEXTS slice can attack the next top offender (export_reranker_parquets at 16 args, _knn_transfer_and_label at 16 args) once the training_dump_helpers refactor space cools down.

@frapercan frapercan enabled auto-merge (squash) May 8, 2026 15:29
…ich_v6_features

Introduces a frozen ``KnnEnrichmentContext`` dataclass in
``feature_enricher.py`` that bundles the six per-call KNN-side
inputs (``valid_accessions``, ``query_embeddings``,
``neighbors_by_aspect``, ``go_map_by_aspect``, ``pair_features``,
``pca_state``) so ``enrich_v6_features`` no longer trips
flake8-bugbear's parameter ceiling.

Signature collapses from 9 keyword-only args to 4
(``predictions``, ``session``, ``ctx``, ``compute_taxonomy``).
The single call-site in
``predict_go_terms.PredictGOTermsBatchOperation._run_aspect_separated_knn``
now constructs the context inline; library forwarding into
``protea_method.feature_enricher`` is unchanged.

Sizes:
- feature_enricher.py: +20 LOC (dataclass + 1 import); enrich body
  drops 9 explicit args to a `ctx.<field>` access pattern
- Smell baseline: 79 -> 74 (params>6 baseline 24 -> 20: enrich
  offender retired + 3 knock-on improvements as the dataclass
  centralises the type annotations)

This is the first incremental T-CONTEXTS slice — Parameter Object
pattern applied locally rather than waiting on the cross-repo
``protea-contracts`` v0.2.0 bump. Future slices can attack the next
top offender (``export_reranker_parquets`` at 16 args) once the
training_dump_helpers refactor cools down.

Local-first 5 verde (ruff + flake8 + pytest 1162 + check_smells).
@frapercan frapercan force-pushed the feat/t-contexts-parquet-export-ctx branch from c2414cc to 204885b Compare May 8, 2026 15:31
@frapercan frapercan merged commit 5074f5b into develop May 8, 2026
13 checks passed
frapercan added a commit that referenced this pull request May 8, 2026
Introduces ``CafaEvalRunContext`` frozen dataclass in
``_run_cafa_eval_driver.py`` to bundle the 16 per-call inputs that
the per-setting NK/LK/PK loop consumes (artifact paths, reranker
bundles, delta cohort, scoring snapshot).

``evaluate_all_settings`` signature collapses 18 args → 3
(``session``, ``ctx``, ``emit``). The orchestrator builds the
context inline in ``RunCafaEvaluationOperation.execute``; otherwise
the loop body reads ``ctx.<field>`` instead of locals.

Sizes:
- _run_cafa_eval_driver.py: +20 LOC (dataclass)
- run_cafa_evaluation.py: -2 LOC (one less call-site arg list)
- Smell baseline: 79 -> 78 (`evaluate_all_settings` retired from
  params>6 list; offset by methods+1 for the dataclass __init__
  count; net params>6 24 → 23)

Combined with PR #73 (in flight, KnnEnrichmentContext) the params>6
ratchet should land at ~22 once both merge.

Local-first 5 verde (ruff + flake8 + pytest 1162 + check_smells).
frapercan added a commit that referenced this pull request May 8, 2026
## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Second incremental Parameter Object slice (after PR #73's
`KnnEnrichmentContext`): tackles the top params>6 offender,
``evaluate_all_settings``, which I introduced in PR #72 (round 2d).

## Changes

- `protea/core/operations/_run_cafa_eval_driver.py`:
- New ``@dataclass(frozen=True) CafaEvalRunContext`` bundling 16
per-call inputs (artifact paths, reranker bundles, delta cohort, scoring
snapshot).
- ``evaluate_all_settings`` signature collapses 18 args → 3
(``session``, ``ctx``, ``emit``).
  - Loop body reads ``ctx.<field>`` instead of locals.
- `protea/core/operations/run_cafa_evaluation.py`:
  - Single call-site builds the context inline.
  - Slight LOC reduction.

## Smell budget

79 → 78 offenders. **`evaluate_all_settings` retired from params>6** (24
→ 23). Combined with PR #73 (KnnEnrichmentContext, in flight) the
ratchet should land at ~22 once both merge.

## Test plan

- [x] `poetry run ruff check protea scripts`
- [x] `poetry run flake8 protea/`
- [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1162
passed, 11 skipped)
- [x] `poetry run python scripts/check_smells.py` (78 known, none new)

## Coordination

Disjoint from PR #73 (different files). Both are local Parameter Object
slices that don't yet require the cross-repo `protea-contracts` v0.2.0
bump.
frapercan added a commit that referenced this pull request May 8, 2026
…#77)

## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Third incremental Parameter Object slice (after #73
KnnEnrichmentContext, #75 CafaEvalRunContext).

## Changes

- New private ``_RerankerRegistration`` frozen dataclass bundling the
nine non-session inputs both register endpoints (multipart +
by-reference) feed into ``_register_model``.
- ``_register_model`` signature collapses 10 args → 2 (session, reg).
- Both call-sites build the registration inline.
- Helper is private — only the two router endpoints in this file use it.
Tests touch the endpoints by HTTP, never the helper directly, so no test
churn.

## Smell budget

78 → 77 offenders. `_register_model` retired from params>6 (23 → 22).

## Test plan

- [x] `poetry run ruff check protea scripts`
- [x] `poetry run flake8 protea/`
- [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163
passed, 11 skipped)
- [x] `poetry run python scripts/check_smells.py` (77 known, none new)
frapercan added a commit that referenced this pull request May 8, 2026
## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Fourth incremental Parameter Object slice. Tackles the last remaining
16-arg offender in core (after #73, #75, #77).

## Changes

- ``protea/core/parquet_export.py``:
- New ``@dataclass(frozen=True) ParquetExportContext`` bundling 15
per-call inputs grouped into 3 sections (source shards, dataset
identity, publishing).
- ``export_reranker_parquets`` signature collapses 16 args → 1
(``ctx``).
- Body destructures the context once at the top so the rest of the
implementation stays diff-minimal.
- ``protea/core/training_dump_helpers.py``: production caller updated.
- ``tests/test_parquet_export_boundary.py``: only test that invokes it
updated.

## Smell budget

77 → 75 offenders. **params>6: 22 → 20** (`export_reranker_parquets`
retired plus 1 knock-on improvement as the dataclass centralises type
annotations).

## Test plan

- [x] `poetry run ruff check protea scripts`
- [x] `poetry run flake8 protea/`
- [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163
passed, 11 skipped)
- [x] `poetry run python scripts/check_smells.py` (75 known, none new)

## T-CONTEXTS progress (4 slices total)

- #73: KnnEnrichmentContext — enrich_v6_features 9 → 4 args
- #75: CafaEvalRunContext — evaluate_all_settings 18 → 3 args
- #77: _RerankerRegistration — _register_model 10 → 2 args
- this PR: ParquetExportContext — export_reranker_parquets 16 → 1 arg

Combined: 53 → 10 args across the four entry points; baseline params>6:
24 → 20 (cumulative drop -4).
frapercan added a commit that referenced this pull request May 8, 2026
## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Fifth incremental Parameter Object slice. Tackles the remaining 16-arg
offender in core training pipeline.

## Changes

- `protea/core/training_dump_helpers.py`:
- New ``KnnTransferContext`` frozen dataclass bundling the 12 per-call
data inputs (queries, references, ontology maps, optional enrichment
helpers).
- ``_knn_transfer_and_label`` signature collapses 16 args → 5
(``session``, ``p``, ``ctx``, plus existing ``sequence_context`` /
``stream_output``).
- Two production call sites updated (train branch + test-stream branch).
- `tests/test_knn_streaming_smoke.py`: shared test runner updated.

## Smell budget

77 → 75 offenders. **params>6: 22 → 20** (`_knn_transfer_and_label`
retired plus 1 knock-on).

## Test plan

- [x] `poetry run ruff check protea scripts`
- [x] `poetry run flake8 protea/`
- [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163
passed, 11 skipped)
- [x] `poetry run python scripts/check_smells.py` (75 known, none new)

## T-CONTEXTS progress (5 slices total, 4 in flight pending CI)

- #73 KnnEnrichmentContext — enrich_v6_features 9 → 4 args
- #75 CafaEvalRunContext — evaluate_all_settings 18 → 3 args
- #77 _RerankerRegistration — _register_model 10 → 2 args
- #78 ParquetExportContext — export_reranker_parquets 16 → 1 arg
- this PR KnnTransferContext — _knn_transfer_and_label 16 → 5 args

Combined: 69 args → 15 across the 5 entry points.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant