refactor(training): T-CONTEXTS partial — _dump_frozen_dataset takes ctx by frapercan · Pull Request #83 · frapercan/PROTEA

frapercan · 2026-05-08T16:21:25Z

Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Sixth incremental Parameter Object slice. _dump_frozen_dataset was already a thin wrapper over export_reranker_parquets (which now takes ParquetExportContext); this slice removes the redundant repackaging and drops 11 keyword args to 1.

Changes

protea/core/training_dump_helpers.py:
- _dump_frozen_dataset signature collapses 11 args → 1 (ctx: ParquetExportContext).
- Body uses ctx.stage_dir to preserve the legacy dump_dir return-key contract.
- Single call-site in TrainRerankerAutoOperation.execute (dump_helper branch) builds the context inline (store=None + producer fields filled where the wrapper used to fill them internally).
- Forward-reference "ParquetExportContext" in the signature (TYPE_CHECKING import) avoids a circular dep with parquet_export.

Smell budget

75 → 74 offenders. params>6: 20 → 19 (_dump_frozen_dataset retired).

Test plan

poetry run ruff check protea scripts
poetry run flake8 protea/
poetry run pytest tests/ --ignore=tests/test_jobs_pg.py (1163 passed, 11 skipped)
poetry run python scripts/check_smells.py (74 known, none new)

Branch naming

Pushed as feat/t-contexts-dump-frozen-mine because another agent had already taken the more natural feat/t-contexts-dump-frozen for an unrelated PR (#81, web benchmark heatmap).

Collapses ``_dump_frozen_dataset`` from 11 keyword-only args to 1 ``ParquetExportContext`` argument. The helper was already a thin wrapper around ``export_reranker_parquets`` (which now takes the same context); this slice removes the redundant repackaging. The single call-site in ``TrainRerankerAutoOperation.execute`` (``dump_helper`` branch) is updated to build the context inline, filling ``store=None`` + ``producer_version`` + ``producer_git_sha`` the wrapper used to fill internally. Body uses ``ctx.stage_dir`` for the legacy ``dump_dir`` return-key preservation. Forward-reference ``"ParquetExportContext"`` in the signature avoids a circular import (parquet_export pulls heavy deps). Sizes: - training_dump_helpers.py: -25 LOC (wrapper body shrinks; caller picks up 7 LOC for the inline context construction) - Smell baseline: 75 -> 74 (params>6 20 → 19; ``_dump_frozen_dataset`` retired) Local-first 5 verde (ruff + flake8 + pytest 1163 + check_smells).

…baseline post-rebase ruff UP037 flagged the forward-ref quotes; redundant under `from __future__ import annotations`. Smell baseline regenerated to match line offsets after rebasing onto develop.

## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial) Seventh incremental Parameter Object slice. Tackles the 11-arg offender ``_predict_batch`` in ``PredictGOTermsBatchOperation`` — the last named target in master plan v3.2 §24 #6 alongside ``_knn_transfer_and_label`` (#80), ``export_reranker_parquets`` (#83), and ``_dump_frozen_dataset`` (#83). ## Changes - `protea/core/operations/predict_go_terms.py`: - New ``BatchPredictContext`` frozen dataclass bundling the 5 required + 5 optional per-call inputs (queries, references, prediction-set handle, payload, optional enrichment maps). - ``_predict_batch`` signature collapses 11 args to 1 (``ctx``). - Single production call site (``execute`` branch) builds the context inline; ``field(default_factory=dict)`` removes the manual ``ref_sequences = ref_sequences or {}`` reset block. - `tests/test_predict_go_terms.py`: 6 call sites in ``TestPredictBatch`` and ``TestPredictBatchRerankerFeatures`` retargeted to pass a ``BatchPredictContext`` instance. ## Smell budget 74 -> 73 offenders. **params>6: 19 -> 18** (``_predict_batch`` retired). File LOC ticks up 1921 -> 1952 (+31 dataclass) and ``execute`` method 336 -> 338 (+2 multi-line ctor); both pre-existing offenders, baseline refreshed via ``--write-baseline``. ## Test plan - [x] `poetry run ruff check protea scripts` - [x] `poetry run flake8 protea/` - [x] `poetry run mypy protea/core/operations/predict_go_terms.py` (Success: no issues found) - [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163 passed, 11 skipped) - [x] `poetry run python scripts/check_smells.py` (73 known, none new)

frapercan enabled auto-merge (squash) May 8, 2026 16:21

frapercan mentioned this pull request May 8, 2026

refactor(predict): T-CONTEXTS partial — _predict_batch takes ctx #87

Merged

5 tasks

fix(lint): drop UP037 quotes on ParquetExportContext + refresh smell …

29d40b9

…baseline post-rebase ruff UP037 flagged the forward-ref quotes; redundant under `from __future__ import annotations`. Smell baseline regenerated to match line offsets after rebasing onto develop.

frapercan force-pushed the feat/t-contexts-dump-frozen-mine branch from 866fb61 to 29d40b9 Compare May 8, 2026 20:12

Merge branch 'develop' into feat/t-contexts-dump-frozen-mine

38d9309

Merge branch 'develop' into feat/t-contexts-dump-frozen-mine

f50b216

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(training): T-CONTEXTS partial — _dump_frozen_dataset takes ctx#83

refactor(training): T-CONTEXTS partial — _dump_frozen_dataset takes ctx#83
frapercan wants to merge 4 commits intodevelopfrom
feat/t-contexts-dump-frozen-mine

frapercan commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frapercan commented May 8, 2026

Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Changes

Smell budget

Test plan

Branch naming

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant