refactor(training): T-CONTEXTS partial — _dump_frozen_dataset takes ctx#83
Open
refactor(training): T-CONTEXTS partial — _dump_frozen_dataset takes ctx#83
Conversation
Collapses ``_dump_frozen_dataset`` from 11 keyword-only args to 1 ``ParquetExportContext`` argument. The helper was already a thin wrapper around ``export_reranker_parquets`` (which now takes the same context); this slice removes the redundant repackaging. The single call-site in ``TrainRerankerAutoOperation.execute`` (``dump_helper`` branch) is updated to build the context inline, filling ``store=None`` + ``producer_version`` + ``producer_git_sha`` the wrapper used to fill internally. Body uses ``ctx.stage_dir`` for the legacy ``dump_dir`` return-key preservation. Forward-reference ``"ParquetExportContext"`` in the signature avoids a circular import (parquet_export pulls heavy deps). Sizes: - training_dump_helpers.py: -25 LOC (wrapper body shrinks; caller picks up 7 LOC for the inline context construction) - Smell baseline: 75 -> 74 (params>6 20 → 19; ``_dump_frozen_dataset`` retired) Local-first 5 verde (ruff + flake8 + pytest 1163 + check_smells).
5 tasks
…baseline post-rebase ruff UP037 flagged the forward-ref quotes; redundant under `from __future__ import annotations`. Smell baseline regenerated to match line offsets after rebasing onto develop.
866fb61 to
29d40b9
Compare
frapercan
added a commit
that referenced
this pull request
May 8, 2026
## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial) Seventh incremental Parameter Object slice. Tackles the 11-arg offender ``_predict_batch`` in ``PredictGOTermsBatchOperation`` — the last named target in master plan v3.2 §24 #6 alongside ``_knn_transfer_and_label`` (#80), ``export_reranker_parquets`` (#83), and ``_dump_frozen_dataset`` (#83). ## Changes - `protea/core/operations/predict_go_terms.py`: - New ``BatchPredictContext`` frozen dataclass bundling the 5 required + 5 optional per-call inputs (queries, references, prediction-set handle, payload, optional enrichment maps). - ``_predict_batch`` signature collapses 11 args to 1 (``ctx``). - Single production call site (``execute`` branch) builds the context inline; ``field(default_factory=dict)`` removes the manual ``ref_sequences = ref_sequences or {}`` reset block. - `tests/test_predict_go_terms.py`: 6 call sites in ``TestPredictBatch`` and ``TestPredictBatchRerankerFeatures`` retargeted to pass a ``BatchPredictContext`` instance. ## Smell budget 74 -> 73 offenders. **params>6: 19 -> 18** (``_predict_batch`` retired). File LOC ticks up 1921 -> 1952 (+31 dataclass) and ``execute`` method 336 -> 338 (+2 multi-line ctor); both pre-existing offenders, baseline refreshed via ``--write-baseline``. ## Test plan - [x] `poetry run ruff check protea scripts` - [x] `poetry run flake8 protea/` - [x] `poetry run mypy protea/core/operations/predict_go_terms.py` (Success: no issues found) - [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163 passed, 11 skipped) - [x] `poetry run python scripts/check_smells.py` (73 known, none new)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)
Sixth incremental Parameter Object slice.
_dump_frozen_datasetwas already a thin wrapper overexport_reranker_parquets(which now takesParquetExportContext); this slice removes the redundant repackaging and drops 11 keyword args to 1.Changes
protea/core/training_dump_helpers.py:_dump_frozen_datasetsignature collapses 11 args → 1 (ctx: ParquetExportContext).ctx.stage_dirto preserve the legacydump_dirreturn-key contract.TrainRerankerAutoOperation.execute(dump_helper branch) builds the context inline (store=None+ producer fields filled where the wrapper used to fill them internally)."ParquetExportContext"in the signature (TYPE_CHECKING import) avoids a circular dep withparquet_export.Smell budget
75 → 74 offenders. params>6: 20 → 19 (
_dump_frozen_datasetretired).Test plan
poetry run ruff check protea scriptspoetry run flake8 protea/poetry run pytest tests/ --ignore=tests/test_jobs_pg.py(1163 passed, 11 skipped)poetry run python scripts/check_smells.py(74 known, none new)Branch naming
Pushed as
feat/t-contexts-dump-frozen-minebecause another agent had already taken the more naturalfeat/t-contexts-dump-frozenfor an unrelated PR (#81, web benchmark heatmap).