Skip to content

refactor(training): T-CONTEXTS partial — _dump_frozen_dataset takes ctx#83

Open
frapercan wants to merge 4 commits intodevelopfrom
feat/t-contexts-dump-frozen-mine
Open

refactor(training): T-CONTEXTS partial — _dump_frozen_dataset takes ctx#83
frapercan wants to merge 4 commits intodevelopfrom
feat/t-contexts-dump-frozen-mine

Conversation

@frapercan
Copy link
Copy Markdown
Owner

Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Sixth incremental Parameter Object slice. _dump_frozen_dataset was already a thin wrapper over export_reranker_parquets (which now takes ParquetExportContext); this slice removes the redundant repackaging and drops 11 keyword args to 1.

Changes

  • protea/core/training_dump_helpers.py:
    • _dump_frozen_dataset signature collapses 11 args → 1 (ctx: ParquetExportContext).
    • Body uses ctx.stage_dir to preserve the legacy dump_dir return-key contract.
    • Single call-site in TrainRerankerAutoOperation.execute (dump_helper branch) builds the context inline (store=None + producer fields filled where the wrapper used to fill them internally).
    • Forward-reference "ParquetExportContext" in the signature (TYPE_CHECKING import) avoids a circular dep with parquet_export.

Smell budget

75 → 74 offenders. params>6: 20 → 19 (_dump_frozen_dataset retired).

Test plan

  • poetry run ruff check protea scripts
  • poetry run flake8 protea/
  • poetry run pytest tests/ --ignore=tests/test_jobs_pg.py (1163 passed, 11 skipped)
  • poetry run python scripts/check_smells.py (74 known, none new)

Branch naming

Pushed as feat/t-contexts-dump-frozen-mine because another agent had already taken the more natural feat/t-contexts-dump-frozen for an unrelated PR (#81, web benchmark heatmap).

@frapercan frapercan enabled auto-merge (squash) May 8, 2026 16:21
Collapses ``_dump_frozen_dataset`` from 11 keyword-only args to 1
``ParquetExportContext`` argument. The helper was already a thin
wrapper around ``export_reranker_parquets`` (which now takes the
same context); this slice removes the redundant repackaging.

The single call-site in ``TrainRerankerAutoOperation.execute``
(``dump_helper`` branch) is updated to build the context inline,
filling ``store=None`` + ``producer_version`` + ``producer_git_sha``
the wrapper used to fill internally.

Body uses ``ctx.stage_dir`` for the legacy ``dump_dir`` return-key
preservation. Forward-reference ``"ParquetExportContext"`` in the
signature avoids a circular import (parquet_export pulls heavy deps).

Sizes:
- training_dump_helpers.py: -25 LOC (wrapper body shrinks; caller
  picks up 7 LOC for the inline context construction)
- Smell baseline: 75 -> 74 (params>6 20 → 19;
  ``_dump_frozen_dataset`` retired)

Local-first 5 verde (ruff + flake8 + pytest 1163 + check_smells).
…baseline post-rebase

ruff UP037 flagged the forward-ref quotes; redundant under
`from __future__ import annotations`. Smell baseline regenerated to
match line offsets after rebasing onto develop.
@frapercan frapercan force-pushed the feat/t-contexts-dump-frozen-mine branch from 866fb61 to 29d40b9 Compare May 8, 2026 20:12
frapercan added a commit that referenced this pull request May 8, 2026
## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Seventh incremental Parameter Object slice. Tackles the 11-arg offender
``_predict_batch`` in ``PredictGOTermsBatchOperation`` — the last named
target in master plan v3.2 §24 #6 alongside ``_knn_transfer_and_label``
(#80), ``export_reranker_parquets`` (#83), and ``_dump_frozen_dataset``
(#83).

## Changes

- `protea/core/operations/predict_go_terms.py`:
- New ``BatchPredictContext`` frozen dataclass bundling the 5 required +
5 optional per-call inputs (queries, references, prediction-set handle,
payload, optional enrichment maps).
  - ``_predict_batch`` signature collapses 11 args to 1 (``ctx``).
- Single production call site (``execute`` branch) builds the context
inline; ``field(default_factory=dict)`` removes the manual
``ref_sequences = ref_sequences or {}`` reset block.
- `tests/test_predict_go_terms.py`: 6 call sites in ``TestPredictBatch``
and ``TestPredictBatchRerankerFeatures`` retargeted to pass a
``BatchPredictContext`` instance.

## Smell budget

74 -> 73 offenders. **params>6: 19 -> 18** (``_predict_batch`` retired).
File LOC ticks up 1921 -> 1952 (+31 dataclass) and ``execute`` method
336 -> 338 (+2 multi-line ctor); both pre-existing offenders, baseline
refreshed via ``--write-baseline``.

## Test plan

- [x] `poetry run ruff check protea scripts`
- [x] `poetry run flake8 protea/`
- [x] `poetry run mypy protea/core/operations/predict_go_terms.py`
(Success: no issues found)
- [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163
passed, 11 skipped)
- [x] `poetry run python scripts/check_smells.py` (73 known, none new)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant