Skip to content

refactor(predict): T-CONTEXTS partial — _predict_batch takes ctx#87

Merged
frapercan merged 2 commits intodevelopfrom
exec/feat/t-contexts-predict-batch
May 8, 2026
Merged

refactor(predict): T-CONTEXTS partial — _predict_batch takes ctx#87
frapercan merged 2 commits intodevelopfrom
exec/feat/t-contexts-predict-batch

Conversation

@frapercan
Copy link
Copy Markdown
Owner

Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Seventh incremental Parameter Object slice. Tackles the 11-arg offender _predict_batch in PredictGOTermsBatchOperation — the last named target in master plan v3.2 §24 #6 alongside _knn_transfer_and_label (#80), export_reranker_parquets (#83), and _dump_frozen_dataset (#83).

Changes

  • protea/core/operations/predict_go_terms.py:
    • New BatchPredictContext frozen dataclass bundling the 5 required + 5 optional per-call inputs (queries, references, prediction-set handle, payload, optional enrichment maps).
    • _predict_batch signature collapses 11 args to 1 (ctx).
    • Single production call site (execute branch) builds the context inline; field(default_factory=dict) removes the manual ref_sequences = ref_sequences or {} reset block.
  • tests/test_predict_go_terms.py: 6 call sites in TestPredictBatch and TestPredictBatchRerankerFeatures retargeted to pass a BatchPredictContext instance.

Smell budget

74 -> 73 offenders. params>6: 19 -> 18 (_predict_batch retired). File LOC ticks up 1921 -> 1952 (+31 dataclass) and execute method 336 -> 338 (+2 multi-line ctor); both pre-existing offenders, baseline refreshed via --write-baseline.

Test plan

  • poetry run ruff check protea scripts
  • poetry run flake8 protea/
  • poetry run mypy protea/core/operations/predict_go_terms.py (Success: no issues found)
  • poetry run pytest tests/ --ignore=tests/test_jobs_pg.py (1163 passed, 11 skipped)
  • poetry run python scripts/check_smells.py (73 known, none new)

Seventh incremental Parameter Object slice. Tackles the 11-arg offender
in PredictGOTermsBatchOperation, the last named target in master plan
v3.2 §24 #6.

Changes:
- protea/core/operations/predict_go_terms.py:
  - New BatchPredictContext frozen dataclass bundling the 5 required +
    5 optional per-call inputs (queries, references, prediction-set
    handle, payload, optional enrichment maps).
  - _predict_batch signature collapses 11 args to 1 (ctx).
  - Single production call site (execute branch) builds the context
    inline. dataclass field defaults remove the manual None-guard
    block that used to reset optional dicts on entry.
- tests/test_predict_go_terms.py: 6 call sites in TestPredictBatch and
  TestPredictBatchRerankerFeatures updated to pass BatchPredictContext.

Smell budget: 74 -> 73 offenders. params>6: 19 -> 18 (_predict_batch
retired). File LOC and execute method LOC tick up by 31 / 2 respectively
(dataclass + multi-line ctor); both pre-existing offenders, baseline
refreshed via --write-baseline.

Local-first 5 verde: ruff, flake8, mypy (predict_go_terms clean; 13
pre-existing test errors on develop unchanged), pytest tests/ (1163
passed, 11 skipped), check_smells.py.
@frapercan frapercan added the loop:executor PRs from the executor loop (master plan §24) label May 8, 2026
@frapercan frapercan enabled auto-merge (squash) May 8, 2026 20:11
@frapercan frapercan merged commit 7944c56 into develop May 8, 2026
13 checks passed
frapercan added a commit that referenced this pull request May 8, 2026
…#94)

## Acceptance criteria (master plan §24 Fase 1 — T-CONTEXTS partial)

Tenth incremental Parameter Object slice. Tackles the 8-arg KNN helper
for the aspect-separated path in ``PredictGOTermsBatchOperation``, the
largest remaining unnamed params offender in ``predict_go_terms`` after
``_predict_batch`` (#87). Satisfies AC \"ninguna firma productiva >6
args\".

## Changes

- `protea/core/operations/predict_go_terms.py`:
- New ``AspectSeparatedKnnContext`` frozen dataclass alongside
``BatchPredictContext`` (6 fields: ``valid_accessions``,
``query_embeddings``, ``ref_data_by_aspect``, ``annotation_set_id``,
``prediction_set_id``, ``payload``).
- ``_run_aspect_separated_knn`` signature collapses 8 args to 3
(``self``, ``session``, ``ctx``). Body unpacks ctx fields onto locals at
entry to keep the ~200-LOC function body diff-free below the signature
change.
- Single production call site (``execute`` branch) builds the context
inline. No tests reference ``_run_aspect_separated_knn`` directly (the
path is exercised via the ``execute()`` integration suite).

## Smell budget

**params>6: 16 -> 15** (``_run_aspect_separated_knn`` retired). File LOC
ticks up 1952 -> 1978 (+26 dataclass) and ``execute`` method 338 -> 340
(+2 multi-line ctor); both pre-existing offenders, baseline refreshed
via ``--write-baseline``.

## Test plan

- [x] `poetry run ruff check protea scripts`
- [x] `poetry run flake8 protea/`
- [x] `poetry run mypy protea/core/operations/predict_go_terms.py`
(Success: no issues found)
- [x] `poetry run pytest tests/ --ignore=tests/test_jobs_pg.py` (1163
passed, 11 skipped)
- [x] `poetry run python scripts/check_smells.py`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

loop:executor PRs from the executor loop (master plan §24)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant