Hoist Rosace preprocessing out of the per-replicate loop by odcambc · Pull Request #35 · odcambc/dumpling

odcambc · 2026-05-25T03:44:53Z

build_rosace_object called FilterData / ImputeData / NormalizeData / IntegrateData inside the inner replicate loop. For a condition with N replicates, each function ran N times — once per replicate addition — against progressively more populated state.

Verified against pimentellab/rosace@main (R/preprocessing.R):

FilterData.AssayGrowth writes back to @CountS in-place (no separate filtered slot). Idempotent on a second back-to-back call: the surviving rows already satisfy the predicates, so re-filtering is a no-op. Also clears @norm.counts as a side effect.
ImputeData.AssayGrowth writes imputed values back to @CountS. impute::impute.knn returns the input unchanged when there are no NAs, so re-imputation on already-imputed data is numerically a no-op.
NormalizeData.AssayGrowth writes to a separate @norm.counts slot and recomputes from raw @CountS on each call. Idempotent. The WT baseline is intra-assay (per-replicate), not pooled across replicates — so once-per-rep vs once-per-key produces identical per-assay norm.counts.
IntegrateData.Rosace rebuilds the AssaySet from scratch on each call (R/preprocessing.R:217-238, assayset <- AssaySet()), and AddAssaySetData overwrites by name. Repeated calls just rebuild and replace.

So the per-replicate loop produced the same final scores as the per-condition pattern, with O(N²) wasted preprocessing work. The intermediate AssaySets between loop iterations were also visible to any downstream consumer reading @assay.sets, even though the loop overwrote them seconds later.

Restructure as two passes per condition:
Pass 1 — add every replicate's assay via AddAssayData.
Pass 2 — run Filter / Impute / Normalize / Integrate once on the
consolidated key, matching the vignette pattern in
vignettes/intro_rosace.Rmd lines 65-115.

Also guard against the case where every replicate for a condition has empty counts. The old loop silently dodged this because the per-replicate F/I/N/I never ran. With F/I/N/I now hoisted, the guard is explicit: skip with a warning rather than calling IntegrateData on a key with no assays.

@CountS

build_rosace_object called FilterData / ImputeData / NormalizeData / IntegrateData inside the inner replicate loop. For a condition with N replicates, each function ran N times — once per replicate addition — against progressively more populated state. Verified against pimentellab/rosace@main (R/preprocessing.R): - FilterData.AssayGrowth writes back to @CountS in-place (no separate filtered slot). Idempotent on a second back-to-back call: the surviving rows already satisfy the predicates, so re-filtering is a no-op. Also clears @norm.counts as a side effect. - ImputeData.AssayGrowth writes imputed values back to @CountS. impute::impute.knn returns the input unchanged when there are no NAs, so re-imputation on already-imputed data is numerically a no-op. - NormalizeData.AssayGrowth writes to a separate @norm.counts slot and recomputes from raw @CountS on each call. Idempotent. The WT baseline is intra-assay (per-replicate), not pooled across replicates — so once-per-rep vs once-per-key produces identical per-assay norm.counts. - IntegrateData.Rosace rebuilds the AssaySet from scratch on each call (R/preprocessing.R:217-238, `assayset <- AssaySet()`), and AddAssaySetData overwrites by name. Repeated calls just rebuild and replace. So the per-replicate loop produced the same final scores as the per-condition pattern, with O(N²) wasted preprocessing work. The intermediate AssaySets between loop iterations were also visible to any downstream consumer reading @assay.sets, even though the loop overwrote them seconds later. Restructure as two passes per condition: Pass 1 — add every replicate's assay via AddAssayData. Pass 2 — run Filter / Impute / Normalize / Integrate once on the consolidated key, matching the vignette pattern in vignettes/intro_rosace.Rmd lines 65-115. Also guard against the case where every replicate for a condition has empty counts. The old loop silently dodged this because the per-replicate F/I/N/I never ran. With F/I/N/I now hoisted, the guard is explicit: skip with a warning rather than calling IntegrateData on a key with no assays. No behavior change: same scores, less CPU. The "different / more correct scores" framing from the original commit message was wrong — re-scoring historical datasets is not required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

odcambc merged commit e83d00d into main May 25, 2026

odcambc deleted the rosace-integration branch May 25, 2026 03:45

odcambc restored the rosace-integration branch May 25, 2026 03:49

odcambc deleted the rosace-integration branch May 25, 2026 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hoist Rosace preprocessing out of the per-replicate loop#35

Hoist Rosace preprocessing out of the per-replicate loop#35
odcambc merged 1 commit into
mainfrom
rosace-integration

odcambc commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

odcambc commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant