Enable prefetch stage for StagedTrainPipeline #2239

sarckk · 2024-07-22T17:03:35Z

Summary:
Add ability to run prefetch as a stage in StagedTrainPipeline

Recommended usage to run 3-stage pipeline with data copy, sparse dist and prefetch steps (changes required shown with arrows):

sdd = SparseDataDistUtil(
    model=self._model,
    data_dist_stream=torch.torch.cuda.Stream(),
    prefetch_stream=torch.torch.cuda.Stream(), <--- define prefetch stream
)

pipeline = [
    PipelineStage(
        name="data_copy",
        runnable=lambda batch, context: batch.to(
            self._device, non_blocking=True
        ),
        stream=torch.cuda.Stream(),
    ),
    PipelineStage(
        name="start_sparse_data_dist",
        runnable=sdd.start_sparse_data_dist,
        stream=sdd.data_dist_stream,
        fill_callback=sdd.wait_sparse_data_dist,
    ),
    PipelineStage(
        name="prefetch",
        runnable=sdd.prefetch, <--- add stage with runnable=sdd.prefetch
        stream=sdd.prefetch_stream,
        fill_callback=sdd.load_prefetch, <--- fill_callback of sdd.load_prefetch
    ),
]

return StagedTrainPipeline(pipeline_stages=pipeline)

Order of execution for above pipeline:

Iteration #1:

_fill_pipeline():
batch 0: memcpy, start_sdd, wait_sdd (callback), prefetch, load_prefetch (callback)
batch 1: memcpy, start_sdd, wait_sdd (callback)
batch 2: memcpy

progress():
batch 3: memcpy
batch 2: start_sdd
batch 1: prefetch

after pipeline progress():
model(batch 0)
load_prefetch (prepares for model fwd on batch 1)
wait_sdd (prepares for batch 2 prefetch)

Iteration #2:
progress():
batch 4: memcpy
batch 3: start_sdd
batch 2: prefetch

after pipeline progress():
model(batch 1)
load_prefetch (prepares for model fwd on batch 2)
wait_sdd (prepares for batch 3 prefetch)

Reviewed By: zzzwen

Differential Revision: D59786807

Summary: Add ability to run prefetch as a stage in `StagedTrainPipeline` Recommended usage to run 3-stage pipeline with data copy, sparse dist and prefetch steps (changes required shown with arrows): ``` sdd = SparseDataDistUtil( model=self._model, data_dist_stream=torch.torch.cuda.Stream(), prefetch_stream=torch.torch.cuda.Stream(), <--- define prefetch stream ) pipeline = [ PipelineStage( name="data_copy", runnable=lambda batch, context: batch.to( self._device, non_blocking=True ), stream=torch.cuda.Stream(), ), PipelineStage( name="start_sparse_data_dist", runnable=sdd.start_sparse_data_dist, stream=sdd.data_dist_stream, fill_callback=sdd.wait_sparse_data_dist, ), PipelineStage( name="prefetch", runnable=sdd.prefetch, <--- add stage with runnable=sdd.prefetch stream=sdd.prefetch_stream, fill_callback=sdd.load_prefetch, <--- fill_callback of sdd.load_prefetch ), ] return StagedTrainPipeline(pipeline_stages=pipeline) ``` Order of execution for above pipeline: Iteration pytorch#1: _fill_pipeline(): batch 0: memcpy, start_sdd, wait_sdd (callback), prefetch, load_prefetch (callback) batch 1: memcpy, start_sdd, wait_sdd (callback) batch 2: memcpy progress(): batch 3: memcpy batch 2: start_sdd batch 1: prefetch after pipeline progress(): model(batch 0) load_prefetch (prepares for model fwd on batch 1) wait_sdd (prepares for batch 2 prefetch) Iteration pytorch#2: progress(): batch 4: memcpy batch 3: start_sdd batch 2: prefetch after pipeline progress(): model(batch 1) load_prefetch (prepares for model fwd on batch 2) wait_sdd (prepares for batch 3 prefetch) Reviewed By: zzzwen Differential Revision: D59786807

facebook-github-bot · 2024-07-22T17:04:04Z

This pull request was exported from Phabricator. Differential Revision: D59786807

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 22, 2024

facebook-github-bot added the fb-exported label Jul 22, 2024

facebook-github-bot closed this in 9264186 Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable prefetch stage for StagedTrainPipeline #2239

Enable prefetch stage for StagedTrainPipeline #2239

sarckk commented Jul 22, 2024

facebook-github-bot commented Jul 22, 2024

Enable prefetch stage for StagedTrainPipeline #2239

Enable prefetch stage for StagedTrainPipeline #2239

Conversation

sarckk commented Jul 22, 2024

facebook-github-bot commented Jul 22, 2024