torch.profiler warmup options and documentation

I'm using torch.profiler to profile model inference. When I first started using the profiler, I was very confused the overall inference time to be ~20x higher than it should be. At the time, I was manually warming up my model by running the model before profiling. I later discovered that the large initial latency was a result of not warming up through the profiler after looking into the source code of torch.profiler and seeing that `_default_schedule_fn` causes torch.profiler to record without warming up.

This doesn't seem obvious considering that the [documentation](https://pytorch.org/docs/stable/profiler.html) and [tutorial](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html) demonstrate code snippets that do not warmup. For example,
```
with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
    with record_function("model_inference"):
        model(inputs)
```
The only mention of the need for warming up is [step 8 of the tutorial](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html#using-profiler-to-analyze-long-running-jobs), which is for scheduling on iterations. For code snippets such as the one above, it's not clear how to warmup.

My hacky workarounds look like this:
```
    with torch.profiler.profile(
        activities=[
            ProfilerActivity.CPU,
            ProfilerActivity.CUDA,
        ],
        schedule=lambda step: torch.profiler.ProfilerAction.WARMUP,
    ) as prof:
        // Code to profile

    with torch.profiler.profile(
        activities=[
            ProfilerActivity.CPU,
            ProfilerActivity.CUDA,
        ],
    ) as prof:
        // Code to profile
```
```
   with torch.profiler.profile(
        activities=[
            ProfilerActivity.CPU,
            ProfilerActivity.CUDA,
        ],
        schedule=torch.profiler.schedule(
            wait=0,
            warmup=1,
            active=1,
        ),
    ) as prof:
        for iter in range(2):
            // Code to profile
            prof.step()
```

Is there a better solution for warming up a single code block to execute? Could the need for warming up be made more explicitly clear in the documentation and tutorial?

Sorry if this issue would be more appropriate in the PyTorch repository. It seems like it would take longer for a response considering there are 5k+ issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

torch.profiler warmup options and documentation #328

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

torch.profiler warmup options and documentation #328

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions