-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Description
🚀 The feature, motivation and pitch
Some torch.compile
backends use optional arguments, such as the tvm backend:
pytorch/torch/_dynamo/backends/tvm.py
Line 17 in c68a94c
def tvm(gm, example_inputs, *, scheduler=None, trials=20000): |
There is already a field in torch.compile
for options
, but these only apply to Torch Inductor
Lines 1533 to 1566 in c68a94c
def compile(model: Optional[Callable] = None, *, | |
fullgraph: builtins.bool = False, | |
dynamic: builtins.bool = False, | |
backend: Union[str, Callable] = "inductor", | |
mode: Union[str, None] = None, | |
options: Optional[Dict[str, Union[str, builtins.int, builtins.bool]]] = None, | |
disable: builtins.bool = False) -> Callable: | |
""" | |
Optimizes given model/function using TorchDynamo and specified backend. | |
Args: | |
model (Callable): Module/function to optimize | |
fullgraph (bool): Whether it is ok to break model into several subgraphs | |
dynamic (bool): Use dynamic shape tracing | |
backend (str or Callable): backend to be used | |
- "inductor" is the default backend, which is a good balance between performance and overhead | |
- Non experimental in-tree backends can be seen with `torch._dynamo.list_backends()` | |
- Experimental or debug in-tree backends can be seen with `torch._dynamo.list_backends(None)` | |
- To register an out-of-tree custom backend: https://pytorch.org/docs/master/dynamo/custom-backends.html | |
mode (str): Can be either "default", "reduce-overhead" or "max-autotune" | |
- "default" is the default mode, which is a good balance between performance and overhead | |
- "reduce-overhead" is a mode that reduces the overhead of python with CUDA graphs, useful for small batches | |
- "max-autotune" is a mode that that leverages Triton based matrix multiplications and convolutions | |
- To see the exact configs that each mode sets you can call `torch._inductor.list_mode_options()` | |
options (dict): A dictionary of options to pass to the backend. Some notable ones to try out are | |
- `epilogue_fusion` which fuses pointwise ops into templates. Requires `max_autotune` to also be set | |
- `max_autotune` which will profile to pick the best matmul configuration | |
- `fallback_random` which is useful when debugging accuracy issues | |
- `shape_padding` which pads matrix shapes to better align loads on GPUs especially for tensor cores | |
- `triton.cudagraphs` which will reduce the overhead of python with CUDA graphs | |
- `trace.enabled` which is the most useful debugging flag to turn on | |
- `trace.graph_diagram` which will show you a picture of your graph after fusion | |
- For inductor you can see the full list of configs that it supports by calling `torch._inductor.list_options()` | |
disable (bool): Turn torch.compile() into a no-op for testing |
The options
argument could be expanded to allow providing custom arguments to any backend. Alternatively, a **kwargs
could be added to the function signature, for keyword arguments which are passed to the backend directly, if specified.
Alternatives
One alternative to providing optional arguments to torch.compile
is to use functools.partial
or lambda
expressions to pre-populate keyword arguments in backends, but this leads to function implementations which are neither clean nor one-liners, and require building a custom backend for each set of desired keyword arguments. This can also require additional infrastructure on the backend side, which can unnecessarily increase the size of backend code. For example, to specify keyword arguments currently, one would need to do:
custom_backend = lambda gm, inputs : my_backend(gm, inputs, a=a, b=b, c=c)
torch.compile(model, backend=custom_backend)
As opposed to the much simpler:
torch.compile(model, backend=custom_backend, a=a, b=b, c=c)
##### OR #####
torch.compile(model, backend=custom_backend, options={"a": a, "b": b, "c": c})
Additional context
No response