Using simple cases in pytorch to understanding parallel in AI training/inference.
Unless otherwise specified, all code is run in a linux+DGX A100-40GB+nvcr.io/nvidia/pytorch:23.04-py3(pytorch 2.0) environment.
Please refer to the corresponding installation tutorial for the above environment configuration.
Unless otherwise specified, all code is written by shh2000@github, no code copy from other repos.
Some simple cases in train_basic_model has xx_forward.py, contains only forward(no training) for better understanding.
Cases:
| catagory | task | case | parallel type | api | manual with readme |
|---|---|---|---|---|---|
| train | simple | matmul | None | / | see code |
| data | torch.DDP() | see code | |||
| 1D Tensor | / | see code | |||
| Pipeline | torch Pipe() | / | |||
| C=A*B | 2D-Tensor | / | see code |