-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Description
🐛 Describe the bug
I tried to use model parallelism with PyTorch.
Firstly, I put all Linears in one cuda device.
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.linear1 = nn.Linear(1000, 1000)
self.linear2 = nn.Linear(1000, 1000)
net = Net()
net.cuda(6)Then I observe PyTorch occupy 14GB memory of device 6.
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... Off | 00000000:DB:00.0 Off | 0 |
| N/A 33C P0 54W / 300W | 1421MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... Off | 00000000:DC:00.0 Off | 0 |
| N/A 32C P0 56W / 300W | 0MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Secondly, I put these Linears on different devices.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.linear1 = nn.Linear(1000, 1000)
self.linear2 = nn.Linear(1000, 1000)
net.linear1.cuda(6)
net.linear2.cuda(7)
Then I observe PyTorch occupy 1421MB per GPU. Why model parallelism can not save gpu memory. I think it is ideal to use only 7GB per GPU.
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... Off | 00000000:DB:00.0 Off | 0 |
| N/A 33C P0 55W / 300W | 1421MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... Off | 00000000:DC:00.0 Off | 0 |
| N/A 32C P0 56W / 300W | 1421MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Versions
Collecting environment information...
PyTorch version: 1.10.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Alibaba Group Enterprise Linux Server 7.2 (Paladin) (x86_64)
GCC version: (GCC) 5.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.17
Python version: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-3.10.0-327.ali2018.alios7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB
GPU 4: Tesla V100-SXM2-16GB
GPU 5: Tesla V100-SXM2-16GB
GPU 6: Tesla V100-SXM2-16GB
GPU 7: Tesla V100-SXM2-16GB
Nvidia driver version: 440.64.00
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.19.5
[pip3] torch==1.10.0+cu102
[pip3] torchvision==0.11.1+cu102
[conda] numpy 1.19.5 <pip>
[conda] torch 1.10.0+cu102 <pip>
[conda] torchvision 0.11.1+cu102 <pip>
Metadata
Metadata
Assignees
Labels
No labels