Skip to content

Put tensor on different devices does not reduce GPU memory use #86780

@serend1p1ty

Description

@serend1p1ty

🐛 Describe the bug

I tried to use model parallelism with PyTorch.

Firstly, I put all Linears in one cuda device.

import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.linear1 = nn.Linear(1000, 1000)
        self.linear2 = nn.Linear(1000, 1000)

net = Net()
net.cuda(6)

Then I observe PyTorch occupy 14GB memory of device 6.

+-------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2...  Off  | 00000000:DB:00.0 Off |                    0 |
| N/A   33C    P0    54W / 300W |   1421MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2...  Off  | 00000000:DC:00.0 Off |                    0 |
| N/A   32C    P0    56W / 300W |      0MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Secondly, I put these Linears on different devices.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.linear1 = nn.Linear(1000, 1000)
        self.linear2 = nn.Linear(1000, 1000)

net.linear1.cuda(6)
net.linear2.cuda(7)

Then I observe PyTorch occupy 1421MB per GPU. Why model parallelism can not save gpu memory. I think it is ideal to use only 7GB per GPU.

+-------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2...  Off  | 00000000:DB:00.0 Off |                    0 |
| N/A   33C    P0    55W / 300W |   1421MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2...  Off  | 00000000:DC:00.0 Off |                    0 |
| N/A   32C    P0    56W / 300W |   1421MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Versions

Collecting environment information...
PyTorch version: 1.10.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Alibaba Group Enterprise Linux Server 7.2 (Paladin) (x86_64)
GCC version: (GCC) 5.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.17

Python version: 3.8.13 (default, Mar 28 2022, 11:38:47)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-3.10.0-327.ali2018.alios7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB
GPU 4: Tesla V100-SXM2-16GB
GPU 5: Tesla V100-SXM2-16GB
GPU 6: Tesla V100-SXM2-16GB
GPU 7: Tesla V100-SXM2-16GB

Nvidia driver version: 440.64.00
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.19.5
[pip3] torch==1.10.0+cu102
[pip3] torchvision==0.11.1+cu102
[conda] numpy                     1.19.5                    <pip>
[conda] torch                     1.10.0+cu102              <pip>
[conda] torchvision               0.11.1+cu102              <pip>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions