# Part I. `cuda` 与 `cpu` 的相关信息和操作

# 1. 查看 PyTorch 版本

## 1.1. macOS

In [2]:
import torch

In [3]:
print(torch.__version__)

1.8.1


## 1.2. MCloud2

### Step 1. slurm 提交脚本 -- `deeplearning.slurm`
```bash
#!/bin/sh
#SBATCH --partition=3080ti
#SBATCH --job-name=deeplearning
#SBATCH --nodes=1		# Number of nodes
#SBATCH --ntasks-per-node=1	# processes per node: 4;  Note: if gpus-per-task=1, ntasks-per-node==gres=gpu:
#SBATCH --gres=gpu:1		# Number of GPUs: 4
#SBATCH --gpus-per-task=1	# Number of GPUs per process

module load intel/2020

# Suboption 1. My conda: fail to call conda 
#  source /data/home/liuhanyu/anaconda3/etc/profile.d/conda.sh
#  module load cuda/11.6
#  conda activate workdir

# Suboption 2. /share/app/anaconda3: 
source /share/app/anaconda3/etc/profile.d/conda.sh
module load cuda/11.3
conda activate mlff

mpirun -np $SLURM_NPROCS python test.py > output 
```

### Step 2. 运行的 Python 脚本 -- `test.py`
```python
import torch

print(torch.cuda.is_available())
print(torch.__version__)
```

### Step 3. 提交任务并查看结果
```bash
$ sbatch deeplearning.slurm
$ cat output
True
1.11.0
```

# 2. 查看所有的可用的 cpu 设备的数量

## 2.1. macOS

In [4]:
print("Available cpu devices: {0}".format(torch.cuda.os.cpu_count()))

Available cpu devices: 28


## 2.2. MCloud2

### Step 1. Python 脚本
```python
import torch

print("Cuda available: ", torch.cuda.is_available())
print("Cuda version: ", torch.__version__)
print("Available Cpu devices: ", torch.cuda.os.cpu_count())
```

### Step 2. 提提交任并查看结果
```bash
$ sbatch deeplearning.slurm
$ cat output
Cuda available:  True
Cuda version:  1.11.0
Available Cpu devices:  28
```

# 3. 查看所有的可用的 GPU 的数量

## 3.1. macOS

In [5]:
print("Available GPU devices: ", torch.cuda.device_count())

Available GPU devices:  0


## 3.2. MCloud2

## Step 1. Python 脚本
```python
import torch


print("Cuda available: ", torch.cuda.is_available())
print("Cuda version: ", torch.__version__)
print("Available CPU devices: ", torch.cuda.os.cpu_count())
print("Available GPU devices: ", torch.cuda.device_count())
```


## Step 2. 提交任务并查看结果
```bash
# Case 1. 
#    #SBATCH --ntasks-per-node=1
#    #SBATCH --gres=gpu:1
$ sbatch deeplearning.slurm
$ cat output
Cuda available:  True
Cuda version:  1.11.0
Available CPU devices:  28
Available GPU devices:  1

# Case 2.
#    #SBATCH --ntasks-per-node=4
#    #SBATCH --gres=gpu:4
$ sbatch deeplearning.slurm
$ cat output
Cuda available:  True
Cuda version:  1.11.0
Available CPU devices:  28
Available GPU devices:  4
Cuda available:  True
Cuda version:  1.11.0
Available CPU devices:  28
Available GPU devices:  4
Cuda available:  True
Cuda version:  1.11.0
Available CPU devices:  28
Available GPU devices:  4
Cuda available:  True
Cuda version:  1.11.0
Available CPU devices:  28
Available GPU devices:  4
```

# 4. 获取 GPU 设备的名称 (e.g. `"cuda:0"`)

## 4.1. MCloud2

### Step 1. Python 脚本
```python
import torch


print("Cuda available: ", torch.cuda.is_available())
print("Cuda version: ", torch.__version__)
print("Available CPU devices: ", torch.cuda.os.cpu_count())
print("Available GPU devices: ", torch.cuda.device_count())
print("GPU device(cuda:0) name: ", torch.cuda.get_device_name("cuda:0"))
```

### Step 2. 提交任务并查看结果
```bash
$ sbatch deeplearning.slurm
$ cat output
Cuda available:  True
Cuda version:  1.11.0
Available CPU devices:  28
Available GPU devices:  1
GPU device(cuda:0) name:  NVIDIA GeForce RTX 3080 Ti
```

# 5. 通过 `torch.device()` 指定设备 (在 MCloud2 上)

## 5.1. 指定 `"cpu:0"` 
### Step 1. Python 脚本
```python
import torch 

cpu_1 = torch.device("cpu:0")
print("CPU device: {0}: {1}".format(cpu_1.type, cpu_1.index))
```

### Step 2. 提交任务并查看结果
```bash
$ sbatch deeplearning.slurm
CPU device: cpu: 0
```

## 5.2. 指定 `"cuda:0"` 设备
### Step 1. Python 脚本
```python
import torch 

gpu_1 = torch.device("cuda:0")
print("GPU device: {0}: {1}".format(gpu_1.type, gpu_1.index))
```

### Step 2. 提交任务并查看结果
```bash
$ sbatch deeplearning.slurm
GPU device: cuda: 0
```

# Part II. `cpu` 和 `cuda` 设备上的 `Tensor`

<font color="coral" size="4">

1. 默认情况下，`tensor` 是在 cpu 设备上创建的，但是可以通过一下方法转移到 gpu 设备上:
    - `cpu_tensor.to(<gpu_device>)`: 将 cpu_tensor 拷贝到 GPU 上
    - `cpu_tensor.cuda(<gpu_device>)`: 将 cpu_tensor 拷贝到 GPU 上
    - `cpu_tensor.copy_(<gpu_tensor>)`: 将gpu上的 gpu_tensor 拷贝到cpu上的 cpu_tensor 中
2. `torch.tensor` 和 `torch.Tensor` 的区别是:
    - `torch.tensor` 可以通过 `device` 指定gpu设备
    - `torch.Tensor` 只能在 cpu 上创建，否则报错。

</font>

# 1. `Tensor` 从 CPU 拷贝到 GPU 上

### Step 1. Python 脚本
```python
import torch 


if torch.cuda.is_available():
    device = torch.device("cuda:0")

# 默认创建的 tensor 是在 CPU 上的
cpu_tensor = torch.Tensor([
                    [1, 4, 7],
                    [3, 6, 9],
                    [2, 5, 8],
                    ])
print(cpu_tensor.device)

# Way 1
gpu_tensor_1 = cpu_tensor.to(device=device)
print("gpu_tensor_1.device: ", gpu_tensor_1.device)

# Way 2
gpu_tensor_2 = cpu_tensor.cuda(device=device)
print("gpu_tensor_2.device: ", gpu_tensor_2.device)

# Way 3
gpu_tensor_3 = cpu_tensor.copy_(gpu_tensor_2)
print("gpu_tensor_3.device: ", gpu_tensor_3.device)
print(gpu_tensor_3)
```

### Step 2. 提交任务并查看结果
```bash
$ sbatch deeplearning.slurm
$ cat output
cpu                             # Generate tensor in default way.
gpu_tensor_1.device:  cuda:0    # Way 1 
gpu_tensor_2.device:  cuda:0    # Way 2
gpu_tensor_3.device:  cpu       # Way 3
tensor([[1., 4., 7.],
        [3., 6., 9.],
        [2., 5., 8.]])
```

# 2. 直接在 GPU 上创建 `Tensor`
### Step 1. Python 脚本
```python
import torch 


### Case 1. 
gpu_tensor_1 = torch.tensor([
                    [2, 5, 8],
                    [1, 4, 7],
                    [3, 6, 9],
                    ],
                    device=torch.device("cuda:0"))
print(gpu_tensor_1.device)

### Case 2.
gpu_tensor_2 = torch.rand(size=(3, 4), device=torch.device("cuda:0"))
print(gpu_tensor_2.device)

### Case 3.
gpu_tensor_3 = torch.ones(size=(3, 4), device=torch.device("cuda:0"))
print(gpu_tensor_3.device)

```

### Step 2. 提交任务并查看结果
```bash
$ sbatch deeplearning.slurm
$ cat output
cuda:0
cuda:0
cuda:0
```