# Pytorch基础知识

## 安装教程

* [官网](https://pytorch.org/get-started/previous-versions/)
* [torchaudio](): speech/audio processing
* [torchtext](): natural language processing
* [torchvision](): computer vision
* [skorch](): scikit-learn + PyTorch

## 一些有用的GitHub链接
* [Huggingface Transformers](https://github.com/huggingface/transformers): transformer models: BERT, GPT, ...
* [Fairseq](https://github.com/pytorch/fairseq): sequence modeling for NLP&Speech
* [ESPnet](https://github.com/espnet/espnet): speech recognition, translation, synthesis, ...

### 验证是否安装成功

```python
import torch
x = torch.rand(5, 3)
print(x)

# 验证GPU是否可以使用
import torch
torch.cuda.is_available()
```

## 助教教程

* [p1](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/hw/Pytorch/Pytorch_Tutorial_1.pdf)
* [p2](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/hw/Pytorch/Pytorch_Tutorial_2.pdf)

## 知识点

### PyTorch and TensorFlow

1. PyTorch
    * 开发商: facebook ai
    * 内核: python & c++
    * 调试: 简单
    * 应用: 研究
2. TensorFlow
    * 开发商: google brain
    * 内核: python, c++, javascript, swift
    * 调试: 困难(2.0版本以上简单)
    * 应用: 工业

### Data Type

|data type|dtype|tensor|
| :---- | :---- | :---- |
|32-bit floating point|torch.float|torch.FloatTensor|
|64-bit interger(signed)|torch.long|torch.LongTensor|

### 构造tensor

1. from list or numpy array
2. zeros tensor
3. unit tensor

In [1]:
# from list or numpy array
import torch
import numpy as np
l1 = [[1, -1], [-1, 1]]
x = torch.tensor(l1)
y = torch.from_numpy(np.array(l1))
print('list: \n', l1)
print('from list: \n', x)
print('from numpy array: \n', y)

list: 
 [[1, -1], [-1, 1]]
from list: 
 tensor([[ 1, -1],
        [-1,  1]])
from numpy array: 
 tensor([[ 1, -1],
        [-1,  1]])


In [2]:
# zero tensor and unit tensor
x = torch.zeros((2,2))
y = torch.ones((1,2,5))
print('zero tensor: \n', x)
print('unit tensor: \n', y)

zero tensor: 
 tensor([[0., 0.],
        [0., 0.]])
unit tensor: 
 tensor([[[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]]])


### 操作tensor

1. squeeze: 删除长度为1的指定dimension
2. unsqueeze: 扩展新维度
3. transpose: 转置两个指定维度
4. cat: 拼接多个tensor
5. addition\subtraction\power\summation\mean\

In [3]:
# squeeze
x = torch.zeros((1,2,3))
x.shape

torch.Size([1, 2, 3])

In [4]:
x = x.squeeze(0)  # remove dim=0的维度，因为dim=0的长度是1
x.shape

torch.Size([2, 3])

In [5]:
# unsqueeze
x.shape

torch.Size([2, 3])

In [6]:
x = x.unsqueeze(1)  # 在dim=1维度长插入长度为1的新增维度，也可以在dim=0\dim=2
x.shape

torch.Size([2, 1, 3])

In [7]:
# transpose
x.shape

torch.Size([2, 1, 3])

In [8]:
x = x.transpose(0, 1)  # dim=0和dim=1交换
x.shape

torch.Size([1, 2, 3])

In [9]:
# cat
x = torch.zeros((2,1,3))
y = torch.zeros((2,3,3))
z = torch.zeros((2,2,3))
w = torch.cat([x,y,z], dim=1)
w.shape

torch.Size([2, 6, 3])

In [10]:
# add\sub\power\sum\mean
z = x + y
z = x - y
y = x.pow(2)
y = x.sum()
y = x.mean()

### pytorch-tensor vs numpy

|pytorch|numpy|
| :---- | :---- |
|x.shape|x.shape|
|x.dtype|x.dtype|
|x.reshape/x.view|x.reshape|
|x.squeeze()|x.sequeeze()|
|x.unsqueeze(1)|np.expand_dims(x, 1)|

### Device cpu and gpu

* default: tensors & modules 默认会使用CPU
* 使用cpu: x = x.to('cpu')
* 使用GPU: x = x.to('cuda')
* 检查是否有nvidia gpu: torch.cuda.is_available()
* 多个GPU时：指定'cuda:0', 'cuda:1'...
* 为什么使用gpu：并行计算等

In [11]:
torch.cuda.is_available()

True

In [12]:
x = x.to('cpu')
x

tensor([[[0., 0., 0.]],

        [[0., 0., 0.]]])

In [13]:
x = x.to('cuda')
x

tensor([[[0., 0., 0.]],

        [[0., 0., 0.]]], device='cuda:0')

### 如何计算梯度

In [14]:
x = torch.tensor([[1., 0.], [-1., 1.]], requires_grad=True)  # x=[[1., 0.], [-1., 1.]]
x = x.to('cuda')  # 放gpu计算
z = x.pow(2).sum()  # z=\sum_i^j{x_{ij}^2}
z.backward()  # 求解梯度：\alpha z/ \alpha x_{ij} = 2x_{ij}
x.grad  # 得到梯度

  x.grad  # 得到梯度


In [15]:
x = torch.tensor([[1., 0.], [-1., 1.]], requires_grad=True).to('cuda')  # x=[[1., 0.], [-1., 1.]]
z = x.pow(2).sum()  # z=\sum_i^j{x_{ij}^2}
z.backward()  # 求解梯度：\alpha z/ \alpha x_{ij} = 2x_{ij}
x.grad  # 得到梯度

  x.grad  # 得到梯度


* 上述问题应该是，创建在cpu上的x，再转到cuda上，导致x不是同一个x，则x.grad则是None

In [16]:
x = torch.tensor([[1., 0.], [-1., 1.]], requires_grad=True, device='cuda')  # x=[[1., 0.], [-1., 1.]]
z = x.pow(2).sum()  # z=\sum_i^j{x_{ij}^2}
z.backward()  # 求解梯度：\alpha z/ \alpha x_{ij} = 2x_{ij}
x.grad  # 得到梯度

tensor([[ 2.,  0.],
        [-2.,  2.]], device='cuda:0')

### DNN Training Procedure

![dnn_procedure.jpg](img/dnn_procedure.jpg)

### Dataset & DataLoader

```python
from torch.utils.data import Dataset, DataLoader

class MyDataset(Dataset):
    def __init__(self, file):
        """
        read data & preprocess
        """
        self.data = file
    
    def __getitem__(self, index):
        """
        returns one sample at a time
        """
        return self.data[index]
    
    def __len__(self):
        """
        returns the size of the dataset
        """
        return len(self.data)
```

![dataset_dataloader.jpg](img/dataset_dataloader.jpg)

### Layers

![layers.jpg](img/layers.jpg)

![layers1.jpg](img/layers1.jpg)

![layers2.jpg](img/layers2.jpg)

```python
layer = torch.nn.Linear(32, 64)
layer.weight.shape
layer.bias.shape
```

### 激活函数

![activation_func.jpg](img/activation_func.jpg)

### 损失函数

![loss_func.jpg](img/loss_func.jpg)

### 自定义网络结构

```python
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(10, 32),
            nn.Sigmoid(),
            nn.Linear(32, 1)
        )
        
    def forward(self, x):
        return self.net(x)
```

### 优化函数

![optim_func.jpg](img/optim_func.jpg)

### 模型训练

```python
dataset = MyDataset(file)  # read data via MyDataset
tr_set = DataLoader(dataset, 16, shuffle=True)  # put dataset into DataLoader
model = MyModel().to(device)  # contruct model and move to device(cpu/cuda)
criterion = nn.MSELoss()  # set loss function
optimizer = torch.optim.SGD(model.parameters(), 0.1)  # set optimizer

for epoch in range(n_epochs):  # iterate n_epochs
    model.train()  # set model to train mode
    for x, y in tr_set:  # iterate through the dataloader
        optimizer.zero_grad()  # set gradient to zero
        x, y = x.to(device), y.to(device)  # move data to device(cpu/cuda)
        pred = model(x)  # forward pass(compute output)
        loss = criterion(pred, y)  # compute loss
        loss.backward()  # compute gradient(backpropagation反向传播)
        optimizer.step()  # update model with optimizer
```

### 验证集计算

```python
model.eval()  # set model to evaluation mode
total_loss = 0
for x, y in dv_set:  # iterate through the dataloader
    x, y = x.to(device), y.to(device)  # move data to device(cpu/cuda)
    with torch.no_grad():  # disable gradient calculation
        pred = model(x)  # forward pass(compute output)
        loss = criterion(pred, y)  # compute loss
    total_loss += loss.cpu().item()*len(x)  # accumulate loss
    avg_loss = total_loss / len(dv_set.dataset)  # compute averaged loss
```

### 测试集计算

```python
model.eval()  # set model to evaluation mode
preds = []
for x in tt_set:  # iterate through the dataloader
    x = x.to(device)  # move data to device(cpu/cuda)
    with torch.no_grad():  # disable gradient calculation
        pred = model(x)  # forward pass(compute output)
        preds.append(pred.cpu())  # collect prediction
```

### 保存/恢复模型

```python
# save
torch.save(model.state_dict(), path)

# load
ckpt = torch.load(path)
model.load_state_dict(ckpt)
```

### 一些常见错误

In [17]:
# #  Tensor on Different Device to Model
# model = torch.nn.Linear(5, 1).to('cuda')
# x = torch.Tensor([1,2,3,4,5]).to('cpu')
# y = model(x)

In [18]:
# send the tensor to GPU
model = torch.nn.Linear(5, 1).to('cuda')
x = torch.Tensor([1,2,3,4,5]).to('cuda')
y = model(x)
y.shape

torch.Size([1])

In [19]:
# # Mismatched Dimensions
# x = torch.randn(4, 5)
# y = torch.randn(5, 4)
# z = x + y

In [20]:
# the shape of a tensor is incorrect, use transpose, squeeze, unsqueezeto align the dimensions
x = torch.randn(4, 5)
y = torch.randn(5, 4)
y = y.transpose(0, 1)
z = x + y
z.shape

torch.Size([4, 5])

In [21]:
# # cuda out of memory
# import torch
# import torchvision.models as models
# resnet18 = models.resnet18().to('cuda')
# data = torch.randn(512, 3, 244, 244)
# out = resnet18(data.to('cuda'))
# out.shape

In [22]:
# # The batch size of data is too large to fit in the GPU. Reduce the batch size
# import torch
# import torchvision.models as models
# resnet18 = models.resnet18().to('cuda')
# data = torch.randn(512, 3, 244, 244)
# data.shape
# for d in data:
#     out = resnet18(data.to('cuda').unsqueeze(0))
# out.shape

In [23]:
# #  Mismatched Tensor Type
# import torch.nn as nn
# L = nn.CrossEntropyLoss()
# outs = torch.randn(5, 5)
# labels = torch.Tensor([1,2,3,4,0])
# lossval = L(outs, labels)
# lossval

In [24]:
import torch.nn as nn
L = nn.CrossEntropyLoss()
outs = torch.randn(5, 5)
labels = torch.Tensor([1,2,3,4,0])
labels = labels.long()
lossval = L(outs, labels)
lossval

tensor(1.9904)