# PyTorch 常用代码片段

笔记出处位于知乎文档: [PyTorch常用代码段](https://zhuanlan.zhihu.com/p/104019160)

## 基本配置

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models
# 如果引用torchvision失败，可能是pillow的组件版本过高
# 可以先卸载本地的pillow，然后pip install pillow==6.2.2
import torchvision
print(torch.__version__)
print(torch.version.cuda)
print(torch.backends.cudnn.version())
print(torch.cuda.get_device_name(0))

### 可复现性

在硬件设备（CPU、GPU）不同时，完全的可复现性无法保证，即使随机种子相同。但是，在同一个设备上，应该保证可复现性。具体做法是，在程序开始的时候固定torch的随机种子，同时也把numpy的随机种子固定。

In [None]:
import numpy as np
np.random.seed(0)
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

### 显卡设置

如果只需要一张显卡

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

如果需要指定多张显卡，比如0，1号显卡。

In [None]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'

也可以在命令行运行代码时设置显卡：

In [None]:
# CUDA_VISIBLE_DEVICES=0,1 python train.py

清除显存

In [None]:
torch.cuda.empty_cache()

也可以使用在命令行重置GPU的指令

In [None]:
# nvidia-smi --gpu-reset -i 0

## 张量处理

### 张量的数据类型

PyTorch有9种CPU张量类型和9种GPU张量类型。

<img src="./image/torch_tensor.jpg"  alt="Tensor types" />

### 张量基本信息

In [None]:
tensor = torch.randn(3,4,5)
print(tensor.type())  # 数据类型
print(tensor.size())  # 张量的shape，是个元组
print(tensor.dim())   # 维度的数量

### 命名张量

张量命名是一个非常有用的方法，这样可以方便地使用维度的名字来做索引或其他操作，大大提高了可读性与易用性，防止出错

In [None]:
# 在PyTorch 1.3之前，需要使用注释
# Tensor[N, C, H, W]
images = torch.randn(32, 3, 56, 56)
print(images.size())
images.sum(dim = 1)
images.select(dim = 1, index = 0)

In [None]:
# 只有1.3之后才支持张量命名
NCHW = ['N', 'C', 'H', 'W']
images = torch.randn(32, 3, 56, 56, names=NCHW)
images.sum('C')
images.select('C', index=0)
# 也可以这么设置
tensor = torch.rand(3,4,1,2,names=('C', 'N', 'H', 'W'))
# 使用align_to可以对维度方便地排序
tensor = tensor.align_to('N', 'C', 'H', 'W')

### 数据类型转换

In [None]:
# 设置默认类型，pytorch中的Float Tensor远远快于DoubleTensor
torch.set_default_tensor_type(torch.FloatTensor)

# 类型转换
tensor = tensor.cuda()
tensor = tensor.cpu()
tensor = tensor.float()
tensor = tensor.long()

### torch.Tensor与np.ndarray转换

除了CharTensor，其他所有CPU上的张量都支持转换为numpy格式，然后再转换回来

In [None]:
ndarray = tensor.cpu().numpy()
tensor = torch.from_numpy(ndarray).float()
tensor = torch.from_numpy(ndarray.copy()).float()
type(tensor)

In [None]:
tensor

### Torch.tensor与PIL.Image转换

In [None]:
import PIL

In [None]:
data = torch.clamp(tensor * 255, min = 0, max = 255)
type(data.numpy())

In [None]:
# 不清楚为什么，总是报错
# pytorch中的张量默认采用[N, C, H, W]的顺序，并且数据范围在[0,1]，需要进行转置和规范化
# torch.Tensor -> PIL.Image
image = PIL.Image.fromarray(data).byte().permute(1,2,0).cpu().numpy()
# image = torchvision.transforms.functional.to_pil_image(tensor)

In [None]:
image = torchvision.transforms.functional.to_pil_image(data)

In [None]:
image

In [None]:
import numpy as np

In [None]:
path = r'./image/lstm_3.png'
img_tensor = torch.from_numpy(np.asarray(PIL.Image.open(path))).permute(2,0,1).float() / 255

In [None]:
img_tensor.size()

In [None]:
# 等同torch直接转换
img_tensor = torchvision.transforms.functional.to_tensor(PIL.Image.open(path)) 

In [None]:
img_tensor.size()

In [None]:
new_image = torchvision.transforms.functional.to_pil_image(img_tensor)

In [None]:
new_image

整合为读取图片并显示的方法

In [None]:
# torchvision的to_tensor与to_pil_image与PIL实现方式等价，但代码更简洁
def load_image(img_path: str):
#     img_tensor = torch.from_numpy(np.asarray(PIL.Image.open(img_path))).permute(2,0,1).float() / 255
    img_tensor = torchvision.transforms.functional.to_tensor(PIL.Image.open(path)) 
    new_image = torchvision.transforms.functional.to_pil_image(img_tensor)
    return img_tensor, new_image

In [None]:
img_tensor, new_image = load_image(r'./image/lstm_3.png')

In [None]:
type(new_image)

In [None]:
new_image

In [None]:
tensor

In [None]:
ndarray = tensor.cpu().numpy()

In [None]:
ndarray

In [None]:
tensor.size()

In [None]:
small_img = torchvision.transforms.functional.to_pil_image(tensor)

In [None]:
small_img

In [None]:
to_ndarray = np.asarray(small_img)

In [None]:
to_ndarray.shape

In [None]:
to_ndarray.reshape(3, 4, 5)

### 从只包含一个元素的张量中取值

In [None]:
value = torch.rand(1).item()

In [None]:
value

### 张量形变

In [None]:
# 在将卷积层输入全连接层的情况下通常需要对张量做形变处理，
# 相比torch.view，torch.reshape可以自动处理输入张量不连续的情况。
tensor = torch.rand(2,3,4)
print(tensor)
shape = (6, 4)
tensor = torch.reshape(tensor, shape)

In [None]:
tensor

### 打乱顺序

In [None]:
tensor = torch.rand(3,3,4)
tensor

In [None]:
# 打乱第一个维度
tensor = tensor[torch.randperm(tensor.size(0))]
tensor

In [None]:
# 打乱第一个维度
tensor = tensor[torch.randperm(tensor.size(0))]
tensor

### 水平翻转

In [None]:
# pytorch不支持tensor[::-1]这样的负步长操作，水平翻转可以通过张量索引实现
# 假设张量的维度为[N, D, H, W].
tensor = torch.rand(2, 3, 3, 4)
tensor

In [None]:
tensor = tensor[:, :, :, torch.arange(tensor.size(3) - 1, -1, -1).long()]

In [None]:
tensor

### 交换tensor的维度

In [None]:
origin = torch.randn((2, 3, 4))
origin

In [None]:
transposed = torch.transpose(origin, 1, 2)
transposed

In [None]:
transposed.size()

### 复制张量

In [None]:
tensor = torch.rand(4, 3, 3, 2)

In [None]:
tensor

In [None]:
# Operation                 |  New/Shared memory | Still in computation graph |
tensor.clone()            # |        New         |          Yes               |

In [None]:
tensor.detach()           # |      Shared        |          No                |

In [None]:
# Failed for torch 1.2.0
# tensor.detach.clone()()   # |        New         |          No                |

### 张量拼接

In [None]:
'''
注意torch.cat和torch.stack的区别在于torch.cat沿着给定的维度拼接，
而torch.stack会新增一维。例如当参数是3个10x5的张量，torch.cat的结果是30x5的张量，
而torch.stack的结果是3x10x5的张量。
'''
tensor_list = []
for i in range(3):
    tensor_list.append(torch.rand(4, 3))

tensor_cat = torch.cat(tensor_list, dim=0)
tensor_cat

In [None]:
tensor_stack = torch.stack(tensor_list, dim=0)
tensor_stack

### 将整数标签转为one-hot编码

In [None]:
tensor = torch.tensor([1, 2, 0, 3])
N = tensor.size(0)
num_classes = 4
one_hot = torch.zeros(N, num_classes).long()
one_hot

In [None]:
one_hot.scatter_(dim=1, 
                 index=torch.unsqueeze(tensor, dim=1), 
                 src=torch.ones(N, num_classes).long())

### 得到非零元素

In [None]:
torch.nonzero(tensor) # index of non-zero elements

In [None]:
torch.nonzero(tensor==0) # index of zero elements

In [None]:
torch.nonzero(tensor).size(0) # number of non-zero elements

In [None]:
torch.nonzero(tensor == 0).size(0) # number of zero elements

### 判断两个张量相等

In [None]:
tensor1 = torch.rand([3, 4, 5])
# tensor1 = torch.tensor([[1, 2, 0, 3], [2, 3, 2, 5]])

In [None]:
tensor1

In [None]:
tensor2 = torch.rand([3, 4, 5])
# tensor2 = torch.tensor([1, 2, 0, 3])

In [None]:
tensor2

In [None]:
torch.allclose(tensor1, tensor2)  # float tensor

In [None]:
torch.equal(tensor1, tensor2)     # int tensor

### 张量扩展

In [None]:
# Expand tensor of shape 2*3 to shape 2*3*4*4.
tensor = torch.rand(2, 3)
tensor

In [None]:
torch.reshape(tensor, (2, 3, 1, 1))

In [None]:
torch.reshape(tensor, (2, 3, 1, 1)).expand(2, 3, 4, 4)

### 矩阵乘法

#### 二维矩阵乘法

In [None]:
tensor1_d2 = torch.randn([4, 3])
tensor1_d2

In [None]:
tensor2_d2 = torch.randn([3, 5])
tensor2_d2

In [None]:
torch.mm(tensor1_d2, tensor2_d2)

#### 三维矩阵乘法

In [None]:
tensor1_d3 = torch.randn([3, 4, 3])
tensor1_d3

In [None]:
tensor2_d3 = torch.randn([3, 3, 5])
tensor2_d3

In [None]:
torch.bmm(tensor1_d3, tensor2_d3)

#### 点乘

In [None]:
tensor1_dot = torch.randn([3, 4])
tensor1_dot

In [None]:
tensor2_dot = torch.randn([3, 4])
tensor2_dot

In [None]:
tensor1_dot * tensor2_dot

### 计算两组数据之间的两两欧氏距离

In [None]:
# 利用broadcast机制
X1 = torch.randn([4, 5])
X1

In [None]:
X1[:, None, :]

In [None]:
X2 = torch.randn([4, 5])
X2

In [None]:
X1[:, None, :] - X2

In [None]:
torch.sum((X1[:,None,:] - X2)**2, dim=2)

In [None]:
torch.sqrt(torch.sum((X1[:,None,:] - X2) ** 2, dim = 2))

## 模型定义和操作

### 一个简单两层卷积网络的示例

In [None]:
# convolutional neural network (2 convolutional layers)
class ConvNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(4*4*32, num_classes)
    
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out

In [None]:
model = ConvNet(num_classes=10).to(device)

In [None]:
model

### 双线性汇合

In [None]:
N = 100 # N: number of samples
D = 3 # D: RGB channels
H = 32 # Height
W = 32 # Weidth

In [None]:
origin_X = torch.randn(N, D, H, W)

In [None]:
origin_X.size()

In [None]:
X = torch.reshape(origin_X, (N, D, H * W))                        # Assume X has shape N*D*H*W

In [None]:
X.size()

In [None]:
torch.transpose(X, 1, 2).size()

In [None]:
b_X = torch.bmm(X, torch.transpose(X, 1, 2)) / (H * W) # Bilinear pooling, 得到100 x 3 x 3的张量

In [None]:
b_X[0:2]

In [None]:
assert b_X.size() == (N, D, D)

In [None]:
b_X = torch.reshape(b_X, (N, D, D))

In [None]:
b_X.size()

In [None]:
b_X[0:2]

In [None]:
torch.sqrt(torch.abs(b_X) + 1e-5)[0:2]

In [None]:
b_X = torch.sign(b_X) * torch.sqrt(torch.abs(b_X) + 1e-5) # Signed-sqrt normalization
b_X[0:2]

In [None]:
b_X = torch.nn.functional.normalize(b_X) # L2 normalization

In [None]:
b_X[0:2]

### 多卡同步 BN（Batch normalization）

当使用 torch.nn.DataParallel 将代码运行在多张 GPU 卡上时，PyTorch 的 BN 层默认操作是各卡上数据独立地计算均值和标准差，同步 BN 使用所有卡上的数据一起计算 BN 层的均值和标准差，缓解了当批量大小（batch size）比较小时对均值和标准差估计不准的情况，是在目标检测等任务中一个有效的提升性能的技巧。

In [None]:
sync_bn = torch.nn.SyncBatchNorm(num_features=10, eps=1e-05, momentum=0.1, affine=True, 
                                 track_running_stats=True)

### 将已有网络的所有BN层改为同步BN层

In [None]:
def convertBNtoSyncBN(module, process_group=None):
    '''Recursively replace all BN layers to SyncBN layer.

    Args:
        module[torch.nn.Module]. Network
    '''
    if isinstance(module, torch.nn.modules.batchnorm._BatchNorm):
        sync_bn = torch.nn.SyncBatchNorm(module.num_features, module.eps, module.momentum, 
                                         module.affine, module.track_running_stats, process_group)
        sync_bn.running_mean = module.running_mean
        sync_bn.running_var = module.running_var
        if module.affine:
            sync_bn.weight = module.weight.clone().detach()
            sync_bn.bias = module.bias.clone().detach()
        return sync_bn
    else:
#         for name, child_module in module.named_children():
#             setattr(module, name) = convert_syncbn_model(child_module, process_group=process_group)
        return module

### 类似BN滑动平均

如果要实现类似 BN 滑动平均的操作，在 forward 函数中要使用原地（inplace）操作给滑动平均赋值。

```
class BN(torch.nn.Module)
    def __init__(self):
        ...
        self.register_buffer('running_mean', torch.zeros(num_features))

    def forward(self, X):
        ...
        self.running_mean += momentum * (current - self.running_mean)
```

### 计算模型整体参数量

In [None]:
model = ConvNet(num_classes=10)
num_parameters = sum(torch.numel(parameter) for parameter in model.parameters())
num_parameters

### 查看网络中的参数

可以通过model.state_dict()或者model.named_parameters()函数查看现在的全部可训练参数（包括通过继承得到的父类中的参数）

In [None]:
params = list(model.named_parameters())
for (name, param) in params:
    print(name)
    print(param.grad)

### 模型可视化

首先需要安装: graphviz，路径为：https://graphviz.gitlab.io/_pages/Download/Download_windows.html

安装完毕之后需要设置环境变量路径, 如：C:\Program Files (x86)\Graphviz2.38\bin


然后安装python包：
- pip install torchviz

In [None]:
from torchviz import make_dot, make_dot_from_trace

In [None]:
make_dot(model(torch.randn(18, 1, 16, 16)), params=dict(model.named_parameters()))

In [None]:
test_model = nn.Sequential()
test_model.add_module('W0', nn.Linear(8, 16))
test_model.add_module('tanh', nn.Tanh())
test_model.add_module('W1', nn.Linear(16, 1))

x = torch.randn(1,8)
y = test_model(x)
make_dot(y.mean(), params=dict(test_model.named_parameters()))

### 类似 Keras 的 model.summary() 输出模型信息

安装：
- pip install torchsummary or <br>
- git clone https://github.com/sksq96/pytorch-summary

#### 简单演示

In [None]:
from torchsummary import summary

In [None]:
summary(test_model.to(device), (1, 16, 8))

In [None]:
summary(model.to(device), (1, 16, 16))

#### 自定义模型

In [None]:
class Custom_Net(nn.Module):
    def __init__(self):
        super(Custom_Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

custom_model = Custom_Net().to(device)

summary(custom_model, (1, 28, 28))

#### VGG 16

In [None]:
vgg = models.vgg16().to(device)
summary(vgg, (3, 224, 224))

In [None]:
make_dot(vgg(torch.randn(1, 3, 224, 224).to(device)), params=dict(vgg.named_parameters()))

#### Multiple Inputs

In [None]:
class SimpleConv(nn.Module):
    def __init__(self):
        super(SimpleConv, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 1, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
        )

    def forward(self, x, y):
        x1 = self.features(x)
        x2 = self.features(y)
        return x1, x2

In [None]:
s_model = SimpleConv().to(device)

summary(s_model, [(1, 16, 16), (1, 28, 28)])

### 模型权重初始化

In [None]:
c_model = ConvNet(num_classes=10).to(device)
# 注意 model.modules() 和 model.children() 的区别：
# model.modules() 会迭代地遍历模型的所有子层，而 model.children() 只会遍历模型下的一层。
# 针对不同的layer类型，使用不同的权重初始化方法
for layer in c_model.modules():
    # 何凯明提出了针对于relu的初始化方法。pytorch默认使用kaiming正态分布初始化卷积层参数
    if isinstance(layer, torch.nn.Conv2d):
        torch.nn.init.kaiming_normal_(layer.weight, mode='fan_out',
                                      nonlinearity='relu')
        if layer.bias is not None:
            torch.nn.init.constant_(layer.bias, val=0.0)
    elif isinstance(layer, torch.nn.BatchNorm2d):
        torch.nn.init.constant_(layer.weight, val=1.0)
        torch.nn.init.constant_(layer.bias, val=0.0)
    elif isinstance(layer, torch.nn.Linear):
        torch.nn.init.xavier_normal_(layer.weight)
        if layer.bias is not None:
            torch.nn.init.constant_(layer.bias, val=0.0)

In [None]:
layer.weight

In [None]:
random_value = torch.randn(3, 1, 16, 16)
random_value[0]

In [None]:
# 使用给定的tensor，初始化权重
layer.weight = torch.nn.Parameter(random_value)

In [None]:
layer.weight[0]

In [None]:
torch.cuda.empty_cache()

### 提取模型中的某一层

modules()会返回模型中所有模块的迭代器，它能够访问到最内层。

比如self.layer1.conv1这个模块，还有一个与它们相对应的是name_children()属性以及named_modules(),这两个不仅会返回模块的迭代器，还会返回网络层的名字。

In [None]:
model = ConvNet(num_classes=10).to(device)

In [None]:
model

In [None]:
summary(model.to(device), (1, 16, 16))

In [None]:
# 取模型中的前两层
new_model = nn.Sequential(*list(model.children())[:2])

In [None]:
new_model

In [None]:
summary(new_model.to(device), (1, 16, 16))

In [None]:
conv_model = nn.Sequential()
# 如果希望提取出模型中的所有卷积层，可以像下面这样操作
conv_layer_idx = 0
for layer in model.named_modules():
    if isinstance(layer[1],nn.Conv2d):
        # 原模型的layer名称: layer[0]可能含有.，会引起报错
        # 所以需要重置卷积层名称
        layer_name = 'conv_{0}'.format(conv_layer_idx)
        conv_model.add_module(layer_name, layer[1])
        conv_layer_idx += 1

In [None]:
conv_model

In [None]:
summary(conv_model.to(device), (1, 16, 16))

In [None]:
!nvidia-smi

### 部分层使用预训练模型

注意如果保存的模型是 torch.nn.DataParallel，则当前的模型也需要是

```
model.load_state_dict(torch.load('model.pth'), strict=False)
```

### 将在GPU保存的模型加载到CPU

```
model.load_state_dict(torch.load('model.pth', map_location='cpu'))
```

### 导入另一个模型的相同部分到新的模型

模型导入参数时，如果两个模型结构不一致，则直接导入参数会报错。用下面方法可以把另一个模型的相同的部分导入到新的模型中。

```
# model_new代表新的模型
# model_saved代表其他模型，比如用torch.load导入的已保存的模型
model_new_dict = model_new.state_dict()
model_common_dict = {k:v for k, v in model_saved.items() if k in model_new_dict.keys()}
model_new_dict.update(model_common_dict)
model_new.load_state_dict(model_new_dict)
```

In [None]:
for k, v in model.state_dict().items():
    print(k)

## 数据处理

### 计算数据集的均值和标准差

In [None]:
import os
import cv2
import numpy as np
from torch.utils.data import Dataset
from PIL import Image


def compute_mean_and_std(dataset):
    # 输入PyTorch的dataset，输出均值和标准差
    mean_r = 0
    mean_g = 0
    mean_b = 0

    for img, _ in dataset:
        img = np.asarray(img) # change PIL Image to numpy array
        mean_b += np.mean(img[:, :, 0])
        mean_g += np.mean(img[:, :, 1])
        mean_r += np.mean(img[:, :, 2])

    mean_b /= len(dataset)
    mean_g /= len(dataset)
    mean_r /= len(dataset)

    diff_r = 0
    diff_g = 0
    diff_b = 0

    N = 0

    for img, _ in dataset:
        img = np.asarray(img)

        diff_b += np.sum(np.power(img[:, :, 0] - mean_b, 2))
        diff_g += np.sum(np.power(img[:, :, 1] - mean_g, 2))
        diff_r += np.sum(np.power(img[:, :, 2] - mean_r, 2))

        N += np.prod(img[:, :, 0].shape)

    std_b = np.sqrt(diff_b / N)
    std_g = np.sqrt(diff_g / N)
    std_r = np.sqrt(diff_r / N)

    mean = (mean_b.item() / 255.0, mean_g.item() / 255.0, mean_r.item() / 255.0)
    std = (std_b.item() / 255.0, std_g.item() / 255.0, std_r.item() / 255.0)
    return mean, std

### 得到视频数据基本信息

```
import cv2
video = cv2.VideoCapture(mp4_path)
height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
fps = int(video.get(cv2.CAP_PROP_FPS))
video.release()
```

### TSN 每段（segment）采样一帧视频

```
K = self._num_segments
if is_train:
    if num_frames > K:
        # Random index for each segment.
        frame_indices = torch.randint(
            high=num_frames // K, size=(K,), dtype=torch.long)
        frame_indices += num_frames // K * torch.arange(K)
    else:
        frame_indices = torch.randint(
            high=num_frames, size=(K - num_frames,), dtype=torch.long)
        frame_indices = torch.sort(torch.cat((
            torch.arange(num_frames), frame_indices)))[0]
else:
    if num_frames > K:
        # Middle index for each segment.
        frame_indices = num_frames / K // 2
        frame_indices += num_frames // K * torch.arange(K)
    else:
        frame_indices = torch.sort(torch.cat((                              
            torch.arange(num_frames), torch.arange(K - num_frames))))[0]
assert frame_indices.size() == (K,)
return [frame_indices[i] for i in range(K)]
```

### 常用训练和验证数据预处理

其中 ToTensor 操作会将 PIL.Image 或形状为 H×W×D，数值范围为 [0, 255] 的 np.ndarray 转换为形状为 D×H×W，数值范围为 [0.0, 1.0] 的 torch.Tensor。

In [None]:
import torchvision
train_transform = torchvision.transforms.Compose([
    torchvision.transforms.RandomResizedCrop(size=224,
                                             scale=(0.08, 1.0)),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=(0.485, 0.456, 0.406),
                                     std=(0.229, 0.224, 0.225)),
 ])
val_transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=(0.485, 0.456, 0.406),
                                     std=(0.229, 0.224, 0.225)),
])

### 数据加载与处理

In [None]:
from __future__ import print_function, division
import os
import torch
import pandas as pd
from skimage import io, transform
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils
%matplotlib inline
# Ignore warnings
import warnings
warnings.filterwarnings("ignore")

# plt.ion()   # interactive mode

读取人脸轮廓的数据集

In [None]:
landmarks_frame = pd.read_csv('./faces/face_landmarks.csv')

In [None]:
print('Face number: {0}'.format(len(landmarks_frame)))

读取第3行数据

In [None]:
n = 2
print('Face number: {0}'.format(len(landmarks_frame)))
img_name = landmarks_frame.iloc[n, 0]
landmarks = landmarks_frame.iloc[n, 1:].values
# 将轮廓数据拆为两列，第一列是X轴，第二列是Y轴
landmarks = landmarks.astype('float').reshape(-1, 2)

print('Image name: {}'.format(img_name))
print('Landmarks shape: {}'.format(landmarks.shape))
print('First 4 Landmarks: {}'.format(landmarks[:4]))

展示数据集第3行人脸轮廓的效果

In [None]:
def show_landmarks(image, landmarks):
    """Show image with landmarks"""
    plt.imshow(image)
    plt.scatter(landmarks[:, 0], landmarks[:, 1], s=10, marker='.', c='r')
    plt.pause(0.001)  # pause a bit so that plots are updated

In [None]:
plt.figure()
show_landmarks(io.imread(os.path.join('./faces/', img_name)),
               landmarks)
plt.show()

### Dataset类介绍

#### 原理介绍

torch.utils.data.Dataset是一个PyTorch用来表示数据集的抽象类。

我们用这个类来处理自己的数据集的时候必须继承Dataset,然后重写下面的函数：

```__len__```: 使得len(dataset)返回数据集的大小；

```__getitem__```：使得支持dataset[i]能够返回第i个数据样本这样的下标操作。

#### 创建脸部图像数据集

- 在类的```__init__```函数中完成csv文件的读取工作；
- 在类的```__getitem__```函数中完成图片的读取工作。这样是为了减小内存开销，只要在需要用到的时候才将图片读入。
- 除此，数据集还会接收一个可以选择的参数transform，用来对图像做一些改变，具体的会在下面进行介绍。
- 最终返回的样本数据是一个字典形式的，如下所示：{‘image':image,'landmarks':landmarks}

那么现在我们就可以写出类的定义

In [None]:
class FaceLandmarksDataset(Dataset):
    """Face Landmarks dataset."""

    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        csv_path = os.path.join(root_dir, csv_file)
        self.landmarks_frame = pd.read_csv(csv_path)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.landmarks_frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir,
                                self.landmarks_frame.iloc[idx, 0])
        image = io.imread(img_name)
        landmarks = self.landmarks_frame.iloc[idx, 1:].values
        landmarks = landmarks.astype('float').reshape(-1, 2)
        sample = {'image': image, 'landmarks': landmarks}

        if self.transform:
            sample = self.transform(sample)

        return sample

#### 实例化类

接下来我们对上面定义好的类做实例化，然后在数据样本上进行迭代。我们会打印前4个样本图像及其对应的坐标点。

In [None]:
face_dataset = FaceLandmarksDataset(csv_file='face_landmarks.csv',
                                    root_dir='./faces/')

fig = plt.figure()

for i in range(len(face_dataset)):
    sample = face_dataset[i]
    print(i, sample['image'].shape, sample['landmarks'].shape)
    ax = plt.subplot(1, 4, i + 1)
    plt.tight_layout()
    ax.set_title('Sample #{}'.format(i))
    ax.axis('off')
    show_landmarks(sample['image'], sample['landmarks'])

    if i == 3:
        plt.show()
        break

### Transforms

从上面显示的图片我们可以看到每张图片的大小都不一样，但往往我们在处理神经网络的输入图像的时候都希望它们有一个相对固定的大小。

因此，我们需要一些对图像进行预处理的工作。

#### 实现常用变换功能

我们试着写一下这三个常用的变换功能：

- Rescale：重新调整图像大小；
- RandomCrop：随机从图像中截取一部分；
- ToTensor：将numpy类型表示的图像转换成torch表示的图像。
我们用类而不是函数来实现以上这三个功能，主要是考虑到如果用函数的话，每次都需要传入参数，但是用类就可以省掉很多麻烦。我们只需要实现每个类的```__call__```函数和```__init__```函数。

下面是对这三个功能的实现

In [None]:
class Rescale(object):
    """Rescale the image in a sample to a given size.

    Args:
        output_size (tuple or int): Desired output size. If tuple, output is
            matched to output_size. If int, smaller of image edges is matched
            to output_size keeping aspect ratio the same.
    """
    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        self.output_size = output_size

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']

        h, w = image.shape[:2]
        if isinstance(self.output_size, int):
            if h > w:
                new_h, new_w = self.output_size * h / w, self.output_size
            else:
                new_h, new_w = self.output_size, self.output_size * w / h
        else:
            new_h, new_w = self.output_size

        new_h, new_w = int(new_h), int(new_w)

        img = transform.resize(image, (new_h, new_w))

        # h and w are swapped for landmarks because for images,
        # x and y axes are axis 1 and 0 respectively
        landmarks = landmarks * [new_w / w, new_h / h]

        return {'image': img, 'landmarks': landmarks}

In [None]:
class RandomCrop(object):
    """Crop randomly the image in a sample.

    Args:
        output_size (tuple or int): Desired output size. If int, square crop
            is made.
    """
    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        if isinstance(output_size, int):
            self.output_size = (output_size, output_size)
        else:
            assert len(output_size) == 2
            self.output_size = output_size

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']

        h, w = image.shape[:2]
        new_h, new_w = self.output_size

        top = np.random.randint(0, h - new_h)
        left = np.random.randint(0, w - new_w)

        image = image[top: top + new_h,
                      left: left + new_w]

        landmarks = landmarks - [left, top]

        return {'image': image, 'landmarks': landmarks}

In [None]:
class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""
    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']

        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        image = image.transpose((2, 0, 1))
        return {'image': torch.from_numpy(image),
                'landmarks': torch.from_numpy(landmarks)}

#### 组合以上变换功能

假设我们现在需要将图像的较短边调整到256，然后从中随机截取224的正方形图像。我们就可以调用```torchvision.transforms.Compose```将以上的```Rescale```和```RandomCrop```两个变换组合起来。

以下的代码段展示了分开进行变换以及用```Compose```组合进行变换的结果图

In [None]:
scale = Rescale(256)
crop = RandomCrop(128)
composed = transforms.Compose([Rescale(256),
                               RandomCrop(224)])

# Apply each of the above transforms on sample.
fig = plt.figure()
sample = face_dataset[2]
for i, tsfrm in enumerate([scale, crop, composed]):
    transformed_sample = tsfrm(sample)

    ax = plt.subplot(1, 3, i + 1)
    plt.tight_layout()
    ax.set_title(type(tsfrm).__name__)
    show_landmarks(**transformed_sample)

plt.show()

#### 合并dataset与transform、遍历数据集

简单回顾一下：

- 第３小节我们介绍了dataset类;
- 第４小节我们我们介绍了怎么样实现各个转换函数，然后将其组合起来。

如果你还记得的话，我们在之前定义dataset的时候是有一个transform参数的，但我们在第４节中是先取了样本数据，然后再进行变换操作，并没有将其作为参数传到dataset中。

所以我们现在要做的工作就是将所有的内容集成到一起。每次抽取一个样本，都会有以下步骤：

- 从文件中读取图片；
- 将转换应用于读入的图片；
- 由于做了随机选取的操作，所以起到了数据增强的效果。

其实我们只要把Transform的部分作为形参传入dataset就可以了，其他的都不变。

然后用for循环来依次获得数据集样本。

In [None]:
transformed_dataset = FaceLandmarksDataset(csv_file='face_landmarks.csv',
                                           root_dir='./faces/',
                                           transform=transforms.Compose([
                                               Rescale(256),
                                               RandomCrop(224),
                                               ToTensor()
                                           ]))

for i in range(len(transformed_dataset)):
    sample = transformed_dataset[i]
    print(i, sample['image'].size(), sample['landmarks'].size())
    if i == 3:
        break

### DataLoader类介绍

以上我们已经实现了dataset与transform的合并，也实现了用for循环来获取每一个样本数据，好像事情就已经结束了。

但等等，真的结束了吗？我们好像还落了什么事情，是的没错：

- 按照batch_size获得批量数据；
- 打乱数据顺序；
- 用多线程multiprocessing来加载数据；

torch.utils.data.DataLoader这个类为我们解决了以上所有的问题

只要按照要求设置DataLoader的参数即可:

- 第一个参数传入transformed_dataset，即已经用了transform的Dataset实例。
- 第二个参数传入batch_size，表示每个batch包含多少个数据。
- 第三个参数传入shuffle，布尔型变量，表示是否打乱。
- 第四个参数传入num_workers表示使用几个线程来加载数据。

如下所示即实现了DataLoader函数的使用，及批样本数据的展示。

In [None]:
# Helper function to show a batch
def show_landmarks_batch(sample_batched):
    """Show image with landmarks for a batch of samples."""
    images_batch, landmarks_batch = \
            sample_batched['image'], sample_batched['landmarks']
    batch_size = len(images_batch)
    im_size = images_batch.size(2)

    grid = utils.make_grid(images_batch)
    plt.imshow(grid.numpy().transpose((1, 2, 0)))

    for i in range(batch_size):
        plt.scatter(landmarks_batch[i, :, 0].numpy() + i * im_size,
                    landmarks_batch[i, :, 1].numpy(),
                    s=10, marker='.', c='r')

        plt.title('Batch from dataloader')

In [None]:
# Pytorch在win10中有bug，暂不支持多线程，num_worker只能为0，否则会报：Broken pip的错误
dataloader = DataLoader(transformed_dataset, 
                        batch_size=4,
                        shuffle=True, 
                        num_workers=0)

In [None]:
for i_batch, sample_batched in enumerate(dataloader):
    print(i_batch, 
          sample_batched['image'].size(),
          sample_batched['landmarks'].size())

    # observe 4th batch and stop.
    plt.figure()
    show_landmarks_batch(sample_batched)
    plt.axis('off')
    plt.ioff()
    plt.show()
    if i_batch == 3:
        break

这样呢其实就完成了对数据集完整的处理了。

### torchvision

torchvision包提供了一些常用的数据集和转换函数。使用torchvision甚至不需要自己写处理函数。

在torchvision中最通用的数据集是ImageFolder，它假设数据结构为如下：

```
root/ants/xxx.png
root/ants/xxy.jpeg
root/ants/xxz.png
.
.
.
root/bees/123.jpg
root/bees/nsdf3.png
root/bees/asd932_.png
```

这里的root指代根目录，ants bees指的是不同的类标签，后面的是具体的图片名称。

当然它还提供了对PIL.Image的常用操作，包括RandomHorizontalFlip Scale等等。

以下为用torchvision实现的超简化版本的数据处理方法：

In [None]:
import torch
from torchvision import transforms, datasets

data_transform = transforms.Compose([
        transforms.RandomSizedCrop(500),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
    ])
root_folder = r'./17flowers/train/'
flowers_dataset = datasets.ImageFolder(root=root_folder,
                                       transform=data_transform)
dataset_loader = torch.utils.data.DataLoader(flowers_dataset,
                                             batch_size=4, 
                                             shuffle=True,
                                             num_workers=0)

In [None]:
for dl in dataset_loader:
    print(dl[0].size(), dl[1])
    break

In [None]:
# Helper function to show a batch
def show_flowers_batch(sample_batched):
    """Show image with landmarks for a batch of samples."""
    images_batch, tags_batch = \
            sample_batched[0], sample_batched[1]
    batch_size = len(images_batch)
    im_size = images_batch.size(2)
    grid = utils.make_grid(images_batch)
    plt.imshow(grid.numpy().transpose((1, 2, 0)))

In [None]:
for i_batch, sample_batched in enumerate(dataset_loader):
    print(i_batch, 
          sample_batched[0].size(),
          sample_batched[1])

    # observe 4th batch and stop.
    plt.figure()
    show_flowers_batch(sample_batched)
    plt.axis('off')
    plt.ioff()
    plt.show()
    if i_batch == 3:
        break