### 使用torchvision.models.detection模型训练与推理
[参考branch"pytorch/vision/tree/temp-tutorial"](https://github.com/pytorch/vision/tree/temp-tutorial/tutorials)<br>
##### 1. 分类、检测、分割以及关键点检测的模型模块在 [pytorch/vision/models](https://github.com/pytorch/vision/tree/master/torchvision/models)<br>
&ensp;本地安装的`torchvision`与官方发布的最新版本有很大不同, 所以本地使用的时候以已经安装的`torchvision`库为标准参照.
##### 2. 微调、训练模型的参考代码位于 [pytorch/vision/references](https://github.com/pytorch/vision/tree/master/references)<br>
##### 3. 依赖安装项 `cocoapi, torchvision >= 0.3, pytorch, opencv-python, pillow`
##### 4. baseline
(1)models<br>
`backbone(num_class, outchannels, resnet, fpn), rpn(anchor_generator), predictor(fastrcnn_predictor, maskrcnnpredictor)`<br>
(2)datasets<br>
(3)transforms<br>
&ensp;`torchvision`内的`transforms`提供了检测模型最基本的`transforms`方法，`normalize(inputs), resize(inputs, targets)`, 所以在定义数据集的`transforms`时, 只需给定其他数据增强方法;<br>
(4)train<br>
&ensp; optimizer, lr_scheduler<br>
(5)test/evaluation<br>
&ensp; 模型训练过程中需要评估训练效果, 为避免重复造轮子浪费时间, 可以直接利用cocoapi的evaluate方法来评估模型在自己的数据集上的表现性能.所以需要将自己的数据集包装成coco数据集的形式.<br>
&ensp; 而`coco_dataset`主要的特点就是, 标注是一个字典, 主要内容有3个, `{"catogories":[]], "images":[], "annotations":[]}`(具体格式详细参考coco数据集说明文档), 所以, 我们只需要<br>
&ensp; 将自己的数据集和标注信息包装到这个字典去, 然后再填充到`COCO()`对象的`dataset`属性, 就可以了.详细参考`coco_utils.py`.<br>
##### `目的是使用高度封装的模块化api高效快速搭建和训练模型`<br>
APIs:
- [models](https://github.com/pytorch/vision/tree/master/torchvision/models)
- [detection](https://github.com/pytorch/vision/tree/master/torchvision/models/detection)
- [faster_rcnn.FasterRCNN](https://github.com/pytorch/vision/blob/master/torchvision/models/detection/faster_rcnn.py)


#### FasterR-CNN
##### FasterRCNN继承自GeneralizeRCNN, 初始化参数表为
```python
(self, backbone, num_classes=None,
    # transform parameters
    min_size=800, max_size=1333,
    image_mean=None, image_std=None,
    # RPN parameters
    rpn_anchor_generator=None, rpn_head=None,
    rpn_pre_nms_top_n_train=2000, rpn_pre_nms_top_n_test=1000,
    rpn_post_nms_top_n_train=2000, rpn_post_nms_top_n_test=1000,
    rpn_nms_thresh=0.7,
    rpn_fg_iou_thresh=0.7, rpn_bg_iou_thresh=0.3,
    rpn_batch_size_per_image=256, rpn_positive_fraction=0.5,
    # Box parameters
    box_roi_pool=None, box_head=None, box_predictor=None,
    box_score_thresh=0.05, box_nms_thresh=0.5, box_detections_per_img=100,
    box_fg_iou_thresh=0.5, box_bg_iou_thresh=0.5,
    box_batch_size_per_image=512, box_positive_fraction=0.25,
    bbox_reg_weights=None)
```
##### FasterRCNN主要由backbone, rpn, roi_pooling, fastrcnn_predictor等组成.<br>
(1)可以根据需要自由选择cnn作为backbone, 比如resnet, fpn等.backbone参数要明确.<br>
(2)num_classes参数要明确.<br>
(3)可以根据实际目标尺寸情况提供anchor尺寸组合参数, 生成rpn网络.rpn网络用于生成候选建议窗口.<br>


#### 搭建目标检测模型

In [7]:
import torch
import torchvision as tv
from torchvision import models
from torchvision.models.detection import faster_rcnn, mask_rcnn
from torchvision.models.detection.rpn import AnchorGenerator

num_classes=2
# faster_rcnn.resnet_fpn_backbone内部将backbone的第1, 第2卷积层冻结，不参与更新
backbone = faster_rcnn.resnet_fpn_backbone(backbone_name='resnet50', pretrained=True)
rpn_anchor_generator = AnchorGenerator(sizes=((32,), (64,), (128,), (256,), (512,),),
                                   aspect_ratios=((0.5, 1.0, 2.0),)*5 )
model = faster_rcnn.FasterRCNN(backbone=backbone, num_classes=num_classes, min_size=600, max_size=600, rpn_anchor_generator=rpn_anchor_generator)

#### 封装数据集
##### 1. linux wget可以很方便地下载链接指向的文件, win10下可以通过安装`pip install wget`来使用.
`import wget` <br>
##### 2. 使用PennFudanPed上手学习.

In [None]:
# import wget
# import tarfile

# url = 'https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip'
# out_fname = "datasets/PennFudanPed.zip"
# file_fname = wget.download(url=url, out=out_fname)
# # 提取压缩包
# tar = tarfile.open(out_fname)
# tar.extractall()
# tar.close()
# # 删除下载文件压缩包
# os.remove(out_fname)

In [0]:
import os
import random
import numpy as np
from torch.utils import data
from PIL import Image

import transforms as T


class PennFudanDataset(torch.utils.data.Dataset):
    def __init__(self, root='datasets/PennFudanPed', train=True, transforms=None):
        self.root = root
        self.transforms = transforms
        # load all image files, sorting them to
        # ensure that they are aligned
        imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
        masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))

        indices = [i for i in range(len(imgs))]
        random.shuffle(indices)
        if train:
            self.imgs = [imgs[i] for i in indices[:-50]]
            self.masks = [masks[i] for i in indices[:-50]]
            if transforms == None:
                transforms = T.Compose([T.ToTensor(), T.RandomHorizontalFlip(0.5)])
        else:
            self.imgs = [imgs[i] for i in indices[-50:]]
            self.masks = [masks[i] for i in indices[-50:]]
            if transforms == None:
                transforms = T.Compose([T.ToTensor()])          
        self.transforms = transforms


    def __getitem__(self, idx):
        # load images ad masks
        img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
        mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
        img = Image.open(img_path).convert("RGB")
        # note that we haven't converted the mask to RGB,
        # because each color corresponds to a different instance
        # with 0 being background
        mask = Image.open(mask_path)

        mask = np.array(mask)
        # instances are encoded as different colors
        obj_ids = np.unique(mask)
        # first id is the background, so remove it
        obj_ids = obj_ids[1:]

        # split the color-encoded mask into a set
        # of binary masks
        masks = mask == obj_ids[:, None, None]

        # get bounding box coordinates for each mask
        num_objs = len(obj_ids)
        boxes = []
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])

        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # there is only one class
        labels = torch.ones((num_objs,), dtype=torch.int64)
        masks = torch.as_tensor(masks, dtype=torch.uint8)

        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        # suppose all instances are not crowd
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        # target["masks"] = masks
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):
        return len(self.imgs)

In [0]:
import utils

data_train = PennFudanDataset(train=True)
data_test = PennFudanDataset(train=False)
print('data_train num=', len(data_train), '\nfileds:\n', data_train[0][1])
# print('data_test num=', len(data_test), '\nfileds:\n', data_test[0][1])
trainLoader = data.DataLoader(data_train, batch_size=1, shuffle=True, collate_fn=utils.collate_fn)
testLoader = data.DataLoader(data_test, batch_size=1, shuffle=False, collate_fn=utils.collate_fn)
print(trainLoader, '\n', testLoader)

#### 训练优化
##### 1. baseline
&ensp;(1)实例化数据集;将数据集对象提供给数据加载器`torch.utils.data.DataLoader`;<br>
&ensp;(2)定义优化器和学习率调度器;<br>
&ensp;(3)迭代训练更新模型, 每一个epoch评估一次验证集;<br>
##### 2. reference
&ensp;(1)基本训练模块参考代码来自`pytorch/vision/references/detection`, 这里需要将相应的源代码(主要`engine.py`)下载到本地, 再进行编写;<br>
&ensp;(2)demo示例<br>
```python
from engine import train_one_epoch, evaluate

# ...
model.to(device)

# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)

# and a learning rate scheduler which decreases the learning rate by
# 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=3,
                                               gamma=0.1)

for epoch in range(num_epochs):
    # train for one epoch, printing every 10 iterations
    train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
    # update the learning rate
    lr_scheduler.step()
    # evaluate on the test dataset
    evaluate(model, data_loader_test, device=device)                                            
```
&ensp;(3)更多细节, 包括训练过程中保存模型参数, 训练日志记录, loss曲线等, 参考`references/train.py, utils.py`<br>

In [9]:
from engine import train_one_epoch, evaluate

device = torch.device('cuda')
model.to(device)

params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)

lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=3,
                                               gamma=0.1)

for epoch in range(10):
    train_one_epoch(model, optimizer, trainLoader, device, epoch, print_freq=10)
    lr_scheduler.step()
    
    utils.save_on_master({
                'model': model.state_dict(),
                'optimizer': optimizer.state_dict(),
                'lr_scheduler': lr_scheduler.state_dict(),},
                os.path.join('checkpoints', 'model_{}.pth'.format(epoch)))
                   
    evaluate(model, testLoader, device=device)
