# **01. 如何在自定義數據集上訓練 YOLOv7**

本教程基於 WongKinYiu 的 **YOLOv7 Github存儲庫** [[連結]](https://github.com/WongKinYiu/yolov7) 與**論文** [[連結]](https://arxiv.org/abs/2207.02696)，並由 Chieh-Ming整理為具有詳細步驟的中文訓練方式之教材，並記述了關於**自己的自定義對象**的訓練。

### **本教程中介紹的步驟**

為了訓練我們的物件偵測模型，我們分為以下步驟：

* 安裝 YOLOv7 依賴套件 (Install YOLOv7 dependencies)
* 以 YOLOv7 格式從 Roboflow 加載自定義數據集 (Load custom dataset from Roboflow in YOLOv7 format)
* 運行 YOLOv7 訓練 (Run YOLOv7 training)
* 評估 YOLOv7 性能 (Evaluate YOLOv7 performance)
* 在測試圖像上運行 YOLOv7 預測 (Run YOLOv7 inference on test images)

In [1]:
import torch

tensor = torch.rand(3,4)
print(f"Device tensor is stored on: {tensor.device}")

print(f"CUDA is currently available: {torch.cuda.is_available()}")

tensor = tensor.to('cuda')

print(f"Device tensor is stored on: {tensor.device}")

  from .autonotebook import tqdm as notebook_tqdm


Device tensor is stored on: cpu
CUDA is currently available: True
Device tensor is stored on: cuda:0


# **03. 準備格式正確的自定義數據 (Correctly Formatted Custom Data)**

# **03-1. 連結雲端硬碟中的自定義數據**
* 使用md指令，將資料複製於自己建立資料夾中(`custom_dataset`)，包含train與valid，並使用cp指令複製coco.yaml。
* coco.yaml更名改為data.yaml並修改內容：自定義的類別名稱、類別數量以及train與val的path。
    * `train: ../train/images`
    * `val: ../valid/images`
    * `# number of classes`
    * `nc: 3`
    * `# class names`
    * `names: [ 'none', 'bad', 'good']`

* 建立`getYoloFormat`函數，將xml的標籤資料轉為txt格式。
* 此教程以口罩影像集為例，三個類別分別為none, bad, good。

In [2]:
# 建立資料夾
!md "custom_dataset"
!md "custom_dataset/valid"
!md "custom_dataset/train"
!md "custom_dataset/valid/images"
!md "custom_dataset/train/images"
!md "custom_dataset/valid/labels"
!md "custom_dataset/train/labels"

# 將data.yaml複製至新建立的資料夾中
!copy "data/coco.yaml" "custom_dataset"

子目錄或檔案 custom_dataset 已經存在。
子目錄或檔案 custom_dataset/valid 已經存在。
子目錄或檔案 custom_dataset/train 已經存在。
子目錄或檔案 custom_dataset/valid/images 已經存在。
子目錄或檔案 custom_dataset/train/images 已經存在。
子目錄或檔案 custom_dataset/valid/labels 已經存在。
子目錄或檔案 custom_dataset/train/labels 已經存在。


複製了         1 個檔案。


In [3]:
from bs4 import BeautifulSoup
import os
import shutil

#口罩標籤
status_dic = {'good': 2, 'bad': 1, 'none': 0}

def getYoloFormat(filename,label_path, img_path, yolo_path, newname):
    with open(label_path+ filename, 'r') as f:
        soup = BeautifulSoup(f.read(), 'xml')
        imgname = soup.select_one('filename').text
        image_w = int(soup.select_one('width').text)
        image_h = int(soup.select_one('height').text)
        ary = []
        for obj in soup.select('object'):
            xmin = int(obj.select_one('xmin').text)
            xmax = int(obj.select_one('xmax').text)
            ymin = int(obj.select_one('ymin').text)
            ymax = int(obj.select_one('ymax').text)
            objclass = status_dic.get(obj.select_one('name').text)

            x = (xmin + (xmax-xmin)/2) * 1.0 / image_w
            y = (ymin + (ymax-ymin)/2) * 1.0 / image_h
            w = (xmax-xmin) * 1.0 / image_w
            h = (ymax-ymin) * 1.0 / image_h
            ary.append(' '.join([str(objclass), str(x),str(y),str(w),str(h)]))
        if os.path.exists(img_path + imgname):
            shutil.copyfile(img_path + imgname, yolo_path + newname + '.jpg')
            with open(yolo_path + newname + '.txt', 'w') as f:
                f.write('\n'.join(ary))

In [4]:
# 建立存放標籤與影像資料夾
!md "yolo"

子目錄或檔案 yolo 已經存在。


In [5]:
import os

#口罩辨識位置
labelpath = 'masks/labels/'
imgpath   = 'masks/images/'
yolopath  = 'yolo/'

ary = []
for idx, f in enumerate(os.listdir(labelpath)):
    try:
        getYoloFormat(f, labelpath,imgpath, yolopath, str(idx))
    except Exception as e:
        print(e)
        
# 如果跳出如下警示：
# couldn't find a tree builder with the features you requested: xml. do you need to install a parser library?
# 則請安裝 pip install lxml

float division by zero


In [6]:
data_labels = ['yolo/'+ f for f in os.listdir('yolo/') if not f.endswith('.txt')]
print(len(data_labels))

677


In [8]:
import shutil
cut_number = int(len(data_labels)*0.2)
for i in range(len(data_labels)):
    basename = os.path.basename(data_labels[i])
    file_name = os.path.splitext(basename)[0]

    if i<=cut_number:
        shutil.move('yolo/'+file_name+'.txt', 'custom_dataset/valid/labels/'+file_name+'.txt')
        shutil.move('yolo/'+file_name+'.jpg', 'custom_dataset/valid/images/'+file_name+'.jpg')
    else:
        shutil.move('yolo/'+file_name+'.txt', 'custom_dataset/train/labels/'+file_name+'.txt')
        shutil.move('yolo/'+file_name+'.jpg', 'custom_dataset/train/images/'+file_name+'.jpg')

# **04. 開始自定義訓練 (Begin Custom Training)**

我們已準備好開始訓練。

注意：在我們的範例中，我們只會修改 YOLOv7 訓練默認值之一：`epochs`。 在我們的示例中，我們將調整 300 到 100 個 epoch 以提高速度。 如果您想更改其他設置，請參閱 [補充網路教材](https://blog.roboflow.com/yolov7-custom-dataset-training-tutorial/) 中的詳細訊息。

### 下載 COCO 起始檢查點 (download COCO starting checkpoint)
* 請至以下網址下載檔案到yolov7資料夾中
https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt

#### 您可以執行 `!python train.py` 並在於後接上指令，即使用以下選項自定義模型設置：

1. `--weights`，初始權重路徑（默認值：`'yolo7.pt'`）
2. `--cfg`，model.yaml 路徑（默認值：`''`）
3. `--data`，data.yaml 路徑（默認值：`'data/coco.yaml'`）
4. `--hyp`，超參數路徑（默認值：`'data/hyp.scratch.p5.yaml'`）
5. `--epochs`，循環遍歷訓練資料的次數（默認值：`300`）
6. `--batch-size`，所有 GPU 的總批大小（默認值：`16`）
7. `--img-size`，圖像大小（默認值：`[640, 640]`）
8. `--rect`，是否使用非方形訓練選項，將非正方形的輸入影像藉由灰色像素的padding來變為正方形的圖。
9. `--resume`，恢復最近的訓練，建議僅適用於中斷的訓練（默認值：`False`）
10. `--nosave`，只保存最後的檢查點(模型權重)
11. `--device`，cuda 設備，即 0 或 0,1,2,3 或 cpu（默認值：''）
12. `--multi-scale`，改變 img-size +/- 50%%
13. `--single-cls`，將多類數據訓練為單類別
14. `--adam`，使用 `torch.optim.Adam()` 最佳化方法(優化器)
15. `--linear-lr`，線性LR
* 其餘可調控的超參數與設定可以至YOLO v7 Github查詢

### 備註
* 如果出現Jupyter notebook內存不足問題，如下：
    * `IOPub data rate exceeded.`
    * `The notebook server will temporarily stop sending output`
    * `to the client in order to avoid crashing it.`
    * `To change this limit, set the config variable`
    * `--NotebookApp.iopub_data_rate_limit.`
* 解決方式：提高參數數值，即執行 `!jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10`

In [2]:
# 運行此單元開始訓練 (run this cell to begin training)
!python train.py --batch 4 --cfg cfg/training/yolov7.yaml --epochs 50 --data /custom_dataset/data.yaml --weights 'yolov7.pt' --device 0 

[34m[1mwandb: [0mInstall Weights & Biases for YOLOR logging with 'pip install wandb' (recommended)

[34m[1mautoanchor: [0mAnalyzing anchors... anchors/target = 5.30, Best Possible Recall (BPR) = 0.9981
                 all         136         755    0.000696      0.0552    6.15e-05    8.52e-06
                 all         136         755    0.000706      0.0237    3.46e-05    5.87e-06
                 all         136         755     0.00148      0.0531    0.000132     2.2e-05
                 all         136         755     0.00147      0.0377    0.000144     2.6e-05
                 all         136         755      0.0234      0.0591     0.00463     0.00157
                 all         136         755      0.0695      0.0841      0.0257     0.00566
                 all         136         755       0.102       0.103      0.0682      0.0179
                 all         136         755       0.786       0.117      0.0933      0.0232
                 all         136         755    

YOLOR  v0.1-35-gef4dde4 torch 1.12.1 CUDA:0 (NVIDIA GeForce GTX 1650 SUPER, 4095.6875MB)

Namespace(adam=False, artifact_alias='latest', batch_size=4, bbox_interval=-1, bucket='', cache_images=False, cfg='cfg/training/yolov7.yaml', data='.\\custom_dataset\\data.yaml', device='0', entity=None, epochs=50, evolve=False, exist_ok=False, global_rank=-1, hyp='data/hyp.scratch.p5.yaml', image_weights=False, img_size=[640, 640], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, project='runs/train', quad=False, rect=False, resume=False, save_dir='runs\\train\\exp', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=4, upload_dataset=False, weights="'yolov7.pt'", workers=8, world_size=1)
[34m[1mtensorboard: [0mStart with 'tensorboard --logdir runs/train', view at http://localhost:6006/
[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_

      0/49     3.34G   0.06912     1.415   0.02259     1.507        35       640:  45%|████▍     | 61/136 [00:51<00:48,  1.55it/s]
      0/49     3.34G    0.0691       1.4    0.0226     1.492        81       640:  45%|████▍     | 61/136 [00:52<00:48,  1.55it/s]
      0/49     3.34G    0.0691       1.4    0.0226     1.492        81       640:  46%|████▌     | 62/136 [00:52<00:47,  1.56it/s]
      0/49     3.34G   0.06926     1.385   0.02261     1.477        23       640:  46%|████▌     | 62/136 [00:52<00:47,  1.56it/s]
      0/49     3.34G   0.06926     1.385   0.02261     1.477        23       640:  46%|████▋     | 63/136 [00:52<00:47,  1.55it/s]
      0/49     3.34G   0.06921      1.37   0.02262     1.462        29       640:  46%|████▋     | 63/136 [00:53<00:47,  1.55it/s]
      0/49     3.34G   0.06921      1.37   0.02262     1.462        29       640:  47%|████▋     | 64/136 [00:53<00:46,  1.56it/s]
      0/49     3.34G   0.06926     1.356   0.02263     1.448        60       640:  

      1/49     3.33G    0.0716   0.07916   0.02249    0.1733        39       640:  38%|███▊      | 52/136 [00:33<00:52,  1.60it/s]
      1/49     3.33G    0.0716   0.07916   0.02249    0.1733        39       640:  39%|███▉      | 53/136 [00:33<00:52,  1.57it/s]
      1/49     3.33G   0.07164   0.07872    0.0225    0.1729        61       640:  39%|███▉      | 53/136 [00:34<00:52,  1.57it/s]
      1/49     3.33G   0.07164   0.07872    0.0225    0.1729        61       640:  40%|███▉      | 54/136 [00:34<00:51,  1.58it/s]
      1/49     3.33G    0.0716   0.07829   0.02251    0.1724        37       640:  40%|███▉      | 54/136 [00:35<00:51,  1.58it/s]
      1/49     3.33G    0.0716   0.07829   0.02251    0.1724        37       640:  40%|████      | 55/136 [00:35<00:51,  1.59it/s]
      1/49     3.33G   0.07157   0.07787   0.02252     0.172        41       640:  40%|████      | 55/136 [00:35<00:51,  1.59it/s]
      1/49     3.33G   0.07157   0.07787   0.02252     0.172        41       640:  

               Class      Images      Labels           P           R      mAP@.5  mAP@.5:.95:  71%|███████   | 12/17 [00:02<00:01,  3.90it/s]
               Class      Images      Labels           P           R      mAP@.5  mAP@.5:.95:  76%|███████▋  | 13/17 [00:03<00:01,  3.81it/s]
               Class      Images      Labels           P           R      mAP@.5  mAP@.5:.95:  82%|████████▏ | 14/17 [00:03<00:00,  3.50it/s]
               Class      Images      Labels           P           R      mAP@.5  mAP@.5:.95:  88%|████████▊ | 15/17 [00:03<00:00,  3.34it/s]
               Class      Images      Labels           P           R      mAP@.5  mAP@.5:.95:  94%|█████████▍| 16/17 [00:04<00:00,  3.41it/s]
               Class      Images      Labels           P           R      mAP@.5  mAP@.5:.95: 100%|██████████| 17/17 [00:04<00:00,  3.67it/s]
               Class      Images      Labels           P           R      mAP@.5  mAP@.5:.95: 100%|██████████| 17/17 [00:04<00:00,  3.93it/s]

     

     16/49     3.38G   0.05355   0.01957  0.007229   0.08035        38       640:  41%|████      | 56/136 [00:36<00:50,  1.58it/s]
     16/49     3.38G   0.05355   0.01957  0.007229   0.08035        38       640:  42%|████▏     | 57/136 [00:36<00:50,  1.58it/s]
     16/49     3.38G   0.05354   0.01971  0.007231   0.08049        66       640:  42%|████▏     | 57/136 [00:36<00:50,  1.58it/s]
     16/49     3.38G   0.05354   0.01971  0.007231   0.08049        66       640:  43%|████▎     | 58/136 [00:36<00:49,  1.58it/s]
     16/49     3.38G   0.05395   0.01962  0.007327   0.08089        59       640:  43%|████▎     | 58/136 [00:37<00:49,  1.58it/s]
     16/49     3.38G   0.05395   0.01962  0.007327   0.08089        59       640:  43%|████▎     | 59/136 [00:37<00:48,  1.58it/s]
     16/49     3.38G   0.05384   0.01959  0.007329   0.08076        36       640:  43%|████▎     | 59/136 [00:37<00:48,  1.58it/s]
     16/49     3.38G   0.05384   0.01959  0.007329   0.08076        36       640:  

     24/49     3.38G   0.04328   0.01598  0.006427   0.06569        36       640:  87%|████████▋ | 118/136 [01:15<00:11,  1.60it/s]
     24/49     3.38G   0.04328   0.01598  0.006427   0.06569        36       640:  88%|████████▊ | 119/136 [01:15<00:10,  1.59it/s]
     24/49     3.38G   0.04325   0.01594  0.006433   0.06563        26       640:  88%|████████▊ | 119/136 [01:15<00:10,  1.59it/s]
     24/49     3.38G   0.04325   0.01594  0.006433   0.06563        26       640:  88%|████████▊ | 120/136 [01:15<00:10,  1.60it/s]
     24/49     3.38G   0.04328   0.01594  0.006443   0.06567        30       640:  88%|████████▊ | 120/136 [01:16<00:10,  1.60it/s]
     24/49     3.38G   0.04328   0.01594  0.006443   0.06567        30       640:  89%|████████▉ | 121/136 [01:16<00:09,  1.60it/s]
     24/49     3.38G   0.04327   0.01595  0.006439   0.06565        65       640:  89%|████████▉ | 121/136 [01:16<00:09,  1.60it/s]
     24/49     3.38G   0.04327   0.01595  0.006439   0.06565        65      

     33/49     3.38G   0.03745   0.01476  0.005309   0.05752        64       640:  23%|██▎       | 31/136 [00:20<01:06,  1.59it/s]
     33/49     3.38G   0.03745   0.01476  0.005309   0.05752        64       640:  24%|██▎       | 32/136 [00:20<01:05,  1.59it/s]
     33/49     3.38G   0.03763   0.01509  0.005347   0.05806        63       640:  24%|██▎       | 32/136 [00:20<01:05,  1.59it/s]
     33/49     3.38G   0.03763   0.01509  0.005347   0.05806        63       640:  24%|██▍       | 33/136 [00:20<01:05,  1.58it/s]
     33/49     3.38G   0.03772   0.01509  0.005486    0.0583        42       640:  24%|██▍       | 33/136 [00:21<01:05,  1.58it/s]
     33/49     3.38G   0.03772   0.01509  0.005486    0.0583        42       640:  25%|██▌       | 34/136 [00:21<01:04,  1.59it/s]
     33/49     3.38G   0.03764   0.01514  0.005442   0.05822        46       640:  25%|██▌       | 34/136 [00:22<01:04,  1.59it/s]
     33/49     3.38G   0.03764   0.01514  0.005442   0.05822        46       640:  

     41/49     3.38G   0.03424   0.01502  0.004977   0.05423        50       640:  68%|██████▊   | 93/136 [00:59<00:27,  1.57it/s]
     41/49     3.38G   0.03424   0.01502  0.004977   0.05423        50       640:  69%|██████▉   | 94/136 [00:59<00:26,  1.58it/s]
     41/49     3.38G   0.03435   0.01523  0.005003   0.05458       150       640:  69%|██████▉   | 94/136 [00:59<00:26,  1.58it/s]
     41/49     3.38G   0.03435   0.01523  0.005003   0.05458       150       640:  70%|██████▉   | 95/136 [00:59<00:26,  1.57it/s]
     41/49     3.38G   0.03428   0.01518   0.00502   0.05449        38       640:  70%|██████▉   | 95/136 [01:00<00:26,  1.57it/s]
     41/49     3.38G   0.03428   0.01518   0.00502   0.05449        38       640:  71%|███████   | 96/136 [01:00<00:25,  1.58it/s]
     41/49     3.38G   0.03421    0.0151  0.004984   0.05429        27       640:  71%|███████   | 96/136 [01:01<00:25,  1.58it/s]
     41/49     3.38G   0.03421    0.0151  0.004984   0.05429        27       640:  

In [None]:
# 使用tensorboard可視化訓練過程
%load_ext tensorboard
%tensorboard --logdir "runs/train" --host localhost

# **05. 評估 (Evaluation)**

我們可以使用提供的評估程式碼來評估自定義訓練的性能。

請注意，我們可以調整以下自定義參數。 詳見【detect.py接受的參數】(https://github.com/WongKinYiu/yolov7/blob/main/detect.py#L154)。

In [3]:
# 運行評估 (Run evaluation)
!python detect.py --weights runs/train/exp/weights/best.pt --conf 0.5 --source masks/images

YOLOR  v0.1-35-gef4dde4 torch 1.12.1 CUDA:0 (NVIDIA GeForce GTX 1650 SUPER, 4095.6875MB)

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Model Summary: 314 layers, 36492560 parameters, 6194944 gradients, 103.2 GFLOPS


Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.5, device='', exist_ok=False, img_size=640, iou_thres=0.45, name='exp', no_trace=False, nosave=False, project='runs/detect', save_conf=False, save_txt=False, source='masks/images', update=False, view_img=False, weights=['runs/train/exp/weights/best.pt'])
Fusing layers... 
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
 Convert model to Traced-model... 
 traced_script_module saved! 
 model is traced! 

 The image with the result is saved in: runs\detect\exp\-1x-1.jpg
 The image with the result is saved in: runs\detect\exp\-I1-MS09uaqsLdGTFkgnS0Rcg1mmPyAj95ySg_eckoM.jpeg
 The image with the result is saved in: runs\detect\exp\0002526673.jpg
 The image with the result is saved in: runs\detect\exp\0009S6815V3PEU1N-C123-F4.jpg
 The image with the result is saved in: runs\detect\exp\000_1OC3DT.jpg
 The image with the result is saved in: runs\detect\exp\000_1ov3n5_0.jpeg
 The image with the 

 The image with the result is saved in: runs\detect\exp\20200128001360.jpg
 The image with the result is saved in: runs\detect\exp\20200128150215888112.jpeg
 The image with the result is saved in: runs\detect\exp\20200129000063M.jpg
 The image with the result is saved in: runs\detect\exp\20200129001040.jpg
 The image with the result is saved in: runs\detect\exp\20200129001153.jpg
 The image with the result is saved in: runs\detect\exp\2020012913501217696.jpg
 The image with the result is saved in: runs\detect\exp\20200129151820_fd638cb7a2000a1096ca3fbdc15a6fa9_1.jpeg
 The image with the result is saved in: runs\detect\exp\2020012921402900_1.jpg
 The image with the result is saved in: runs\detect\exp\20200130-031838_U14224_M588349_adf2.jpg
 The image with the result is saved in: runs\detect\exp\20200130004271.jpg
 The image with the result is saved in: runs\detect\exp\2020013023283369938.jpg
 The image with the result is saved in: runs\detect\exp\20200131-china-health_1.jpg
 The image w

In [None]:
# 在所有測試影像上顯示預測結果 (display inference on ALL test images)

import glob
from IPython.display import Image, display

i = 0
limit = 10000 # max images to print
for imageName in glob.glob('runs/detect/exp/*.jpg'): #assuming JPG
    if i < limit:
        display(Image(filename=imageName))
        print("\n")
    i = i + 1