# ☀️ Imports and Setup

According to the official [Train Custom Data](https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data) guide, YOLOv5 requires a certain directory structure. 

```
/parent_folder
    /dataset
         /images
         /labels
    /yolov5
```

* We thus will create a `/tmp` directory. <br>
* Download YOLOv5 repository and pip install the required dependencies. <br>
* Install the latest version of W&B and login with your wandb account. You can create your free W&B account [here](https://wandb.ai/site).

In [1]:
#make a directory for yolov5
!mkdir tmp
%cd tmp

mkdir: cannot create directory ‘tmp’: File exists
/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection/tmp


In [2]:
%pwd

'/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection/tmp'

In [3]:
# Download YOLOv5
!git clone https://github.com/ultralytics/yolov5  # clone repo
%cd yolov5
# Install dependencies
%pip install -qr requirements.txt  # install dependencies

%cd ../
import torch
print(f"Setup complete. Using torch {torch.__version__} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

fatal: destination path 'yolov5' already exists and is not an empty directory.
/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection/tmp/yolov5
Note: you may need to restart the kernel to use updated packages.
/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection/tmp
Setup complete. Using torch 1.9.0+cu102 (TITAN Xp)


In [4]:
# Install W&B 
!pip install -q --upgrade wandb
# Login 
import os
key = os.getenv('WANDB_API_KEY')
# print(key)
import wandb
wandb.login(key=key)



Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: W&B API key is configured (use `wandb login --relogin` to force relogin)


True

In [5]:
# Necessary/extra dependencies. 
import os
import gc
import cv2
import numpy as np
import pandas as pd
from tqdm import tqdm
from shutil import copyfile
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

#customize iPython writefile so we can write variables
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))

# 🦆 Hyperparameters

In [6]:
TRAIN_PATH = 'input/siim-covid19-resized-to-512px-png/train/'
IMG_SIZE = 512
BATCH_SIZE = 16
EPOCHS = 200
HOME= '~/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection'
CLASS_NUMBER=1

In [7]:
%cd {HOME}
%pwd


/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection


'/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection'

# 🔨 Prepare Dataset

This is the most important section when it comes to training an object detector with YOLOv5. The directory structure, bounding box format, etc must be in the correct order. This section builds every piece needed to train a YOLOv5 model.

I am using [xhlulu's](https://www.kaggle.com/xhlulu) resized dataset. The uploaded 256x256 Kaggle dataset is [here](https://www.kaggle.com/xhlulu/siim-covid19-resized-to-256px-jpg). Find other image resolutions [here](https://www.kaggle.com/c/siim-covid19-detection/discussion/239918).

* Create train-validation split. <br>
* Create required `/dataset` folder structure and more the images to that folder. <br>
* Create `data.yaml` file needed to train the model. <br>
* Create bounding box coordinates in the required YOLO format. 

In [8]:
# Everything is done from /kaggle directory.
%cd {HOME}

# Load image level csv file
df = pd.read_csv('input/siim-covid19-detection/train_image_level.csv')

# Modify values in the id column
df['id'] = df.apply(lambda row: row.id.split('_')[0], axis=1)
# Add absolute path
df['path'] = df.apply(lambda row: TRAIN_PATH+row.id+'.jpg', axis=1)
# Get image level labels
df['image_level'] = df.apply(lambda row: row.label.split(' ')[0], axis=1)

df.head(5)

/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection


Unnamed: 0,id,boxes,label,StudyInstanceUID,path,image_level
0,000a312787f2,"[{'x': 789.28836, 'y': 582.43035, 'width': 102...",opacity 1 789.28836 582.43035 1815.94498 2499....,5776db0cec75,input/siim-covid19-resized-to-512px-png/train/...,opacity
1,000c3a3f293f,,none 1 0 0 1 1,ff0879eb20ed,input/siim-covid19-resized-to-512px-png/train/...,none
2,0012ff7358bc,"[{'x': 677.42216, 'y': 197.97662, 'width': 867...",opacity 1 677.42216 197.97662 1545.21983 1197....,9d514ce429a7,input/siim-covid19-resized-to-512px-png/train/...,opacity
3,001398f4ff4f,"[{'x': 2729, 'y': 2181.33331, 'width': 948.000...",opacity 1 2729 2181.33331 3677.00012 2785.33331,28dddc8559b2,input/siim-covid19-resized-to-512px-png/train/...,opacity
4,001bd15d1891,"[{'x': 623.23328, 'y': 1050, 'width': 714, 'he...",opacity 1 623.23328 1050 1337.23328 2156 opaci...,dfd9fdd85a3e,input/siim-covid19-resized-to-512px-png/train/...,opacity


In [9]:
df['image_level'].value_counts()

opacity    4294
none       2040
Name: image_level, dtype: int64

In [10]:
# Load meta.csv file
# Original dimensions are required to scale the bounding box coordinates appropriately.
meta_df = pd.read_csv('input/siim-covid19-resized-to-512px-png/meta.csv')
train_meta_df = meta_df.loc[meta_df.split == 'train']
train_meta_df = train_meta_df.drop('split', axis=1)
train_meta_df.columns = ['id', 'dim0', 'dim1']

train_meta_df.head(10)

Unnamed: 0,id,dim0,dim1
0,d8ba599611e5,2336,2836
1,29b23a11d1e4,3488,4256
2,8174f49500a5,2330,2846
3,d54f6204b044,2330,2846
4,d51cadde8626,3488,4256
5,47d014f9055a,2991,2992
6,89fd7f185d77,3480,4248
7,7c40e04c6163,2540,2880
8,6a93346150a4,2540,2880
9,5b687c54d3fd,3488,4256


In [11]:
# Merge both the dataframes
df = df.merge(train_meta_df, on='id',how="left")
df.head(2)

Unnamed: 0,id,boxes,label,StudyInstanceUID,path,image_level,dim0,dim1
0,000a312787f2,"[{'x': 789.28836, 'y': 582.43035, 'width': 102...",opacity 1 789.28836 582.43035 1815.94498 2499....,5776db0cec75,input/siim-covid19-resized-to-512px-png/train/...,opacity,3488,4256
1,000c3a3f293f,,none 1 0 0 1 1,ff0879eb20ed,input/siim-covid19-resized-to-512px-png/train/...,none,2320,2832


## 🍘 Train-validation split

In [12]:
# Create train and validation split.
train_df, valid_df = train_test_split(df, test_size=0.2, random_state=42, stratify=df.image_level.values)

train_df.loc[:, 'split'] = 'train'
valid_df.loc[:, 'split'] = 'valid'

df = pd.concat([train_df, valid_df]).reset_index(drop=True)
print(f'Size of dataset: {len(df)}, training images: {len(train_df)}. validation images: {len(valid_df)}')

Size of dataset: 6334, training images: 5067. validation images: 1267


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


## 🍚 Prepare Required Folder Structure

The required folder structure for the dataset directory is: 

```
/parent_folder
    /dataset
         /images
             /train
             /val
         /labels
             /train
             /val
    /yolov5
```

Note that I have named the directory `covid`.

In [13]:
df.head()

Unnamed: 0,id,boxes,label,StudyInstanceUID,path,image_level,dim0,dim1,split
0,badf2d31cdbd,"[{'x': 484.0363, 'y': 955.76643, 'width': 587....",opacity 1 484.0363 955.76643 1071.75999 1657.7...,c508e7ff4063,input/siim-covid19-resized-to-512px-png/train/...,opacity,2336,2836,train
1,8d95766f633e,"[{'x': 1821.70335, 'y': 880.30619, 'width': 66...",opacity 1 1821.70335 880.30619 2485.89476 1875...,2c78ef584129,input/siim-covid19-resized-to-512px-png/train/...,opacity,2400,2880,train
2,1951ac929b71,"[{'x': 409.13877, 'y': 955.27207, 'width': 115...",opacity 1 409.13877 955.27207 1559.23612 3211....,24cd517d2846,input/siim-covid19-resized-to-512px-png/train/...,opacity,3488,4256,train
3,69522a81a9b6,"[{'x': 1654.70833, 'y': 1096.9792, 'width': 78...",opacity 1 1654.70833 1096.9792 2434.94161 1975...,b22e2537daa0,input/siim-covid19-resized-to-512px-png/train/...,opacity,2446,2630,train
4,98e895293a8c,"[{'x': 2191.62551, 'y': 1077.50003, 'width': 1...",opacity 1 2191.62551 1077.50003 3729.32302 298...,58072ae8b0f0,input/siim-covid19-resized-to-512px-png/train/...,opacity,3480,4240,train


In [14]:
os.makedirs('tmp/covid/images/train', exist_ok=True)
os.makedirs('tmp/covid/images/valid', exist_ok=True)

os.makedirs('tmp/covid/labels/train', exist_ok=True)
os.makedirs('tmp/covid/labels/valid', exist_ok=True)

In [15]:
df.head()

Unnamed: 0,id,boxes,label,StudyInstanceUID,path,image_level,dim0,dim1,split
0,badf2d31cdbd,"[{'x': 484.0363, 'y': 955.76643, 'width': 587....",opacity 1 484.0363 955.76643 1071.75999 1657.7...,c508e7ff4063,input/siim-covid19-resized-to-512px-png/train/...,opacity,2336,2836,train
1,8d95766f633e,"[{'x': 1821.70335, 'y': 880.30619, 'width': 66...",opacity 1 1821.70335 880.30619 2485.89476 1875...,2c78ef584129,input/siim-covid19-resized-to-512px-png/train/...,opacity,2400,2880,train
2,1951ac929b71,"[{'x': 409.13877, 'y': 955.27207, 'width': 115...",opacity 1 409.13877 955.27207 1559.23612 3211....,24cd517d2846,input/siim-covid19-resized-to-512px-png/train/...,opacity,3488,4256,train
3,69522a81a9b6,"[{'x': 1654.70833, 'y': 1096.9792, 'width': 78...",opacity 1 1654.70833 1096.9792 2434.94161 1975...,b22e2537daa0,input/siim-covid19-resized-to-512px-png/train/...,opacity,2446,2630,train
4,98e895293a8c,"[{'x': 2191.62551, 'y': 1077.50003, 'width': 1...",opacity 1 2191.62551 1077.50003 3729.32302 298...,58072ae8b0f0,input/siim-covid19-resized-to-512px-png/train/...,opacity,3480,4240,train


In [16]:
# Move the images to relevant split folder.
for i in tqdm(range(len(df))):
    row = df.loc[i]
    if row.split == 'train':
        copyfile(row.path, f'tmp/covid/images/train/{row.id}.png')
    else:
        copyfile(row.path, f'tmp/covid/images/valid/{row.id}.png')

100%|██████████| 6334/6334 [00:02<00:00, 2850.89it/s]


## 🍜 Create `.YAML` file

The `data.yaml`, is the dataset configuration file that defines 

1. an "optional" download command/URL for auto-downloading, 
2. a path to a directory of training images (or path to a *.txt file with a list of training images), 
3. a path to a directory of validation images (or path to a *.txt file with a list of validation images), 
4. the number of classes, 
5. a list of class names.

> 📍 Important: In this competition, each image can either belong to `opacity` or `none` image-level labels. That's why I have  used the number of classes, `nc` to be 2. YOLOv5 automatically handles the images without any bounding box coordinates. 

> 📍 Note: The `data.yaml` is created in the `yolov5/data` directory as required. 

In [17]:
# Create .yaml file 
%cd {HOME}
import yaml

data_yaml = dict(
    train = '../covid/images/train',
    val = '../covid/images/valid',
    nc = 1,
    names = ['opacity']
)

# Note that I am creating the file in the yolov5/data/ directory.
with open('tmp/yolov5/data/data.yaml', 'w') as outfile:
    yaml.dump(data_yaml, outfile, default_flow_style=True)
    
%cat tmp/yolov5/data/data.yaml

/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection
{names: [opacity], nc: 1, train: ../covid/images/train, val: ../covid/images/valid}


## 🍮 Prepare Bounding Box Coordinated for YOLOv5

For every image with **bounding box(es)** a `.txt` file with the same name as the image will be created in the format shown below:

* One row per object. <br>
* Each row is class `x_center y_center width height format`. <br>
* Box coordinates must be in normalized xywh format (from 0 - 1). We can normalize by the boxes in pixels by dividing `x_center` and `width` by image width, and `y_center` and `height` by image height. <br>
* Class numbers are zero-indexed (start from 0). <br>

> 📍 Note: We don't have to remove the images without bounding boxes from the training or validation sets. 

In [18]:
# Get the raw bounding box by parsing the row value of the label column.
# Ref: https://www.kaggle.com/yujiariyasu/plot-3positive-classes
def get_bbox(row):
    bboxes = []
    bbox = []
    for i, l in enumerate(row.label.split(' ')):
        if (i % 6 == 0) | (i % 6 == 1):
            continue
        bbox.append(float(l))
        if i % 6 == 5:
            bboxes.append(bbox)
            bbox = []  
            
    return bboxes

# Scale the bounding boxes according to the size of the resized image. 
def scale_bbox(row, bboxes):
    # Get scaling factor
    scale_x = IMG_SIZE/row.dim1
    scale_y = IMG_SIZE/row.dim0
    
    scaled_bboxes = []
    for bbox in bboxes:
        x = int(np.round(bbox[0]*scale_x, 4))
        y = int(np.round(bbox[1]*scale_y, 4))
        x1 = int(np.round(bbox[2]*(scale_x), 4))
        y1= int(np.round(bbox[3]*scale_y, 4))

        scaled_bboxes.append([x, y, x1, y1]) # xmin, ymin, xmax, ymax
        
    return scaled_bboxes

# Convert the bounding boxes in YOLO format.
def get_yolo_format_bbox(img_w, img_h, bboxes):
    yolo_boxes = []
    for bbox in bboxes:
        w = bbox[2] - bbox[0] # xmax - xmin
        h = bbox[3] - bbox[1] # ymax - ymin
        xc = bbox[0] + int(np.round(w/2)) # xmin + width/2
        yc = bbox[1] + int(np.round(h/2)) # ymin + height/2
        
        yolo_boxes.append([xc/img_w, yc/img_h, w/img_w, h/img_h]) # x_center y_center width height
    
    return yolo_boxes

In [19]:
df

Unnamed: 0,id,boxes,label,StudyInstanceUID,path,image_level,dim0,dim1,split
0,badf2d31cdbd,"[{'x': 484.0363, 'y': 955.76643, 'width': 587....",opacity 1 484.0363 955.76643 1071.75999 1657.7...,c508e7ff4063,input/siim-covid19-resized-to-512px-png/train/...,opacity,2336,2836,train
1,8d95766f633e,"[{'x': 1821.70335, 'y': 880.30619, 'width': 66...",opacity 1 1821.70335 880.30619 2485.89476 1875...,2c78ef584129,input/siim-covid19-resized-to-512px-png/train/...,opacity,2400,2880,train
2,1951ac929b71,"[{'x': 409.13877, 'y': 955.27207, 'width': 115...",opacity 1 409.13877 955.27207 1559.23612 3211....,24cd517d2846,input/siim-covid19-resized-to-512px-png/train/...,opacity,3488,4256,train
3,69522a81a9b6,"[{'x': 1654.70833, 'y': 1096.9792, 'width': 78...",opacity 1 1654.70833 1096.9792 2434.94161 1975...,b22e2537daa0,input/siim-covid19-resized-to-512px-png/train/...,opacity,2446,2630,train
4,98e895293a8c,"[{'x': 2191.62551, 'y': 1077.50003, 'width': 1...",opacity 1 2191.62551 1077.50003 3729.32302 298...,58072ae8b0f0,input/siim-covid19-resized-to-512px-png/train/...,opacity,3480,4240,train
...,...,...,...,...,...,...,...,...,...
6329,ceff5e389de0,"[{'x': 2140.2, 'y': 1286.25001, 'width': 733.6...",opacity 1 2140.2 1286.25001 2873.89995 2489.75...,6fc05a848fc4,input/siim-covid19-resized-to-512px-png/train/...,opacity,4240,3480,valid
6330,600343c20434,"[{'x': 367.64518, 'y': 1109.91869, 'width': 84...",opacity 1 367.64518 1109.91869 1208.78702 1673...,f6ea0674baa8,input/siim-covid19-resized-to-512px-png/train/...,opacity,2991,2992,valid
6331,d6a56e79a52d,"[{'x': 509.73116, 'y': 595.55194, 'width': 440...",opacity 1 509.73116 595.55194 950.34624 1537.2...,816ff8fa8f42,input/siim-covid19-resized-to-512px-png/train/...,opacity,2320,2828,valid
6332,222f258d61f7,"[{'x': 1701.8041, 'y': 1190.48126, 'width': 72...",opacity 1 1701.8041 1190.48126 2430.85867 1990...,8a0139211dd5,input/siim-covid19-resized-to-512px-png/train/...,opacity,2544,3056,valid


In [20]:
# Prepare the txt files for bounding box
# for i in tqdm(range(10)):
for i in tqdm(range(len(df))):
    row = df.loc[i]
    # Get image id
    img_id = row.id
    # Get split
    split = row.split
    # Get image-level label
    label = row.image_level
    
    if row.split=='train':
        file_name = f'tmp/covid/labels/train/{row.id}.txt'
    else:
        file_name = f'tmp/covid/labels/valid/{row.id}.txt'
        
    
    if label=='opacity':
        # Get bboxes
        bboxes = get_bbox(row)
        # Scale bounding boxes
        scale_bboxes = scale_bbox(row, bboxes)
        # Format for YOLOv5
        yolo_bboxes = get_yolo_format_bbox(IMG_SIZE, IMG_SIZE, scale_bboxes)
        
        with open(file_name, 'w') as f:
            for bbox in yolo_bboxes:
                bbox = [CLASS_NUMBER]+bbox
                bbox = [str(i) for i in bbox]
                bbox = ' '.join(bbox)
#                 print(bbox)
                f.write(bbox)
                f.write('\n')

100%|██████████| 6334/6334 [00:01<00:00, 4659.04it/s]


# 🚅 Train with W&B



In [21]:
%cd tmp/yolov5/

/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection/tmp/yolov5


```
--img {IMG_SIZE} \ # Input image size.
--batch {BATCH_SIZE} \ # Batch size
--epochs {EPOCHS} \ # Number of epochs
--data data.yaml \ # Configuration file
--weights yolov5l.pt \ # Model name
--save_period 1\ # Save model after interval
--project kaggle-siim-covid # W&B project name
```

In [22]:
!python train.py --img {IMG_SIZE} \
                 --batch {BATCH_SIZE} \
                 --epochs {EPOCHS} \
                 --data data.yaml \
                 --weights yolov5l.pt \
                 --save_period 1\
                 --project kaggle-siim-covid\
                 --single-cls


[34m[1mtrain: [0mweights=yolov5l.pt, cfg=, data=data.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=200, batch_size=16, img_size=[512], rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache_images=False, image_weights=False, device=, multi_scale=False, single_cls=True, adam=False, sync_bn=False, workers=8, project=kaggle-siim-covid, entity=None, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=1, artifact_alias=latest, local_rank=-1
[34m[1mgithub: [0mCommand 'git fetch && git config --get remote.origin.url' timed out after 5 seconds, for updates see https://github.com/ultralytics/yolov5
YOLOv5 🚀 v5.0-294-gdd62e2d torch 1.9.0+cu102 CUDA:0 (TITAN Xp, 12194.0625MB)

[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, 

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



    52/199     6.82G   0.04303   0.01853         0   0.06156        25       512
               Class     Images     Labels          P          R     mAP@.5 mAP@
                 all       1267       1582      0.496       0.45      0.416      0.126
Saving model artifact on epoch  53

     Epoch   gpu_mem       box       obj       cls     total    labels  img_size
    53/199     6.82G   0.04291   0.01841         0   0.06133        36       512
               Class     Images     Labels          P          R     mAP@.5 mAP@
                 all       1267       1582      0.514      0.481       0.43      0.125
Saving model artifact on epoch  54

     Epoch   gpu_mem       box       obj       cls     total    labels  img_size
    54/199     6.82G   0.04266    0.0184         0   0.06106        36       512
               Class     Images     Labels          P          R     mAP@.5 mAP@
                 all       1267       1582      0.524      0.453      0.394      0.119
Saving model artifa

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



   102/199     6.82G   0.03195   0.01488         0   0.04683        28       512
               Class     Images     Labels          P          R     mAP@.5 mAP@
                 all       1267       1582      0.484      0.455      0.354     0.0999
Saving model artifact on epoch  103

     Epoch   gpu_mem       box       obj       cls     total    labels  img_size
   103/199     6.82G   0.03152   0.01477         0   0.04629        21       512
               Class     Images     Labels          P          R     mAP@.5 mAP@
                 all       1267       1582      0.459      0.447       0.35      0.103
Saving model artifact on epoch  104

     Epoch   gpu_mem       box       obj       cls     total    labels  img_size
   104/199     6.82G   0.03153   0.01464         0   0.04618        28       512
               Class     Images     Labels          P          R     mAP@.5 mAP@
                 all       1267       1582      0.514      0.416      0.363      0.104
Saving model arti

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



   152/199     6.82G   0.02238    0.0107         0   0.03308        25       512
               Class     Images     Labels          P          R     mAP@.5 mAP@
                 all       1267       1582      0.493      0.418      0.332      0.095
Saving model artifact on epoch  153

     Epoch   gpu_mem       box       obj       cls     total    labels  img_size
   153/199     6.82G   0.02251   0.01089         0    0.0334        24       512^C
   153/199     6.82G   0.02251   0.01089         0    0.0334        24       512
Traceback (most recent call last):
  File "/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection/tmp/yolov5/train.py", line 661, in <module>
    main(opt)
  File "/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection/tmp/yolov5/train.py", line 559, in main
    train(opt.hyp, opt, device)
  File "/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection/tmp/yolov5/train.py", line 339, in train
    loss, loss_items = compute_loss(pred, targets.to(device))  # loss s

## Model Saved Automatically as Artifact

Since it's a kernel based competition, you can easily download the best model from the W&B Artifacts UI and upload as a Kaggle dataset that you can load in your inference kernel (internel disabled).

### [Path to saved model $\rightarrow$](https://wandb.ai/ayush-thakur/kaggle-siim-covid/artifacts/model/run_jbt74n7q_model/4c3ca5752dba99bd227e)

![img](https://i.imgur.com/KhRLQvR.png)

> 📍 Download the model with the `best` alias tagged to it. 

# Inference

You will probably use a `Submission.ipynb` kernel to run all the predictions. After training a YOLOv5 based object detector -> head to the artifacts page and download the best model -> upload the model as a Kaggle dataset -> Use it with the submission folder. 

> 📍 Note that you might have to clone the YOLOv5 repository in a Kaggle dataset as well. 

In this section, I will show you how you can do the inference and modify the predicted bounding box coordinates.

In [22]:
TEST_PATH = '/kaggle/input/siim-covid19-resized-to-512px-jpg/test/' # absolute path

Since I am training the model in this kernel itself, I will not be using the method that I have described above. The best model is saved in the directory `project_name/exp*/weights/best.pt`. In `exp*`, * can be 1, 2, etc. 

In [23]:
MODEL_PATH = 'kaggle-siim-covid/exp/weights/best.pt'

```
--weights {MODEL_PATH} \ # path to the best model.
--source {TEST_PATH} \ # absolute path to the test images.
--img {IMG_SIZE} \ # Size of image
--conf 0.281 \ # Confidence threshold (default is 0.25)
--iou-thres 0.5 \ # IOU threshold (default is 0.45)
--max-det 3 \ # Number of detections per image (default is 1000) 
--save-txt \ # Save predicted bounding box coordinates as txt files
--save-conf # Save the confidence of prediction for each bounding box
```

In [24]:
!python detect.py --weights {MODEL_PATH} \
                  --source {TEST_PATH} \
                  --img {IMG_SIZE} \
                  --conf 0.281 \
                  --iou-thres 0.5 \
                  --max-det 3 \
                  --save-txt \
                  --save-conf

[34m[1mdetect: [0mweights=['kaggle-siim-covid/exp/weights/best.pt'], source=/kaggle/input/siim-covid19-resized-to-512px-jpg/test/, imgsz=512, conf_thres=0.281, iou_thres=0.5, max_det=3, device=, view_img=False, save_txt=True, save_conf=True, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False
YOLOv5 🚀 v5.0-294-gdd62e2d torch 1.9.0+cu102 CUDA:0 (TITAN Xp, 12194.0625MB)

Traceback (most recent call last):
  File "/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection/tmp/yolov5/detect.py", line 228, in <module>
    main(opt)
  File "/home/keith/AA_jupyter_tuts/kaggle_SIIM_COVID_Detection/tmp/yolov5/detect.py", line 223, in main
    run(**vars(opt))
  File "/home/keith/anaconda3/envs/p39/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/ho

### How to find the confidence score?

1. First first the [W&B run page](https://wandb.ai/ayush-thakur/kaggle-siim-covid/runs/jbt74n7q) generated by training the YOLOv5 model. 

2. Go to the media panel -> click on the F1_curve.png file to get a rough estimate of the threshold -> go to the Bounding Box Debugger panel and interactively adjust the confidence threshold. 

![img](https://i.imgur.com/cCUnTBw.gif)

> 📍 The bounding box coordinates are saved as text file per image name. It is saved in this directory `runs/detect/exp3/labels`. 

In [25]:
PRED_PATH = 'runs/detect/exp3/labels'
!ls {PRED_PATH}

ls: cannot access 'runs/detect/exp3/labels': No such file or directory


In [26]:
# Visualize predicted coordinates.
%cat runs/detect/exp3/labels/ba91d37ee459.txt

cat: runs/detect/exp3/labels/ba91d37ee459.txt: No such file or directory


> 📍 Note: 1 is class id (opacity), the first four float numbers are `x_center`, `y_center`, `width` and `height`. The final float value is `confidence`.

In [27]:
prediction_files = os.listdir(PRED_PATH)
print('Number of test images predicted as opaque: ', len(prediction_files))

FileNotFoundError: [Errno 2] No such file or directory: 'runs/detect/exp3/labels'

> 📍 Out of 1263 test images, 583 were predicted with `opacity` label and thus we have that many prediction txt files.

# Submission

In this section, I will show how you can use YOLOv5 as object detector and prepare `submission.csv` file.

In [None]:
# The submisison requires xmin, ymin, xmax, ymax format. 
# YOLOv5 returns x_center, y_center, width, height
def correct_bbox_format(bboxes):
    correct_bboxes = []
    for b in bboxes:
        xc, yc = int(np.round(b[0]*IMG_SIZE)), int(np.round(b[1]*IMG_SIZE))
        w, h = int(np.round(b[2]*IMG_SIZE)), int(np.round(b[3]*IMG_SIZE))

        xmin = xc - int(np.round(w/2))
        xmax = xc + int(np.round(w/2))
        ymin = yc - int(np.round(h/2))
        ymax = yc + int(np.round(h/2))
        
        correct_bboxes.append([xmin, xmax, ymin, ymax])
        
    return correct_bboxes

# Read the txt file generated by YOLOv5 during inference and extract 
# confidence and bounding box coordinates.
def get_conf_bboxes(file_path):
    confidence = []
    bboxes = []
    with open(file_path, 'r') as file:
        for line in file:
            preds = line.strip('\n').split(' ')
            preds = list(map(float, preds))
            confidence.append(preds[-1])
            bboxes.append(preds[1:-1])
    return confidence, bboxes

In [None]:
# Read the submisison file
sub_df = pd.read_csv('/kaggle/input/siim-covid19-detection/sample_submission.csv')
sub_df.tail()

In [None]:
# Prediction loop for submission
predictions = []

for i in tqdm(range(len(sub_df))):
    row = sub_df.loc[i]
    id_name = row.id.split('_')[0]
    id_level = row.id.split('_')[-1]
    
    if id_level == 'study':
        # do study-level classification
        predictions.append("Negative 1 0 0 1 1") # dummy prediction
        
    elif id_level == 'image':
        # we can do image-level classification here.
        # also we can rely on the object detector's classification head.
        # for this example submisison we will use YOLO's classification head. 
        # since we already ran the inference we know which test images belong to opacity.
        if f'{id_name}.txt' in prediction_files:
            # opacity label
            confidence, bboxes = get_conf_bboxes(f'{PRED_PATH}/{id_name}.txt')
            bboxes = correct_bbox_format(bboxes)
            pred_string = ''
            for j, conf in enumerate(confidence):
                pred_string += f'opacity {conf} ' + ' '.join(map(str, bboxes[j])) + ' '
            predictions.append(pred_string[:-1]) 
        else:
            predictions.append("None 1 0 0 1 1")

In [None]:
sub_df['PredictionString'] = predictions
sub_df.to_csv('submission.csv', index=False)
sub_df.tail()

# Get the best Weights from YOLOv5 (see wandb artifacts best_run)

In [None]:
import wandb
run = wandb.init()
artifact = run.use_artifact('kperkins411/kaggle-siim-covid/run_3bh5hck7_model:v199', type='model')
artifact_dir = artifact.download()


In [None]:
artifact_dir