## I. Introduction

In **YOLOV3**, **bounding boxes** for objects in training images are a key input.  **Anchor boxes** are a key part YOLOV3 model configuration, one that can sets YOLOV3 up for efficent training.  Anchor boxes are defined at the resolution chosen for training inputs.  As an example, RSNA images are of dimensions **1024x1024**, but possible YOLOV3 training dimensions are **416x416**, **512x512**, and **608x608** (notice these are all multiples of 32).   During training, YOLOV3 figures out offsets from the closest anchor box that provides the lowest loss for a ground truth bounding box.  So,  while any reasonable set of anchor boxes can be adequate for model convergence using YOLOV3, this kernel analyzes RSNA Stage 2 training inputs with the goal of potentially choosing anchor boxes that lead to more efficient training convergence.

V2 was a stable version with input image size of (defined by **YOLOV3_SIZE**) of **608x608**.  V3 analyzes anchor boxes at YOLOV3_SIZE of **512**. 

Comments welcome!

References: [YOLOV3 Paper](https://pjreddie.com/media/files/papers/YOLOv3.pdf), [YOLOV3 Web Site](https://pjreddie.com/darknet/yolo/)

In [None]:
import numpy as np
import pandas as pd

from sklearn.cluster import MiniBatchKMeans
import matplotlib.pyplot as plt

In [None]:
# clone darknet
!git clone https://github.com/pjreddie/darknet

In [None]:
# lets look at the default anchor boxes in yolov3-tiny.cfg (6 anchor boxes)
# and yolov3.cfg (9 anchor boxes) and the associated input image sizes
!cp darknet/cfg/yolov3-tiny.cfg .
!grep -E 'width|height|anchors' yolov3-tiny.cfg

print ("---------")

!cp darknet/cfg/yolov3.cfg .
!grep -E 'width|height|anchors' yolov3.cfg

In [None]:
# so the default configuration of yolov3-tiny is 416x416, and the one for yolov3 is 608x608
# yolov3 uses three anchor boxes per 'scale.'  Predictions are first done at the input scale.
# then the network upsamples the inputs to twice the resolution and makes predictions again.
# upsampling helps detect smaller objects.  yolov3-tiny has 2 scales (6 anchor boxes), and
# yolov3 has 3 scales (9 anchor boxes).
# Analysis Choices:
# V2: input width and height of 608 (19x19 cells) for both YOLOV3 Tiny and YOLOV3
# V3: input width and height of 512 (16x16 cells) for both YOLOV3 Tiny and YOLOV3

In [None]:
# cleanup darknet download
!rm -rf darknet

In [None]:
# global variables
TRAIN_LABELS_CSV_FILE="../input/stage_2_train_labels.csv"
# pedantic nit: we are changing 'Target' to 'label' on the way in
TRAIN_LABELS_CSV_COLUMN_NAMES=['patientId', 'x1', 'y1', 'bw', 'bh', 'label']

DICOM_IMAGE_SIZE=1024
YOLOV3_SIZE=512

In [None]:
# read RSNA TRAIN_LABELS_CSV_FILE into a pandas dataframe
labelsbboxdf = pd.read_csv(TRAIN_LABELS_CSV_FILE,
                           names=TRAIN_LABELS_CSV_COLUMN_NAMES,
                           # skip the header line
                           header=0,
                           # index the dataframe on patientId
                           index_col='patientId')

labelsbboxdf.head(10)

In [None]:
# drop all fields except the bounding box dimensions and
# all row except the Lung Opacity ones
yolov3bboxesdf=labelsbboxdf[['bw', 'bh']].dropna()
yolov3bboxesdf.head(10)

In [None]:
# resize bounding boxes for YOLOV3_SIZE
yolov3bboxesdf=yolov3bboxesdf*(YOLOV3_SIZE/DICOM_IMAGE_SIZE)
yolov3bboxesdf.head(10)

In [None]:
# as reference, below are the vitals on bounding boxes at DICOM_IMAGE_SIZE
labelsbboxdf[['bw', 'bh']].describe(percentiles=[0.25, 0.5, 0.75, .95])

In [None]:
# below are the vitals on bounding boxes at chosen input size of YOLOV3_SIZE
yolov3bboxesdf.describe(percentiles=[0.25, 0.5, 0.75, 0.85, .95])

In [None]:
# we could hand-craft the following anchor boxes :
# ~<min, ~<25%, ~<50%, ~<75%, ~<85%, ~<95% and have a 6 anchor box set (for yolov3 tiny)
!printf '10,15, 75,75 100,125, 100,175 125,225, 150,275\n' > rsna-yolov3-manual-tiny-anchors.txt

In [None]:
!cat rsna-yolov3-manual-tiny-anchors.txt

In [None]:
# let's see what kmeans analysis gives us

In [None]:
# convert to numpy array
bboxarray=np.array(yolov3bboxesdf)

print (bboxarray.shape)
print (bboxarray)

In [None]:
# fit to 6 kmeans clusters (for yolov3 tiny)
kmeans=MiniBatchKMeans(n_clusters=6, verbose=1)
colors=['b.', 'g.', 'r.', 'c.', 'm.', 'y.',  'k.']
kmeans.fit(bboxarray)
centroids=kmeans.cluster_centers_
labels=kmeans.labels_

print (centroids.shape)
print (centroids)
print (labels.shape)
print (labels)

In [None]:
# view computed centroids to bounding box dimensions' scatterplot
plt.figure(figsize=(10,10))
for i in range(len(bboxarray)):
    plt.plot(bboxarray[i][0], bboxarray[i][1], colors[labels[i]], markersize=10)   
plt.scatter(centroids[:,0], centroids[:,1], marker="x", s=150, linewidth=5, zorder=10)
plt.show()

In [None]:
# post process centroids
anchors=np.around(centroids)
print (len(anchors))
print (anchors)
print ("---------")
ind = np.lexsort((anchors[:,1], anchors[:,0])) # lexsort uses the second argument first, followed by the first argument
#print (ind)
sortedanchors=np.array([anchors[i] for i in ind])
print(sortedanchors)

In [None]:
# write anchor boxes to file
# organize anchor boxes in YOLOV3 format
for i in range (len(sortedanchors)):
    anchorbox="{},{}".format(int(sortedanchors[i][0]), int(sortedanchors[i][1]))
    if i==0:
        anchorrecord=anchorbox
    else:
        anchorrecord="{},  {}".format(anchorrecord, anchorbox)
anchorrecord="{}\n".format(anchorrecord)

print (anchorrecord)

# save anchor box specification to file
savedanchorsfilename='rsna-yolov3-kmeans-tiny-anchors.txt'
with open(savedanchorsfilename,'w') as file:
    file.write(anchorrecord)
file.close()

In [None]:
!cat rsna-yolov3-kmeans-tiny-anchors.txt

In [None]:
# fit to 9 kmeans clusters
kmeans=MiniBatchKMeans(n_clusters=9, verbose=1)
colors=['b.', 'g.', 'r.', 'c.', 'm.', 'y.', 'k.', 'w.', 'b.']
kmeans.fit(bboxarray)
centroids=kmeans.cluster_centers_
labels=kmeans.labels_

print (centroids.shape)
print (centroids)
print (labels.shape)
print (labels)

In [None]:
# view computed centroids to bounding box dimensions' scatterplot
plt.figure(figsize=(10,10))
for i in range(len(bboxarray)):
    plt.plot(bboxarray[i][0], bboxarray[i][1], colors[labels[i]], markersize=10)   
plt.scatter(centroids[:,0], centroids[:,1], marker="x", s=150, linewidth=5, zorder=10)
plt.show()

In [None]:
# post process centroids
anchors=np.around(centroids)
print (len(anchors))
print (anchors)
print ("---------")
ind = np.lexsort((anchors[:,1], anchors[:,0]))
#print (ind)
sortedanchors=np.array([anchors[i] for i in ind])
print(sortedanchors)

In [None]:
# write anchor boxes to file
# organize anchor boxes in YOLOV3 format
for i in range (len(sortedanchors)):
    anchorbox="{},{}".format(int(sortedanchors[i][0]), int(sortedanchors[i][1]))
    if i==0:
        anchorrecord=anchorbox
    else:
        anchorrecord="{},  {}".format(anchorrecord, anchorbox)
anchorrecord="{}\n".format(anchorrecord)

print (anchorrecord)

# save anchor box specification to file
savedanchorsfilename='rsna-yolov3-kmeans-anchors.txt'
with open(savedanchorsfilename,'w') as file:
    file.write(anchorrecord)
file.close()

In [None]:
!cat rsna-yolov3-kmeans-anchors.txt

In [None]:
# everything together
# print default yolov3-tiny anchors
!grep anchors yolov3-tiny.cfg
# print hand-crafted yolov3 tiny anchors
!cat rsna-yolov3-manual-tiny-anchors.txt
# print kmeans suggested yolov3 tiny anchors
!cat rsna-yolov3-kmeans-tiny-anchors.txt
# print default yolov3 anchors
!grep anchors yolov3.cfg
# print kmeans suggested yolov3 anchors
!cat rsna-yolov3-kmeans-anchors.txt


## II.  Conclusion

1.  The default anchors will work but will cause significantly larger number of steps for training to converge.
2.  The hand-crafted anchor boxes for YOLOV3 Tiny are better than what k-means gives us.  But we could do better with the smallest default anchor box of **10, 13** (YOLOV3 default) just in case the test cases have very small **Lung Opacity** areas.   After that, the hand-crafted anchor boxes for YOLOV3 Tiny look adequate.
3.  For YOLOV3, again we could do better with **10, 13,** as the smallest anchor box.   We could use the rest of the k-means proposed anchor boxes, dropping the largest one and fine tuning the one before that if needed.  Between sets of three anchor boxes per scale, most bounding boxes should be converged efficiently.
4.  **With the choice of input image sizes**, proceeding with the following anchor box sets appears to be a good choice for RSNA Stage 2 training data [**NOTE**: your results may vary slightly on subsequent runs]:
     
     **V3 (Input Image Size=512x512)**:
     
         YOLOV3-Tiny: 10,13, 75,75 100,125, 100,175 125,225, 150,275
     
         YOLOV3: 10,13, 74,63, 81,103, 96,146, 112,206, 122,251, 123,102, 136,165, 143,287
     
     **V2 (Input Image Size=608x608)**:
     
         YOLOV3-Tiny: 10,13, 100,100 125,175, 150,250 160,300, 175,325
     
         YOLOV3: 10,13, 80,76, 107,114, 114,169, 117,216, 130,267, 155,146, 157,326, 170,340

5.  The analysis makes us much more familiar with the data and correlate it to training progress.