# Mapillary speed limit traffic signs cleanup

In [1]:
%load_ext autoreload
%autoreload 2head()
from fastai.vision.all import *

In [2]:
from MLfix import MLfix

## Load crops and labels into a Pandas DataFrame

In [3]:
path = Path('yolo-bbox-crops-aspects-traffic-signs')

In [4]:
fnames = get_image_files(path)

We'll create the DataFrame with the file names as the index and put the folder name in the `label` column since we groupped the crops in folders by their ground truth class.

In [7]:
data = pd.DataFrame(dict(fname = fnames), index = [str(x) for x in fnames])
data['label'] = data.fname.map(lambda x: x.parent.name)

In [8]:
data

Unnamed: 0,fname,label
yolo-bbox-crops-aspects-traffic-signs/Stop/004193-0.jpg,yolo-bbox-crops-aspects-traffic-signs/Stop/004193-0.jpg,Stop
yolo-bbox-crops-aspects-traffic-signs/Stop/003692-0.jpg,yolo-bbox-crops-aspects-traffic-signs/Stop/003692-0.jpg,Stop
yolo-bbox-crops-aspects-traffic-signs/Stop/003677-0.jpg,yolo-bbox-crops-aspects-traffic-signs/Stop/003677-0.jpg,Stop
yolo-bbox-crops-aspects-traffic-signs/Stop/003663-0.jpg,yolo-bbox-crops-aspects-traffic-signs/Stop/003663-0.jpg,Stop
yolo-bbox-crops-aspects-traffic-signs/Stop/003580-1.jpg,yolo-bbox-crops-aspects-traffic-signs/Stop/003580-1.jpg,Stop
...,...,...
yolo-bbox-crops-aspects-traffic-signs/Lane-Reduce-Left/001599-0.jpg,yolo-bbox-crops-aspects-traffic-signs/Lane-Reduce-Left/001599-0.jpg,Lane-Reduce-Left
yolo-bbox-crops-aspects-traffic-signs/Lane-Reduce-Left/002054-0.jpg,yolo-bbox-crops-aspects-traffic-signs/Lane-Reduce-Left/002054-0.jpg,Lane-Reduce-Left
yolo-bbox-crops-aspects-traffic-signs/Lane-Reduce-Left/002340-0.jpg,yolo-bbox-crops-aspects-traffic-signs/Lane-Reduce-Left/002340-0.jpg,Lane-Reduce-Left
yolo-bbox-crops-aspects-traffic-signs/Lane-Reduce-Left/002727-0.jpg,yolo-bbox-crops-aspects-traffic-signs/Lane-Reduce-Left/002727-0.jpg,Lane-Reduce-Left


## Find all the signs with "speed" in the label

In [9]:
speed_limits = data.copy()

In [10]:
# if you don't want to do all the work yourself you can load the result of the analysis we did
invalid = pd.read_csv('invalid-traffic-signs.csv', index_col=0)
speed_limits['new_label'] = speed_limits['label']
speed_limits.loc[invalid.index, 'new_label'] = 'invalid'

In [11]:
speed_limits.head()

Unnamed: 0,fname,label,new_label
yolo-bbox-crops-aspects-traffic-signs/Stop/004193-0.jpg,yolo-bbox-crops-aspects-traffic-signs/Stop/004193-0.jpg,Stop,Stop
yolo-bbox-crops-aspects-traffic-signs/Stop/003692-0.jpg,yolo-bbox-crops-aspects-traffic-signs/Stop/003692-0.jpg,Stop,Stop
yolo-bbox-crops-aspects-traffic-signs/Stop/003677-0.jpg,yolo-bbox-crops-aspects-traffic-signs/Stop/003677-0.jpg,Stop,Stop
yolo-bbox-crops-aspects-traffic-signs/Stop/003663-0.jpg,yolo-bbox-crops-aspects-traffic-signs/Stop/003663-0.jpg,Stop,Stop
yolo-bbox-crops-aspects-traffic-signs/Stop/003580-1.jpg,yolo-bbox-crops-aspects-traffic-signs/Stop/003580-1.jpg,Stop,Stop


## Run MLfix

We can use the DataFrame we created to start an MLfix session. The photos will be grouped by the `label` column (the `group` keyword argument) which are the original ground truth labels, we will display the fixed label 
below each image (the `label=`) and display the images zoomed up a bit at 200px.

We also give the paths to the class SVG icons so MLfix can display them instead of the class names which are
not very readable.

In the UI you can select the invalid crops by clicking on them. Also check the top bar for filters, for example you can easily show only the invalid crops.

The MLfix call returns a new Pandas Series that's going to be updated as we select the invalid samples in the UI. We assign it to a variable so we will be able to inspect the results later on.

FIXME: we cannot assign to a new column because that seems to break the reference and assign a copy instead (which won't be updated when we use the UI). At one point the code just updated the column pointed to by the `label=` kwarg and we should probably get back to doing that.

In [12]:
new_labels = MLfix(
    speed_limits,
    group='label',
    label='new_label',
    size=200)


label


yolo-bbox-crops-aspects-traffic-signs/Speed-Limit-30/000792-1.jpg label = invalid
yolo-bbox-crops-aspects-traffic-signs/Speed-Limit-30/000818-0.jpg label = invalid
yolo-bbox-crops-aspects-traffic-signs/Speed-Limit-30/000215-0.jpg label = invalid
yolo-bbox-crops-aspects-traffic-signs/Speed-Limit-30/000215-0.jpg label = Speed-Limit-30
yolo-bbox-crops-aspects-traffic-signs/Speed-Limit-30/000215-0.jpg label = invalid
yolo-bbox-crops-aspects-traffic-signs/Speed-Limit-30/000869-0.jpg label = invalid
yolo-bbox-crops-aspects-traffic-signs/Speed-Limit-30/000477-0.jpg label = invalid
yolo-bbox-crops-aspects-traffic-signs/Speed-Limit-30/000483-0.jpg label = invalid
yolo-bbox-crops-aspects-traffic-signs/Speed-Limit-30/000483-0.jpg label = Speed-Limit-30
yolo-bbox-crops-aspects-traffic-signs/Speed-Limit-30/000483-0.jpg label = invalid


Let's see how many mistakes did we manage to find?

In [16]:
print(f"{(speed_limits['label'] != new_labels).mean() * 100:.2f} %")

2.34 %


We can now update the CSV file and save our work:

In [18]:
new_labels[new_labels == 'invalid'].to_csv('invalid-traffic-signs.csv')

In [None]:
https://localhost:8888/proxy/47997/mlfix-kjKg5/imgs/yolo-bbox-crops-aspects-traffic-signs/Speed-Limit-30/000483-0.jpg