We are asked to create a jupyter notebook to train an image segmentation model for damage detection based on the given training data. The following repository provides an implementation of the Mask R-CNN.
Mask R-CNN is basically an extension of Faster R-CNN. Faster R-CNN is widely used for object detection tasks. For a given image, it returns the class label and bounding box coordinates for each object in the image. Faster R-CNN:
- first uses a ConvNet to extract feature maps from the images;
- these feature maps are then passed through a Region Proposal Network (RPN) which returns the candidate bounding boxes;
- we then apply a Region of Interest (RoI) pooling layer on these candidate bounding boxes to bring all the candidates to the same size;
- and finally, the proposals are passed to a fully connected layer to classify and output the bounding boxes for objects.
In addition to the pooling layer to find the RoI, Mask R-CNN generates also the segmentation mask. Once we have the RoIs based on the IoU values, we can add a mask branch to the existing architecture. This returns the segmentation mask for each region that contains an object. Basically, we predict the masks for all the objects in the image.
Mask R-CNN can be thought as a combination of a Faster R-CNN that detects objects (class + bounding box) and a Fully Convolutional Network (FCN) that does pixel-wise boundary.
We have 64 images (49 for training and 15 for validation) of damaged vehicles, already annotated with the proper polygon masks.
In order to train Mask R-CNN model, we defined a custom Dataset class, appropriate to our case, overwriting some functions to loads images and annotations, adding them to my new dataset class. To evaluate our model, two additional functions are used to apply the color splash effect on the image on the damages, once they have been detected by our trained model.
To reduce the training time of Mask R-CNN, we have decided to:
- train model for only 5 epochs;
- reduce number of steps per epoch to 50;
- change the backbone network to
resnet50
; - exploit pre-trained weights on COCO dataset, and train only the heads of the model.
The necessary requirements are specified in requirements.txt. To run the following program on your own machine:
- Clone this repository
- Install dependencies
pip3 install -r requirements.txt
- Run setup from the repository root directory
python3 setup.py install
- Download pre-trained weights on COCO dataset
wget "https://github.com/matterport/Mask_RCNN/releases/download/v1.0/mask_rcnn_coco.h5"
The training, the evaluation and some studies on the dataset and on the final weights of the model are presented in the jupyter notebook train_and_evaluate.ipynb.
Some damage has been correctly identified on the right front wheel, but a major scratch on the lowest part of the frontal car body has not been detected.
No damage has been detected, even if it was easy to spot.
The major drawback of this model is that it requires a lot of time to be properly trained. The performances of our model show that the asssumptions made in the Problem section to reduce the training time have been proven to be a bit too optimistic. The fact that no damage was identified in the second picture may be partially justisfied considering that delta between background color and vehicle damage is smaller with respect to the first test image: this could results in gradients with lower magnitude, thus convergence in gradient descent is harder to reach (especially considering that is combined with a small number of epochs and steps per epoch). Another compromising assumption that may have been wrongly considered is the backbone network (ResNet-50 instead of ResNet-101). Once the model is trained, we can see that bn_conv1
weights hit overflow, however resnet50 pre-trained weights already overflow even before training, so this may be a source for poor performances.
Although the model and its applications seem promising and have shown great results among different domains and tasks, a more deep and thourough training step (that could take up to several days) needs to be considered in order to have a model that behaves well with unseen vehicle footage. Further experiments will be carried out in order to prove its validity.
Mask R-CNN. Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, 2017
Mask R-CNN code. Waleed Abdulla, 2017, Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow