The programs in this repository train and use a Single Shot MultiBox Detector to take an image and draw bounding boxes around objects of certain classes contained in this image. The network is based on the VGG-16 model and uses the approach described in this paper by Wei Liu et al. The software is generic and easily extendable to any dataset, although I only tried it with Pascal VOC so far. All you need to do to introduce a new dataset is to create a new file defining it.

Go here for more info.

Pascal VOC Results

Images and numbers speak louder than a thousand words, so here they are:

Example #1 Example #2

Model Training data mAP Train mAP VOC12 test Reference
vgg300 VOC07+12 trainval and VOC07 Test 79.5% 72.3% 72.4%
vgg512 VOC07+12 trainval and VOC07 Test 82.3% 75.0% 74.9%


To train the model on the Pascal VOC data, go to the pascal-voc directory and download the dataset:

cd pascal-voc
cd ..

You then need to preprocess the dataset before you can train the model on it. It's OK to use the default settings, but if you want something more, it's always good to try the --help parameter.


You can then train the whole thing. It will take around 150 to 200 epochs to get good results. Again, you can try --help if you want to do something custom.


You can annotate images, dump raw predictions, print the AP stats, or export the results in the Pascal VOC compatible format using the inference script.

./ --help

To export the model to an inference optimize graph run (use result/result as the name of the output tensor):


If you want to make detection basing on the inference model, check out:


Have Fun!