# Model specification and approach

Now that I have a feeling for the dataset and it's structure. Let's recap the problem and consider what options we have in terms of model choice. Here I brainstorm a few possibilities.

### Problem recap

We want to be able identify the brand of logos present in a test image. These brands should be car and/or clothing brands available in the BelgaLogos dataset. There may be more than one brand present in a test image and we want to be able to identify all of them (as best as we can).


### Problem approaches

The problem could be approached in two ways:

1. A 'one-shot' classifier that both locates the logo in the image, and classifies it.
2. A two-step classifier where one stage locates the logo, and the second classifies it.

There are [many methods available](https://github.com/caocuong0306/awesome-object-proposals) for locating candidate
object locations in an image. The simplest is the **sliding-window** method. In this method, a smaller patch or window of the test image is evaluated for the presence of the desired object. This window is scanned accross the whole test image ([example gif](https://www.pyimagesearch.com/wp-content/uploads/2015/03/sliding-window-animated-adrian.gif)) until the entire image is covered. If the object-classification stage returns a match for a window position, that window is then considered a potential object location. The main drawback of the sliding-window method is slow evaluation (many separate evaluations are needed), this is not a problem for training however, and as time is short, prioritising training efficiency is more important.

### Potential solutions

We shall now survey some possible options (both one-shot, and using sliding-windows) for solving the problem.


 1. [**Template matching**](https://en.wikipedia.org/wiki/Template_matching) (sliding-window)
    - Scans a canonical template image across the test image, finding the location which minimises some measure of image simmilarity between the template and the test image patch.  
    
    - *Pro:* Simple to implement, no need for large training set.
    - *Con:* Very poor robustness w.r.t noise, occlusion, transformations in the test image.


 2. [**Keypoint matching**](https://en.wikipedia.org/wiki/Scale-invariant_feature_transform) (one-shot/sliding window)
    - Models the geometry of a template image using algorithmically determined key-points, and attempts to match these to the key-points of a test image. The key-point characteristics should be reasonably invariant under affine transformations (e.g scaling, rotation, shearing) so as to be able to match differently-transformed test images.
    - *Pro:* No need for large training set, can handle occluded objects, noise etc.
    - *Con:* Potentially tricky to implement, may struggle with lots of false-positives in complex images.
    
    
 3. [**Haar classifiers**](https://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework) (one-shot)
     - Uses a set of hand-constructed convolution kernels ([Haar features](https://en.wikipedia.org/wiki/Haar-like_feature)) as elements in a boosted classifier. Combines several layers (cascade) of classifers to localise object in image.
     - *Pro:* Good performance in the literature w.r.t keypoint/template matching algorithms.
     - *Con:* Needs a very large training datset, careful choice of Haar features.
 
 
 4. [**Convolutional nets**](https://en.wikipedia.org/wiki/Convolutional_neural_network) (one-shot/sliding window)
     - Adaptive, nested convolutional-kernels combined with a neural network classification stage. Far more flexible/powerful than Haar-classifiers but with a correspondingly larger thirst for data and potential to over-fit.
     - *Pro:* State-of-the-art object classification. High-performance general models exist, no need for as much tuning as in Haar classifiers.
     - *Cons:* Requires very large training dataset and training time.
        

## Summary

There are many possible approaches to the problem. Two jump out at me as being the most promising:

1. **Keypoint matching**
2. **Convolutional neural networks**

In terms of training a convolutional network, the small number of images available in the BelgaLogos dataset appears to be a serious problem. In the literature this is typically handled by generating new 'simulated' training data samples. For example

- [Su, Zhu, Gong 2016](https://arxiv.org/pdf/1612.09322.pdf) take a 'canonical' form of the logo, apply random affine transformations to it, and superimpose the transformed logo on a random image. This is then used to train a deep ConvNet.
- The [FlickrBelgaLogos](http://www-sop.inria.fr/members/Alexis.Joly/BelgaLogos/FlickrBelgaLogos.html) takes logos from the original BelgaLogo dataset (the images within annotated bounding-boxes) and applies them ontop of random images from Flickr. This procedure for generating more test samples has also been used in the literature. 

Both of these methods have been used to good effect, but they do add considerable complication to the problem. Deep convolutional neural networks also require a great deal of training time. In order to maximise the likelihood of getting an acceptable solution in the provided timeframe, I'll begin by running a small feasibility analysis for keypoint matching.