### CVMA (H) Assessed Exercise: Deep Convolutional Neural Networks & Analytic Convolutions
<font color="red"> This exercise work **must** be your own work. A declaration of originality is required. </font>

**There will be a compulsory lab question on this exercise in the May 2018 degree exam: the exam paper will comprise a total of _4 questions_ of which _3 questions_ are to be answered by the candidate as follows:**

* **One question is compulsory** and examines this Assessed Exercise.
* **Two questions** are to be selected by the candidate from the remaining three questions set in the paper.

**Make sure you complete this exercise, and understand the material thoroughly.**

Do **_not_** copy and paste code from elsewhere into the exercise, **except** for code from the CVMA (H) lecture or lab notes.

**READ THESE INSTRUCTIONS CAREFULLY BEFORE YOU START WORK**

-----

## Submission, deadline, feedback
**Submission** Your submission will be the notebook **exercise.ipynb** with all of the code and your written work included. You should not use external files. Make sure you have run all the cells before you submit.

**Deadline** You must submit this on Moodle by **4.30pm, Monday 3rd December 2018**

**Feeback** You will receive feedback on your submission in the New Year. There is *no mark* assigned, but you *will* get an annotated version of your notebook with individual formative feedback on your work.



-----
## Learning Objectives

1. Understand Deep Learning concepts by hands-on coding and experience.
1. Learn PyTorch and how to construct, train and evaluate a DCNN to undertake a computer vision classification task.
1. Obtain a qualitative feel for how DCNNs operate by exploring their parameter space - “What do I need to adjust to achieve X?”
1. Understand concepts such as overfitting/underfitting, how to diagnose this and how to address it.
1. Understand how to combine DCNN methods with conventional image analysis within the same network formulation.
1. Understand issues regarding data efficiency when training networks.
1. Understand and explore the dichotomy between learned data processing and analytically defined data processing approaches.


## Introduction

The objective of this exercise is to understand how the classification accuracy and loss produced when training an DCNN vary according to the key parameters of the network, such as it's layer structure, activation function, level of inter-layer dropout and pooling method. The number of training cycles and batch size is also explored in this context.

*Data efficiency* is a key issue when desiogning and training DCNNs. Accordingly there is a dichotomy between preprocessing input/training data to reduce its degree of variability, to thereby reduce training times and also the need for very large training sets. An active area of reserach is using pre-defined network layers where the required transformations are well understood, as is the case for image analysis. In other words, why use large amounts of valuable data to train a network layer, when a good, or even optimal, layer function is known via analytic signal processing? In this exercise we shall test the following hypothesis:

**Hypothesis: _it is possible to define analytically a set of kernels capable of deconstructing an input image over spatial scale and space._** The known optimal kernel for deconstructing a visual signal for this task is the Gabor filter. A Gabor filter computes what is known as a *wavelet*, a local Fourier Transform. Gabor filter kernels come in pairs comprising a real and imaginary component, as illustrated: 


<img src="imgs/even_gabor.png" width=30% /> 

**Real Component, phase: $\phi = 0$**

<img src="imgs/odd_gabor.png" width=30% />

**Imaginary Component, phase: $\phi = \pi/2$**

However, we can simply utilise each Gabor kernel within layer one of our DCNN. (Gabor kernels are defined in the section below.) 

The idea behind this part of the exercise is to replace the first conv layer of our DCNN with a bank of 96 Gabor filter kernels (48 real and 48 imaginary) for a number of different scales and orientations defined in the specification section below. We can then freeze this layer while traning the remaining layers in the DCNN and compare the classification accuracy and loss for this DCNN to that of the standard DCNN.

We can also hand-design our own filters, perhaps tuned to respond to specific image patterns that we want to be sure our DCNN will detect. The exercise suggests how to design a radial and tangential family of kernels. 

Finally, we explore what happens if we employ various combinations of analytic, hand-designed and learned filters.

The focus of this exercise is exploring  DCNN properties, as opposed to programming, hence the coding required has been kept to a minimum. However, a degree of literacy with the relevant Python packages (Pytorch, matplotlib etc.) is expected.



### Gabor Filters
The Gabor filter (assumed to be centered at zero) is the product of a sinusoid and a Gaussian:

  $$g(x,y;\lambda,\theta,\phi,\gamma) = \exp\bigg(-\frac{x'^2+\gamma^2 y'^2}{2\sigma^2}\bigg) \cos\bigg(2\pi\frac{x'}{\lambda} + \phi\bigg),$$

where

  $$x'  = x\cos\theta + y\sin\theta,$$
  $$y'  = -x\sin\theta + y\cos\theta.$$

The filter has the following characteristics:

- Wavelength.  The number of cycles/pixel is $\lambda$.
- Orientation.  The angle of the normal to the sinusoid is $\theta$.
- Phase.  The offset of the sinusoid is $\phi$.
- Aspect Ratio.  Ellipticity is produced with $gamma < 1$.
- The spatial envelope of the Gaussian, $\sigma$, is controlled by the bandwidth, which at unity gives $\sigma = 0.56\lambda$.


**Gabor filters are a good model of _simple cells_ in the human visual cortex.**

The half-response spatial frequency bandwidth $b$ (in octaves) of a Gabor filter is related to the ratio $\sigma/\lambda$:

$$b  = \log_2 \frac{(\sigma/\lambda)\pi + \sqrt{\log 2/2}}{(\sigma/\lambda)\pi - \sqrt{\log 2/2}} \\
\frac{\sigma}{\lambda}  = (1/\pi) \sqrt{\log 2/2} \frac{2^b+1}{2^b-1}.$$

The value of $\sigma$ cannot be specified directly. It can only be changed through the bandwidth $b$. The bandwidth value must be specified as a real positive number. Default is 1, in which case $\sigma$ and $\lambda$ are connected as follows: $\sigma = 0.56\lambda$. The smaller the bandwidth, the larger $\sigma$, the support of the Gabor function and the number of visible parallel excitatory and inhibitory stripe zones. Gabor maths above adapted from: http://people.csail.mit.edu/tieu/notebook/gabor.tex

Examples of a few Gabor kernels (selected from the 96 filters specified in the exercise) are shown below:

<img src="imgs/gabors.png" width=600px>

A complete family of Gabor filters is shown below:

<img src="imgs/gabor_family.png" width=1000px>

Fortunately, **we don't need to worry about all the above maths too much**, we can easily obtain Gabor filter kernels by calling the following OpenCV commands:


`real_kernel = cv2.getGaborKernel((W, H), 1, orientation, scale, 1, 0)`

`imaginary_kernel = cv2.getGaborKernel((W,H), 1, orientation, scale, 1, np.pi / 2)`

Where: 
`cv2.getGaborKernel(ksize, sigma, theta, lambd, gamma[, psi[, ktype]]) → retval`

Parameters:

* ksize – Size of the filter returned.
* sigma – Standard deviation of the gaussian envelope.
* theta – Orientation of the normal to the parallel stripes of a Gabor function.
* lambd – Wavelength of the sinusoidal factor.
* gamma – Spatial aspect ratio.
* psi – Phase offset.
* ktype – Type of filter coefficients. It can be CV_32F or CV_64F .



-----
### Specification

This exercise is based on reference to the materiasl in Lab 5, the tutorials and notebooks on the PyTorch website on Deep Learning and Pytorch, and the codes in the lectures and the assessed exercise notebook iteself. 

**Phase 1:**
1. Run the the basic DCNN to classify the MNIST data set of handwritten numeric digits supplied in the exercise notebook.
1. Modify this code to plot out the network's classification accuracy and loss during as a function of training epoch. 
 1. The plots of accuracy and loss should be overlaid in the same display frame, with different colours for the loss and accuracty respectively so that their relationship can be compared directly as a function of training epoch.
1. Investigate and compare the **baseline** performance of this system, i.e. in the above unmodified system, to when it has been modified by:
 1. Adding/removing layers: CNN layers and fully connected layers
 1. Varying batch sizes
 1. Varying numbers of training epochs
 1. Applying different levels of drop-out
 1. Applying different types of pooling
 1. Applying different types of activation function
1. Following an initial ad-hoc investigation by hand:
 1. Undertake a systematic investigation by writing a jupyter-notebook script to _automate_ the process. 
 1. Document your investigation as you undertake it. 
 1. Due to the limited time available for this lab, a realtively coarse, sequential (rather than _combinatorial_) exploration of the parameters is acceptable. Plot your baseline and best accuracy and loss results.

**Phase 2**
1. Modify the above network to employ a pre-computed filter-bank for the first DCNN layer:
 1. Code a set of Gabor convolution filters from first principles using Numpy (using the Gabor filter kernel generator: cv2.getGaborKernel(), documented at: https://docs.opencv.org/3.0-beta/modules/imgproc/doc/filtering.html )
 1. Use an initial Gabor filter parameterisation comprising 6 scales in the range *scale* = 2..3 for 8 *orientations* in the range 0..2$\small\pi$ computer for both the real and imaginary filter components, giving a total of 96 filter kernels. 
 1. A minimum kernel size to investigate for the Gabor filter is 5$\small\times$5, but larger sizes such as 11$\small\times$11 or 13$\small\times$13 should perform better at the expense of training and run times.
 1. Each pair of Gabor kernels for a specific parameter instance can be generated using the following OpenCV calls (for a 5$\small\times$5 kernel): 
  1. `real_kernel = cv2.getGaborKernel((5, 5), 1, orientation, scale, 1, 0)`
  1. `imaginary_kernel = cv2.getGaborKernel((5, 5), 1, orientation, scale, 1, np.pi / 2)`
 1. Display the kernels you have produced all together in a "composite" image.
1. Use these kernels to initialise the first layer of the deep net implementation and adjust the remainder of the network definition accordingly.
1. Train this new net using the best parameterisation discovered for the regular DCNN above. Remember to _fix_ (i.e. **don't update the first layer during training**) the Gabor filter kernels you've initialised - refer to the **torch.nn documentation** on the Pytorch website on how to do this.
1. Undertake a limited exploration of the new systems parameter space and compare with the best parameterised standard system, plotting your baseline and best accuracy and loss results as in Phase 1 above. 

**Phase 3**
1. Now construct your own hand-designed set of binary convolution features:
 1. Only Kernel values of [-1,0,1] permitted. Though you may require to scale these kernel values (by dividing through by the kernel's modulus) to ensure the kernel output stays in the range: [-1,0,1].
 1. Rotationally symmetric kernels would be a good starting pladce to consider - a simplified "dart-board pattern" with comprising +1/-1 sectors (figure below shows the simples two-sector pattern) in effects computes tangential deriviatives, while patterns comprising concentric rings compute radial derivatives. 
 <img src="imgs/filters.png" width=600px>
 1. By varying the numbers of radial or concentric sectors, a family of features can be constructed to characterise the radial and tangential pattern space.  
 1. Design such a set of filters to explore for a set of orientations and scale factors.
1. Evaluate the DCNN architecture using the hand designed features for layer 1 and cf to the prior architectures.
1. Undertake a limited exploration of the new systems parameter space and compare with the best parameterised standard system, plotting your baseline and best accuracy and loss results as in Phase 1 above. 


**Phase 4**
1. What happens if the Gabor filters and your hand-defined filters are both used when trainign the DCNN - do the results improve?
1. What happens if we also define a set of kernels to be trained in conjunction with our pre-defined filters, i.e. combine the different pre-defined features and also learned features?
1. If we initialise the DCNN with pre-defined filters, then allow the filters to be updated during training (as opposed to being fixed) what happens - print out the pre-defined filter set before and after training (it might be best to plot them out) to see which if any have changed and in what way.
1. Report on the investigation and interpret the results.



### Evaluation
The codes supplied in the exercise notebook and in the PyTorch tutorials and documentation.
 
*Use loops to automate the experimentation process.*

Discuss your results in prose. 


------
## Report
You will need to document your approach clearly in prose, with appropriate use of figures and diagrams to show **_and interpret_**  what you have done. See below for the report outline and details on how to fill it out. Explain your steps clearly and precisely, but do not write pages of text. **Keep it short and to the point.**  Make *sure* you have run all cells before submission -- I need to be able to see the output in the notebook.

Be brief, but explain yourself clearly. A paragraph or two of text for each section is sufficient, but make good use of images and graphs, especially in the results section. 

**You should combine your actual code with the text of the report in a literate programming style -- e.g. the approach section should contain the processing code, with interspersed Markdown cells documenting what you did. In the results sections, use Markdown cells to explain and interpret what you are showing, and use code cells to actually plot the result.**

----



#### Data Sets
As an alternative to employing the MNIST hand-written character data set, the CIFAR10 data set may also be used, this is a larger and richer data set in terms of visual content, but will require more time to train networks accordingly. Only do this if you feel confident about the task.


### Coding hints
* Use `torch`, `skimage`, `scipy.ndimage` or `OpenCV` to perform image manipulations. 
* Dont forget to copy the python utility funcions supplied with the lectures as required.
* PyTorch turorials and torch.nn documentation essential as well.
* **If you have problems understanding something - ask the lecturer or the lab assistant!**

