# README

<details>

## Project title

The aim of this project is to evaluate a simple hypothesis:
    
"Can an Autoencoder + Convolutional Neural Network approach perform better than a simple Convolutional Neural Network classifier?"

Autoencoders are neural networks that traditionally look to encode raw pixel data into a few nodes of a hidden layer, and then decode, or reconstruct the image from that 'compression'     

The idea behind the hypothesis is that, the autoencoder might learn to emphasize key distinguishing features when it is trained. Therefore, if we use the encoded representation of the image as input to our CNN, we might achieve better results than if we used the raw image pixel data as input. 
    
## Motivation

The motivation behind this project, was simply to explore what autoencoders can and cannot provide to other algorithms and evaluate their usefulness in deepening any particular algorithm. 
    
Autoencoders are a popular algorithm for beginner deep learning practitioners because the idea is easy to understand, and it has a aesthetic appeal to it where doing something so simple might yield benefits to a classification algorithm in question. 
    
However, as with any other hypothesis, we must evaluate it without bias, using the most important performance metrics (precision, accuracy, f-score, confusion matrix) to determine the efficacy. Mechanistic speculation of the algorithm cannot lead to more objective conclusions, which is why we are testing the algorithm today.
    
 
## Table Of Contents
    
 - Overview
 - Requirements
 - Directory Structure
 - Tests
 - Code Example
 - References
 - Further Avenues to possibly pursue with this code
   
## Overview

We'll be exploring 5 different models and their performance on CIFAR-10 in this project:
    
 - Convolutional Autoencoder 
 - MiniVGGNet
 - ShallowNet
 - Encoder + MiniVGGNet
 - Encoder + ShallowNet
    
In the case of the Convolutional Autoencoder, we'll be testing it's ability to recreate images from the CIFAR-10 dataset, and based on those results we'll use the encoder layers as inputs for both MiniVGGNet and ShallowNet. For the rest, we'll be evaluating the model on their ability to correctly classify images in our test set.
    
## Requirements

 The relevant libraries are listed below: and their versions are listed below. I used a conda virtual environment to run this project, so all the installation instructions will be presumed to be in a conda environment.
- conda 4.8.3
- conda-build 3.18.8
- python 3.7.6
- tensorflow-gpu 2.1 (tensorflow 2.1 should also work here just fine)
- matplotlib 3.1.3
- numpy 1.18.1    
- pandas 1.02
- scikit-learn 0.21.3
- argparse 1.3.0 
- opencv 4.2.0
- seaborn 0.10.0

You can install Anaconda from https://www.anaconda.com/distribution/, and then run the following commands:
    
```python
conda install -c conda-forge python=3.7.6
conda install -c anaconda tensorflow=2.1
conda install -c conda-forge matplotlib=3.1.3 
conda install -c conda-forge numpy=1.18.1
conda install -c anaconda pandas=1.02
conda install scikit-learn=0.21,3
conda install -c anaconda argparse=1.3.0
conda install -c conda-forge opencv=4.2.0
conda install -c anaconda seaborn=0.10.0
```
    
## Code style

If you're using any code style like xo, standard etc. That will help others while contributing to your project. Ex. -

js-standard-style
## Screenshots

Include logo/demo screenshot etc.
Tech/framework used

Ex. -

## Built with

    Electron

## Features

What makes your project stand out?
Code Example

Show what the library does as concisely as possible, developers should be able to figure out how your project solves their problem by looking at the code example. Make sure the API you are showing off is obvious, and that your code is short and concise.
Installation

Provide step by step series of examples and explanations about how to get a development env running.
## API Reference

Depending on the size of the project, if it is small and simple enough the reference docs can be added to the README. For medium size to larger projects it is important to at least provide a link to where the API reference docs live.
## Tests

Describe and show how to run the tests with code examples.
How to use?

If people like your project they’ll want to learn how they can use it. To do so include step by step guide to use your project.
## Contribute

Let people know how they can contribute into your project. A contributing guideline will be a big plus.
Credits

Give proper credits. This could be a link to any repo which inspired you to build this project, any blogposts or links to people who contrbuted in this project.
Anything else that seems useful
## License

A short snippet describing the license (MIT, Apache etc)

MIT © Yourname
</details>

## Bibliography

### Data augmentation: 
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutionalneural networks.  InAdvances in neural information processing systems, pages 1097–1105,2012.<br>
[2] M. Paschali, W. Simson, A. G. Roy, M. F. Naeem, R. Göbl, C. Wachinger, and N. Navab. Dataaugmentation with manifold exploring geometric transformations for increased performanceand robustness.arXiv preprint arXiv:1901.04420, 2019.

### ShallowNet and ImageNet Representations
[3] A. Rosebrock, Deep Learning for Computer Vision, PyImageSearch, https://www.pyimagesearch.com/deep-learning-computer-vision-python-book/, accessed on 24 March 2020, pp 208-210, 231-241.

### Autoencoder Representation
[4] R. Flynn, Convolutional Autoencoders for the Cifar10 Dataset, https://github.com/rtflynn/Cifar-Autoencoder, accessed on 24 March 2020.

## Directory Structure

```bash
C:.
└───Models
    │   convautoencoder_cifar10.py
    │   convautoencoder_minivggnet_cifar10.py
    │   convautoencoder_shallownet_cifar10.py
    │   minivggnet_cifar10.py
    │   shallownet_cifar10.py
    │
    ├───modelcollection
    │   ├───callbacks
    │   ├───nn
    │   │   └───conv
    │   │        │   convautoencoder.py
    │   │        │   convautoencoder_minivggnet.py
    │   │        │   convautoencoder_shallownet.py
    │   │        │   minivggnet.py
    │   │        └───shallownet.py 
    │   ├───plot
    │   └───preprocessing
    │
    └───output
        ├───plots
        │   ├───convautoencoder_minivggnet
        │   ├───convautoencoder_shallownet
        │   ├───conveautoencoder
        │   ├───minivggnet
        │   └───shallownet
        └───weights
            ├───convautoencoder_minivggnet
            ├───convautoencoder_shallownet
            ├───conveautoencoder
            ├───minivggnet
            └───shallownet
```

The main scripts are in the Models folder, with each of the models having it's own script. 

Each of the respective models' classes are in Models/modelcollection/nn/conv if you need to take a look at or alter the structure of the networks.

Each of the models output their relevant performance metricsinto their respective folders
in Models/output/plots, as well as the best performing model weights into Model/output/weights.

## Tests 

All model outputs will be created in the default folders unless otherwise specified by command line arguments.

### Autoencoder

This test should be run first, as we are training the autoencoder separately from the rest of the CNN model. Therefore, to run the Autoencoder + CNN models we need an existing autoencoder model file to exist.

```python
python convautoencoder_cifar10.py
```
<details>
<summary>Optional Arguments</summary>
<br>

- --samples
- number of samples to visualize when decoding, 
- default:8
<br>
    
- --image
- path to output image comparison file 
- default="output/plots/conveautoencoder/autoencoder_only_output.png"
<br>

- --output
- path to output plot file
- default="output/plots/conveautoencoder/autoencoder_only_plot.png"
<br>

- --weights
- path to best model weights file
- default = 'output/weights/conveautoencoder/convautoencoder_cifar10_best_weights.hdf5'

</details>
<details>
<summary>Outputs</summary>
<br>

- Image Output Comparison -> output/plots/conveautoencoder/autoencoder_only_output.png
- Training and Validation Loss Plot -> output/plots/conveautoencoder/autoencoder_only_plot.png
- Best Model (Lowest Validation Loss) -> output/weights/conveautoencoder/convautoencoder_cifar10_best_weights.hdf5
    
</details>

### Autoencoder + MiniVGGNet

```python
python convautoencoder_minivggnet_cifar10.py
```
<details>
<summary>Optional Arguments</summary>
<br>

- --output
- path to the output plot folder
- default="output/plots/convautoencoder_minivggnet"
<br>

- --weights
- path to best model weights file
- default = 'output/weights/convautoencoder_minivggnet/convautoencoder_minivggnet_cifar10_best_weights.hdf5'
<br>
    
- --autoencoder
- path to best autoencoder model weights file
- default = 'output/weights/conveautoencoder/convautoencoder_cifar10_best_weights.hdf5'

</details>
<details>
<summary>Outputs</summary>
<br>
    
- Classification Report -> output/plots/convautoencoder_minivggnet/cifar10_convautoencoder_minivggnet_classification_report.png
- Confusion Matrix -> output/plots/convautoencoder_minivggnet/cifar10_convautoencoder_minivggnet_conf_matrix.png
- Training and Validation Loss Plot -> output/plots/convautoencoder_minivggnet/cifar10_convautoencoder_minivggnet.png
- Best Model (Lowest Validation Loss) -> output/weights/convautoencoder_minivggnet/convautoencoder_minivggnet_cifar10_best_weights.hdf5
    
</details>

### Autoencoder + ShallowNet

```python
python convautoencoder_shallownet_cifar10.py
```
<details>
<summary>Optional Arguments</summary>
<br>

- --output
- path to the output plot folder
- default="output/plots/convautoencoder_shallownet"
<br>

- --weights
- path to best model weights file
- default = 'output/weights/convautoencoder_shallownet/convautoencoder_shallownet_cifar10_best_weights.hdf5'
<br>
    
- --autoencoder
- path to best autoencoder model weights file
- default = 'output/weights/conveautoencoder/convautoencoder_cifar10_best_weights.hdf5'

</details>
<details>
<summary>Outputs</summary>
<br>
    
- Classification Report -> output/plots/convautoencoder_shallownet/cifar10_convautoencoder_shallownet_classification_report.png
- Confusion Matrix -> output/plots/convautoencoder_shallownet/cifar10_convautoencoder_shallownet_conf_matrix.png
- Training and Validation Loss Plot -> output/plots/convautoencoder_shallownet/cifar10_convautoencoder_shallownet.png
- Best Model (Lowest Validation Loss) -> output/weights/convautoencoder_shallownet/convautoencoder_shallownet_cifar10_best_weights.hdf5
    
</details>

### MiniVGGNet

```python
python minivggnet_cifar10.py
```
<details>
<summary>Optional Arguments</summary>
<br>
  
- --output
- path to the output plot folder
- default= "output/plots/minivggnet"
<br>

- --weights
- path to best model weights file
- default = 'output/weights/minivggnet/minivggnet_cifar10_best_weights.hdf5'

</details>
<details>
<summary>Outputs</summary>
<br>
    
- Classification Report -> output/plots/minivggnet/cifar10_minivggnet_classification_report.png
- Confusion Matrix -> output/plots/minivggnet/cifar10_minivggnet_conf_matrix.png
- Training and Validation Loss Plot -> output/plots/minivggnet/cifar10_minivggnet.png
- Best Model (Lowest Validation Loss) -> output/weights/minivggnet/minivggnet_cifar10_best_weights.hdf5
    
</details>

### ShallowNet

```python
python shallownet_cifar10.py
```
<details>
<summary>Optional Arguments</summary>
<br>
    
- --output
- path to the output plot folder
- default= "output/plots/shallownet"
<br>

- --weights
- path to best model weights file
- default = 'output/weights/shallownet/shallownet_cifar10_best_weights.hdf5'

</details>
<details>
<summary>Outputs</summary>
<br>
    
- Classification Report -> output/plots/shallownet/cifar10_shallownet_classification_report.png
- Confusion Matrix -> output/plots/shallownet/cifar10_shallownet_conf_matrix.png
- Training and Validation Loss Plot -> output/plots/shallownet/cifar10_shallownet.png
- Best Model (Lowest Validation Loss) -> output/weights/shallownet/shallownet_cifar10_best_weights.hdf5
    
</details>


## Experiment 



# Introduction

Autoencoders are neural networks that seek to encode data into a latent-space representation, and decode this data to obtain emphasized features. Being conceptually easy to explain, they are notorious for being one of the first networks that practitioners new to the deep learning field learn, holding appeal through it's idealistic simplicity. 

Autoencoders have had various applications over the years, ranging from dimensionality reduction to denoising image datasets. Our aim today, is to use it's most traditional function as an encoder and attempt to answer the question: "Can an Autoencoder + Convolutional Neural Network approach perform better than a simple Convolutional Neural Network classifier on a classification task?"

The main idea of this experiment is to train an encoder, use the layers up till the latent space representation as input for the Convolutional Neural Network, and see if and how it outperforms a simple Convolutional Neural Network approach. We will be looking at the performance of the Autoencoder, some CNN models, and the Autoencoder + the aforementioned CNN models on the CIFAR-10 dataset. 

In particular, we will demonstrate how adding the autoencoder layer to a shallow network such as the aptly-named ShallowNet can increase classification accuracy, but when applied to deeper networks such as MiniVGGNet actually can reduce classification accuracy compared to more classical approaches. It is concluded that this is because shallow networks do not have the depth to extract features, which autoencoders help to provide. In addition, established deeper architectures have specific methods to highlight relevant features, while autoencoders may break away from these methods to highlight different features that are not necessarily useful for the network. 

# Dataset Explanation 

For our experiment, we are going to employ one of the standard datasets used to evaluate image classification networks, CIFAR-10. 

There are a few notable qualities about this dataset:

First, it has 50,000 training images, and 10,000 testing images, with an even split between 10 classes each, giving us 5,000 training images per class and 1,000 testing images per class. While this may not be in the millions like ImageNet, it has an even split between classes to ensure that no one class is trained significantly more than the other.

It should be noted that for this particular experiment, we have the constraint to use only 50% of training data for the following three classes: bird, deer, and truck. Other classes maintain their full amount of training data.

Next, the images themselves are 32 x 32 with 3 channels (RGB). This is a very small amount of data for each image, which makes it difficult to get high accuracy.

Finally, the images, depending on the class, can feature a significant amount of deformation, occlusion, viewpoint variation, and intra-class variation. This not only makes it difficult to gain high accuracy, but also makes the generalizability of the model suffer in question. This means we need both robust architecture and the application of regularization techniques to assist in generalization.

# Model/Architecture explanation

Since our goal is to evaluate whether or not an autoencoder approach works better on simple CNN networks, we will be evaluating the classification task on the CIFAR-10 dataset with two less complex models: ShallowNet and MiniVGGNet[3]. Our choice of models allows us to evaluate the effectiveness of the autoencoder on the simplest of CNNs possible (ShallowNet) as a baseline, then evaluate it on a model with more complexity. Both of these networks will be evaluated on performance with and without the encoder as input.
## ShallowNet

![title](./Images/ShallowNet.png)

As mentioned above, ShallowNet contains just 1 CONV and FC layer each, being the simplest implementation of a CNN possible. By using ShallowNet, we can test the baseline hypothesis that autoencoders are able to provide useful information and depth to very shallow algorithms, establishing that the latent-space representation can have value to a CNN. 
## MiniVGGNet

![image.png](./Images/MiniVGGNet.png)

As VGGNet is usually evaluated on ImageNet instead of CIFAR-10 and is traditionally either 16 or 19 layers, we will be reducing the size of the model as seen in Table 2 to work with our smaller dataset[3]. The reasoning behind choosing MiniVGGNet over another similar depth network is because it employs multiple CONV => RELU before the POOL layer, and doing so allows the model to glean richer features. If the performance of the network still improves with the autoencoder, this indicates the latent-space representation contains a great amount of relevant information for the convolutional layers to use in discerning features. 

The batch normalization and the dropout used in the MiniVGGNet serve to combat overfitting and improve generalizability of our model on the validation data. 

## Autoencoder

![title](./Images/Convolutional_Autoencoder.png)

The autoencoder we are going to use for our model is a purely convolutional model (no fully connected layers) that employs shallow, strided convolutions in place of max pooling, with batch normalization. It has been demonstrated to accurately replicate images[4] from CIFAR-10 with no noticeable differences between the actual image data and the representations from the autoencoder. We will train this model separately, take the trained encoder and apply it to each of the aforementioned models.

Fully connected layers were found to reduce image quality [4], and personal experiments were made to compress the data down to 8 x 8 and 4 x 4 in an attempt improve classification accuracy. However, not only did the classification accuracy decrease, this resulted in the loss of too much detail given the low resolution of the CIFAR-10 images, so the compression was kept to a minimum of 16 x 16 as per the author's reccomendation. Sigmoid was used as the activation function in the last layer to produce output in the range of [0,1]. 

## Autoencoder + ShallowNet, Autoencoder + MiniVGGNet

![title](./Images/Autoencoder+CNN.png)

For our autoencoder + CNN models, we will take the encoding portion of the autoencoder model(the layers up until before the UPSAMPLING layer) and use that as input for our CNN. Note that since the encoder does not heavily compress the data, it is not a necessity to will not need to add any layers to perform upsampling or convolutional transpose layer. However, in future experiments this is an option to consider to possibly raise accuracy,

# Experiment

For our experiment, we will be testing how well the network classifies image labels from CIFAR-10. We will use 50% of the training data from the bird, deer and truck classes and 100% of the training data from the rest of the classes in CIFAR-10. 

## Hyperparameters

For ShallowNet, data augmentation was employed with rotation up to 10 degrees, slight width and height shift range, and horizontal flip. The epochs and the batch size were set to 40 and 32, respectively. For our optimization algorithm, since we are not expecting any troubles converging quickly, SGD was used with a learning rate of 0.01 and no additional parameters. The simplicity of the hyperparameters reflect the simplicity and depth of the overall model here.

For MiniVGGNet, we employed the same data augmentation as ShallowNet, with rotation up to 10 degrees, slight width and height shift range, and horizontal flip applied randomly. 150 epochs were used with batch size of 64, as the model needed an extensive amount of epochs to converge. SGD was used as the optimization algorithm of choice. The learning rate was set at 0.01 initially, and despite incremental raises the network did not converge faster. Insteading the variation in the validation loss increased significantly. Therefore we have kept it set to 0.01 to set a balance between convergence speed and accuracy. The decay was set to 0.01/200, with the momentum = 0.9 and the nesterov acceleration set on to aid in convergence. 

For the autoencoder, 5 epochs [4] was found to be sufficient in ensuring convergence given the small depth of the network. 
Given the low number of epochs, Adam was used as our optimization with a learning rate of 0.001 to support fast convergence. In addition, a batch size of 32 was sufficient to train the network. 

For our Autoencoder + (ShallowNet, MiniVGGNet) models, the same hyperparameters were employed as the base models. The only change present is the increase of epochs to 200 for the Autoencoder + MiniVGGNet model. There experimentation around the learning rate to make convergence faster, yet the best results were found with 200 epochs and a SGD learning rate of 0.01.

## Evaluation Metrics

For the CNN models, we'll be evaluating performance on the classification task by comparing the confusion matrices (actual vs predicted values for each class of CIFAR-10), validation accuracy, precision, recall, and f-score for each model.

For the autoencoders, we'll be comparing the raw image data to the image representations created from the autoencoder and comparing the similarity visually. 

For both models, we'll also be visualizing the training versus validation loss in order to gauge whether the network is overfitting and ensure reasonable convergence.

## Figures



### Autoencoder
<table><tr>
<td> <img src="./Models/output/plots/convautoencoder/autoencoder_only_output.png" width="100"/></td>
<td><img src="./Models/output/plots/convautoencoder/autoencoder_only_plot.png" height="1300" width="600"</td><tr></table>
<table cellspacing = "15"><tr>
<td><img src="./Images/autoencoder_image_replication_caption.png"</td>
<td><img src="./Images/autoencoder_training_and_validation_loss_caption.png"/></td><tr></table>

With our autoencoder, we have quickly reached a nice low plateau for our validation loss. Upon inspection, it is hard to hard any differences the raw image data on the left and the image representations on the right. 

The model has been deemed to work well in recreating the images.

### ShallowNet
<table style="background-color: white;"><tr><td><img src="./Models/output/plots/shallownet/cifar10_shallownet_conf_matrix.png" width="800"/></td><tr><table>
<table><tr><td><img src="./Images/shallownet_conf_matrix_caption.png" width="320"/></td><tr><table>
<table><tr>
<td> <img src="./Models/output/plots/shallownet/cifar10_shallownet.png" width="400"/></td>
<td><img src="./Models/output/plots/shallownet/cifar10_shallownet_classification_report.png" height="1300" width="700"</td><tr></table>
<table cellspacing = "15"><tr>
<td> <img src="./Images/shallownet_training_and_validation_loss_caption.png"/></td>
<td><img src="./Images/shallownet_classification_report_caption.png"</td><tr></table>

The validation loss has nicely plateaud at 40 epochs, indicating that the networks has converged and any further training will lead to overfitting.

Given the precision and recall values of ships, trucks, and automobiles, it is safe to say that the network has not learned enough features to be able to effectively distinguish beyond automobiles to more specific vehicle types (trucks and ships). 

In addition, the network is having difficulty precisely identifying between different small animals, with birds, deer, and dogs often mistaken to be frogs (which have low precision and high recall). Again, the model has not learnt enough features to distinguish past frogs to other small animals.

In theory, we should hope to expect improvements to the aforementioned with an autoencoder input. Let's see if this is the case.

### Autoencoder + ShallowNet
<table style="background-color: white;"><tr><td><img src="./Models/output/plots/convautoencoder_shallownet/cifar10_convautoencoder_shallownet_conf_matrix.png" width="800"/></td><tr><table>
<table><tr><td><img src="./Images/autoencoder_shallownet_conf_matrix_caption.png" width="450"/></td><tr><table>
<table><tr>
<td> <img src="./Models/output/plots/convautoencoder_shallownet/cifar10_convautoencoder_shallownet.png" width="400"/></td>
<td><img src="./Models/output/plots/convautoencoder_shallownet/cifar10_convautoencoder_shallownet_classification_report.png" height="1300" width="700"</td><tr></table>
<table cellspacing = "15"><tr>
<td> <img src="./Images/autoencoder_shallownet_training_and_validation_loss_caption.png"/></td>
<td><img src="./Images/autoencoder_shallownet_classification_report_caption.png"</td><tr></table>

Once again, the validation loss has nicely plateaud at 40 epochs, indicating that the networks has converged. However, there is slight overfitting and any further training will exacerbate this.

Comparing the vehicle evaluation metrics of this model to the ShallowNet, we see that the algorithm has learned features that improve it's ability to move past generalization of vehicles to automobiles and correctly identify between ships, airplanes, and trucks. While this has resulted in a reduction of recall of the automobile class, it makes up for that with the increase in recall for the other vehicles.

We also observe nominal improvements to the precision of the frog class and in recall for the other small animals(excluding dog), indicating that the model has learned a few more specialized features to be able to better distinguish from frogs and other small animals.

While the model still has a ways to go, it is clear that the latent space representation provides more polished features that can disinguish between classes in certain groups (vehicles and small animals).

### MiniVGGNet
<table style="background-color: white;"><tr><td><img src="./Models/output/plots/minivggnet/cifar10_minivggnet_conf_matrix.png" width="800"/></td><tr><table>
<table><tr><td><img src="./Images/minivggnet_conf_matrix_caption.png" width="320"/></td><tr><table>
<table><tr>
<td> <img src="./Models/output/plots/minivggnet/cifar10_minivggnet.png" width="400"/></td>
<td><img src="./Models/output/plots/minivggnet/cifar10_minivggnet_classification_report.png" height="1300" width="700"</td><tr></table>
<table cellspacing = "15"><tr>
<td> <img src="./Images/minivggnet_training_and_validation_loss_caption.png"/></td>
<td><img src="./Images/minivggnet_classification_report_caption.png"</td><tr></table>

For this model, we see convergence of the validation accuracy and stabilization of the validation loss, with only very slight overfitting present.



### Autoencoder + MiniVGGNet
<table style="background-color: white;"><tr><td><img src="./Models/output/plots/convautoencoder_minivggnet/cifar10_convautoencoder_minivggnet_conf_matrix.png" width="800"/></td><tr><table>
<table><tr><td><img src="./Images/autoencoder_minivggnet_conf_matrix_caption.png" width="450"/></td><tr><table>
<table><tr>
<td> <img src="./Models/output/plots/convautoencoder_minivggnet/cifar10_convautoencoder_minivggnet.png" width="400"/></td>
<td><img src="./Models/output/plots/convautoencoder_minivggnet/cifar10_convautoencoder_minivggnet_classification_report.png" height="1300" width="700"</td><tr></table>
<table cellspacing = "15"><tr>
<td> <img src="./Images/autoencoder_minivggnet_training_and_validation_loss_caption.png"/></td>
<td><img src="./Images/autoencoder_minivggnet_classification_report_caption.png"</td><tr></table>

## Results

## Further Exploration

- Make the compression of the encoder stronger and see how that affects model preformance
- Other more complex networks, ResNet, SqueezeNets(see how the small model size performs with autoencoder)
- Limiting the amount of training data for other classes