# <strong style="color: tomato;">Deep Learning</strong> $\color{blue}{\text{}}$
---

## <span style="color: yellowgreen;">1. </span>Introduction.

Section Topics and Goals:
- High Level Overview of Machine Learning
- Overview of understanding classification metrics.
- Cover Deep Learning Basics
- Keras Basics
- MNIST Data Overview
- Convolutional Neural Network Theory
- Keras CNN
- Deep Learning on Custom Image Files
- Understanding YOLO v3
- YOLO v3 with Python

## <span style="color: yellowgreen;">2. </span>Machine Learning basics.

Before we dive into Deep Learning, let’s work on understanding the general machine learning process we will be using.

The specific case of machine learning we will be conducting is known as supervised learning. Because different students have different backgrounds in math, we will keep the mathematics behind the machine learning algorithms light. **A great textbook** on general machine learning is **Introduction to Statistical Learning by Gareth James** as a companion book. **It’s freely available online**. Simply google search the title of the book.

Machine learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look.

Where is Machine Learning used:
- Fraud detection.
- Web search results.
- Real-time ads on web pages 
- Credit scoring and next-best offers.
- Prediction of equipment failures.
- New pricing models.
- Network intrusion detection.
- Recommendation Engines
- Customer Segmentation
- Text Sentiment Analysis
- Predicting Customer Churn
- Pattern and image recognition.
- Email spam filtering.
- Financial Modeling

**Supervised learning** algorithms are trained using **labeled** examples, such as an input where the desired output is known. For example, a picture could have a category label, such as either Dog or Cat. The learning algorithm receives a set of inputs along with the corresponding correct outputs, and the algorithm learns by comparing its actual output with correct outputs to find errors. It then modifies the model accordingly. Supervised learning is commonly used in applications where historical data predicts likely future events. 

<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><img style="margin-top: 0.5em; margin-bottom: -0.3em; width: 30%;" src="./src/img/ML/ML_1.png"></div>

Steps in Supervised Learning (graph above):
- Get your data! Customers, Sensors, etc... 
- Clean and format your data (using Keras)
- Split the data
    - Test data (30% of images)
    - Training data (70% of images)
- Apply the model on a training data and let it learn simply on this data and then we build and fit the model to that training data
- Evaluate how well the model performed => take the test data we previously set aside and test the model on it. Essentially we try to predict on the test data where we already know the correct label. This way we can evaluate the model and see how it performed
- Then in case we want to adjust parameters in our model, we can take it back to the training and building phase and then fit the model again and then retest and we can redo the cycle over and over again, until we're finally satisfied with the evaluation metrics that our model is performing with.

Once we've done that, we can deploy our model, essentially predicting on new incoming data that was outside of our original data set.

Image classification and recognition is a very common and widely applicable use of deep learning and machine learning with OpenCV and Keras.
Let’s continue by learning about how to evaluate a classification task.

## <span style="color: yellowgreen;">3. </span>Understanding classification metrics.

We just learned that after our machine learning process is complete, we will use performance metrics to evaluate how our model did.
Let’s discuss classification metrics in more detail!

The key classification metrics we need to understand are:
- Accuracy
- Recall
- Precision
- F1-Score

But first, we should understand the reasoning behind these metrics and how they will actually work in the real world!

Typically in any classification task your model can only achieve two results:
- Either your model was correct in its prediction.
- Or your model was incorrect in its prediction.

Fortunately incorrect vs correct expands to situations where you have multiple classes. For the purposes of explaining the metrics, let’s imagine a **binary classification** situation, where we only have two available classes. In our example, we will attempt to predict if an image is a dog or a cat. Since this is supervised learning, we will first **fit / train** a model on **training data**, then **test** the model on **testing data**. Once we have the model’s predictions from the **X_test** data, we compare it to the true **y values** (the correct labels).
<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><img style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;" src="./src/img/ClassMetr/ClassMetr_1.png"><img style="margin-top: 0.5em; margin-bottom: -0.3em; margin-left: 2em; width: 25%;" src="./src/img/ClassMetr/ClassMetr_2.png"></div>

We repeat this process for all the images in our X test data. At the end we will have a count of correct matches and a count of incorrect matches. The key realization we need to make, is that **in the real world, not all incorrect or correct matches hold equal value!** Also in the real world, a single metric won’t tell the complete story! To understand all of this, let’s bring back the 4 metrics we mentioned and see how they are calculated. We could organize our predicted values compared to the real values in a **confusion matrix**.

**Accuracy**:
- Accuracy in classification problems is the **number of correct predictions** made by the model divided by the **total number of predictions**.
- For example, if the X_test set was 100 images and our model correctly predicted 80 images, then we have 80/100.
  - 0.8 or 80% accuracy.
- Accuracy is useful when target classes are well balanced
  - In our example, we would have roughly the same amount of cat images as we have dog images.
- Accuracy is **not** a good choice with **unbalanced** classes!
- Imagine we had 99 images of dogs and 1 image of a cat. If our model was simply a line that always predicted **dog** we would get 99% accuracy!
  - In this situation we’ll want to understand recall and precision

**Recall**:
- Ability of a model to find all the relevant cases within a dataset. 
- The precise definition of recall is the number of true positives divided by the number of true positives plus the number of false negatives. 

**Precision**:
- Ability of a classification model to identify only the relevant data points.
- Precision is defined as the number of true positives divided by the number of true positives plus the number of false positives. 

**Recall and Precision tradeoffs**:
- Often you have a trade-off between Recall and Precision.
- While recall expresses the ability to find all relevant instances in a dataset, precision expresses the proportion of the data points our model says was relevant actually were relevant.

**F1-Score**:
- In cases where we want to find an optimal blend of precision and recall we can combine the two metrics using what is called the F1 score.
- The F1 score is the harmonic mean of precision and recall taking both metrics into account in the following equation:
$$ F_1 = 2 \times \frac{precision \times recall}{precision + recall}$$
- We use the harmonic mean instead of a simple average because it punishes extreme values. 
- A classifier with a precision of 1.0 and a recall of 0.0 has a simple average of 0.5 but an F1 score of 0. 

We can also view all out correctly classified versus incorrectly classified images in the form of a confusion matrix.
<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><img style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;" src="./src/img/ClassMetr/ClassMetr_3.png"><img style="margin-top: 0.5em; margin-bottom: -0.3em; margin-left: 2em; width: 25%;" src="./src/img/ClassMetr/ClassMetr_4.png"></div>

The main point to remember with the confusion matrix and the various calculated metrics is that they are all fundamentally ways of comparing the predicted values versus the true values.
What constitutes “good” metrics, will really depend on the specific situation!
We can use a confusion matrix to evaluate our model.
For example, imagine testing for disease.

<div style="margin-top: 0.5em; margin-bottom: 1em; display: flex; flex-direction: row; justify-content: center; align-items: center;">
  <ul style="text-align: center; list-style-type:none;">
      <li>Accuracy:</li>
      <li>Overall, how often is it correct?</li>
      <li>(TP + TN) / total = 150/165 = 0.91</li>
  </ul>
  <img style="margin-left: 2em; display: inline-block; width: 30%;" src="./src/img/ClassMetr/ClassMetr_6.png">
  <ul style="text-align: center; list-style-type:none;">
      <li>Misclassification Rate (Error Rate):</li>
      <li>Overall, how often is it wrong?</li>
      <li>(FP + FN) / total = 15/165 = 0.09</li>
    </ul>
</div>
<!-- & \frac{FP + FN}{total} = \frac{15}{165} = 0.09 & -->
Still confused on the confusion matrix?
No problem! Check out the Wikipedia page for it, it has a really good diagram with all the formulas for all the metrics.
Throughout the training, we’ll usually just print out metrics (e.g. accuracy).

## <span style="color: yellowgreen;">4. </span>Introduction to Deep Learning (THEORY).

The next series of lectures will cover:
- Neurons
- Neural Networks
- Cost Functions
- Gradient Descent and BackPropagation

Keep in mind, Keras will do all of this work for us on the back end with just a few simple line calls, but it's important to have an intuition for what is happening behind the scenes! Keep in mind, the next series of lectures is just intuition and theory only.

### <span style="color: royalblue;">a) </span>Understanding a Neuron - introduction to the Perceptron

In this section we will talk about:
- Biological Neuron
- Perceptron Model
- Mathematical Representation

Before we launch straight into neural networks, we need to understand the individual components first, such as a single “neuron”.
Artificial Neural Networks (ANN) actually have a basis in biology! Let’s see how we can attempt to mimic biological neurons with an artificial neuron, known as a perceptron!

&ensp;

Biological neuron:
- Dendrites (inputs)
- Body (core)
- Axon (output)

The artificial neuron also has inputs and outputs! This simple model is known as a perceptron.
- Input 0
- Input 1
- Output

&ensp;

Simple example of how it can work:
- We have two inputs and an output. Inputs will be values of features. Inputs are multiplied by a weight. Weights initially start off as random. Inputs are now multiplied by weights. Then these results are passed to an **activation function**. Many activation functions to choose from, we’ll cover this in more detail later! For now our activation function will be very simple...
	- If sum of inputs is positive return 1, if sum is negative output 0. In this case 6-4=2 so the activation function returns 1. 
There is a possible issue. What if the original inputs started off as zero? Then any weight multiplied by the input would still result in zero! We fix this by adding in a **bias term**, in this case we choose 1. So what does this look like mathematically? Let’s quickly think about how we can represent this perceptron model mathematically:

$$
\sum_{i=0}^{n} w_ix_i + b \\[0.4em]
\text{where:}
\begin{cases}
&n& & - & \text{number of inputs,} \\
&w& & - & \text{weight of the input,} \\
&x& & - & \text{input,} \\
&b& & - & \text{bias} \\
\end{cases}
$$
&ensp;

Once we have many perceptrons in a network we’ll see how we can easily extend this to a matrix form!

### <span style="color: royalblue;">b) </span>Understanding a Neural Network

We’ve seen how a single perceptron behaves, now let’s expand this concept to the idea of a neural network. Let’s see how to connect many perceptrons together and then how to represent this mathematically.

Multiple Perceptrons Network:
- Input Layer
    - Real values from the data
- Hidden Layers
    - Layers in between input and output
    - 3 or more layers is considered a “deep network”
- Output Layer
    - Final estimate of the output

As you go forwards through more layers, the level of abstraction increases. 
Let’s now discuss the **activation functions** in a little more detail.

Previously our activation function was just  a simple function that output 0 or 1. This is a pretty dramatic function, since small changes aren’t reflected. It would be nice if we could have a more dynamic function, for example the curved line gradually rising from 0 to 1. Lucky for us, this is the 

&ensp;

- **sigmoid function** (values from 0 to 1):
$$
f(x) = \frac{1}{1 + e^{-(x)}}\\[0.5em]
$$
Changing the activation function used can be beneficial depending on the task. Let’s discuss a few more activation functions that we’ll encounter.

&ensp;

- **Hyperbolic Tangent** (values from -1 to 1): $\tanh z \text{, where } z = wx + b$
$$
\cosh x = \frac{e^x + e^{-x}}{2} \\[0.5em]
\sinh x = \frac{e^x - e^{-x}}{2} \\[0.5em]
\tanh x = \frac{\sinh x}{\cosh x} 
$$
&ensp;

- **Rectified Linear Unit** (**ReLU**) (values from 0 to z). This is common and actually a relatively simple function: 
$$
max(0, z) \text{, where } z = wx + b
$$
&ensp;

ReLu and tanh tend to have the best performance, so we will focus on these two. Deep Learning libraries (specifically Keras which we are going to work with) have these built in for us, so we don’t need to worry about having to implement them manually.

As we continue on, we’ll also talk about some more state of the art activation functions.
Up next, we’ll discuss cost functions, which will allow us to measure how well these neurons are performing.

### <span style="color: royalblue;">c) </span>Cost functions

Let’s now explore how we can evaluate performance of a neuron. We can use a **cost function** to measure how far off we are from the expected value.

We’ll use the following variables:
- **y** to represent the true value
- **a** to represent neuron’s prediction

In terms of weights and bias:
- $w x + b = z$
- Pass z into activation function $\sigma(z) = a$

&ensp;

- **Quadratic Cost**:
$$
C = \sum \frac{(y - a)^2}{n}
$$
We can see that larger errors are more prominent due to the squaring. Unfortunately this calculation can cause a slowdown in our learning speed.

&ensp;

- **Cross Entropy** (our default / go to cost function):
$$
C = -\frac{1}{n} \sum{[\;y\ln a + (1 - y)\ln(1-a)\;]}
$$
This cost function allows for faster learning. The larger the difference, the faster the neuron can learn.

&ensp;

We now have 2 key aspects of learning with neural networks, the neurons with their activation function and the cost function. We’re still missing a key step, actually “learning”. We need to figure out how we can use our neurons and the measurement of error (our cost function) and then attempt to correct our prediction, in other words, “learn”. Now we will briefly cover how we can do this with Gradient Descent.


### <span style="color: royalblue;">d) </span>Gradient Descent and Backpropagation

If you’ve dabbled in machine learning before, you may have already heard of **Gradient Descent**. Let’s quickly go over it with a high level overview.

&ensp;

**Gradient descent** is an optimization algorithm for finding the minimum of a function. To find a local minimum, we take steps proportional to the negative of the gradient.
Gradient Descent (in 1 dimension), visually we can see what parameter value to choose to minimize our Cost. Finding this minimum is simple for 1 dimension, but our cases will have many more parameters, meaning we’ll need to use the built-in linear algebra that our Deep Learning library will provide. Using gradient descent we can figure out the best parameters for minimizing our cost, for example, finding the best values for the weights of the neuron inputs.

We now just have one issue to solve, how can we quickly adjust the optimal parameters or weights across our entire network? This is where backpropagation comes in. 

**Backpropagation** is used to calculate the error contribution of each neuron after a batch of data is processed. It relies heavily on the chain rule to go back through the network and calculate these errors. Backpropagation works by calculating  the error at the output and then distributes back through the network layers. It requires a known desired output for each input value (**supervised learning**). 

&ensp;

We now know enough terminology and theory to begin working with Keras! We’ll be using a data set of image **features** of counterfeit versus real bills.
Later on we will learn about **convolutional neural networks** for handling image data directly.

## <span style="color: yellowgreen;">5. </span>Keras Basics.

### <span style="color: royalblue;">a) </span>Introduction

Let’s learn how to create a machine learning model with *Keras* library with *Tensorflow* as a backend. 

We’ll start with some data on currency bank notes. Some of these bank notes were forgeries and others were legitimate. The researchers created a dataset from these bank notes by taking 400x400 images of the notes and then extracting various numerical features based off the wavelets of the images.

&ensp;

**Very important note!**
- The data we’re working with in this lecture **is not** an image. We’re focusing right now on how to use Keras for general machine learning. Insted we are using features that were extracted from the images.

&ensp;

**Our goal** now is to:
- Know how to use Keras, 
- General syntax.
- How to perform a machine learning task such as:
    - Getting data,
    - Reading it in,
    - Splitting it into test set and training set,
    - Fitting the model to that training data,
    - Then predicting a new unseen data such as the test set

Once we learn about Convolutional Neural Networks, then we can expand on Keras to feed in image data (pixel images) into a network. 

### <span style="color: royalblue;">b) </span>Implementation

In [None]:
import numpy as np
from numpy import genfromtxt # generate an array fron a text file 

**Dataset**:

We will use the Bank Authentication Data Set to start off with. This data set consists of various image features derived from images that had 400 x 400 pixels. You should note **the data itself that we will be using ARE NOT ACTUAL IMAGES**, they are **features** of images. In the next lecture we will cover grabbing and working with image data with Keras. This notebook focuses on learning the basics of building a neural network with Keras.

More info on the data set [here](https://archive.ics.uci.edu/ml/datasets/banknote+authentication).

Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.


Attribute Information:

1. variance of Wavelet Transformed image (continuous) 
2. skewness of Wavelet Transformed image (continuous) 
3. curtosis of Wavelet Transformed image (continuous) 
4. entropy of image (continuous) 
5. class (integer)

&ensp;


**Reading in the Data Set**:

We already have a prepared .csv files (comma separated value)

In [None]:
# we are passing the delimiter parameter to specify that the features are separated by a comma
data = genfromtxt('../Computer-Vision-with-Python/DATA/bank_note_data.txt', delimiter=',')
data

We can notice that at the end of the columns we have a 0s and 1s. This indicates whether or not it was an actual authentic note. 0 => forged, 1 => real. This feature is called a **label** or a **class**. We are going to classify - or rather are going to build a machine learning model that can classify - these bank notes just based on the features, without having the label.

To achieve that we first have to **separate the labels from the actual features**:

In [None]:
# we could make it more universal I guess, but here it does not matter
# label_index = len(data[1, :]) - 1
# labels = data[:, label_index]
# features_no = len(data[1, :]) - 1
# features = data[:, 0:features_no]

labels = data[:, 4] # only class telling real / fake
features = data[:, :4] # only features, no class

**Split the Data into Training and Test**:

Its time to split the data into a train / test set. Keep in mind, sometimes people like to split 3 ways, train/test/validation. We'll keep things simple for now.

&ensp;

By convention we use a capital X for features and lowercase y for label (due to the mathematical notation used in the papers, and X is a 2D matrix and y is a singular array / vector). We can still just use the 'features' and 'labels' notation if we want.

In [None]:
X = features
y = labels

# it is going to split the features and the labels into a train set and training set
#  this (train_...) also does randomized shuffling so we do not havr to worry about this concern that the labels happen to be sorted order
# -> this will automatically shuffle them for us
from sklearn.model_selection import train_test_split

# passing the X features; y labels; test size of 33%; random_state => seed to have the same shuffle every time
# why 42? => https://news.mit.edu/2019/answer-life-universe-and-everything-sum-three-cubes-mathematics-0910
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) # copied straight fron the docs

**Standardizing the Data**:

Usually when using Neural Networks, you will get better performance when you standardize the data. Standardization just means normalizing the values to all fit between a certain range, like 0-1, or -1 to 1.

The [scikit learn library](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) also provides a nice function for this.

In [None]:
# force all the feature data to fall within a certain range
# this can actually help the neural network perform better
from sklearn.preprocessing import MinMaxScaler
scaler_object = MinMaxScaler()

# fit the scaler object to our training data
# fit() finds the min and max value and then transform() is transforming the given array based on the MinMax we just calculated durring the fit
scaler_object.fit(X_train)
# we only fit to X_train and not X_test BECAUSE we do not want the scaler_object to peek at any test data - it would be cheating. If we would do that it is called data leakage and is essentially cheating. So we fit to the train data but transform both
scaled_X_train = scaler_object.transform(X_train)
scaled_X_test = scaler_object.transform(X_test)

Ok, now we have the data scaled. We can move on to building the network with Keras.

&ensp;

**Building the Network with Keras**:

Building a simple neural network.

In [None]:
from keras.models import Sequential
from keras.layers import Dense

# creates the model
model = Sequential()

# adding the layers
# add the dense layer, expecting 4 features (we have 4 neurons), input dimention; activation function ReLu
model.add(Dense(4, input_dim = 4, activation = 'relu'))

# here we can play arround with the neurons; too large / too small => bad results; we can do 1x or 2x input dimensions; we do not specify the input dim as it is not the input layer - it is a hidden layer
model.add(Dense(8, activation= 'relu'))

# 1 because we only have 1 neuron which has 1 output and is outputting the result of either 0 or 1; activation type sigmoid => fit between 0 and 1
model.add(Dense(1, activation= 'sigmoid'))

**Compile Model**:

After setting up the model as sequential and adding the layers, we have to **compile** the model. For the compilation process we have to choose a loss and optimizer, and the metrics that we are concerned about during fitting.

In [None]:
model.compile(loss= 'binary_crossentropy', optimizer= 'adam', metrics= ['accuracy'])

**Fit (Train) the Model**:

epochs (iterations on a dataset) - Trains the model for a given number of epochs (iterations on a dataset). 1 epoch => 1 iteration through all the data in dataset. Depending on the speed of the PC we should pick a lower number of epochs.

verbose - reporting back / printing the results as it is training

In [None]:
model.fit(scaled_X_train, y_train, epochs= 50, verbose= 2)

**Predicting New Unseen Data**:

Let's see how we did by predicting on **new data**. Remember, our model has **never** seen the test data that we scaled previously. This process is the exact same process you would use on totally brand new data. For example, a brand new bank note that you just analyzed .

In [None]:
# Spits out probabilities by default.
# model.predict(scaled_X_test)

model.predict_classes(scaled_X_test)

**Evaluating Model Performance**:

So how well did we do? How do we actually measure "well". Is 95% accuracy good enough? It all depends on the situation. Also we need to take into account things like recall and precision.

In [None]:
model.metrics_names

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

predictions = model.predict_classes(scaled_X_test)

# we have the answers because we have the y_test vector
confusion_matrix(y_test, predictions)

# [True Negative, False Negative]
# [False Positive, True Positive]

In [None]:
# displaying the metrics
print(classification_report(y_test, predictions))

**Saving and Loading Models**:

Now that we have a model trained, we can save and load it.

In [None]:
# save the model
model.save('../myTestModel.h5')

In [None]:
# load the model
from keras.models import load_model
newmodel = load_model('../myTestModel.h5')

In [None]:
# use the loaded model to predict classes
newmodel.predict_classes(scaled_X_test)

## <span style="color: yellowgreen;">6. </span>MNIST Dataset.

A classic data set in Deep Learning is the **MNIST data set**. Let’s quickly cover some basics about it since we’ll be using similar data concepts quite frequently during this section.

Fortunately this data is easy to access with Keras. The data set has:
- 60,000 training images
- 10,000 test images

&ensp;

The MNIST data set contains handwritten single digits from 0 to 9. A single digit image can be represented as an array. Specifically, 28 by 28 pixels. The values represent the grayscale image. Often, when working with image data, we will normalize the data to fall in range of 0 and 1 and this data set has it already set for us. We can think of the entire group of the 60,000 images as a 4-dimensional array. 60,000 images of 1 chanel 28 by 28 pixels. 

This array has **4** dimensions
- (60000, 28, 28, 1)
- (Samples, x, y, channels)
- For color images, the last dimension value would be 3 (RGB)

&ensp;

For the labels we’ll use One-Hot Encoding. This means that instead of having labels such as “One”, “Two”, etc… we’ll have a single array for each image. This means if the original labels of the images are given as a list of numbers
- [5, 0, 4, ...., 5, 6, 8]
- We will convert them to one-hot encoding (easily done with Keras)

The label is represented  based off the index position in the label array. The corresponding label will be a 1 at the index location and zero everywhere else. For example, a drawn digit of 4 would have this label array:
- [0, 0, 0, 0, **1**, 0, 0, 0, 0 , 0]

As a result, the labels for the training data ends up being a large 2-d array (60000, 10).

## <span style="color: yellowgreen;">7. </span>Convolutional Neural Networks.

### <span style="color: royalblue;">a) </span>Part one - Convolutional Neural Networks

We just created a Neural Network for already defined features. But what if we have the raw image data? We need to learn about **Convolutional Neural Networks** in order to effectively solve the problems that image data can present.

Just like the simple perceptron, CNNs also have their origins in biological research. Hubel and Wiesel studied the structure of the visual cortex in mammals, winning a Nobel Prize in 1981. Their research revealed that neurons in the visual cortex had a small local receptive field. This idea then inspired an ANN architecture that would become CNN. Famously implemented in the 1998 paper by Yann LeCun et al. The LeNet-5 architecture was first used to classify the MNIST data set.

When learning about CNNs you’ll often see a diagram like this:
<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;">
<img src="./src/img/CNN/CNN_1.png">
</div></div>

Let’s break down the various aspects of a CNN seen here:
- Tensors
- DNN vs CNN
- Convolutions and Filters
- Padding
- Pooling Layers
- Review Dropout

&ensp;

Recall that Tensors are N-Dimensional Arrays that we build up to:
- Scalar - 3
- Vector - [3, 4, 5]
- Matrix - [ [3, 4] , [5, 6] , [7, 8] ]
- <span style="color: #fff;">Tensor - </span>
<span style="color: pink;"> [
  <span style="color: yellow;">[<span style="color: tomato;"> [ 1, 2], </span><span style="color: green;">[ 3, 4]</span> ],</span>
  <br>&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;
  <span style="color: royalblue;">[ [ 5, 6] , [ 7, 8] ]</span>
]</span>

<!-- - Tensor - $\color{green}{\text{[}}
\color{orange}{\text{[\color{red}{\text{[ 1, 2]}}, [ 3, 4] }}],\\[0.3em]
\qquad\quad\;\color{yellow}{\text{[[ 5, 6] , [ 7, 8]]}} \color{green}{\text{]}}$ -->

Tensors make it very convenient to feed in sets of images into our model - (I,H,W,C):
- I : Images
- H: Height of Image in Pixels
- W: Width of Image in Pixels
- C: Color Channels: Grayscale - 1 channel, RGB - 3 channels

Now let’s explore the difference between a Densely Connected Neural Network and a Convolutional Neural Network. Recall that we’ve already been able to create DNNs with tf.estimator API.

<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><span>Densely Connected layer:</span><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;">
<img src="./src/img/CNN/CNN_2.png">
</div><span>Convolutional Layer:</span><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;">
<img src="./src/img/CNN/CNN_3.png">
</div></div>

- In **densely connected layer** every neuron in 1 layer is direcly connected to every other neuron in the next layer.
- In **convolutional layer** each unit is connected to a smaller number of nearby units in next layer.

So why bother with a CNN instead of a DNN?
- The MNIST dataset was 28 by 28 pixels (784 total), but most images are at least 256 by 256 or greater (<56k total, and this is a small image with resolution of just 256px!). This leads to too many parameters, unscalable to new images.
- Convolutions also have a major advantage for image processing, where pixels nearby to each other are much more correlated to each other for image detection.
- Each CNN layer looks at an increasingly larger part of the image. Having units only connected to nearby units also aids in *invariance*.
- CNN also helps with regularization, limiting the search of weights to the size of the convolution.

&ensp;

Let’s explore how the convolutional neural network relates to image recognition. We start with the input layer, the image itself. Convolutional layers are only connected to pixels in their respective fields.
<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><span></span><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;">
<img src="./src/img/CNN/CNN_4.png">
</div></div>

We run into a possible issue for edge neurons. There may not be an input there for them. We can fix this by adding a “padding” of zeros around the image.

<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><span></span><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;">
<img src="./src/img/CNN/CNN_5.png">
</div></div>

&ensp;

Let’s walk through 1-D Convolution in more detail, then expand this idea to 2-D Convolution. Let’s revisit our DNN and convert it to a CNN.

<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><div style="margin-top: 0.5em; margin-bottom: -0.3em; margin-left: 1em; width: 25%;"><span>A DNN:</span>
<img style="" src="./src/img/CNN/CNN_6.png">
</div><div style="margin-top: 0.5em; margin-bottom: -0.3em; margin-left: 1em; width: 25%;"><span>1-D Convolution:</span>
<img style="" src="./src/img/CNN/CNN_7.png">
</div></div>

We can treat these weights as a filter (1-D Convolution).
$$
y = w_1 x_1 + w_2 x_2 \\[0.3em]
\text{if} (w_1, w_2) = (1, -1) \text{, then: } y = x_1 + x_2 \\[0.3em]
\text{When is $y$ at a maximum?} \\[0.3em]
(x_1, x_2) = (1, 0)
$$

We now have a set of weights that can act as a filter for edge detection. We can then expand this idea to multiple filters.

<div style="display: flex; justify-content: center; align-items: center;"><ul><li style="list-style: none;">Example:</li><li>Filters: 1</li><li>Filter Size: 2</li><li>Stride: 2 / 1 (1 unit at a time)</li></ul><div style="margin-top: 0.5em; margin-bottom: -0.3em; margin-left: 2em; width: 25%;">
<img src="./src/img/CNN/CNN_8.png">
</div></div>

In the example above we are saying that we have 1 filter, the size of the filter is 2 because we have 2 weihgts, and the stride is 2 or 1.

*The stride of 2 means we are moving up these weights two neurons at a time. So when we start from the bottom we repeat the weights every two neurons along the input. That is known as a stride of this filter. When the stride of the filter is set to 1, as we can see on the image above (where we aply 4 filters), that means we repeat the weights every neuron.*

Remember that:
- We can add zero padding to include more edge pixels to finsh off that stride count.
- Each filter is detecting a different feature. 

&ensp;

For simplicity, we begin to describe and visualize these sets of neurons as blocks instead:

<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;">
<img src="./src/img/CNN/CNN_9.png">
</div></div>

Let’s now expand these concepts to 2-D Convolution, since we’ll mainly be dealing with images.
We have the section of input image and that section is coresponding to the section of tensors.

<div style="margin-bottom: 2em; display: flex; justify-content: center; align-items: center; text-align: center;"><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;"><span>2D Convolution:</span>
<img src="./src/img/CNN/CNN_10.png">
</div><div style="margin-top: 0.5em; margin-bottom: -0.3em; margin-left: 2em; width: 25%;"><span>2D Color images:</span>
<img src="./src/img/CNN/CNN_11.png">
</div></div>

<div style="display: flex; flex-direction: row; flex-wrap: wrap; justify-content: center; align-items: center;">

  <div style="display: flex; flex-direction: column; justify-content: center; text-align: center; width: 35%; height: 40%; border: 1px solid #000;">
    <span>Filters are commonly visualized with grids:</span>
    <img style="max-width: 100%; margin: 1em; height: auto;" src="./src/img/CNN/CNN_12.png">
  </div>

  <div style="display: flex; flex-direction: column; justify-content: center; text-align: center; width: 35%; height: 40%; border: 1px solid #000;">
    <span>Stride Distance of 1 Example:</span>
    <img style="max-width: 100%; margin: 1em; height: auto;" src="./src/img/CNN/CNN_13.png">
  </div>

  <div style="display: flex; flex-direction: column; justify-content: center; text-align: center; width: 35%; height: 40%; border: 1px solid #000;">
    <span>Representation of Multiple Filters:</span>
    <img style=" max-width: 100%; margin: 1em; height: auto;" src="./src/img/CNN/CNN_14.png">
  </div>

  <div style="display: flex; flex-direction: column; justify-content: center; text-align: center; width: 35%; height: 40%; border: 1px solid #000;">
    <span>At the original CNN diagram, exactly what we saw here?:</span>
    <img style=" max-width: 100%; margin: 1em; height: auto;" src="./src/img/CNN/CNN_15.png">
  </div>

</div>


Now it is time to discuss subsampling (pooling)


### <span style="color: royalblue;">b) </span>Part two - pooling layers

Now that we understand convolutional layers, let’s discuss **pooling layers**. Pooling layers will subsample the input image, which reduces the memory use and computer load as well as reducing the number of parameters. 

Let’s imagine a layer of pixels in our input image. For our MNIST digits set, each pixel had a value representing “darkness”. We create a 2 by 2 pool of pixels (known as a kernel) and evaluate the maximum value:

<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;">
<img src="./src/img/CNN/CNN_16.png">
</div></div>

Only the max value makes it to the next layer. We then move over by a “stride”, in this case, our stride is two.

<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;">
<img src="./src/img/CNN/CNN_17.png">
</div></div>

This pooling layer will end up removing a lot of information, even a small pooling “kernel” of 2 by 2 with a stride of 2 will remove 75% of the input data.

Another common technique deployed with CNN is called “Dropout”
- Dropout can be thought of as a form of regularization to help prevent overfitting.
- During training, units are randomly dropped, along with their connections.
    - This helps prevent units from “co-adapting” too much.

Let’s also quickly point out some famous CNN architectures:
- LeNet-5 by Yann LeCun
- AlexNet by Alex Krizhevsky et al.
- GoogLeNet by Szegedy at Google Research
- ResNet by Kaiming He et al.
- Check out the resource links to the papers discussing these architectures!

<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;">
<img src="./src/img/CNN/CNN_18.png">
</div><span style="margin: 1em;">AlexNet</span><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;">
<img src="./src/img/CNN/CNN_19.png">
</div></div>

&ensp;

<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;"><span>Convolutional Neural Network</span>
<img src="./src/img/CNN/CNN_1.png">
</div></div>

## <span style="color: yellowgreen;">8. </span>Keras CNN with MNIST dataset.

CNN - Convolutional Neural Networks (in case someone does not remember)

**Convolutional Neural Networks for Image Classification**:

In [None]:
from keras.datasets import mnist

# load the mnist data 
(x_train, y_train), (x_test, y_test) = mnist.load_data()

**Visualizing the Image Data**:

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

x_train.shape # does not have the color channel

# grab a single image
single_image = x_train[0]
plt.imshow(single_image, 'gray') # displays the image BUT it is reversed - the digit is white and bgc is black
plt.imshow(single_image, 'gray_r') # reverse the color mapping


**PreProcessing Data**:

We first need to make sure the labels will be understandable by our CNN.

**Labels**:

In [None]:
y_train

In [None]:
y_test

It looks like our labels are literally categories of numbers. We need to translate this to be "one hot encoded" so our CNN can understand them, otherwise it will think this is some sort of regression problem on a continuous axis. Luckily , Keras has an easy to use function for this:

In [None]:
from keras.utils.np_utils import to_categorical # Converts a class vector (integers) to binary class matrix.

# categorical versions
# labels; no of classes
y_cat_test = to_categorical(y_test, 10)
y_cat_train = to_categorical(y_train, 10)

y_cat_train[0] # we know that is a 5 so we should see the 1 on the index 5 after one hot encoding

**Processing X Data**:

We should normalize the X data, because we have the values in range from 0 to 255 and they should be between 0 and 1.

In [None]:
single_image.max()
# single_image.min()

x_train = x_train / x_train.max() # / 255
x_test = x_test / x_test.max() # / 255

scaled_image = x_train[0]
scaled_image.max()

# we do not see any visual difference
plt.imshow(scaled_image, 'gray_r')

**Reshaping the Data**:

Right now our data is 60,000 images stored in 28 by 28 pixel array formation (60000, 28, 28). 

This is correct for a CNN, but we need to add one more dimension to show we're dealing with 1 RGB channel (since technically the images are in black and white, only showing values from 0-255 on a single channel), an color image would have 3 dimensions.

In [None]:
x_train.shape

Reshape to include channel dimension (in this case, 1 channel)

In [None]:
x_train = x_train.reshape(60000, 28, 28, 1) # just defining that there is a channel there
# we are just clarifying that we have a single channel
x_test = x_test.reshape(10000, 28, 28, 1)

**Training the Model**:

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPool2D, Flatten

# create a model
model = Sequential()

# CONVOLUTIONAL LAYER
# we can play arround with those values but the ones given here are usually a good starting point
# although we can not mess around with the input shape
model.add(Conv2D(filters=32, kernel_size=(4, 4), input_shape=(28, 28, 1), activation='relu'))

# POOLING LAYER
# we can experiment with the pool size
model.add(MaxPool2D(pool_size=(2, 2)))

# FLATTEN IMAGES FROM 28 by 28 to 764 BEFORE FINAL LAYER (2D --> 1D)
# we have to transform the convolutional and pooling layers into something that a single dense layer can understand
model.add(Flatten())

# DENSE HIDDEN LAYER
# here we have 128 neurons in a hidden layer, but we can play arround with these values
model.add(Dense(128, activation='relu'))

# OUTPUT LAYER
# can not play arround with; output 10 labels and specific activation function that will directly output the class that it thinks it is
model.add(Dense(10, activation='softmax'))

# loss: String (name of objective function) or objective function. Configures the model for training.
# optimizer: String (name of optimizer) or optimizer instance. Configures the model for training.
# metrics: List of metrics to be evaluated by the model. Configures the model for training.
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

In [None]:
# Prints a string summary of the network.
model.summary()

**Train the Model**:

This can take a while to compute, change no of epochs if necessary.

In [None]:
model.fit(x_train, y_cat_train, epochs= 2) # have to remember that it has to be a categorical data

**Evaluate the Model**:

In [None]:
model.metrics_names

In [None]:
model.evaluate(x_test, y_cat_test)

**Let us now test our model on images that is has not seen before**:

In [None]:
from sklearn.metrics import classification_report

# predicting on images it does not know
predictions = model.predict_classes(x_test)

# here we are not using the categorical values and not one hot encoded because the predictions have the original format
print(classification_report(y_test, predictions))

Looks like a good performance for me.

## <span style="color: yellowgreen;">9. </span>Keras CNN with CIFAR-10.

We are going to work with color images. This is the famous dataset with color images of various objects.

&ensp;

**The Data**:

CIFAR-10 is a dataset of 50,000 32x32 color training images, labeled over 10 categories, and 10,000 test images.

In [None]:
from keras.datasets import cifar10

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

In [None]:
x_train.shape # (50000, 32, 32, 3)
x_train[0].shape

In [None]:
import matplotlib.pyplot as plt

plt.imshow(x_train[0]) # frog
plt.imshow(x_train[12]) # horse

**PreProcessing**:

In [None]:
x_train.max() # 255
x_train.min() # 0

x_train = x_train / x_train.max()
x_test = x_test / x_test.max()

y_train

**Labels**:

We can see that the labels are in their integer form, so we have to change them to one hot encoded.

In [None]:
from keras.utils import to_categorical

# in this set we have 10 labels (hence the name)
y_cat_train = to_categorical(y_train, 10)
y_cat_test = to_categorical(y_test, 10)

**Building the Model**:

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPool2D, Flatten

model = Sequential()

# in this model we are going to have a 2 convolutional layers set
model.add(Conv2D(filters=32, kernel_size=(4, 4), input_shape= (32, 32, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Conv2D(filters=32, kernel_size=(4, 4), input_shape= (32, 32, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Flatten())

# dense hidden layer
# no of neurons is up to you but peopole tend to go for 2^n 
model.add(Dense(256, activation='relu'))

# LAST LAYER IS THE CLASSIFIER, THUS 10 POSSIBLE CLASSES
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
# verbose: Integer. 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch.
model.fit(x_train, y_cat_train, verbose=1, epochs=10)

In [None]:
model.save('../cifar_10epochs.h5')

**Evaluate the model**:

In [None]:
model.metrics_names

In [None]:
model.evaluate(x_test, y_cat_test)

In [None]:
from sklearn.metrics import classification_report

predictions = model.predict_classes(x_test)

print(classification_report(y_test, predictions))

**Optional - Large Model**:

In [None]:
model = Sequential()

## FIRST SET OF LAYERS

# CONVOLUTIONAL LAYER
model.add(Conv2D(filters=32, kernel_size=(4,4),input_shape=(32, 32, 3), activation='relu',))
# CONVOLUTIONAL LAYER
model.add(Conv2D(filters=32, kernel_size=(4,4),input_shape=(32, 32, 3), activation='relu',))

# POOLING LAYER
model.add(MaxPool2D(pool_size=(2, 2)))

## SECOND SET OF LAYERS

# CONVOLUTIONAL LAYER
model.add(Conv2D(filters=64, kernel_size=(4,4),input_shape=(32, 32, 3), activation='relu',))
# CONVOLUTIONAL LAYER
model.add(Conv2D(filters=64, kernel_size=(4,4),input_shape=(32, 32, 3), activation='relu',))

# POOLING LAYER
model.add(MaxPool2D(pool_size=(2, 2)))

# FLATTEN IMAGES FROM 28 by 28 to 764 BEFORE FINAL LAYER
model.add(Flatten())

# 512 NEURONS IN DENSE HIDDEN LAYER (YOU CAN CHANGE THIS NUMBER OF NEURONS)
model.add(Dense(512, activation='relu'))

# LAST LAYER IS THE CLASSIFIER, THUS 10 POSSIBLE CLASSES
model.add(Dense(10, activation='softmax'))


model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

In [None]:
model.fit(x_train,y_cat_train,verbose=1,epochs=20)

In [None]:
model.evaluate(x_test,y_cat_test)

In [None]:
from sklearn.metrics import classification_report

predictions = model.predict_classes(x_test)

print(classification_report(y_test,predictions))

model.save('larger_CIFAR10_model.h5')

## <span style="color: yellowgreen;">10. </span>Deep Learning on custom images.

**Working with Custom Images**:

So far everything we've worked with has been nicely formatted for us already by Keras. Let's explore what its like to work with a more realistic data set.

In a real world situation, you're not going to be able to just simply upload images using a Keras import tool. Instead, you'll have to work with raw JPEG or PNG files. Now you will see how you can work for your own custom image data sets.

For that we're going to need to download a pre-existing data set that has the raw image files. We're not going to be using an import tool in Keras. Instead, we'll be downloading a dataset of just raw images and we'll be learning how to resize them. Also how to perform different operations on them to make them suitable for training.

And then we'll actually work with those real custom image files to build out a neural network.

&ensp;

**The Data**:

PLEASE NOTE: THIS DATASET IS VERY LARGE. **USE OUR VERSION OF THE DATA. WE ALREADY ORGANIZED IT FOR YOU.**

**ORIGINAL DATA [SOURCE](https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765).**

The Kaggle Competition: [Cats and Dogs](https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition) includes 25,000 images of cats and dogs. We will be building a classifier that works with these images and attempt to detect dogs versus cats!

The pictures are numbered 0-12499 for both cats and dogs, thus we have 12,500 images of Dogs and 12,500 images of Cats. This is a huge dataset!!

&ensp;

**Note: We will be dealing with real image files, NOT numpy arrays. Which means a large part of this process will be learning how to work with and deal with large groups of image files. This is too much data to fit in memory as a numpy array, so we'll need to feed it into our model in batches.**

### <span style="color: royalblue;">a) </span>Part one

To work with custom images we have to have a good file structure. Like 2 folders with names "test" and "train". Then we have to create a separate folder for each category of image (the same in both folders).

**Visualizing the Data**:

Let's take a closer look at the data.

In [None]:
import cv2
import matplotlib.pyplot as plt

cat4 = cv2.imread('../Computer-Vision-with-Python/CATS_DOGS/train/CAT/4.jpg')
cat4 = cv2.cvtColor(cat4, cv2.COLOR_BGR2RGB)

dog2 = cv2.imread('../Computer-Vision-with-Python/CATS_DOGS/train/DOG/2.jpg')
dog2 = cv2.cvtColor(dog2, cv2.COLOR_BGR2RGB)

plt.imshow(cat4)
plt.imshow(dog2)

All the images are different sizes - this is reflective of real world data.

**Preparing the Data for the model**:

There is too much data for us to read all at once in memory. We can use some built in functions in Keras to automatically process the data, generate a flow of batches from a directory, and also manipulate the images.

&ensp;

**Image Manipulation**:

Its usually a good idea to manipulate the images with rotation, resizing, and scaling so the model becomes more robust to different images that our data set doesn't have. We can use the **ImageDataGenerator** to do this automatically for us. Check out the documentation for a full list of all the parameters you can use here!

In [None]:
from keras.preprocessing.image import ImageDataGenerator

image_gen = ImageDataGenerator(rotation_range= 30, # randomly rotate images in set range of degrees 
                               width_shift_range= 0.1,# randomly shift the pic width by a max of 10%
                               height_shift_range=0.1, # randomly shift the pic height by a max of 10%
                               rescale=1/255, # rescale the image by normalzing it
                               shear_range=0.2, # shear means cutting away part of the image / crop it (max 20%)
                               zoom_range=0.2, # zoom in by 20% max
                               horizontal_flip=True, # allow horizontal flipping; there is also a vertical flip but we do not want to train on upside down dogs
                               fill_mode='nearest' # fill in missing pixels with the nearest filled value; fill the gaps
                               )

# how it works?
plt.imshow(image_gen.random_transform(dog2))

**Generating many manipulated images from a directory**:

In order to use .flow_from_directory(), you must organize the images in sub-directories. This is an **absolute requirement**, otherwise the method won't work. The directories should only contain images of one class, so one folder per class of images.

Structure Needed:

- Image Data Folder
    - Class 1
        - 0.jpg
        - 1.jpg
        - ...
    - Class 2
        - 0.jpg
        - 1.jpg
        - ...
    - ...
    - Class n

In [None]:
# we have to run it twice => 1 on test, 1 on train
# here we are not doing anything with the images and we will run this later with additional parameters
image_gen.flow_from_directory('../Computer-Vision-with-Python/CATS_DOGS/train/')
image_gen.flow_from_directory('../Computer-Vision-with-Python/CATS_DOGS/test/')
# Found 18743 images belonging to 2 classes.
# Found 6251 images belonging to 2 classes.

### <span style="color: royalblue;">b) </span>Part two

**Note**: We will introduce some slightly different imports to reflect the most recent changes to Keras. These are just a few different imports:
- MaxPool2D → MaxPooling2D
- Adding in Activation functions separately

&ensp;

**Resizing Images**:

Let's have Keras resize all the images to 150 pixels by 150 pixels once they've been manipulated.

In [None]:
# width, height, channels
image_shape = (150, 150, 3)

**Creating the Model**:

In [None]:
from keras.models import Sequential

# Activation - adding the activation function after providing Dense layer
from keras.layers import Activation, Dropout, Conv2D, Flatten, MaxPooling2D

model = Sequential()

# this is a really complicated task so we will add 3 convolutional layers
model.add(Conv2D(filters=32, kernel_size=(3, 3), input_shape=image_shape, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(3, 3), input_shape=image_shape, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(3, 3), input_shape=image_shape, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())

model.add(Dense(128)) # here we will provide the activation function separately using the newly imported layer
model.add(Activation('relu')) # works exactly the same as if we would provide it inside the Dense

# DROPOUT LAYER
# Dropouts help reduce overfitting by randomly turning neurons off during training.
# Here we say randomly turn off 50% of neurons.
model.add(Dropout(0.5))

# it is binary here => 0 = cat, 1 = dog
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
model.summary()

**Training the Model**:

We need to choose a batch size, and a good starting point is a bach size of 16. Although there is no right / correct answer.

In [None]:
batch_size = 16


train_directory = '../Computer-Vision-with-Python/CATS_DOGS/train/'
test_directory = '../Computer-Vision-with-Python/CATS_DOGS/test/'

# target_size: Tuple of integers (height, width), default: (256, 256). The dimensions to which all images found will be resized.
#class_mode: One of "categorical", "binary", "sparse", "input", or None. Default: "categorical". Determines the type of label arrays that are returned
train_image_generator = image_gen.flow_from_directory(train_directory, 
                                                      target_size=image_shape[:2],
                                                      batch_size=batch_size,
                                                      class_mode='binary')

test_image_generator = image_gen.flow_from_directory(test_directory, 
                                                      target_size=image_shape[:2],
                                                      batch_size=batch_size,
                                                      class_mode='binary')

In [None]:
train_image_generator.class_indices
#{'CAT': 0, 'DOG': 1}

In [None]:
#the way to ignore the warning when the model can not read certain images
import warnings
warnings.filterwarnings('ignore')

In [None]:
# steps_per_epoch: Integer. Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch

results = model.fit_generator(train_image_generator,epochs=1, # should be 100 but we do not have time for that. NOW we dont...;)
                              steps_per_epoch=150, # grab a batch of 16 for 150 times and call it an epoch; more of the limiter because here we have >18k images
                              validation_data=test_image_generator, # 
                              validation_steps=12)

**Evaluating the Model**:

In [None]:
# checking the accuracy of the model
results.history['acc']

# we can plot the results to see how our accuracy changed through each epoch
plt.plot(results.history['acc'])

**Predicting on new images**:

For this we will be using a pre-trained model provided but we can do it ourself.

In [None]:
from keras.models import load_model

new_model = load_model('../Computer-Vision-with-Python/06-Deep-Learning-Computer-Vision/cat_dog_100epochs.h5')

Predicting on the image the model has never seen before

In [None]:
from keras.preprocessing import image
import numpy as np

dog_file = '../Computer-Vision-with-Python/CATS_DOGS/test/DOG/10005.jpg'

# adjust the image to be the size we trained our model on (150, 150)
# it is actually reading the image into PIL image format
dog_img = image.load_img(dog_file, target_size=(150, 150))

# turn the image into array because the model works on array, not PIL Image type. So we have to convert it.
dog_img = image.img_to_array(dog_img)

dog_img.shape # (150, 150, 3) BUT has to be (1, 150, 150, 3)
# we have to expand the dimensions
# You can use reshape. Just in this case, expand_dims easily achieve what we want here (to add another dimension to the array) for model input purpose.
# Expand the shape of an array. Insert a new axis that will appear at the axis position in the expanded array shape.
dog_img = np.expand_dims(dog_img, axis=0)

# normalize the image
dog_img = dog_img / dog_img.max()

In [None]:
prediction = new_model.predict_classes(dog_img) # predicts the class / label
prediction_prob = new_model.predict(dog_img) # says how shure it is about its decision

# Output prediction
print(f'Probability that image is a dog is: {prediction_prob}')

How to create the confusion matrix when we use a ImageDataGenerator: [**link**](https://datascience.stackexchange.com/questions/46182/keras-confusion-matrix-with-predict-generator).

## <span style="color: yellowgreen;">11. </span>YOLO v3 Object detection.

### <span style="color: royalblue;">a) </span>Introduction

Let’s learn about the state of the art image detection algorithm known as **YOLO (You Only Look Once)**. YOLO can view an image and draw bounding boxes over what it perceives as identified classes.

<div style="display: flex; justify-content: center; align-items: center; text-align: center;"><div style="margin-top: 0.5em; margin-bottom: -0.3em; width: 25%;">
<img src="./src/img/YOLO/YOLO_1.png">
</div></div>

We will be using version 3 of the YOLO Object Detection Algorithm, which further improves upon the original implementation in both speed and accuracy.

&ensp;

So what makes YOLO different than other detection algorithms?
- Prior detection systems repurpose classifiers or localizers to perform detection. They apply the model to an image at multiple locations and scales. High scoring regions of the image are considered detections.
- YOLO uses a totally different approach. We apply a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

&ensp;

YOLO has several advantages over classifier-based systems.
- It looks at the whole image at test time so its predictions are informed by global context in the image.
- It also makes predictions with a single network evaluation unlike systems like R-CNN which require thousands for a single image. This makes it extremely fast, more than 1000x faster than R-CNN and 100x faster than Fast R-CNN. 

We will load an already trained YOLO model and see how we can use it with either image or video data.

### <span style="color: royalblue;">b) </span>Python introduction

Let’s now explore how to implement YOLO v3 with Python. We will be using an implementation of YOLO v3 that has been trained on the COCO dataset.

The COCO dataset has over 1.5 million object instances with 80 different object categories. We will use a pre-trained model that has been trained on the COCO dataset and explore its capabilities. Realistically it would take many, many hours of training using a high end GPU to achieve a reasonable model.

Because of this, we will download the weights of the pre-trained network. This network is hugely complex, meaning the actual h5 file for the weights is over 200 MB.

<sup>Once you’ve downloaded the file, you will need to place it in the DATA directory of the YOLO folder.</sup>

Let's see how to use the state of the art in object detection. Since we can't reasonably train the YOLOv3 network ourself, we will use a pre-established version.

[CODE SOURCE](https://github.com/xiaochus/YOLOv3)

REFERENCE (for original YOLOv3): 

      @article{YOLOv3,  
            title={YOLOv3: An Incremental Improvement},  
            author={J Redmon, A Farhadi },
            year={2018} 

**YOU MUST PROPERLY SET UP THE MODEL AND WEIGHTS. THIS NOTEBOOK WON'T WORK UNLESS YOU FOLLOW THE EXACT SET UP.**

### <span style="color: royalblue;">c) </span>Python implementation

This was provided as a ready to use notebook / configuration, which can be found in the "./Assessmests/Solutions Provided/06-YOLOv3/06-YOLO-Object-Detection.ipynb" **BUT** it requires the pretrained model weights in the "/06-YOLOv3/data" directory ("./Assessmests/Solutions Provided/06-YOLOv3/data/yolo.h5"), which is not contained in this repository.