## Keras Tutorial for beginner - 1

Taking photo is really common nowadays thanks to development of smartphone and maybe social media. So, many people are interested in Image Recognization. Here, I want to guide how to use Keras for image classification using Python. There are several different popular other packages, but Keras is open source neural network library and user-friendly and very easy to learn. Thus, Keras is one of the best way to start to learn Neural Network for beginner. 

## Why Neural Network?

Neural Network is designed to solve a complicated problem, motivated by human's brain, but not really related to biological functions. A NN enables to train a model with feeding parameters with features based on input parameters, which is really powerful to solve real-world's non-linear complicated problem.

NN are involved with layers, which is the results of combination of weighted-features on the previous layer (or input values) Or some normalization functions called activation function. The activation function is helpful to avoid to make the model just combinations of linear regressions.

<img src="./images/nn.jpeg">
https://becominghuman.ai/artificial-neuron-networks-basics-introduction-to-neural-networks-3082f1dcca8c

## Why Convolutional Neural Network?

As mentioned earlier, Regular Neural Networks use vectors for hidden layers. However, if we want to do Image Recognization, what can be challenge? Yes. Image is not vector. It is two-dimensional matrix, or maybe three-dimensional matrix (including RGB color) In this case, vector doesn't represent the full images well.

In particular, the fully connective layers of a regular neural network are less time efficient and easily lead to overfitting. For example, the scales of the images in our sample are a respectable 224 x 224, which means a hidden layer needs to include 224 x 224 x 3 = 150,528 weights. If we want to use multiple layers (as expected), the number of weights would add up quickly.

Convolutional Neural Networks (CNNs) are especially powerful when we must train on multi-dimensional data, such as images. CNNs consist of locally connected layers, which use far fewer weights compared to the densly connected layers of a regular neural network. The locally connected layers efficiently prevent overfitting and allow us to easily understand the image data because they naturally handle two dimensional patterns.

<img src="./images/neural_net2.jpeg">
<img src="./images/cnn.jpeg">

(Top) Regular Neural Network (Bottom) CNN. ( Source : http://cs231n.github.io/convolutional-networks/#overview)

## 1. Data Preprocessing (Rescale images)

We will start by pre-processing the data. Keras requires us to input our images as a 4D array (aka a 4D tensor), with the shape

(n_samples, n_rows, n_columns, n_channels)

where `n_samples` corresponds to the total number of images (or samples), and `n_rows`, `n_columns`, correspond to the pixel number of rows and columns for each image, and `n_channels` corresponds to the number of RGB channels for each image, which is 3.

I have a function that takes as input the path to a color image file and converts that image into the proper 4D tensor format to feed into our CNN. It converts the image information (pixel value) into numpy arrays, and resizes each array to make them all uniformly 224 x 224 pixels.

<img src="./images/ex1.001.jpeg">

Then, when split into 3 color channels, each image is represented by a 3D array (224, 224, 3).

<img src="./images/ex1.002.jpeg">

Next, an extra dimension is added to the 3D array to allow for multiple images (samples) to be processed. Thus, the images are handled as 4D tensors. The returned tensor for a single image will always have the shape

(1, 224, 224, 3)

I have another function that takes an array of strings, with each string being the path to an image, as input to convert those images into a 4D tensor with the shape

(n_samples, 224, 224, 3)

Finally, I rescale the images by dividing every pixel in every image by 255, which changes the range of each image from 0-255 into 0-1.

## 2. Design of CNN

A CNN consists of multiple layers, which include convolutional, pooling, and some fully connected layers (also used in regular Neural Networks).

(1) Convolutional Layer

A convolutional Layer consists of locally connected nodes, meaning that the nodes are only connected to a small subset of the previous layers' nodes. To build a convolutional layer, we first select a width (column) and height (row) that defines a convolution filter. The filter is a matrix that can have its own characteristic pattern, and each convolutional layer will have the task of searching for its filters pattern in the image. To do this, we simply move the filter horizontally and vertically over the matrix of image pixels, and at each position the convolutional filter returns a numerical result that specifies whether its pattern was seen locally. The image below demonstates how a convolutional layer works. In practice, one will use many convolutional layers, each searching for its own unique pattern in the image, in order to identify complex structures.

<img src="./images/ex1.003.jpeg">

(2) Pooling Layer

Recall that a convolutional layer is a stack of feature maps where we have one feature map for each filter. More filters means a larger stack, which means that the dimensionality of our convolutional layers can get quite large. Higher dimensionality means we will need to use more parameters, which can lead to overfitting. Therefore, we need a method to reduce this dimensionality by using pooling layers within a CNN.

Generally, there are two popular choices for types of pooling layers. The first type is a max pooling layer. Max pooling layers take a stack of feature maps as input, and are constructed by finding the maximum value from a subset of pixels in the input layer. The second type of pooling layer is a global average. As the name implies, this pooling layer simply stores the average of all values in the input layer, rather than considering smaller windows. The global average pooling is a more extreme type of dimensionality reduction.

<img src="./images/ex1.004.jpeg">

### Example of a CNN Architecture

I want to show basic styles of CNN Architecture below. If you ask me what is the best way to design an optimized architecture, my answer will be "I don't know". This process is kinda of tuning hyper parameters. We cannot be sure the best solution. "No Free Lunch" However, there are a lot of CNN Architecture example people used. We can start to mimic what they do. and try to make your own architecture from that! Here is one of the examples I am suggesting. But note that the result might not be great depending on data, but it can be good starting.

<img src="./images/sample_cnn.png">

## 3. Train a model 

### (1) Loss Function

After designing the CNN model, we need to specify a loss function so that we can quantify the model accuracy. Since we are constructing a multiclass classifier, we will use categorical cross-entropy loss.

This loss function returns a numerical value that is lower if the model predicts the true label (in this case, the correct dog breed). As with other classification tasks in machine learning, we want to minimize the loss function to train our model to give us the highest accuracy possible. Let's assume that we have 10 labels data. The true labels are one-hot encoded, and each label is a vector with 10 entries. The model outputs a vector having 10 entries, where each entry corresponds to the probability of that dog breed.

<img src="./images/skier_awesome.png">

The illustration of Error loss function. The goal is to minimize the error by seeking parameters (weights) that minimize the loss function. (Source: modified from Udacity Machine Learning Nano Degree image)

For example, let's consider the image with a Welsh Springer Spaniel dog. Our model predicts that there is a Brittany in the image with a 90% propability and Welsh Springer Spaniel in the image with a 10% probability. The categorical cross-entropy loss checks the true label vector (with only Welsh Springer Spaniel selected) against the prediction vector (which has 90% chances of Brittany and with only 10% chance of Welsh Springer Spaniel), and returns a high value for the loss. The model then adjusts the weights, and if the prediction changes to favor the true label more, then the loss function decreases. Eventually, if our model is good enough, we would find at the end of our training that it correctly identifies most dog breeds.

### (2) Data Augmentation

### (3) Epoch / Batch Size

In training the model, I modify the weignts to improve the predictions. I chose to train the model for 5 epochs, and I saved the weights that correspond to the highest validation accuracy. This process took around 20 min on my laptop. Expect it to take more time if you want to train your model with a larger number of epochs.