# Neural Network Fashion MNIST

Fashion MNIST:https://becominghuman.ai/step-by-step-neural-network-tutorial-for-beginner-cc71a04eedeb
CNN: https://thedatamage.com/convolutional-neural-network-explained/

In [12]:
import pandas as pd

## Train-Validation-Test split

This step is the train-validation-test split. This splits your data into three portions.

The training data is self-explanatory. Generally, a larger amount of training data quantity will make your Neural Network better understand your data distribution. More data will make your trained network do better. Always put the priority on this portion of the split.

Next is the validation data. It is the portion of data which will be evaluated against during the training process. This data is used to estimate the prediction error.

Finally, the test data. This is the data used to evaluate the neural network model. If the network performs well on the test data, you can bring the network to the production level.

If your data is not that many, maybe in thousands or tens of thousands, then use 70–10–20 as the split strategy. 70% of the data are split into training, 10% into validation, and 20% into the test set.

In [13]:
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

In [14]:
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

For every neural network project you will do in the future, these rules always apply.

1. Start simple. Use a single layer perceptron and evaluate the result. If it is good, then proceed to deployment.

2. If the previous step is not good enough, try to get your network wider and/or deeper. Add several neurons in your single-layer perceptron. Or, add one layer into the existing network. Evaluate and, if it is good, proceed to deployment. If not, then iterate by adding more neurons or layers.

3. When, after adding several more layers into your network, but the results are still not good, then maybe you need to change your network architecture. Use Convolutional Neural Network (CNN) for images or Recurring Neural Network for time-series and texts.

## Neural Network building code

## 1. Single-layer perceptron
Let’s start our neural network with a perceptron.

In [15]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://miro.medium.com/max/1920/1*K0D3JOZJUTxX4Q-9CpiALw.jpeg")

Our image is 28x28, and therefore is two-dimensional. Because of our perceptron only able to read one-dimensional data, let’s flatten them.

In [16]:
x_train = x_train.reshape(x_train.shape[0], -1) / 255.0
x_test = x_test.reshape(x_test.shape[0], -1) / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

You will see that the size changed into 784 because of the flatten. Print y_train.shape and y_test.shape to see your data size.

Your training data x_train is transformed from 60,000 x 28 x 28 to 60,000 x 784. Your testing data x_test follows suit, from 10,000 x 28 x 28 to 10,000 x 784.

For the hidden layer, let’s set an arbitrary number of neurons. The number should be simple and small enough to follow our step number 1. Let’s choose 10 neurons.

While for the output layer, because we have ten categories to categorize, we need to set it to 10 output neurons. For each image, each of these neurons will be filled with 1 if it is the correct category and 0 if not.

In an example, if you have a Sandal image, then the output layer should have something like this [0 0 0 0 0 1 0 0 0 0]. The index for Sandal category (5) should be 1, the other should be 0. Remember, the array is zero-indexed. The sixth item should be index number 5.

The output layer is called One-Hot Vector, when it is hot then the value is 1, the others should be all zeros.

In [17]:
model = Sequential()
model.add(Dense(10, input_dim=784, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

The Sequential model is the easiest model Keras has. The first line of model.add method is adding your hidden layer with 10 cells, coming from 784 input cells.

The second add method is adding your output layer to the network. This has 10 cells as I elaborated before.

The Relu and Softmax activation options are non-linear. Being able to use non-linear data makes Neural Network particularly useful. Generally, neural networks can map any data distribution at any level of complexity.
You don’t have to know what Relu and Softmax are. These are too complex for a beginner. You just need to follow these tips:

1. Use Relu whenever possible, on every hidden layer.
2. Use Softmax on output layers with more than two categories to be predicted.
3. Use Sigmoid on an output layer with two categories.

After creating your model, call compile method to finish your model. It usually takes three parameters. Always use categorical_crossentropy for multi-categories, and binary_crossentropy for two categories. Use adam or rmsprop as the optimizer since both of them are pretty good. And you need accuracy as the metric to check your network performance.

### Train

In [18]:
model.fit(x_train, y_train, epochs=10, validation_split=0.1)

Train on 54000 samples, validate on 6000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x64c62c150>

Use 10% of the training data as the validation data, hence validation_split was set to 0.1.

epochs is the number of training loops we will do. 

One epoch will expose all of our training data once to the network. More epochs means the network will know better about our data.

You get 85% accuracy on validation data.
You don't see the training data accuracy, because it should have 100% accuracy. What matters is the accuracy of the validation data. Since it has not seen any of the validation data, we can see how well it can generalize.
Let’s see on to the testing data:


In [19]:
_, test_acc = model.evaluate(x_test, y_test)
print(test_acc)

0.8468


And you will get around 84% accuracy on test data. Good enough for this simple architecture.

### Make network wider

In [22]:
model2=Sequential()
model2.add(Dense(50,input_dim=784,activation='relu'))
model2.add(Dense(10,activation='softmax'))
model2.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model2.fit(x_train,y_train,epochs=10,validation_split=0.1)

Train on 54000 samples, validate on 6000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x653133190>

Accuracy is 0.9009, pretty good. A bigger network can imprive.

In [25]:
_,test_acc=model2.evaluate(x_test,y_test)
print(test_acc)

0.877


Slightly better improvment. Let's go deeper

### Create a deeper network

In [26]:
model3=Sequential()
model3.add(Dense(50,input_dim=784,activation='relu'))
model3.add(Dense(50,activation='relu'))
model3.add(Dense(10,activation='softmax'))
model3.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model3.fit(x_train,y_train,epochs=10,validation_split=0.1)

Train on 54000 samples, validate on 6000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x652d869d0>

We added another hidden layer with 50 cells.

Accuracy is 0.9 which is better

In [28]:
_,test_acc=model3.evaluate(x_test,y_test)



Accuracy is 0,88 which is not that big. Maybe the approach was not the right by using peceptron, let's try another method

## 2. Convolutional Neural Network

In [29]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://miro.medium.com/max/790/1*eAJeVQry42C4DmArr95NUQ.gif")

A convolutional neural network (CNN) is a neural network that can “see ” a subset of our data. It can detect a pattern in images better than perceptron.

In [30]:
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten
import numpy as np
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train[:,:,:,np.newaxis] / 255.0
x_test = x_test[:,:,:,np.newaxis] / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

Let’s import the necessary methods and reshape our training data. You can’t flatten it because CNN reads our images as is.

If you check your x_train, you will have 60,000 x 28 x 28 x 1 data.

Why x 1?

The data CNN needs to read must be like this: total_data x width x height x channels.
Height and width are self-explanatory. Channels are like Red or Green or Blue in RGB images. In RGB, because there are three channels, we need to make the data x 3. But because we work with grayscale images, every value on Red, Green, or Blue channel is the same and we reduce to one channel.

Let’s build the architecture:

In [31]:
model4 = Sequential()
model4.add(Conv2D(filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(28,28, 1))) 
model4.add(MaxPooling2D(pool_size=2))
model4.add(Flatten())
model4.add(Dense(10, activation='softmax'))
model4.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

The code is a little bit different. There are Conv2D, MaxPooling2D, and Flatten.
These guys are the three most common layers to use in CNN.

In [32]:
model4.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 28, 28, 64)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 12544)             0         
_________________________________________________________________
dense_9 (Dense)              (None, 10)                125450    
Total params: 125,770
Trainable params: 125,770
Non-trainable params: 0
_________________________________________________________________


conv2d changes your 28x28x1 image to 28x28x64. Just imagine this as 64 hidden layer cells.

MaxPooling2D reduces the width and height so that you will not need to compute all the cells. It reduces the size to 14x14x64.

Finally, flatten just flattens out the output of MaxPooling into a hidden layer of 12544 cells.

In [33]:
model4.fit(x_train, y_train, epochs=10, validation_split=0.1)

Train on 54000 samples, validate on 6000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x652dd2850>

In [34]:
_,test_acc=model4.evaluate(x_test,y_test)



Accuracy is more than 90%. A single layer of CNN can do better than perceptron same with test 