<a href="https://colab.research.google.com/github/soohyunme/TensorFlow_Tutorial/blob/main/Code/02_Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [34]:
import os
from threading import active_count

from tensorflow.python.keras.layers.core import Activation
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist


# Load Data

In [35]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()



# Reshape / Normalize

In [36]:
# # Reshape x
# x_train = x_train.reshape(-1,28*28).astype("float32") / 255.0
# x_test = x_test.reshape(-1,28*28).astype("float32") / 255.0

# Non normalize
x_train = x_train.reshape(-1,28*28).astype("float32") 
x_test = x_test.reshape(-1,28*28).astype("float32") 


# Sequential API (Very convenient, not very flexible)

In [37]:
# model = keras.Sequential(
#     [
#         keras.Input(shape=(28*28)),
#         layers.Dense(512, activation='relu'),
#         layers.Dense(256, activation='relu'),
#         layers.Dense(10), # loss(from_logits=True)
#     ]
# )


In [38]:
# model = keras.Sequential()
# model.add(keras.Input(shape=(784)))
# model.add(layers.Dense(512, activation='relu'))
# model.add(layers.Dense(256, activation='relu', name='my_layer'))
# model.add(layers.Dense(10)) # loss(from_logits=True) 


In [39]:
# model = keras.Model(inputs=model.inputs, 
#                     outputs=[layer.output for layer in model.layers]) 
#                     # outputs=[model.get_layer('my_layer').output])
#                     # outputs=[model.layers[-2].output])


# Functional API (A bit more flexible)


In [40]:
inputs = keras.Input(shape=(784))
x = layers.Dense(512, activation='relu', name='first_layer')(inputs)
x = layers.Dense(256, activation='relu', name='second_layer')(x)
x = layers.Dense(128, activation='relu', name='third_layer')(x)
outputs = layers.Dense(10, activation='softmax')(x) # loss(from_logits=False)
model = keras.Model(inputs=inputs, outputs=outputs)


In [41]:
model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=False),
    optimizer=keras.optimizers.Adam(lr=0.001),
    # optimizer = keras.optimizers.SGD(lr=0.001), # Try SGD
    # optimizer = keras.optimizers.Adagrad(lr=0.001), # Try Adagrad
    # optimizer = keras.optimizers.RMSprop(lr=0.001), # Try RMSprop
    metrics=["accuracy"],
)


  "The `lr` argument is deprecated, use `learning_rate` instead.")


In [42]:
with tf.device('/device:GPU:0'):
  model.fit(x_train, y_train, batch_size=64, epochs=15, verbose=2)
  model.evaluate(x_test, y_test, batch_size=64, verbose=2)


Epoch 1/15
938/938 - 2s - loss: 0.8509 - accuracy: 0.9037
Epoch 2/15
938/938 - 1s - loss: 0.1675 - accuracy: 0.9543
Epoch 3/15
938/938 - 1s - loss: 0.1315 - accuracy: 0.9623
Epoch 4/15
938/938 - 1s - loss: 0.1083 - accuracy: 0.9686
Epoch 5/15
938/938 - 1s - loss: 0.0954 - accuracy: 0.9728
Epoch 6/15
938/938 - 1s - loss: 0.0860 - accuracy: 0.9758
Epoch 7/15
938/938 - 1s - loss: 0.0832 - accuracy: 0.9772
Epoch 8/15
938/938 - 1s - loss: 0.0726 - accuracy: 0.9796
Epoch 9/15
938/938 - 1s - loss: 0.0661 - accuracy: 0.9804
Epoch 10/15
938/938 - 1s - loss: 0.0581 - accuracy: 0.9835
Epoch 11/15
938/938 - 1s - loss: 0.0543 - accuracy: 0.9847
Epoch 12/15
938/938 - 1s - loss: 0.0489 - accuracy: 0.9872
Epoch 13/15
938/938 - 1s - loss: 0.0450 - accuracy: 0.9873
Epoch 14/15
938/938 - 1s - loss: 0.0415 - accuracy: 0.9887
Epoch 15/15
938/938 - 1s - loss: 0.0414 - accuracy: 0.9890
157/157 - 0s - loss: 0.1223 - accuracy: 0.9746


## **SUGGESTIONS:**


1. **Try and see what accuracy you can get by increasing the model, training for longer, etcetera.  
You should be able to get over 98.2% on the test set!**
> Baseline = 0.9779  
> Add layer(1 layer) and more epochs(5->10) 0.9813  
> Bigger batch size 32 -> 64 and more epochs(10->15) 0.9828

2. **Try using different optimizers than Adam, 
for example Gradient Descent with Momentum, Adagrad, and RMSprop**
> Use SGD in second model -> 0.9225  
> Use Adagrad in second model -> 0.9401  
> Use RMSprop in second model -> 0.9799

3. **Is there any difference if you remove the normalization of the data?**
> normalize -> 0.9828  
> Non normalize -> 0.9809