<a href="https://colab.research.google.com/github/kjmobile/lb/blob/main/14_Neural_Network_2_Q.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Network 2: Deep Learning

In [34]:
# Import library and set seed
import tensorflow as tf

tf.keras.utils.set_random_seed(42)
tf.config.experimental.enable_op_determinism()

## Two layers

In [35]:
from tensorflow import keras

(train_input, train_target), (test_input, test_target) = keras.datasets.fashion_mnist.load_data()

### Preprocessing : Normalization and further dividing validation set

In [36]:
from sklearn.model_selection import train_test_split

train_scaled = train_input / 255.0
train_scaled = train_scaled.reshape(-1, 28*28) #This reshape() can be replaced with the keras.flatten() layer as shown below

train_scaled, val_scaled, train_target, val_target = train_test_split(
    train_scaled, train_target, test_size=0.2, random_state=42)

## Design a Deep Neural Network

In [37]:
dense1 = keras.layers.Dense(100, activation='sigmoid', input_shape=(784,))
dense2 = keras.layers.Dense(10, activation='softmax')

In [38]:
model = keras.Sequential([dense1, dense2])

In [50]:
model.summary()

# We have two dense layers (aka, "hidden" layers), but you may experiment it by adding more layers
# The number of units('nodes' or 'neurons') of the first dense layer was set to 100.
# The rule of thumb is that the number should be at least larger than the size of the output layer
# The the number of units for the second layer is set to 10, softmax activation, because it is 10 -item multicategory classification task
# param # 78500 <= 785(input)*100 (units) + 100(bias terms)
# Then why 1010?
# Why output shape shows the sample count is None, why?: The number is left flexible in the model, since the fit() will use the 'mini-batch SGD' by default as optimizer and one batch size is set to 32.

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_14 (Dense)            (None, 100)               78500     
                                                                 
 dense_15 (Dense)            (None, 10)                1010      
                                                                 
Total params: 79510 (310.59 KB)
Trainable params: 79510 (310.59 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


## c.f., Alternative Syntax equivalent to above

In [44]:
model_1 = keras.Sequential([
    keras.layers.Dense(100, activation='sigmoid', input_shape=(784,), name='hidden'),
    keras.layers.Dense(10, activation='softmax', name='output')
], name='Fashion MNIST Model')

model_1.summary()

Model: "Fashion MNIST Model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 hidden (Dense)              (None, 100)               78500     
                                                                 
 output (Dense)              (None, 10)                1010      
                                                                 
Total params: 79510 (310.59 KB)
Trainable params: 79510 (310.59 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [45]:
model = keras.Sequential()
model.add(keras.layers.Dense(100, activation='sigmoid', input_shape=(784,)))
model.add(keras.layers.Dense(10, activation='softmax'))


In [46]:
model.summary()

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_14 (Dense)            (None, 100)               78500     
                                                                 
 dense_15 (Dense)            (None, 10)                1010      
                                                                 
Total params: 79510 (310.59 KB)
Trainable params: 79510 (310.59 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [47]:
import numpy as np
np.unique(train_target, return_counts=True)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8),
 array([4798, 4781, 4795, 4816, 4798, 4789, 4782, 4841, 4803, 4797]))

In [48]:
model.compile(loss='sparse_categorical_crossentropy', metrics='accuracy')
model.fit(train_scaled, train_target, epochs=10)
#loss function is set to 'sparse_categorical_crossentropy' because target value is set by integer rather than one-hot encoding.
#i.e., Keras will transform the target value into 'sparse' format before applying categorical crossentropy

#How many times of backpropagation occur in this model setting?

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7dfe5f170be0>

In [13]:
model.evaluate(val_scaled, val_target)



[0.33367130160331726, 0.8821666836738586]

## Using ReLU, instead of Sigmoid, for as an activation function

In [81]:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28, 28)))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))

# ReLU is favored in image classification models over sigmoid due to its computational efficiency,
# requiring simpler calculations, and its mitigation of the vanishing gradient problem,
# ensuring gradients remain large and effective during backpropagation.

# In here we used keras's 'flatten' layer to reshape the input data without using train_scaled.reshape(-1, 28*28) as shown above.
# But since the flatten layer, only reshaped the input and did not contribute to learning, the model is still a neural net with the depth 2, not 3.


In [52]:
model.summary()

Model: "sequential_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_2 (Flatten)         (None, 784)               0         
                                                                 
 dense_16 (Dense)            (None, 100)               78500     
                                                                 
 dense_17 (Dense)            (None, 10)                1010      
                                                                 
Total params: 79510 (310.59 KB)
Trainable params: 79510 (310.59 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [53]:
(train_input, train_target), (test_input, test_target) = keras.datasets.fashion_mnist.load_data()

train_scaled = train_input / 255.0

train_scaled, val_scaled, train_target, val_target = train_test_split(
    train_scaled, train_target, test_size=0.2, random_state=42)

In [54]:
model.compile(loss='sparse_categorical_crossentropy', metrics='accuracy')

model.fit(train_scaled, train_target, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7dfe4b1793f0>

In [18]:
model.evaluate(val_scaled, val_target)



[0.3884997069835663, 0.8804166913032532]

## Optimizers: see the slides to compare them

In [71]:
model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics='accuracy')
# optimizer ='sgd' is short hand for below, which is exactly same as this.

In [72]:
sgd = keras.optimizers.SGD()
model.compile(optimizer=sgd, loss='sparse_categorical_crossentropy', metrics='accuracy')

In [74]:
sgd = keras.optimizers.SGD(learning_rate=0.1) # But if we want to change the defualt learning rate,0.01, we need to explicitly intantiate an sgd object passing a relavant argument.

In [80]:
sgd = keras.optimizers.SGD(momentum=0.9, nesterov=True)
# modifying the SGD optimizer to momentum optimizer and then to nesterov momentum optimizer.
# in most cases,nesterov improves the performance the default sgd.

In [77]:
adagrad = keras.optimizers.Adagrad()
model.compile(optimizer=adagrad, loss='sparse_categorical_crossentropy', metrics='accuracy')

In [78]:
rmsprop = keras.optimizers.RMSprop()
model.compile(optimizer=rmsprop, loss='sparse_categorical_crossentropy', metrics='accuracy')

In [61]:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28, 28)))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))

In [64]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics='accuracy')

model.fit(train_scaled, train_target, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7dfe5ef4b9a0>

In [65]:
model.evaluate(val_scaled, val_target)



[0.33631807565689087, 0.8829166889190674]

#### In this model, what are the hyperparmaters that human researcher must determine?
- Number of Hidden Layers
- Number of neurons (units) in each hidden layer
- Choice of Activation Function
- Number of mini-batch size (by default it is set to 32 for the defaulted mini-batch SGD optimizer)