**prerequisite**
- Nerual networks

**keras**:
- keras is higher level api
- when building nn we will use keras and depending on the flexibility that we need when creating model we will use different api of kearas. In this section we will use **sequential** and **functional** api

In [1]:
import os 
os.environ['TFF_CPP_MIN_LOG_LEVEL']='2'

## for initializing gpu we can comment out the below two lines of code
# physical_devices=tf.config.list_physical_device("GPU")
# tf.config.experimental.set_memory_growth(physical_devices[0],True)



In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist

In [3]:
(x_train,y_train),(x_test,y_test)=mnist.load_data()
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)


training dataset has 60k gray scale images of shape 28*28. and 10k images in the test set. We need to reshape each image pixels into single column in order to train. We will also normalize images into range [0 to 1] from [0-255] so that model will learn fast.

In [4]:
x_train=x_train.reshape(-1,28*28).astype("float32")/255.0 
x_test = x_test.reshape(-1, 28 * 28).astype("float32") / 255.0

we also need to convert the image vectors into tensors by following command.
x_train=tf.convert_to_tensor(x_train). but this is done internally by tensorflow. so if it's numpy arrays we don't need to be bother much. the conversion is happen automatically.

## **sequential model**:
Now, I will build neural networks using sequential api of keras. It is convenient but not very flexible. It's only allow one input map to one output. It is a major limitation. that's why we need sequential of such input output mapping.

In [5]:
##sequential model
sequential_model=keras.Sequential(
[keras.Input(shape=(28*28)), #this input is for see the summary
layers.Dense(512,activation='relu'),
 layers.Dense(256,activation='relu'),
 layers.Dense(10)]
)

In the above code, I initialized neural network model using sequential model. The process is very simple. Just passed a list of layers. Here initialized two hidden lalyers having neurons 512,256 respectively and declared relu activations for these two layers. Lastly an output layer having 10 neurons that is the number of class labels. We did't declare activation (softmax) in the output layer. we will use it inside the loss function.

We use Input to see the model summary. We can execute the summary as follows


We can also have the  model declaration in other way.

In [6]:
# model=keras.Sequential()
# model.add(keras.Input(shape=(784)))
# print(model.summary())
# model.add(layers.Dense(512,activation='relu'))
# print(model.summary())
# model.add(layers.Dense(256,activation='relu'))
# print(model.summary())
# model.add(layers.Dense(10))
# print(model.summary())

In [7]:
# # to specify layer's output
# model=keras.Model(inputs=model.inputs,outputs=[model.layers[-1].output])
# features=model.predict(x_train)
# print(feature.shape )
# #to print all outputs of each layer we can use loop
# for feature in features:
#     print(feature.shape)

Using these declaration method we have the opportunity to see model summary after each layer. It's a effective debugging tool for larger neural networks.

In [8]:
print(sequential_model.summary())
# import sys
# sys.exit()  

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 512)               401920    
                                                                 
 dense_1 (Dense)             (None, 256)               131328    
                                                                 
 dense_2 (Dense)             (None, 10)                2570      
                                                                 
Total params: 535818 (2.04 MB)
Trainable params: 535818 (2.04 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None


we can also see model summary without taking keras.Input() in model list. In that case we have to print model summary after model.fit.

Now, going to tell keras how to configure our training part of our network. For example, we are going to specify the loss function we want to use. we are doing this by using model.compile


In [9]:
sequential_model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(lr=.001),
    metrics=["accuracy"],

)



in above, 
- "Sparses" means the labels are just itegers for corresponding labels.if we remove this Sparse then we would need to have one hot encoding.
- The argument "from_logits=True" this is because we don't have a softmax activation in the model. So when we set from_logits=True it's going  to send it to a softmax first. Then it's going to map sparse categorical cross entropy loss.
- metrics keeps tracks the running acc so far.

In conclusion, model.compile specifies configuration of the network

In [10]:
sequential_model.fit(x_train,y_train,batch_size=32,epochs=5,verbose=2)

Epoch 1/5
1875/1875 - 17s - loss: 0.1827 - accuracy: 0.9444 - 17s/epoch - 9ms/step
Epoch 2/5
1875/1875 - 15s - loss: 0.0795 - accuracy: 0.9748 - 15s/epoch - 8ms/step
Epoch 3/5
1875/1875 - 17s - loss: 0.0532 - accuracy: 0.9833 - 17s/epoch - 9ms/step
Epoch 4/5
1875/1875 - 16s - loss: 0.0415 - accuracy: 0.9871 - 16s/epoch - 9ms/step
Epoch 5/5
1875/1875 - 16s - loss: 0.0324 - accuracy: 0.9895 - 16s/epoch - 9ms/step


<keras.src.callbacks.History at 0x1dfaff65f10>

we use model.fit to specify more concrete training of thee network.
- verbose=2 so that prints after each epoch. 

Now after training we want to evaluate our model.

In [11]:
sequential_model.evaluate(x_test,y_test,batch_size=32,verbose=2)

313/313 - 1s - loss: 0.0869 - accuracy: 0.9785 - 810ms/epoch - 3ms/step


[0.08692192286252975, 0.9785000085830688]

here we don't need to set epoch. Because we want to train for 1 epoch

In summary, The sequential api is convenient to use. In situation where it can't be used we can use functional api.

## Functional API
- more flexible
- can handle multiple input and outputs

In [12]:
# functional model
inputs=keras.Input(shape=(784))
x=layers.Dense(512,activation='relu',name="first_layer")(inputs)
x=layers.Dense(256,activation='relu',name="second_layer")(x)
outputs=layers.Dense(10,activation='softmax',name="output_layer")(x)

functional_model=keras.Model(inputs=inputs,outputs=outputs)

- the model will take inputs and outputs that we declared as inputs and outputs of the network and will build the model.
- unlike before, we are now using softmax in the output layer. so we need to change configuration of the loss function.

In [13]:
print(functional_model.summary())

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 784)]             0         
                                                                 
 first_layer (Dense)         (None, 512)               401920    
                                                                 
 second_layer (Dense)        (None, 256)               131328    
                                                                 
 output_layer (Dense)        (None, 10)                2570      
                                                                 
Total params: 535818 (2.04 MB)
Trainable params: 535818 (2.04 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None


In [14]:
functional_model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=False), #by default false
    optimizer=keras.optimizers.Adam(lr=.001),
    metrics=["accuracy"],
)



In [15]:
# fit the model
functional_model.fit(x_train,y_train,batch_size=32,epochs=5,verbose=2)

Epoch 1/5
1875/1875 - 17s - loss: 0.1853 - accuracy: 0.9440 - 17s/epoch - 9ms/step
Epoch 2/5
1875/1875 - 16s - loss: 0.0791 - accuracy: 0.9752 - 16s/epoch - 8ms/step
Epoch 3/5
1875/1875 - 19s - loss: 0.0560 - accuracy: 0.9820 - 19s/epoch - 10ms/step
Epoch 4/5
1875/1875 - 17s - loss: 0.0413 - accuracy: 0.9869 - 17s/epoch - 9ms/step
Epoch 5/5
1875/1875 - 18s - loss: 0.0326 - accuracy: 0.9892 - 18s/epoch - 9ms/step


<keras.src.callbacks.History at 0x1dfb1751010>

In [16]:
# evaluation
functional_model.evaluate(x_test,y_test,batch_size=32,verbose=2)

313/313 - 1s - loss: 0.0708 - accuracy: 0.9797 - 789ms/epoch - 3ms/step


[0.07081051915884018, 0.9797000288963318]

### SUGGESTION FOR **FUNCTIONAL AND SEQUENTIAL API**

- using different layer size(increasing particularly), increasing training time it is possible to get over 98.2% on the test set!
-  try different optimization other than Adam. e.g. Gradient Descent with momentum, Adagrad, and RMSprop
- Check out with and without normalization and see the differences