# **Implementing Feedforward neural networks with Keras and TensorFlow**
##**a. Import the necessary packages**
##**b. Load the training and testing data (MNIST/CIFAR10)**
##**c. Define the network architecture using Keras**
##**d. Train the model using SGD**
##**e. Evaluate the network**
##**f. Plot the training loss and accuracy**

##**a. Importing Libraries**

In [None]:
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import random

##**b. Loading MNIST Dataset**

In [None]:
mnist = tf.keras.datasets.mnist

In [None]:
(x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = mnist.load_data()

In [None]:
x_train_mnist.shape

(60000, 28, 28)

In [None]:
y_train_mnist.shape

(60000,)

##**c. Define the network architecture using Keras**

In [None]:
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 128)               100480    
                                                                 
 dense_1 (Dense)             (None, 10)                1290      
                                                                 
Total params: 101770 (397.54 KB)
Trainable params: 101770 (397.54 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


## **d. Train the model using SGD**

In [None]:
history = model.fit(x_train_mnist, y_train_mnist, epochs=10, validation_data=(x_test_mnist, y_test_mnist))

RuntimeError: ignored

## **e. Evaluate the model**

In [None]:
test_loss, test_accuracy = model.evaluate(x_test_mnist, y_test_mnist)

print(f'Test Loss: {test_loss:.4f}')
print(f'Test Accuracy: {test_accuracy:.4f}')


In [None]:
n = 20
plt.imshow(x_test_mnist[n])
predicted_value = model.predict(x_test_mnist)
print("Predicted Number: ", np.argmax(predicted_value[n]))

##**f. Plot the training loss and accuracy**

In [None]:
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(['Train', 'Validation'], loc='upper right')

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(['Train', 'Validation'], loc='lower right')

plt.tight_layout()
plt.show()

In [None]:
# This section imports necessary libraries:
# numpy for numerical operations.
# pandas for data manipulation and analysis.
# matplotlib and seaborn for data visualization.
# tensorflow for building and training deep learning models.
# Sequential, Dense, Dropout, and Flatten are specific components from TensorFlow's Keras API that will be used to build a neural network.

# Sequential is a linear stack of layers in a neural network model. It is a simple and straightforward way to build a neural network model, layer by layer, where each layer has exactly one input tensor and one output tensor.

# Dense is a fully connected layer in a neural network. In a dense layer, each neuron (or node) is connected to every neuron in the previous layer, and each connection has a weight associated with it. The output from the dense layer is calculated by applying an activation function to the weighted sum of the inputs.

# Dropout is a regularization technique used to prevent overfitting in neural networks. During training, a fraction of randomly selected neurons are "dropped out," meaning their outputs are set to zero

#  is a layer used to convert multidimensional data into a one-dimensional array. In the context of image data,

In [None]:
# This line defines the MNIST dataset using TensorFlow's Keras datasets. MNIST is a dataset of 28x28 images of handwritten digits (0 through 9).

In [None]:
# This line loads the MNIST dataset into training and testing sets, with images (x_train_mnist and x_test_mnist) and corresponding labels (y_train_mnist and y_test_mnist).

In [None]:
# These lines define a simple neural network model using Keras:
# Sequential() initializes an empty model.
# Flatten(input_shape=(28, 28)) flattens the 28x28 input images into a 1D array.
# Dense(128, activation='relu') adds a fully connected layer with 128 neurons and ReLU activation.
# Dropout(0.2) adds a dropout layer with a dropout rate of 0.2 to prevent overfitting.
# Dense(10, activation='softmax') adds the output layer with 10 neurons (for digits 0-9) using softmax activation.
# compile configures the model for training with the Adam optimizer, sparse categorical crossentropy loss, and accuracy as the metric.
# summary() prints a summary of the model architecture.


In [None]:
# This line trains the model on the MNIST training data for 10 epochs, using the validation data for monitoring performance during training. The training history is stored in the history variable.

In [None]:
# ReLU, or Rectified Linear Unit, is a widely used activation function in neural networks. It introduces non-linearity by outputting the input for positive values and zero for negative values.

# Softmax is an activation function often used in the output layer of a neural network for multi-class classification problems. It transforms raw output scores into probabilities, ensuring that the sum of the probabilities for all classes is equal to one. Softmax is especially useful when the network needs to make mutually exclusive predictions, assigning high probabilities to the most likely class.

In [None]:
# Adam, short for Adaptive Moment Estimation, is an optimization algorithm commonly used in training deep neural networks. It combines the benefits of both momentum and RMSprop methods, maintaining moving averages of the gradients and squared gradients. Adam adapts the learning rates for each parameter individually, providing efficient and effective convergence by adjusting the step size based on the historical information of gradients.

# Sparse Categorical Crossentropy is a loss function employed in classification tasks where each input belongs to one class out of multiple classes.