<a href="https://colab.research.google.com/github/mostafa-ja/Machine-Learning-fall2023/blob/main/Untitled4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Code Questions

# Steps to Build an Image Classification Model using CNN

Before we train a CNN model, let’s build a basic, Fully Connected Neural Network for the dataset. The basic steps to build an image classification model using a neural network are:

1. Flatten the input image dimensions to 1D (width pixels x height pixels)
2. Normalize the image pixel values (divide by 255)
3. One-Hot Encode the categorical column
4. Build a model architecture (Sequential) with Dense layers(Fully connected layers)
5. Train the model and make predictions

Here’s how you can build a neural network model for MNIST. I have used relu and softmax as the activation function and adam optimizer, with accuracy being the evaluation metrics. The code contains all the steps from data loading to preprocessing to fitting the model. I have commented on the relevant parts of the code for better understanding:

In [3]:
# keras imports for the dataset and building our neural network
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Conv2D, MaxPool2D
from keras.utils import to_categorical
#from keras.utils import np_utils


In [4]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Flattening the images from the 28x28 pixels to 1D 787 pixels
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# normalizing the data to help with the training
X_train /= 255.
X_test /= 255.

# one-hot encoding using keras' numpy-related utilities
n_classes = 10
print("Shape before one-hot encoding: ", y_train.shape)
Y_train = to_categorical(y_train, n_classes)
Y_test = to_categorical(y_test, n_classes)
print("Shape after one-hot encoding: ", Y_train.shape)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Shape before one-hot encoding:  (60000,)
Shape after one-hot encoding:  (60000, 10)


In [8]:
# building a linear stack of layers with the sequential model
model = Sequential()

# hidden layer

model.add(Dense(100, input_shape=(784,), activation='relu'))

# output layer

model.add(Dense(10, activation='softmax'))


# looking at the model summary
model.summary()
# compiling the sequential model
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')


# training the model for 10 epochs (use model.fit function)
model.fit(X_train, Y_train, batch_size=256, epochs=20, validation_data=(X_test, Y_test))


Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_2 (Dense)             (None, 100)               78500     
                                                                 
 dense_3 (Dense)             (None, 10)                1010      
                                                                 
Total params: 79510 (310.59 KB)
Trainable params: 79510 (310.59 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.src.callbacks.History at 0x7c1615c0b550>

After running the above code, you’d realized that we are getting a good validation accuracy of around 97% easily.

One major advantage of using ConvNets over NNs is that you do not need to flatten the input images to 1D as they are capable of working with image data in 2D. This helps in retaining the “spatial” properties of images.

# Full Code for the CNN Model


In [9]:
# keras imports for the dataset and building our neural network
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Conv2D, MaxPool2D, Flatten
#from keras.utils import np_utils

# to calculate accuracy
from sklearn.metrics import accuracy_score


In [10]:
# loading the dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# building the input vector from the 28x28 pixels
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# normalizing the data to help with the training
X_train /= 255.
X_test /= 255.

# one-hot encoding using keras' numpy-related utilities
n_classes = 10
print("Shape before one-hot encoding: ", y_train.shape)
Y_train = to_categorical(y_train, n_classes)
Y_test = to_categorical(y_test, n_classes)
print("Shape after one-hot encoding: ", Y_train.shape)



Shape before one-hot encoding:  (60000,)
Shape after one-hot encoding:  (60000, 10)


In [16]:
# building a linear stack of layers with the sequential model
model = Sequential()

# convolutional layer

model.add(Conv2D(32, kernel_size=(3,3), strides=(1,1), padding='valid', activation='relu', input_shape=(28,28,1)))


# MaxPool layer

model.add(MaxPool2D(pool_size=(2,2)))

# flatten output of conv

model.add(Flatten())

# hidden layer

model.add(Dense(100, activation='relu'))

# output layer

model.add(Dense(10, activation='softmax'))


# compiling the sequential model
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

# training the model for 10 epochs
model.fit(X_train, Y_train, batch_size=256, epochs=10, validation_data=(X_test, Y_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7c161d35de70>

In [17]:
model.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_3 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_3 (MaxPoolin  (None, 13, 13, 32)        0         
 g2D)                                                            
                                                                 
 flatten_3 (Flatten)         (None, 5408)              0         
                                                                 
 dense_10 (Dense)            (None, 100)               540900    
                                                                 
 dense_11 (Dense)            (None, 10)                1010      
                                                                 
Total params: 542230 (2.07 MB)
Trainable params: 542230 (2.07 MB)
Non-trainable params: 0 (0.00 Byte)
__________________

Even though our max validation accuracy by using a simple neural network model was around 97%, the CNN model is able to get 98%+ with just a single convolution layer! :)

# Explanatory Questions




## Q1. explain about each activation Functions used in Neural Networks and compare them:

a.Sigmoid

b.Tanh

c.Relu



### a. Sigmoid Activation Function:

The sigmoid function, also known as the logistic function, squashes its input to a range between 0 and 1. The mathematical form of the sigmoid function is given by:

sigmoid(x) = 1 / (1 + e^(-x))

**Characteristics:**
- Output range: (0, 1)
- Smooth, differentiable, and monotonic.
- Suitable for binary classification problems, as it can be interpreted as a probability.
- Prone to vanishing gradient problem, especially in deep networks, which can slow down learning.

### b. Tanh Activation Function:

The hyperbolic tangent (tanh) function is similar to the sigmoid but maps its input to a range between -1 and 1. The mathematical form of the tanh function is given by:

tanh(x) = (e^(x) - e^(-x)) / (e^(x) + e^(-x))

**Characteristics:**
- Output range: (-1, 1)
- Zero-centered, which helps mitigate the vanishing gradient problem compared to sigmoid.
- Suffers from vanishing gradient problem for deep networks, though less severe than sigmoid.
- Commonly used in hidden layers of neural networks.

### c. ReLU (Rectified Linear Unit) Activation Function:

ReLU is a popular activation function that introduces non-linearity by outputting the input for positive values and zero for negative values. The mathematical form of the ReLU function is given by:

ReLU(x) = max(0, x)

**Characteristics:**
- Simple and computationally efficient.
- Addresses the vanishing gradient problem for positive values.
- Prone to the "dying ReLU" problem, where neurons can become inactive during training and stop learning.
- Commonly used in hidden layers but may not be suitable for all types of networks.

### Comparison:

1. **Output Range:**
   - Sigmoid: (0, 1)
   - Tanh: (-1, 1)
   - ReLU: [0, +∞)

2. **Vanishing Gradient:**
   - Sigmoid and tanh are prone to the vanishing gradient problem, especially in deep networks.
   - ReLU helps mitigate the vanishing gradient problem for positive values.

3. **Zero-Centered:**
   - Sigmoid and ReLU are not zero-centered.
   - Tanh is zero-centered, which can help with optimization.

4. **Common Usage:**
   - Sigmoid: Typically used in the output layer for binary classification.
   - Tanh: Commonly used in hidden layers.
   - ReLU: Popular in hidden layers.


## Q2. explain about Epoch, Batch, and Iteration in Neural Networks

**Epoch**: One complete pass through the entire training dataset.

**Batch**: A subset of the training dataset, used for updating the model parameters.

**Iteration**: One update of the model's weights, typically after processing one batch.

## Q3.  What is the difference between a convolutional neural network and a fully connected neural network?



1. **Local Connectivity vs. Global Connectivity:**
   - **CNN:** CNNs are designed to exploit the spatial locality of patterns in input data. They use convolutional layers with filters (kernels) that slide over the input to capture local features. This allows CNNs to learn hierarchical representations of spatial patterns, making them well-suited for tasks like image recognition.
   - **FCN:** In contrast, fully connected networks connect every neuron in one layer to every neuron in the next layer. This global connectivity is effective for learning relationships between different features, but it doesn't explicitly capture spatial structure in the input data.

2. **Parameter Sharing:**
   - **CNN:** CNNs use parameter sharing through the use of filters. The same filter is applied across different spatial locations, allowing the network to learn shared patterns. This significantly reduces the number of parameters compared to a fully connected network, making CNNs computationally more efficient.
   - **FCN:** In fully connected networks, each connection between neurons has its own set of parameters. This can result in a large number of parameters, especially when dealing with high-dimensional input data.

3. **Translation Invariance:**
   - **CNN:** CNNs inherently exhibit translation invariance because of the local connectivity and parameter sharing. This means that the network can recognize patterns regardless of their exact spatial position in the input.
   - **FCN:** Fully connected networks lack translation invariance as they process the entire input globally. They may struggle to recognize patterns in different spatial locations without explicitly learning variations.

4. **Input Size:**
   - **CNN:** CNNs are well-suited for grid-like data, such as images, where spatial relationships are crucial. They can handle inputs of varying sizes, thanks to their parameter sharing and pooling layers.
   - **FCN:** Fully connected networks are more flexible with input sizes but might struggle with high-dimensional grid-like data due to the vast number of parameters.

5. **Use Cases:**
   - **CNN:** Primarily used for computer vision tasks, such as image classification, object detection, and segmentation, where spatial hierarchies and local patterns are important.
   - **FCN:** More general-purpose and used in a variety of tasks, including classification and regression problems. They are not specialized for grid-like data and may require feature engineering to handle structured inputs effectively.


## Q4.  What is the role of activation functions in neural networks?

Activation functions play a crucial role in neural networks by introducing non-linearities to the model. The primary purpose of activation functions is to enable neural networks to learn complex patterns and relationships in the data. Without activation functions, a neural network would essentially be a linear model, and stacking multiple layers of linear transformations would not increase the model's capacity to capture non-linearities.

## Q5.  What is a dropout layer, and how is it used in neural networks?

- With dropout, certain nodes are set to the value zero in a training run, i.e. removed from the network. Thus, they have no influence on the prediction and also in the backpropagation. Thus, a new, slightly modified network architecture is built in each run and the network learns to produce good predictions without certain inputs.

- During Training: At each training iteration, randomly selected neurons are "dropped out" or deactivated. This means their outputs are set to zero. The random dropout is typically applied independently to each neuron with a specified probability, often referred to as the dropout rate .

- During Testing/Prediction: In the testing phase, all neurons are used, but their outputs are scaled by the dropout rate to account for the fact that during training, only a fraction of neurons was active. This scaling is done to ensure that the expected total input to a neuron remains approximately the same during both training and testing.