**kavaskar**   
**2347230**

# Lab 3

### 1. **Data Preprocessing:**

In [2]:
import numpy as np
import tensorflow as tf
from tensorflow.keras import datasets, utils, layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()

# Normalize pixel values to the range [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# One-hot encode class labels
y_train = utils.to_categorical(y_train, 10)
y_test = utils.to_categorical(y_test, 10)

# Data Augmentation (optional)
datagen = ImageDataGenerator(
    horizontal_flip=True,
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1
)
datagen.fit(x_train)


### 2. **Network Architecture Design:**

In [3]:
# Define the feedforward neural network model
model = models.Sequential()

# Input Layer (32x32x3 for CIFAR-10 images)
model.add(layers.InputLayer(input_shape=(32, 32, 3)))

# Hidden Layers
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))

model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))

model.add(layers.Flatten())

# Dense layer before output
model.add(layers.Dense(128, activation='relu'))

# Output Layer with Softmax activation
model.add(layers.Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Summary of the model
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 32, 32, 32)        896       
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        9248      
                                                                 
 max_pooling2d (MaxPooling2  (None, 16, 16, 32)        0         
 D)                                                              
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 64)        18496     
                                                                 
 conv2d_3 (Conv2D)           (None, 16, 16, 64)        36928     
                                                                 
 max_pooling2d_1 (MaxPoolin  (None, 8, 8, 64)          0         
 g2D)                                                   

**Justification**:
- **Convolutional Layers**: Used for extracting features from images (edges, textures, patterns). They capture spatial hierarchies in images.
- **MaxPooling Layers**: Reduces spatial dimensions, helping prevent overfitting and reducing computational complexity.
- **Dense Layer**: Acts as the final classifier after flattening the feature maps.
- **Softmax Output Layer**: Ideal for multi-class classification (10 output classes).

### 3. **Activation Functions:**

We use **ReLU** for hidden layers and **Softmax** for the output layer.
- **ReLU**: Effective in avoiding vanishing gradient issues and accelerating convergence during training. It activates neurons selectively by outputting zero for negative inputs, which helps in sparse representations.
- **Softmax**: Converts raw output scores into probabilities for multi-class classification.

### 4. **Loss Function and Optimizer:**

In [4]:
# Loss function comparison
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

- **Categorical Cross Entropy**: Suitable for multi-class classification by comparing predicted probabilities with true labels.
- **Adam Optimizer**: Combines the advantages of momentum and RMSprop, adjusting learning rates dynamically. This results in faster and more stable convergence.

**Effect of Learning Rate**: A high learning rate may lead to unstable training and overshooting the minimum. A low rate can slow down convergence. Adjust the learning rate dynamically if the model does not converge (e.g., using learning rate schedulers).

### 5. **Training the Model:**

In [5]:
# Train the model
history = model.fit(datagen.flow(x_train, y_train, batch_size=64), 
                    epochs=50, 
                    validation_data=(x_test, y_test))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
 67/782 [=>............................] - ETA: 1:59 - loss: 0.9312 - accuracy: 0.6777

KeyboardInterrupt: 

**Backpropagation**:
- During backpropagation, weights are updated using gradients from the loss function. The learning rate controls how much to adjust weights. If learning is too slow, increase the rate; if oscillating, decrease it.

### 6. **Model Evaluation:**

In [13]:
# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc}")

# Confusion Matrix and classification report
from sklearn.metrics import confusion_matrix, classification_report
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)
y_true = np.argmax(y_test, axis=1)

print(confusion_matrix(y_true, y_pred_classes))
print(classification_report(y_true, y_pred_classes))

Test accuracy: 0.8159999847412109
[[847  15  37   3  13   5   5   9  26  40]
 [  7 928   0   1   1   1   4   1   5  52]
 [ 48   2 737  18  47  40  69  23   4  12]
 [ 20  11  47 562  54 136  90  33  13  34]
 [ 13   2  39  22 781  18  44  71   5   5]
 [  7   5  31  78  34 749  40  44   2  10]
 [  6   4  17   8  17  11 915   6   5  11]
 [ 11   5  19  17  27  32   5 868   3  13]
 [ 81  23   4   2   1   1   6   3 849  30]
 [ 20  36   3   2   2   0   3   3   7 924]]
              precision    recall  f1-score   support

           0       0.80      0.85      0.82      1000
           1       0.90      0.93      0.91      1000
           2       0.79      0.74      0.76      1000
           3       0.79      0.56      0.66      1000
           4       0.80      0.78      0.79      1000
           5       0.75      0.75      0.75      1000
           6       0.77      0.92      0.84      1000
           7       0.82      0.87      0.84      1000
           8       0.92      0.85      0.88     

- **Precision, Recall, F1-score**: Help identify the model's effectiveness in distinguishing between classes.
- **Confusion Matrix**: Highlights misclassification between classes.

**Improving Accuracy**: Consider deeper architectures (more layers), data augmentation, or tuning hyperparameters (batch size, learning rate).

### 7. **Optimization Strategies:**

- **Early Stopping**: Monitors validation loss to stop training if it stops improving.
- **Learning Rate Scheduling**: Reduces the learning rate gradually to ensure smooth convergence.
- **Weight Initialization**: Proper initialization (e.g., He initialization) avoids vanishing/exploding gradients.

**Importance of Weight Initialization**: Poor initialization can lead to slow or unstable convergence. Proper initialization helps the network converge faster.

### 8. **Report**:

- **Model Architecture**: Detailed explanation and rationale for choices.
- **Training/Test Accuracy**: Relevant plots for loss/accuracy over epochs.
- **Hyperparameters**: Learning rate, batch size, number of epochs, etc.
- **Challenges**: E.g., managing overfitting, tuning hyperparameters.