# Convolutional Neural Networks (CNNs)

## Problem Type
**Convolutional Neural Networks (CNNs)** are primarily used for:
- **Image Classification** problems
- **Supervised** learning
- **Object Detection**, **Image Segmentation**, **Video Analysis**

### How CNNs Work
- **Convolutional layers:**
  - Apply convolutional filters (kernels) to input images to detect spatial patterns like edges, textures, and more complex features.
  - Multiple filters learn different features at each layer, capturing increasingly abstract representations.
- **Pooling layers:**
  - Reduce the spatial dimensions (width and height) of the feature maps, keeping the most essential information and reducing computational complexity.
  - Common pooling methods include Max Pooling (selecting the maximum value) and Average Pooling (taking the average value).
- **Activation functions:**
  - Introduce non-linearity into the model to allow it to learn complex patterns.
  - Common functions include ReLU (Rectified Linear Unit), which replaces negative values with zero, and others like sigmoid or tanh.
- **Fully connected layers:**
  - Flatten the feature maps and connect them to fully connected layers for the final classification.
  - The last layer usually applies a softmax activation for multi-class classification or sigmoid for binary classification.
- **Backpropagation:**
  - Uses gradients to update the weights in the network based on the error between the predicted and actual labels.
  - The learning process involves minimizing a loss function through optimization techniques like stochastic gradient descent (SGD) or Adam.

### Key Tuning Metrics
- **`filter_size` (Kernel Size):**
  - **Description:** Size of the convolutional filter applied to the input data.
  - **Impact:** Larger filters capture broader features but may lose fine details; common sizes include 3x3, 5x5.
  - **Default:** Typically `3x3` or `5x5`.
- **`stride`:**
  - **Description:** Step size by which the filter moves across the input.
  - **Impact:** Larger strides reduce the output size, leading to faster computations but may skip over important features.
  - **Default:** `1`.
- **`padding`:**
  - **Description:** Padding added to the input to control the spatial size of the output.
  - **Impact:** `same` padding keeps output size the same as input, while `valid` padding reduces it.
  - **Default:** `valid` (no padding).
- **`number of filters`:**
  - **Description:** Number of filters (or feature maps) in a convolutional layer.
  - **Impact:** More filters capture more features, increasing model capacity but also computational cost.
  - **Default:** Varies by layer; usually starts with `32` or `64` and increases in deeper layers.
- **`learning_rate`:**
  - **Description:** Step size for updating weights during training.
  - **Impact:** Higher values speed up training but may cause instability; lower values provide more stable convergence but slow down training.
  - **Default:** `0.001` (varies with optimizer).
- **`dropout_rate`:**
  - **Description:** Fraction of units to drop during training to prevent overfitting.
  - **Impact:** Helps in regularization; typical values range from `0.2` to `0.5`.
  - **Default:** Varies, typically `0.5` in fully connected layers.

### Pros vs Cons

| Pros                                                  | Cons                                                   |
|-------------------------------------------------------|--------------------------------------------------------|
| Highly effective for image and spatial data           | Requires large amounts of labeled data for training    |
| Automatically learns hierarchical feature representations | Computationally expensive, especially with deep networks |
| Can handle large input dimensions (e.g., images)      | Prone to overfitting without sufficient regularization  |
| Transfer learning allows reuse of pre-trained models  | Complex architecture requires careful tuning of hyperparameters |
| Well-supported by deep learning libraries like TensorFlow and PyTorch | Difficult to interpret learned features and decisions  |

### Evaluation Metrics
- **Accuracy (Classification):**
  - **Description:** Ratio of correct predictions to total predictions.
  - **Good Value:** Higher is better; values above 0.85 indicate strong model performance.
  - **Bad Value:** Below 0.5 suggests poor model performance.
- **Precision (Classification):**
  - **Description:** Proportion of true positives among all positive predictions.
  - **Good Value:** Higher values indicate fewer false positives, especially important in imbalanced datasets.
  - **Bad Value:** Low values suggest many false positives.
- **Recall (Classification):**
  - **Description:** Proportion of actual positives correctly identified.
  - **Good Value:** Higher values indicate fewer false negatives, important in recall-sensitive applications.
  - **Bad Value:** Low values suggest many false negatives.
- **F1 Score (Classification):**
  - **Description:** Harmonic mean of Precision and Recall.
  - **Good Value:** Higher values indicate a good balance between Precision and Recall.
  - **Bad Value:** Low values suggest a poor balance between Precision and Recall.
- **AUC-ROC (Classification):**
  - **Description:** Measures the ability of the model to distinguish between classes across all thresholds.
  - **Good Value:** Values closer to 1 indicate strong separability between classes.
  - **Bad Value:** Values near 0.5 suggest random guessing.
- **Cross-Entropy Loss (Log Loss):**
  - **Description:** Measures the performance of a classification model where the output is a probability value between 0 and 1.
  - **Good Value:** Lower values indicate better model calibration and performance.
  - **Bad Value:** Higher values suggest poor probabilistic predictions.
- **Top-K Accuracy (for multi-class classification):**
  - **Description:** Measures if the correct label is among the top K predicted probabilities.
  - **Good Value:** Higher values, especially in problems with many classes.
  - **Bad Value:** Lower values suggest the model struggles to capture multiple class distinctions.



In [None]:
import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"  # Suppresses INFO and WARNING messages
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from keras.datasets import mnist
from keras.layers import Conv2D, Dense, Dropout, Flatten, Input, MaxPooling2D
from keras.models import Sequential
from keras.utils import to_categorical
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
# Load the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [None]:
plt.imshow(train_images[0], cmap='gray')
plt.title(f"Label: {train_labels[0]}")
plt.show()

In [None]:
# Preprocess the data
# add colour channel as expected by conv2D
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))

# Normalise images between 0 and 1
train_images = train_images.astype('float32') / 255
test_images = test_images.astype('float32') / 255

# one hot encoding
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [None]:
# Build the model
model = Sequential()
model.add(Input(shape=(28, 28, 1))) 
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model.fit(train_images, train_labels, batch_size=128, epochs=10, verbose=1, validation_data=(test_images, test_labels))

# Evaluate the model
score = model.evaluate(test_images, test_labels, verbose=2)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [None]:
# After training the model
predictions = model.predict(test_images)
predictions = np.argmax(predictions, axis=1)  # Convert predictions to labels
true_labels = np.argmax(test_labels, axis=1)  # Convert one-hot encoded y_test to labels

# Print the classification report
print("Classification Report:")
print(classification_report(true_labels, predictions))

In [None]:
# Create a confusion matrix
cm = confusion_matrix(true_labels, predictions)

# Use Seaborn to plot the confusion matrix
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()