<a href="https://colab.research.google.com/github/shstreuber/Data-Mining/blob/master/Module9_Convolutional_Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Module 9: Convolutional Neural Networks**
A Convolutional Neural Network (CNN) is a specialized type of deep neural network designed for processing grid-like data, such as images or videos. It is particularly effective in tasks where the spatial relationships between neighboring pixels or regions are important, making it well-suited for tasks like image classification, object detection, and segmentation.

At the end of this module, you will be able to:

* Explain the components of a Convolutional Neural Network
* Configure TensorFlow and Keras to work with a CNN
* Solve a simple CNN problem

##**0. Key Components and Functions of CNNs**


1. **Convolutional Layers**
 * **Function**: These layers apply learnable filters (kernels) to small regions of the input data, allowing the network to detect features like edges, textures, and patterns at different spatial scales.
 * **Example:** In an image classification task, early convolutional layers might learn to detect simple features like edges or corners, while deeper layers might combine these features into more complex patterns like textures or parts of objects.
**2. Pooling (Subsampling) Layers**
 * **Function:** Pooling layers reduce the spatial dimensions (width and height) of the input, while retaining important information. They help make the learned features more robust to variations in input size or position.
 * **Example:** Max pooling, for instance, takes the maximum value from each patch of the feature map, reducing its size and focusing on the most important features detected by the convolutional layers.

**3. Activation Functions and Non-linearity**
 * **Function:** Activation functions like ReLU (Rectified Linear Unit) introduce non-linearities into the network, allowing it to learn and model complex relationships in the data.
 * **Example:** ReLU is commonly used after convolutional and fully connected layers to introduce non-linearities, enabling the network to approximate more complex functions.

**4. Fully Connected Layers**
 * **Function:** These layers integrate the spatial information learned by the convolutional layers and pooling layers, producing the final output of the network. In image classification, they map the extracted features to the output classes.
 * **Example:** In an image classification CNN, fully connected layers take the flattened output of the preceding layers and produce logits or probabilities for each class using softmax activation.

Here is a great representation of this process, from [Analytics Vidhya](https://www.analyticsvidhya.com/blog/2022/01/convolutional-neural-network-an-overview/):

<img src = "https://editor.analyticsvidhya.com/uploads/59954intro%20to%20CNN.JPG">

And here is [a great code example](https://www.kaggle.com/code/ifeoluwaoduwaiye/cats-vs-dogs-image-classification-using-cnn-95) based on the famous Cats vs Dogs dataset.



## **1. Example of CNN in Image Classification with the CIFAR Dataset**

Let's consider an example of using a CNN to classify images from the CIFAR-10 dataset, which contains 60,000 32x32 color images in 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck).

To learn more about the CIFAR, click [here](https://www.cs.toronto.edu/~kriz/cifar.html). The following description comes from the introduction to this dataset:

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class ... The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

Here are the classes in the dataset, as well as 10 random images from each:
<center>
<img src= "https://github.com/shstreuber/Data-Mining/blob/master/images/CIFAR.JPG?raw=true">
</center>



First, we load and preprocess the CIFAR-10 dataset, normalizing pixel values to the range [0, 1]. Normalizing pixel values by dividing by 255 is a common preprocessing step in machine learning tasks, especially when dealing with images. Pixel values in typical images range from 0 to 255, where 0 represents black and 255 represents white (for grayscale images). For RGB color images, each color channel (Red, Green, Blue) also ranges from 0 to 255. Dividing pixel values by 255 ensures that the input data is within a standardized range (0 to 1 or -1 to 1) suitable for neural networks to process efficiently and effectively.

In [221]:
import tensorflow as tf
from tensorflow.keras import layers, models, datasets

# Load and preprocess CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
train_images, test_images = train_images / 255.0, test_images / 255.0  # Normalize pixel values

Then, we define the Model Architecture and Compile the Model.

* **Architecture**
 * The CNN starts with a series of convolutional and max pooling layers to extract and downsample features from the images. These layers detect patterns in images by sliding small filters (like edge detectors) across the image.
Each filter learns to recognize different features (like edges or textures). These layers detect patterns (like edges or textures) in the images.
 * The *Conv2D* layers with relu activation start the process. They apply convolutional filters to detect patterns
 * The *MaxPooling2D* layers reduce reduce the size of the image representation while retaining important information. They help make the network more robust to variations in the position or size of objects in the images. *MaxPooling2D* with (2, 2) reduces each dimension by half.
 * The Flatten layer transforms the 2D matrix data into a 1D vector, preparing it for the fully connected layers.
 * Dense layers (aka Fully Connected Layers) at the end classify the images into one of the 10 classes using softmax activation. These layers integrate all the features learned by the convolutional and pooling layers and perform the final classification. The last layer (Dense(10, activation='softmax')) outputs probabilities for each of the 10 classes (like 'cat', 'dog', 'car').

* **Compiling**
 * model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) configures the model for training.
 * optimizer='adam' specifies the Adam optimizer, a popular choice for gradient-based optimization.
 * loss='sparse_categorical_crossentropy' sets the loss function appropriate for multi-class classification tasks where the labels are integers.
 * metrics=['accuracy'] specifies that accuracy should be monitored during training and evaluation.

In [222]:
# Define CNN architecture
model = models.Sequential([
    # Convolutional layers: detect patterns in images
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2, 2)),  # Pooling layers: downsample the image
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),

    # Flatten layer: prepare data for fully connected layers
    layers.Flatten(),

    # Fully connected layers: make the final classification
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')  # Output layer with softmax for 10 classes
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',  # Loss function for classification
              metrics=['accuracy'])  # Metric to monitor during training

Lastly, we train and evaluate the model:

* **Training:** The model is trained (fit) on the training data (train_images, train_labels) for 10 epochs and then evaluated (evaluate) on the test data to see how well it can classify new images it hasn't seen before.
* **Evaluation:** After training, the model's performance is evaluated on the test data (test_images, test_labels) using accuracy as the metric.
 * test_loss, test_acc = model.evaluate(test_images, test_labels) evaluates the model on the test data (test_images, test_labels) and computes the loss and accuracy.
 * print(f"Test Accuracy: {test_acc}") prints the test accuracy after evaluation.

In [None]:
# Train the model (this will take about 3-5 minutes)
model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test Accuracy: {test_acc}")

Epoch 1/10
  47/1563 [..............................] - ETA: 58s - loss: 2.2569 - accuracy: 0.1463

## **2. Steps to Classify a New Image**
**1. Preprocess the Image**

* Ensure the new image is in the same format as the CIFAR-10 dataset images: 32x32 pixels with RGB channels (3 channels).
* Normalize the pixel values of the new image to be in the range [0, 1], similar to how the training and test images were preprocessed.

We will be using a zipfile of images located [here](https://github.com/shstreuber/Data-Mining/blob/master/images/CIFAR_exercise.zip).


**2. Load the Trained Model**
* Load the saved model that you trained on the CIFAR-10 dataset. If you haven't saved it yet, you should save it after training using model.save('cifar_model.h5').

**3. Upload the Images**
* Download the [zipfile](https://github.com/shstreuber/Data-Mining/blob/master/images/CIFAR_exercise.zip) from the instructor's GitHub.
* Unzip the file on your computer
* In Colab, locate the file folder icon on the left side of the screen and click on it
* Drag and drop the unzipped image files from your computer into the Files space in Colab or use the file upload icon.
* Once the files are uploaded, right click on a filename and select Copy Path from the popup menu, then paste the path into the new_image_path variable (you'll also need the '').

**4. Perform Prediction**
* Use the loaded model to predict the class probabilities for the new image.
Convert the model's output (probabilities) into a class label by selecting the class with the highest probability.

Classify the images you have uploaded under point 3 above using the trained CIFAR-10 model:

In [None]:
# First, we install OpenCV to load the image
!pip install opencv-python

In [None]:
# Now, we are ready to go

import tensorflow as tf
from tensorflow.keras import models
import numpy as np
import cv2  # Assuming you have OpenCV installed for image loading

# Load the CIFAR-10 model
# model = models.load_model('cifar_model.h5')  # Replace 'cifar_model.h5' with your saved model path
model = model

# Now, put a .jpg image into the
# Load the new image
new_image_path = '<paste here the path to the image from the zip file you want to classify. See 3. above>'
new_image = cv2.imread(new_image_path)  # Load the image using OpenCV
print(new_image.shape) # Verifying that the image has loaded

In [None]:
# Preprocess the image
new_image = cv2.resize(new_image, (32, 32))  # Resize the image to 32x32 pixels
new_image = new_image.astype('float32') / 255.0  # Normalize pixel values to [0, 1]

# Expand dimensions to create a batch of 1 image (required by the model)
new_image = np.expand_dims(new_image, axis=0)
print(new_image.shape) # Verifying that the image has been resized; output should show 32

You should see an output like 1, 32, 32, 3. The dimensions 1, 32, 32, 3 represent the shape of a 4D array that is typically used as input to a convolutional neural network (CNN) for image classification tasks. Here’s a breakdown of each dimension:

* 1: This is the batch size, indicating the number of images in the batch. In this case, it is 1, meaning we are processing a single image.
* 32: This is the height of the image in pixels. The image has 32 pixels in height.
* 32: This is the width of the image in pixels. The image has 32 pixels in width.
* 3: This is the number of color channels in the image. A typical color image has three channels corresponding to Red, Green, and Blue (RGB).

In [None]:
# Perform prediction
predictions = model.predict(new_image)
predicted_class_index = np.argmax(predictions[0])  # Get the index of the class with the highest probability

# Print the predicted class
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
predicted_class = class_names[predicted_class_index]
print(f"Predicted Class: {predicted_class}")

# Your Turn
Go to https://www.kaggle.com/code/ifeoluwaoduwaiye/cats-vs-dogs-image-classification-using-cnn-95

Here, you will see a great notebook based on the famous [Cats vs Dogs](https://www.kaggle.com/c/dogs-vs-cats/data) dataset.
<center>
<img src= "https://storage.googleapis.com/kaggle-media/competitions/kaggle/3362/media/woof_meow.jpg">
</center>

As you can see, the model works pretty well given its training data. Now you will use it to analyze the data from the instructor's zip file.

**Here is your To Do**
* Copy the code from the Kaggle page into your own Google Colab Notebook
* Follow point 3 above to download, unzip, and install the images in the [zipfile](https://github.com/shstreuber/Data-Mining/blob/master/images/CIFAR_exercise.zip) from the instructor's GitHub.
* Use the Kaggle model to classify the images from the zipfile. You can use the code field below to get started.


