# Deep Learning for Computer Vision

Welcome to this notebook on Deep Learning for Computer Vision, part of the 'Part_4_Deep_Learning_and_Specializations' section of our machine learning tutorial series. In this notebook, we'll explore the fundamentals of computer vision using deep learning techniques, focusing on Convolutional Neural Networks (CNNs). CNNs are a powerful class of neural networks designed to process and analyze visual data, making them ideal for tasks like image classification, object detection, and more.

## What You'll Learn
- The basics of computer vision and its applications in machine learning.
- The architecture of Convolutional Neural Networks (CNNs), including convolutional layers, pooling layers, and fully connected layers.
- How to build and train a CNN for image classification using TensorFlow and Keras.
- Practical implementation on the MNIST dataset for handwritten digit recognition.

Let's dive into the fascinating world of computer vision with deep learning!

## 1. Introduction to Computer Vision

Computer vision is a field of artificial intelligence that enables machines to interpret and understand visual information from the world, such as images and videos. It involves tasks like:
- **Image Classification**: Identifying what an image represents (e.g., cat vs. dog).
- **Object Detection**: Locating and identifying objects within an image.
- **Image Segmentation**: Dividing an image into meaningful regions or segments.

Deep learning, particularly through CNNs, has revolutionized computer vision by achieving state-of-the-art performance in these tasks. Unlike traditional methods that rely on hand-crafted features, CNNs automatically learn hierarchical feature representations from raw pixel data.

## 2. Understanding Convolutional Neural Networks (CNNs)

CNNs are a specialized type of neural network designed for processing grid-like data, such as images. They are inspired by the human visual system and are particularly effective for tasks involving visual data. The key components of a CNN include:

- **Convolutional Layers**: These layers apply filters to the input image to extract features like edges, textures, or patterns. Each filter slides over the image (a process called convolution) to produce feature maps.
- **Pooling Layers**: These layers reduce the spatial dimensions of the feature maps (e.g., max pooling takes the maximum value in a region), making the network computationally efficient and reducing overfitting.
- **Fully Connected Layers**: At the end of the network, these layers combine the extracted features to make predictions or classifications.
- **Activation Functions**: Typically, ReLU (Rectified Linear Unit) is used to introduce non-linearity after convolutional and fully connected layers.

CNNs are powerful because they learn hierarchical features: low-level features (like edges) in early layers and high-level features (like object shapes) in deeper layers.

## 3. Setting Up the Environment

Before we build our CNN, let's import the necessary libraries. We'll use TensorFlow and Keras for building and training the model, and matplotlib for visualizations.

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducibility
tf.random.set_seed(42)
np.random.seed(42)

## 4. Loading and Preprocessing the MNIST Dataset

We'll use the MNIST dataset, a classic dataset for handwritten digit recognition. It consists of 60,000 training images and 10,000 test images of digits (0-9), each 28x28 pixels in grayscale.

Let's load the dataset, normalize the pixel values to the range [0, 1], and reshape the data to include a channel dimension (required for CNNs).

In [None]:
# Load MNIST dataset from TensorFlow
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize pixel values to range [0, 1]
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

# Reshape data to include channel dimension (28, 28, 1)
X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)

# Convert labels to categorical (one-hot encoding)
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

# Display dataset shapes
print(f"Training data shape: {X_train.shape}")
print(f"Test data shape: {X_test.shape}")

## 5. Visualizing Sample Images

Let's visualize a few sample images from the MNIST dataset to understand what we're working with.

In [None]:
# Plot a few sample images
plt.figure(figsize=(10, 4))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_train[i].reshape(28, 28), cmap='gray')
    plt.title(f'Digit: {np.argmax(y_train[i])}')
    plt.axis('off')
plt.show()

## 6. Building a Convolutional Neural Network

Now, let's build a simple CNN model for classifying handwritten digits. Our model will consist of:
- Two convolutional layers with ReLU activation and max pooling.
- A flatten layer to convert 2D feature maps to a 1D vector.
- Two fully connected (dense) layers, with the final layer outputting probabilities for 10 classes (digits 0-9).

In [None]:
# Build the CNN model
model = models.Sequential([
    # First Convolutional Block
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    
    # Second Convolutional Block
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    # Flatten the output for dense layers
    layers.Flatten(),
    
    # Dense Layers
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')  # Output layer for 10 classes
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Display model summary
model.summary()

## 7. Training the CNN Model

Let's train the model on the MNIST training data for 5 epochs. We'll use a batch size of 64 and reserve 20% of the training data for validation to monitor performance during training.

In [None]:
# Train the model
history = model.fit(X_train, y_train, 
                    epochs=5, 
                    batch_size=64, 
                    validation_split=0.2)

## 8. Evaluating the Model

After training, let's evaluate the model's performance on the test dataset to see how well it generalizes to unseen data.

In [None]:
# Evaluate the model on test data
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test accuracy: {test_accuracy:.4f}")
print(f"Test loss: {test_loss:.4f}")

## 9. Visualizing Training Progress

Let's plot the training and validation accuracy and loss over the epochs to understand how the model learned.

In [None]:
# Plot training history
plt.figure(figsize=(12, 4))

# Plot accuracy
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

# Plot loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.show()

## 10. Making Predictions

Finally, let's use the trained model to make predictions on a few test images and visualize the results.

In [None]:
# Make predictions on a few test images
predictions = model.predict(X_test[:10])
predicted_labels = np.argmax(predictions, axis=1)
true_labels = np.argmax(y_test[:10], axis=1)

# Visualize predictions
plt.figure(figsize=(10, 4))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_test[i].reshape(28, 28), cmap='gray')
    plt.title(f'Pred: {predicted_labels[i]}\nTrue: {true_labels[i]}')
    plt.axis('off')
plt.show()

## 11. Conclusion

In this notebook, we've explored the basics of deep learning for computer vision using Convolutional Neural Networks (CNNs). We built and trained a CNN on the MNIST dataset for handwritten digit recognition, achieving good accuracy on the test set. CNNs are a cornerstone of modern computer vision, and this example is just the beginning. In more advanced applications, CNNs can be used for complex tasks like object detection, facial recognition, and medical image analysis.

### Key Takeaways
- CNNs are designed to process visual data by learning hierarchical features through convolutional and pooling layers.
- Preprocessing data (e.g., normalization) is crucial for effective training.
- Visualization of training progress and predictions helps in understanding model performance.

Feel free to experiment with the model architecture (e.g., adding more layers, changing hyperparameters) or try other datasets to deepen your understanding of CNNs!

## 12. Further Exploration

If you're interested in diving deeper into computer vision, consider exploring:
- **Advanced Architectures**: Learn about architectures like VGG, ResNet, or Inception for more complex tasks.
- **Object Detection**: Explore frameworks like YOLO or Faster R-CNN for detecting objects in images.
- **Transfer Learning**: Use pre-trained models on custom datasets to leverage existing knowledge (covered in a later notebook in this series).
- **Image Augmentation**: Apply techniques to artificially expand your dataset and improve model robustness.

Stay tuned for more specialized topics in this 'Part_4_Deep_Learning_and_Specializations' section!