# Week 1: Foundations of Conventional Neural Netweorks

Implement the foundational layers of CNNs (pooling, convolutions) and stack them properly in a deep network to solve multi-class image classification problems.

**Learning Objectives**:

* Explain the convolution operation
* Apply two different types of pooling operations
* Identify the components used in a convolutional neural network (padding, stride, filter, ...) and their purpose
* Build a convolutional neural network
* Implement convolutional and pooling layers in numpy, including forward propagation
* Implement helper functions to use when implementing a TensorFlow model
* Create a mood classifer using the TF Keras Sequential API
* Build a ConvNet to identify sign language digits using the TF Keras Functional API
* Build and train a ConvNet in TensorFlow for a binary classification problem
* Build and train a ConvNet in TensorFlow for a multiclass classification problem
* Explain different use cases for the Sequential and Functional APIs

---

## Table of Contents

---

## Computer Vision

In this introduction to Convolutional Neural Networks (CNNs), the focus is on the transformative power of computer vision and the technical necessity of convolutions when dealing with high-resolution image data.

### Importance and Impact of Computer Vision

* **Rapid Advancement:** Deep learning has propelled computer vision into real-world utility, enabling self-driving cars, advanced face recognition, and relevant content curation in consumer apps.
* **Cross-Fertilization:** Architectural innovations in computer vision often inspire breakthroughs in other fields, such as speech recognition.

### Key Computer Vision Problems

* **Image Classification:** Determining whether an object (e.g., a cat) is present in an image.
* **Object Detection:** Not only identifying objects but also determining their specific positions and drawing bounding boxes around them.
* **Neural Style Transfer:** Repainting a content image in the artistic style of a reference image (e.g., turning a landscape photo into a "Picasso" style painting).

<img src='images/cv.png' width=750px>

### The Challenge of Input Scale

* **Small Images:** A $64 \times 64$ RGB image has 12,288 features ($64 \times 64 \times 3$), which is manageable for standard fully connected networks.
* **Large Images:** A modest $1000 \times 1000$ (1-megapixel) image results in 3,000,000 input features.
* **Parameter Explosion:** * In a fully connected layer with just 1,000 hidden units, a 1-megapixel input would require a weight matrix with 3 billion parameters.
* **Overfitting:** With billions of parameters, models are highly prone to overfitting without massive amounts of data.
* **Resource Constraints:** The memory and computational power required to train such a network are generally infeasible for standard hardware.

<img src='images/input_scale.png' width=750px>

### The Solution: Convolutional Operations

* To process high-resolution images efficiently without a parameter explosion, deep learning uses **Convolutional Neural Networks (CNNs)**.
* **Convolutions:** This operation is the fundamental building block of CNNs, allowing the network to learn local patterns (like edges) while drastically reducing the number of parameters compared to fully connected layers.

---

## Edge Detection Example

The convolution operation is a fundamental building block of Convolutional Neural Networks (CNNs). It allows a model to learn features—starting with simple edges and progressing to complex objects—by sliding a filter over an input image.

### What is the Convolution Operation?

In computer vision, convolution is used to detect specific features, such as vertical or horizontal lines.
* **The Input:** A grayscale image is represented as a matrix of pixel intensities (e.g., a $6 \times 6 \times 1$ matrix).
* **The Filter (or Kernel):** A smaller matrix (typically $3 \times 3$) designed to identify a specific pattern. For a vertical edge detector, a common filter is:

$$ \begin{bmatrix} 1 & 0 & -1 \\ 1 & 0 & -1 \\ 1 & 0 & -1 \end{bmatrix}$$

* **The Notation:** In math and deep learning, the asterisk ($*$) denotes the convolution operation (not to be confused with standard multiplication).

### The Mechanics of Convolution

The process of convolving a $6 \times 6$ image with a $3 \times 3$ filter results in a $4 \times 4$ output matrix.

1. **Overlay:** Place the $3 \times 3$ filter over the top-left $3 \times 3$ patch of the image.
2. **Element-wise Product:** Multiply each of the 9 numbers in the filter by the corresponding pixel value in the image patch.
3. **Summation:** Add those 9 products together to get a single value for the first cell of the output matrix.
4. **Shift (Slide):** Move the filter one pixel to the right (the "stride") and repeat the calculation. Once the row is finished, move down and start the next row.

### Intuition: Why It Detects Edges

The filter acts as a mathematical "transition detector."
* **Vertical Edge Case:** Imagine an image where the left half is bright (pixel value 10) and the right half is dark (pixel value 0).
* **The Calculation:** When the filter (with 1s on the left and -1s on the right) sits on the transition:
    * The 1s multiply the bright pixels (high positive sum).
    * The -1s multiply the dark pixels (near-zero sum).
* **The Result:** The sum is a large positive number (e.g., 30). In areas where the color is uniform (all 10s or all 0s), the 1s and -1s cancel each other out, resulting in 0.
* **Visual Output:** The final $4 \times 4$ matrix will show a bright "strip" in the middle, representing the detected edge.
### Practical Implementation

In deep learning frameworks, you don't perform these sums manually. Functions are built-in to handle high-dimensional convolutions:
* **TensorFlow:** `tf.nn.conv2d`
* **Keras:** `Conv2D layer`
* **Output Dimensions:** For an $n \times n$ image and an $f \times f$ filter, the output size is generally $(n - f + 1) \times (n - f + 1)$. In our example, $6 - 3 + 1 = 4$.