# Week 1 Notes

## Computer Vision

### Introduction

Many applications in computer vision are now solved by deep learning. There are two benefits from studying these problems and solutions:

* There is still scope for creating products and applications using deep learning inspired computer vision techniques.

*  The computer vision community has been very inventive in solving problems using deep learning. Their approaches can provide insight when working on other problems such as speech or text.

Computer vision is concerned mostly with inputs that are image or video data.

### Examples of CV Problems

#### Image Classification

Classifying an object as one of several classes (one of which may be the background or unassigned class). The simplest case is a binary classifier - the canonical teaching example being a cat classifier.

#### Object Detection

object detection is the compound task of finding an object (bounding box or scene pixel segmentation) and then classifying it from an image.

#### Neural Style Transfer

Neural style transfer is taking two images - from one of them the neural style is extracted, the other is then redone using the neural style to create a new image; a combination of the two.

#### Face Recognition
The face recognition task is to locate a face in an image and then check if it matches one of the faces in a register or else return an unmatched classification if it is not matched. The particular desired properties of such a classifier are specific to the details of the application. For example face recognition to access a bank account needs a very high precision, but low recall is acceptable - because giving the wrong person access is not permitted.


### Large Inputs: Deep Learning on High Resolution Images

#### Scaling Problem with Large Inputs

When the inputs run into the millions, then training a fully connected neural network becomes very difficult, because the number of training parameters can run into the billions of hundreds of billions. Even with current compute, it will just take too long and too much compute to train a FCNN.

Image data at high resolution will easily have millions of pixels of input - hence a different approach is needed.

#### Convolution Operation Mitigates Scaling on Large Inputs

The convolution operator applies a filter to regions of an image. For each region it computes pairwise multiplications and then sums over various regions of an image. The output of the filter is a multidimensional array storing the mul-sums of each.

The convolution operation effectively shares parameters (the filter weights) locally across inputs, thus needing fewer parameters and less compute to be learned.

Using the convolution operator it becomes possible to learn levels of features, be robust to rotations and still lead to good outputs.

## Convolution Operation

## Derivative of An Image

Finding interesting features of an image involves find areas where there is the greatest change in an image. One way to do this is it find the derivative of an image.

Since an image is a discretized representation, the smallest change is 1 unit. Looking at forward, backward and central derivatives these are just different representations of the approximate derivative:

* Forward: `f(x+1) - f(x)`

* Central: `f(x+1) -f(x-1)`

* Backward: `f(x) -f(x-1)`

To get a good estimate of the derivative of a pixel we average around it. Usually given that a pixel will have 8 direct neighbors, using the 8 local pixels is common. It total 9 pixels will be used to compute the 

It can be shown that these and other derivatives are just masks applied to an image.

Though the layers of a CNN are colloquially referred to as convolutions, this is only by convention.

Mathematically, the operation is technically a sliding dot product or cross-correlation of an image against a filter. This can be expressed using the numpy einsum function, based on einstein notation.

The filter aka a kernel in the literature, is an array of equal or smaller size that is positioned (or overlayed) over various sub regions of an image. 

For each sub region (like a subsample), a single number is returned - the sum of the hadamard product matrix of the filter over the region. This output is stored in the output matrix - whose size will be determined by the size of the image and the size of the kernel - sequentially. The output size is actually `(n - f + 1) x (n - f + 1)`, where `n` is the size of the image and `f` is the size of the filter.

This convolution is technically a cross correlation of the filter and the image.

`sliding_dot_product(image_subsample, filter)` is the calculation that finds the .

A convolutional neural network consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of a series of convolutional layers that convolve with a multiplication or other dot product. 

The activation function is commonly a RELU layer, and is subsequently followed by additional convolutions such as pooling layers, fully connected layers and normalization layers, referred to as hidden layers because their inputs and outputs are masked by the activation function and final convolution. The final convolution, in turn, often involves backpropagation in order to more accurately weight the end product.

## Edge Detection Examples

### Vertical Edge Detection

Different filters detect different aspects of an image. We'll show a hard-coded filter (a 3x3 vertical edge filter) being applied to an image here. 

First a simple example of the mechanics involved in calculating a sliding dot product. This is for a filter_arr that fits exactly into the image_arr.

In [2]:
def sliding_dot_product(image_arr, filter_arr):
    pass

#### Vertical Edge Filter

Here is a vertical edge filter. This filter is looking for an edge by differentiating the image with respect to the 8 local pixels.

In [8]:
import numpy as np

v_edge_filter = np.array([1,0,-1,1,0,-1,1,0,-1]).reshape((3,3))
v_edge_filter

array([[ 1,  0, -1],
       [ 1,  0, -1],
       [ 1,  0, -1]])

This kernel is a simple array, which is calculating the central derivative and averaging it along the x-axis, hence we look rapid changes along the vertical axis, which demonstrate a vertical edge.

#### Example 1: Image With No Vertical Edges

We're going to apply a filter.

#### Example 2: Image With Clear Vertical Edges



## More Edge Detection

### Vertical Filters

### Horizontal Filters

### General Edge Filters

#### Sobel Filter

#### Scharr Filter

### Learning Filters


## Padding

## Strided Convolutions

## Convolutions Over Volumes

## One Layer Of A Convolutional Network

## Simple Convolutional Network Example

## Pooling Layers

## CNN Example

## Why Convolutions?