# CH5 Deep Learning For Computer Vision

* Convolutional neural networks, also known as convnets, a type of deep-learning model almost universally used in computer vision applications.
  
* Importantly, a convnet takes as input tensors of shape (image_height, image_width, image_channels) (not including the batch dimension). In this case, we’ll configure the convnet to process inputs of size (28, 28, 1).
  
* The output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels).
  
* The width and height dimensions tend to shrink as you go deeper in the network. The number of channels is controlled by the first argument passed to the Conv2D layers (32 or 64).

## Necessary Background 
  
* Tensor is a container for numerical data. It is the way we store the information that we will use within our system.
  * Three Attributes
    * 1. Rank - refers to the tensor's number of axes
    * 2. Shape - refers to the number of dimensions along each axis 
    * 3. Data Type - refers to the type of data that is housed in the container

* Vector vs a Tensor
  * Vector only has one axis, thus it has a rank of one. While a Matrix has two axis, so it is a rank two. 

* Shape
  * A Square Matrix may have (2,2) dimensions 
  * A Tensor of RANK 3 may have (3,5,8) dimension. A Vector is a 1-RANK or 1-axis TENSOR but it can have 3 dimensions. 
    * [3,5,8] vs [[3,5,8]] vs [[[3,5,8]]] : 1AXIS|1RANK|3DIMENSIONS ; 2AXIS|2RANK|3DIMENSIONS ; 3AXIS|3RANK|3DIMENSIONS

* Supported Data Types
  * float32, float64
  * uint8
  * int32
  * int64

* A ZERO DIMENSIONAL TENSOR OR a SCALAR 
  * Contains only a single number 
  * 12 is a zero dimensional tensor; thus it has no brackets 

* <code> tensor_scalar = np.array(42)<br>
  tensor_scaler.shape # () <br>
  tensor.ndim # rank = ndim 0</code>

* <code> tensor_vector = np.array([42,24,16,8])<br>
  tensor_scaler.shape # (4,) <br>
  tensor.ndim # rank = ndim 1</code>

* <code> tensor_matrix = np.array([[42,24,24],[42,24,24]])<br>
  tensor_scaler.shape # (3,2) <br>
  tensor.ndim # rank = ndim 2</code>

* Vectors: 1D — (features)

* Sequences: 2D — (timesteps, features)

* Images: 3D — (height, width, channels)

* Videos: 4D — (frames, height, width, channels)

* Machine learning algorithms deal with a subset of data at a time called batches.

* When using a batch of data, the tensor’s first axis is reserved for the size of the batch (number of samples.)

* For example, if your handling 2D tensors (matrices), a batch of them will have a total of 3 dimensions: White small square (samples, rows, columns)

## Back to Introduction to <emphasis><bold>Convnets</bold></emphasis> & <emphasis><bold>Convnets</bold></emphasis>

### The Convolution Operation
* Dense layers learn global patterns in their input feature space

* Convolution layers learn local patterns

* Convnets have two interesting properties then:
  * The patterns they learn are translation invariant. 
    * After learning a certain pattern in the lower-right corner of a picture, a convnet can recognize it anywhere
  * They can learn spatial hiierarchies of patterns. 
    * A first convolution layer will learn small local patterns such as edges, a second convolution layer will learn larger patterns made of the features of the first layers, and so on.

* Convolutions operate over 3D tensors, called feature maps, with two spatial axes (height and width) as well as a depth axis (also called the channels axis)

* Convolutions are defined by two key parameters:
  * Size of the patches extracted from the inputs—These are typically 3 × 3 or 5 × 5. 
  * Depth of the output feature map—The number of filters computed by the convolution. The example started with a depth of 32 and ended with a depth of 64

* In Keras Conv2D layers, these parameters are the first arguments passed to the layer: Conv2D(output_depth, (window_height, window_width)).

* A convolution works by sliding these windows of size 3 × 3 or 5 × 5 over the 3D input feature map, stopping at every possible location, and extracting the 3D patch of surrounding features (shape (window_height, window_width, input_depth)).

* ![Convulation_Process](./5.4.PNG)

* If you want to get an output feature map with the same spatial dimensions as the input, you can use padding. Padding consists of adding an appropriate number of rows and columns on each side of the input feature map so as to make it possible to fit center convolution windows around every input tile. For a 3 × 3 window, you add one column on the right, one colum,n on the left, one row at the top, and one row at the bottoom.

* In Conv2D layers, padding is configurable via the padding argument, which takes two values: "valid", which means no padding (only valid window locations will be used);and "same", which means “pad in such a way as to have an output with the same width and height as the input.” The padding argument defaults to "valid". 

* The other factor that can influence output size is the notion of strides. The description of convolution so far has assumed that the center tiles of the convolution windows are all contiguous. But the distance between two successive windows is a parameter of the convolution, called its stride.

## Max-Pooling Operation 

* That’s the role of max pooling: to aggressively downsample feature maps, much like strided convolutions.

* For instance, before the first MaxPooling2D layers, the feature map is 26 × 26, but the max-pooling operation halves it to 13 × 13.
  
* Max pooling consists of extracting windows from the input feature maps and outputting the max value of each channel. It’s conceptually similar to convolution, except that instead of transforming local patches via a learned linear transformation (the convolution kernel),they’re transformed via a hardcoded max tensor operation

* Because convnets learn local, translation-invariant features, they’re highly data efficient on perceptual problems. Training a convnet from scratch on a very small image dataset will still yield reasonable results despite a relative lack of data, without the need for any custom feature engineering

* 

*

* 

* 

*

* 

*

* 

* 

*

* 

* 

* 

*

* 

*

* 

* 

*

* 

* 

* 

*

* 

*

* 

* 

*

* 

*

* 

* 

*

* 

*

* 

* 

*

* 

* 

*

* 

*

* 

* 

*

* 

*

* 

*

*

In [3]:
pip install numpy 

Collecting numpyNote: you may need to restart the kernel to use updated packages.





  Downloading numpy-1.22.3-cp39-cp39-win_amd64.whl (14.7 MB)
     --------------------------------------- 14.7/14.7 MB 65.6 MB/s eta 0:00:00
Installing collected packages: numpy
Successfully installed numpy-1.22.3


In [12]:
import numpy as np 
tensor_matrix = np.array([[42,24,24],
                          [42,24,24]])

In [14]:
tensor_matrix.shape

(2, 3)

In [15]:
tensor_matrix.ndim

2

In [21]:
four_dimensional_tensor = np.array([[[[1,1,1,1],
                                      [1,1,1,1],
                                      [1,1,1,1],
                                      [1,1,1,1]]]])
four_dimensional_tensor.shape

(1, 1, 4, 4)

In [22]:
four_dimensional_tensor.ndim

4