# CNN basics

    - CNN are inspired by the biology of vision cells in cortex of mammals
    - In a DNN each unit in one layer is connected to EACH unit in another layer
    - In a CNN each unit in one layer is connected to FEW units in another layer
        - Each CNN layer looks at an increasingly larger part of the image
        - For each layer we define set of Filters
        - Layer 1
            - 1 Filter of shape 3*3
            - This filter would have 9 weights
            - This filter would be applied to 9 pixels 
            - Pixel value would be multiple by weights
            - Output would be summed to get a scalar
            - So output of this filter operation is a scalar
            - Likewise we can have multiple filters in one layer
        - Layer 2
            - 1 Filter of shape 4*4
            - This filter would have 16 weights
            - This filter would be applied to 16 pixels 
            - Pixel value would be multiple by weights
            - Output would be summed to get a scalar
            - So output of this filter operation is a scalar
            - Likewise we can have multiple filters in one layer
        - We can stack multiple layers of CNN with multiple filters with multiple dimensions. However on a given layer each filter would be of the same dimension
        
        - Concept of Strides
            - Strides would be of a value N, say 2
            - In convolusion layer say we have a filter of dimension 3*3
            - Say an input image has shape 10*10 pixels, think of it as a matrix for understanding
            - Filter would read first 3 pixels on x-axis (& 3 on y), with stride as 1, it would then read next 3 pixels from 1st pixel
            - Filter would read first 3 pixels on x-axis (& 3 on y), with stride as 2, it would then read next 3 pixels from 3rd pixel
            - Filter would read first 3 pixels on x-axis (& 3 on y), with stride as 3, it would then read next 3 pixels from 4th pixel
            
        - Concept of Pooling layer
            - Say an input image has shape 10*10 pixels, think of it as a matrix for understanding
            - A pooling layer also has filters of say size 2*2
            - Say stride is 2
            - Now, filter would read first 2 pixels on x-axis (& 2 on y) - find the max value of those pixels and extract that
            - Filter then uses stride to then read next set of filters and finds next max value
            - This way Pooling layer reduces the amount of data it needs to process
            
        - Famous CNN networks like RosNet, GoogleNet are basically a CNN architecture with following components
            - 1 or more convolusion layers
                - Each convolusion layer with 1 or more Filters and defined Stride
            - 1 or more pooling layers
                - Each convolusion layer with 1 or more Filters and defined Stride
            - So a RahulNet could be like this:
                - 1 CN layer, 2 Filters (3*3), Stride (2)
                - 1 Pooling layer, 2 Filters (3*3), Stride (2)
                - 1 CN layer, 5 Filters (2*2), Stride (3)
                - 1 Pooling layer, 5 Filters (2*2), Stride (3)
                - Final Dense layer

# Understanding Filters, Kerners & Input Shape

    - Consider example below
    - Here we are doing the following:
        - creating one layer of Conv2D 
        - Number of filters is 200; which means we have 200 filters which would be identifying a particular shape
        - Each filter has kernel of 3; which means (3*3*1)
        - Input shape is (28,28,1); which means that the data is a Rank 3 tensor
        - Number of parameters this layer has to estimate is:
            - 200 * (3*3*1+1) which is 2000

    - model.add(Conv2D(filters=200, kernel_size=3, strides=(1,1), activation='relu', input_shape=(28,28,1)))

# Tips: 
        - When using Conv & MaxPooling the depth of feature map would increase through layers and length of feature map would reduce. This should be a good pattern when you do summary.
        - Each Conv2D layer accepts a Rank 3 tensor
        - Layer 1 which accepts image which is represented as Rank 3 tensor (Height, Weight, Channel)
        - Each layer takes input a tensor and outputs a tensor
        - Number of filters in any layer would determine the channels in the output tensor. For e.g. if a layer has 10 filters then it's output tensor would have 10 channels, like (?,?,10)
        - Filters in convulation layers have two specific advantages:
            - Filters learn localized representation of images; for e.g. an edge, a line, ear, nose etc. Since they learn local patterns they can apply those learnings when they see those patterns again.
            - In a MLP layer, all inputs are understood as once hence the reusability when presented with new data becomes zero