# Deep Computer Vision using Convolutional Neural Networks

## Class Notes (W7 Pred analytics)

#### Advantages of Deep Neural Network over Shallow Network:

Parameter Efficiency:

    Deep networks more efficiently represent a higher order function

    They can achieve the same performance as shallow networks with fewer total parameters

Hierarchical Feature Learning:

    Deep networks learn features at multiple levels of abstraction

    Lower layers learn simple features like edges/textures

    Higher layer combine these into complex concepts (objects, images)

Smoother Loss Surface:

    For reasons we don't fully understand, deep neural networks can find better generalizations because thye are creating/finding more stable minima than the shallow network. This means you can find solutions that are more representative of the variation you see in the test data. 

Decrease in error after parameters exceed data points

    If we have more parameters than the model than data, we have a smoothing or implicit regularization effect that results in better generalization.


#### Activation Functions:

Crucial in neural networks for several key reasons:

1. Introducing Non-linearity
    
    Without activation functions, neural networks would just be linear transformations

    Non-linearity allows networks to learn complex patterns and relationships

2. Feature Transformation

    Transforms inputs into more useful representations

    Maps inputs to different ranges (e.g, probabilities between 0-1)

### Gradient Clipping

Technique used to prevent exploding gradients by limiting the maximum value gradients can take

At each point on a loss surface you may get differing gradient information in different directions, momentum might deal with some, but we need to deal with big gradients with clipping.

How it works:

    1. Calculate gradients normally during backpropagation

    2. Before applying gradients, clip them:

        Clip by value: Cap each individual gradient element to a threshold

        Clip by norm: Scale the entire gradient vector if its norm exceeds a threshold

By combining momentum (for handling varying gradient directions and plateaus) with clipping (for handling extreme gradient magnitudes), we get more stable and effective training across the complex loss landscapes of deep neural networks.

### Backpropogation

Think of it as calculating the derivative of every weight for each layer on the backward pass. It is just applying gradient descent for a neural netwrok, we need a minimum of derivatives for all weights to determine the direction to move.

Once we calculate gradients starting from the output layer and working backwards, we can move the weights in the direction that reduces error. In this case the learning rate controls how big of a step to take.

The end goal is always to minimise the loss function for our model.

### Tricks to make Neural Networks work

Dropout: Different versions of loss-surface

HE Initialisation (Another hyperparameter): When initialising weights we can fix the variance of the random distribution we used for the weights. It solves the issue of weight variance being overexaggerated on the backward pass.

### Convolutional Neural Networks

Specialized neural networks designed primarily for processing grid-like data such as images. Essentially just layers of convolutional (filter) layers and pooling layers repeated and then an MLP (multilayer perceptron) on top before output. The advantage of this is that image data can have too large of an input space (especially if it is too high quality), filter layers can remove redundancies in the input space and reduce the spatial dimensions of our inputs.

Because of sparsity of connections of the convolutional layers, we force localisation by limiting the neurons that the layer in front can learn about (simpler features).

The further you get from the input signal, the harder the model becomes to interpret. Convolutional/Pooling layers easier to interpret than the fully connected MLP that sits on top.

1. Convolutional Layers

        Essentially feature extraction

        Apply filters that scan across the input, detects specific patterns (edges etc.)

        Each convolutional layer typically has many filters (32, 64)

        Each filter creates one feature map in the output

        More filters = more patterns detected

        Share weights across the entire image (parameter efficiency)

        Preserve spatial relationships in the data

2. Pooling Layers

        Image classification is often Massively overparametised because of the size of the input space, pooling layers reduce these parameter redundancy.
        
        In a sense, what we are interested in is very small compared to the whole input space (pixels of an image)

        Reduce spatial dimensions (downsampling)

        Take the original output of CL (featurebank) and downsizing

        Make the network less sensitive to exact positions

        Help with Computational efficiency

        

3. Fully Connected layers

        Usually placed after convolutional/pooling layers

        Process high level features extracted by convolutions

        Make final predictions based on these features

4. Hierarchical Learning

        Each layer builds upon previous layers like so:

        Pixels → Edges → Textures → Patterns → Parts → Objects


### Skip Connections in Neural Networks

#### Basic Structure
- Skip connections are local shortcuts in the network
- They jump over a few layers at a time
- Multiple skip connections exist at different points
- They don't skip to the final output

#### Information Flow
1. Input enters a network block
2. Information splits into two paths:
   - Main path: Through several layers
   - Skip path: Bypasses these layers
3. At the connection point:
   - Main path output and skip path meet
   - Values are added together
   - Combined result feeds into next block
4. Process repeats through network

#### Building Analogy
Think of a building where:
- Regular layers = Taking stairs between floors
- Skip connections = Express elevators
- You don't have one elevator from lobby to penthouse
- Instead, you have multiple express elevators:
  - Ground to Floor 3
  - Floor 3 to Floor 6
  - Floor 6 to Floor 9
  - And so on...

This creates a network with multiple "local bypasses" rather than one big skip to the end. Smoothes and widens the loss surface.

### Time Series Neural Networks & Signal Sampling

#### Nyquist Rate Fundamentals
- Minimum sampling rate required to accurately capture a signal
- Must sample at least 2x the highest frequency component
- Nyquist frequency = highest frequency that can be accurately represented
- Sampling frequency must be > 2 * highest signal frequency

#### Determining Sampling Rate
1. Identify highest frequency component (fmax)
   - Through signal analysis or domain knowledge
   - Consider all important frequencies in your data
   - Account for noise frequencies if relevant

2. Calculate minimum sampling rate
   - Minimum sampling frequency = 2 * fmax
   - In practice, use 2.5x to 4x for safety margin
   - Example: 
     - Signal frequency = 100 Hz
     - Minimum sampling = 200 Hz
     - Recommended sampling ≈ 250-400 Hz

#### Time Between Samples
1. Basic calculation:
   - Time interval = 1/sampling_frequency
   - Example: 400 Hz sampling → 1/400 = 0.0025 seconds between samples

2. Practical considerations:
   - System capabilities (sensors, processing)
   - Storage constraints
   - Real-time requirements
   - Signal variability

#### Time Series NN Architecture Choices
1. Input window size:
   - Must capture relevant temporal patterns
   - Should cover multiple cycles of lowest frequency
   - Consider domain-specific time dependencies

2. Common architectures:
   - RNN/LSTM for sequential data
   - 1D CNN for pattern detection
   - Transformer for long-range dependencies
   - Hybrid models for complex patterns

3. Output considerations:
   - Single-step vs multi-step prediction
   - Classification vs regression
   - Real-time requirements

#### Common Pitfalls
- Undersampling (aliasing)
- Oversampling (computational waste)
- Irregular sampling intervals
- Missing data handling
- Non-stationary signals

#### Best Practices
1. Signal analysis:
   - Perform frequency analysis before deciding rate
   - Consider all frequency components
   - Account for future signal changes

2. Implementation:
   - Use consistent sampling intervals
   - Handle missing data appropriately
   - Monitor for signal drift
   - Validate sampling adequacy

3. Validation:
   - Check reconstruction quality
   - Verify no information loss
   - Test with different conditions
   - Monitor system performance

## Textbook Notes