In CNN, we have two fundamental components, Input and Kernel.

Input in CNN is typically an image or a multidimensional array representing data.
Kernel is basically a small matrix of weights that perform convolution operations on input data.

Pretty sure you've a thought here, What is convolution??
It is a sliding window operation that combines two pieces of information, input and kernel.
This is very fundamental operation in convolution. We perform the same operation between input & kernel that will further produce the output (another matrix).

**Convolution** and **cross-correlation** are both mathematical operations that combine two signals to produce a third signal. They are widely used in signal processing, image processing, machine learning, and other fields. While they are similar, they have key differences in their definitions and applications.

### 1. **Definition**

- **Convolution:**
  - Convolution is an operation that combines two functions (or sequences) to produce a third function (or sequence). It is defined as the integral (or sum, in the discrete case) of the product of the two functions after one is flipped and shifted.
  - **Mathematically (continuous):**
    \[
    (f * g)(t) = \int_{-\infty}^{\infty} f(\tau) \cdot g(t - \tau) \, d\tau
    \]
  - **Mathematically (discrete):**
    \[
    (f * g)[n] = \sum_{m=-\infty}^{\infty} f[m] \cdot g[n - m]
    \]
  - **Key Point:** The second function (or signal) is flipped (time-reversed) before shifting and multiplying.

- **Cross-Correlation:**
  - Cross-correlation measures the similarity between two functions (or sequences) as one is shifted relative to the other.
  - **Mathematically (continuous):**
    \[
    (f \star g)(t) = \int_{-\infty}^{\infty} f(\tau) \cdot g(t + \tau) \, d\tau
    \]
  - **Mathematically (discrete):**
    \[
    (f \star g)[n] = \sum_{m=-\infty}^{\infty} f[m] \cdot g[m + n]
    \]
  - **Key Point:** The second function (or signal) is shifted but not flipped.

### 2. **Key Differences**

- **Flipping:** 
  - In convolution, the second signal is flipped (reversed in time) before being shifted. In cross-correlation, there is no flipping; the signal is only shifted.

- **Order of Operations:**
  - For convolution, the order of the two functions matters because flipping one signal can produce a different result.
  - For cross-correlation, the order typically does not matter, as both signals are treated symmetrically (no flipping).

- **Usage in Convolutional Neural Networks (CNNs):**
  - In CNNs, what is often called "convolution" is technically cross-correlation. The filters are not flipped; they are directly shifted across the input to produce the output.

- **Mathematical Relationship:**
  - Convolution can be seen as cross-correlation with one of the signals flipped. Specifically:
    \[
    (f * g)(t) = (f \star \tilde{g})(t)
    \]
    Where \( \tilde{g}(t) = g(-t) \) is the time-reversed version of \(g(t)\).

### 3. **Applications**

- **Convolution:**
  - Widely used in signal processing (e.g., filtering, linear systems) where the flipping operation is important.
  - Used in differential equation solving and system response analysis.
  - In image processing, convolution is used for operations like blurring, sharpening, edge detection, etc.

- **Cross-Correlation:**
  - Used in signal alignment, time delay estimation, and pattern recognition.
  - In image processing, cross-correlation is used for template matching, where a small template is matched with a larger image to find regions that are similar.

### 4. **Example in Image Processing**

- **Convolution:**
  - Applying a filter (e.g., Gaussian blur) to an image involves convolving the image with the filter kernel, which is flipped before the operation.
  
- **Cross-Correlation:**
  - Matching a small template (e.g., a patch of an image) to different locations in a larger image involves cross-correlating the template with the image to find where the template best fits (no flipping involved).

### 5. **Summary**

- **Convolution** involves flipping one of the functions before shifting and combining, which makes it sensitive to the specific order of operations. It is essential in systems analysis and filtering.
  
- **Cross-correlation** measures the similarity between two signals by shifting one without flipping, making it useful in tasks like signal alignment and pattern recognition.

In practice, especially in machine learning and image processing, the term "convolution" is often used interchangeably with "cross-correlation," even though they are technically different operations.

## so conv(I,K) = I * rot180(K) where * is cross corelation



1. (star) - ⋆ - cross correlation
2. (aestrick) -  * - convoution

So, **What is Real Convolution?**

It is basically performing same operation by rotating the kernel by 180 degrees.

i.e. new kernel matrix will be (rotating the previous matrix by 180 deg)

| 0 | -1 |
| --- | --- |
| 2 | 1 |

We can formulate convolution as :

> conv(I, K) = I  *  rot180(K) OR   I * K = I  * rot180(K), where I*K represent the Convolution!
> 

So, the Convolution between I and K is cross-correlation between I and rotated version of K.

There are multiple ways to perform cross-correlation and hence Convolution.

What we’ve seen above is called VALID Cross-Correlation!  It is basically calculating product by placing the kernel directly onto Input and start sliding when it hits the border of input.

There is another way we can perform this operation called, FULL Cross-Correlation.

In this version, we calculate the product as soon as there is intersection between kernel and input matrix. Obviously, in this case size of output matrix is bigger than previous one. 

One instance is shown here,
![alt text](Untitled.png)

We end this module here. I assume you’ve got a basic understanding of convolution.

Let’s move forward to module 2, it is very interesting .

In [None]:
# convolurion layer takes  in #dimensional block of data as input: W x H x C where C is depth
# the layer has trainsable parameters amongst them kernels
# each kernel has same depth as input mean it entends to full depth of input
# each layer can have multiple kernels :  out_channels=8 mean 8 kernels


In a convolutional layer within a neural network, the depth of the kernels (also known as filters) must match the depth of the input. 

Here's a more detailed explanation:

### Input Depth
- The depth of the input refers to the number of channels in the input data. For example, an RGB image has a depth of 3 because it has three channels: Red, Green, and Blue.

### Kernel (Filter) Depth
- The kernels in a convolutional layer also have a depth dimension. Each kernel is applied to all the channels of the input.

### Matching Depth
- For the convolution operation to be valid, the depth of the kernel must be the same as the depth of the input. This allows the kernel to interact with each channel of the input. If the input has 3 channels, the kernel must also have 3 channels.
- Each channel of the kernel is convolved with the corresponding channel of the input, and the results are summed to produce a single value in the output feature map.

### Example
- If you have an input with a size of \( 32 \times 32 \times 3 \) (e.g., a color image), and you use a kernel of size \( 5 \times 5 \), the kernel's size would actually be \( 5 \times 5 \times 3 \).
- The convolution operation would produce an output of size \( 28 \times 28 \times \text{number of kernels} \), where the number of kernels determines the number of output channels (also known as the depth of the output feature map).

### Why This Requirement?
- The convolution operation computes a weighted sum across the spatial dimensions (height and width) and across the depth (channels). For this to work, the kernel must have the same depth as the input so that it can properly combine information from all channels.

### Summary
- Yes, in a convolutional layer, the depth of the kernels must be the same as the depth of the input to allow for proper convolution across all channels.