<a href="https://colab.research.google.com/github/manjuiitm/ML-Algorithms/blob/main/Transformers_and_Generative_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Content


1.   Introduction to Computer Vision (Day 1)
2.   Introduction to Numpy and Pytorch(AutoGrad) (Day 1)
3.   Attention based Models (Day 1)
4.   Language Transformers and Vision Transformers (Day 2)
5.   AutoEncoders and Diffusion Models (Day 3)

!# Introduction to NumPy

## What is NumPy?

NumPy (Numerical Python) is a powerful open-source library for scientific computing in Python. It provides:

* **High-performance multidimensional arrays:** These are the fundamental data structure in NumPy. They allow efficient storage and manipulation of numerical data.
* **Mathematical functions:** A vast collection of functions for operations on arrays (e.g., linear algebra, Fourier transforms, random number generation).
* **Broadcasting:**  A mechanism that enables arithmetic operations between arrays of different shapes, promoting code conciseness.
* **Tools for integrating with other languages:** NumPy can interface with code written in C/C++ and Fortran, allowing you to leverage existing libraries.

## Why NumPy?

NumPy's arrays offer several advantages over standard Python lists:

* **Compactness:** NumPy arrays use less memory than Python lists, especially when dealing with large datasets.
* **Speed:**  NumPy's vectorized operations (performing operations on entire arrays at once) are significantly faster than iterating over Python lists.
* **Convenience:**  NumPy provides a wide range of built-in functions for mathematical and scientific operations, simplifying your code.
* **Functionality:** NumPy offers advanced features like linear algebra, random number generation, and Fourier transforms, making it essential for scientific computing.

## Installing NumPy

If you haven't already installed NumPy, you can easily do so using the `pip` package manager:

```bash
pip install numpy


In [None]:
!pip install numpy

In [None]:
import numpy as np

(The np alias is a convention to shorten the name for easier use throughout your code.)

# The Core Concept: NumPy Arrays

NumPy arrays (often referred to as `ndarray`, short for "n-dimensional array") are the building blocks of NumPy and the foundation for efficient numerical operations. They are homogeneous collections of elements, all of the same data type, and can be of any dimension.

## Creating NumPy Arrays

There are various ways to create NumPy arrays:

### 1. From Python Lists or Tuples:

In [None]:
# 1D array from a list
arr1d = np.array([1, 2, 3, 4])

# 2D array from a list of lists
arr2d = np.array([[1, 2], [3, 4]])

# Array from a tuple
arr_tuple = np.array((5, 6, 7))

### 2. Using Built-in Functions:

* `arange()`: Similar to Python's `range()`, but returns a NumPy array.
* `linspace()`: Generates evenly spaced numbers within a specified interval.
* `zeros()`: Creates an array filled with zeros.
* `ones()`: Creates an array filled with ones.
* `eye()`: Creates an identity matrix.
* `full()`: Creates an array filled with a specified value.


In [None]:
arr_arange = np.arange(5, 15, 2)  # Start at 5, stop before 15, step by 2
arr_linspace = np.linspace(0, 1, 5,)  # 5 equally spaced points between 0 and 1
arr_zeros = np.zeros((2, 3))  # 2 rows, 3 columns
arr_ones = np.ones((3, 2))  # 3 rows, 2 columns
arr_eye = np.eye(3) # 3x3 identity matrix
arr_full = np.full((2, 2), 7) # 2x2 matrix filled with 7

print("arr_arange:", arr_arange)
print("arr_linspace:", arr_linspace)
print("arr_zeros:\n", arr_zeros)  # \n adds a newline for better formatting
print("arr_ones:\n", arr_ones)
print("arr_eye:\n", arr_eye)
print("arr_full:\n", arr_full)

## Array Attributes

Important attributes of NumPy arrays:

* `shape`: Returns a tuple indicating the dimensions of the array (e.g., `(3, 2)` for a 3x2 array).
* `dtype`: Specifies the data type of the array elements (e.g., `int64`, `float32`, `complex128`).
* `ndim`: Indicates the number of dimensions of the array (e.g., `1` for 1D, `2` for 2D).
* `size`: Total number of elements in the array.

In [None]:
arr1d = np.array([1, 2, 3, 4])
arr2d = np.array([[5, 6], [7, 8]])
arr3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

print("arr1d:")
print("- shape:", arr1d.shape)  # (4,)
print("- dtype:", arr1d.dtype)  # int64 (default)
print("- ndim:", arr1d.ndim)   # 1
print("- size:", arr1d.size)   # 4

print("\narr2d:")
print("- shape:", arr2d.shape)  # (2, 2)
print("- dtype:", arr2d.dtype)  # int64
print("- ndim:", arr2d.ndim)   # 2
print("- size:", arr2d.size)   # 4

print("\narr3d:")
print("- shape:", arr3d.shape)  # (2, 2, 2)
print("- dtype:", arr3d.dtype)  # int64
print("- ndim:", arr3d.ndim)   # 3
print("- size:", arr3d.size)   # 8

## Indexing and Slicing

NumPy arrays provide powerful indexing and slicing mechanisms to access and manipulate their elements.

### Basic Indexing:

* Accessing Individual Elements: Use square brackets `[]` with the element's index (starting from 0).

In [None]:
arr1d = np.array([10, 20, 30, 40])
print(arr1d[0])  # Output: 10
print(arr1d[2])  # Output: 30

arr2d = np.array([[1, 2], [3, 4]])
print(arr2d[0, 1])  # Output: 2 (first row, second column)
print(arr2d[1, 0])  # Output: 3 (second row, first column)


### Slicing:
* Accessing Ranges of Elements: Use a colon : to specify the start (inclusive) and end (exclusive) indices of the slice.

In [None]:
print(arr1d[0:3:2])  # Output: [20 30]
print(arr2d[0, :])  # Output: [1 2] (first row, all columns)
print(arr2d[:, 1])  # Output: [2 4] (all rows, second column)

#### Stride:
An optional third parameter in a slice can specify the step size between elements.

In [None]:
print(arr1d[::2])  # Output: [10 30] (every other element)
print(arr2d[::-1, ::-1]) # Output: [[4 3] [2 1]] (reversed rows and columns)

# Boolean Indexing

## Array Broadcasting

Broadcasting is a powerful mechanism in NumPy that enables arithmetic operations between arrays of different shapes. It's a way to "stretch" or "replicate" smaller arrays to match the shape of larger arrays, avoiding the need for explicit loops or resizing operations. This leads to concise and efficient code.

### How Broadcasting Works:

1. **Shape Compatibility:**  NumPy compares the shapes of the arrays involved in the operation, starting from the trailing dimensions (the rightmost ones).
2. **Matching or 1:**  Two dimensions are compatible when they are equal or one of them is 1.
3. **Expansion:**  If a dimension is 1, NumPy expands (or "broadcasts") it to match the corresponding dimension of the other array.
4. **Element-wise Operation:** Once the shapes are compatible, the arithmetic operation is performed element-wise.


In [None]:
arr1 = np.array([[1, 2,3], [3, 4,3]])  # Shape: (2, 2)
arr2 = np.array([10, 20])  # Shape: (2,)
print(arr2.ndim)
print(arr1.ndim)

result = arr1 * arr2
print(result)


### Rules of Broadcasting:

Broadcasting follows a set of rules to determine if two arrays are compatible for operations:

1. **Dimension Matching:** If the arrays have different numbers of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side until both arrays have the same number of dimensions.

2. **Element-wise Compatibility:** The arrays are compatible in a dimension if they have the same size in that dimension, or if one of the arrays has size 1 in that dimension.

3. **Expansion:** If one array has a dimension of size 1 and the other array has a dimension of size greater than 1, the array with size 1 is "broadcasted" (stretched) to match the size of the other array.

### When Broadcasting Fails:

Broadcasting fails if:

* **Incompatible Dimensions:** The trailing dimensions (from the rightmost) of the arrays are not equal, and neither of them is 1.
* **Dimension Mismatch:**  After padding with ones, the arrays still don't have the same number of dimensions.

### Benefits of Broadcasting:
* **Conciseness**: Broadcasting eliminates the need for explicit loops or manual resizing of arrays, resulting in more concise and readable code.
* **Efficiency**: NumPy's implementation of broadcasting is highly optimized, leading to efficient computations, especially with large arrays.
* **Flexibility**: Broadcasting allows for seamless operations between arrays of different shapes, making it versatile for various scientific computing tasks.
* **Vectorization**: Broadcasting leverages NumPy's vectorization capabilities, performing operations on entire arrays at once, leading to faster execution than element-by-element operations in Python loops.

# Working with Multi-Dimensional Arrays

While one-dimensional arrays (vectors) are useful, NumPy truly shines when dealing with multi-dimensional arrays. This section will cover creating, indexing, slicing, and performing operations on these arrays.

## Creating Multi-Dimensional Arrays

We've already seen how to create 2D arrays from lists of lists:


In [None]:
arr2d = np.array([[1, 2, 3], [4, 5, 6]])  # 2x3 array

arr3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])  # 2x2x2 array


arr2d_zeros = np.zeros((3, 5))   # 3x5 array filled with zeros
arr3d_ones = np.ones((2, 3, 4))  # 2x3x4 array filled with ones

# Indexing and Slicing
arr2d = np.array([[1, 2, 3], [4, 5, 6]])  # 2x3 array
print("2D array:\n",arr2d)

arr3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])  # 2x2x2 array
print("\n3D array:\n",arr3d)

In [None]:
print(arr2d[0, 2])  # Output: 3 (first row, third column)
print(arr3d[1, 0, 1])  # Output: 6 (second block, first row, second column)
print(arr2d[:, 1])  # Output: [2 5] (all rows, second column)
print(arr3d[0, ...])  # Output: [[1 2] [3 4]] (first block, all rows and columns)

### Choosing from Arrays:

NumPy provides convenient ways to randomly select elements from existing arrays:

* `choice()`:  Randomly selects elements from a given array. You can choose whether to sample with or without replacement (i.e., whether an element can be selected multiple times).



In [None]:
choices = np.array(['A', 'B', 'C', 'D'])
random_choice = np.random.choice(choices, 3, replace=False)  # 3 samples without replacement
print(random_choice)  # Output: e.g., ['B' 'D' 'A']

random_choice_with_replacement = np.random.choice(choices, 5, replace=True)
print(random_choice_with_replacement)  # Output: e.g., ['D' 'A' 'D' 'B' 'C']

## Linear Algebra

NumPy's `linalg` module provides a powerful suite of functions for linear algebra operations. These functions are essential for solving systems of linear equations, performing matrix factorizations, calculating eigenvalues and eigenvectors, and much more.

### Basic Linear Algebra Operations:

* `linalg.det()`: Calculates the determinant of a square matrix. The determinant is a scalar value that summarizes important properties of the matrix.



# Image as a Matrix

In [None]:
from matplotlib import pyplot as plt
import numpy as np
import skimage
from skimage import io

**Read 2 images from URL using skimage**.<br>
First image is RGB image and second one is the grayscale version

In [None]:
img = io.imread('https://iith.ac.in/assets/images/towers/tower2.jpg')
img_gray = io.imread('https://iith.ac.in/assets/images/towers/tower2.jpg',as_gray = True)

In [None]:
plt.imshow(img_gray,)

In [None]:
plt.imshow(img)

**Print image as a matrix**

In [None]:
img

In [None]:
type(img)

**Shape of image**<br>
Since its an RGB Image, it has 3 channels

In [None]:
img.shape

**Plot the RGB channels seperately**<br>
Remember each channel takes values between 0 and 255 and has the same height and width, so to visualize these channels it is essential that we choose the appropiate color map for the respective channel.

In [None]:
fig, axes = plt.subplots(1, 3,figsize=(15,15))
axes[0].imshow(img[:,:,0],cmap=plt.cm.Reds_r)
axes[1].imshow(img[:,:,1] ,cmap=plt.cm.Blues_r)
axes[2].imshow(img[:,:,2],cmap=plt.cm.Greens_r)

# RGB Pixel 0,0,200, 0,255,0, 0,0,255

**Plot the grayscale image.**<br>
Remember to use the appropiate color map

In [None]:
plt.imshow(img_gray,cmap = 'gray')

**Print grayscale image as a matrix**<br>
Here, the values are normalized between 0 and 1, which is done by skimage while converting RGB image to grayscale. We can always renormalize the values between 0 and 255.

Print shape of grayscale image
Here, there are only 2 dimensions since the third dimesion for Grayscale Image is 1 as opposed to RGB Image which is 3, and is not really required.

In [None]:
img_gray.shape

# Image as a Function

In [None]:
from matplotlib import pyplot as plt
import numpy as np
import skimage
from mpl_toolkits import mplot3d

In [None]:
# **Get image from skimage.data**<br>
# skimage.data has a set of saved images for our utiity.

In [None]:
img = skimage.data.horse()

**Plot the image of a horse**

In [None]:
plt.imshow(img,cmap='gray')

**Print image as a matrix**<br>
Here we notice that the Numpy ndarry is filled with True and False instead of numbers. This is because we are using an binary image that has only 2 values 0 and 1. Storing the values as Boolean instead of int is better in terms of storage for binary images.

In [None]:
img

In [None]:
# f: (x,y) -> intesity

**Plot the image as a function**

In computer vision (CV), an "image as a function" concept means that an image is treated mathematically as a function where each pixel represents a unique coordinate with a corresponding intensity value, allowing for manipulation and analysis of the image using mathematical operations applied to this functional representation.



*   Pixel as a coordinate: Each pixel in an image is considered a point in a 2D grid, with its x and y coordinates defining its position within the image.
Intensity value as output:
*  The value of the function at a specific pixel coordinate represents the intensity level (brightness) of that pixel, usually ranging from 0 (black) to 255 (white) for 8-bit grayscale images.




By treating an image as a function, various image processing operations like filtering, edge detection, and transformations can be implemented using mathematical operations like derivatives, convolutions, and matrix manipulations.



Algorithmic efficiency:
This approach allows for efficient computational processing of images because algorithms can operate on pixel values directly using mathematical functions.


# Laplacian and why it cannot be performed on an image matrix
"""
The Laplacian of an image is a second-order derivative operator that measures the rate of change of gradients.
It is used to detect regions of rapid intensity change, such as edges.

When considering an image as a function, the Laplacian can be computed analytically by taking the second derivatives of the function.
However, when treating an image as a matrix, we only have discrete pixel values, making it impossible to compute true second derivatives.
Instead, we approximate it using discrete convolution filters, such as the Laplacian kernel.
"""

In [None]:
fig = plt.figure(figsize=(10,10))
ax = plt.axes(projection='3d')

def f(x,y):
  return img[x,y]

x = np.arange(328)
y = np.arange(400)

X, Y = np.meshgrid(x, y)
Z = f(X, Y)

ax.plot_wireframe(X, Y, Z)

# Image Transformations

In [None]:
img = skimage.data.camera()

In [None]:
fig, axes = plt.subplots(1, 4,figsize=(15,15))
axes[0].imshow(img,cmap='gray')
axes[1].imshow(img + 40,cmap='gray')
axes[2].imshow(img[::-1],cmap='gray')
axes[3].imshow(img[:,::-1],cmap='gray')
axes[0].title.set_text('Original Image')
axes[1].title.set_text('Image with increased Intensity')
axes[2].title.set_text('Flip Rows')
axes[3].title.set_text('Flip Columns')

In [None]:
img = skimage.data.camera()

# Max  and Min Value of an 8 bit image
IMAX = 255
IMIN = 0


# Reversing the contrast
img_2 = IMAX - img + IMIN

fig, axes = plt.subplots(1, 2,figsize=(10,10))
axes[0].imshow(img,cmap='gray')
axes[1].imshow(img_2,cmap='gray')


# Cross- Correlation vs Convolution (using Impulse signal)
Cross- Correlation vs Convolution (using Impulse signal)

In [None]:
# 0 0 0  0
# 0 1 2
# 0 4 5


In [None]:
import numpy as np
import matplotlib.pyplot as plt

**Create a black image with a single white pixel in the middle**

In [None]:
img = np.zeros((7,7))
img[3,3] = 5
img



In [None]:
plt.imshow(img,cmap='gray')

**Create a filter that goes from black to white using np.linspace**

In [None]:
filter_ = np.linspace(0,1,9).reshape(3,3)
filter_

In [None]:
flipped_filter  = np.array([[1, 0.875, 0.75],[0.625, 0.5, 0.375],[0.25, 0.125, 0]])

plt.imshow(flipped_filter,cmap="gray")

In [None]:
plt.imshow(filter_,cmap='gray')

## Cross Correlation

**Store the filter size and compute value of k** <br>
You can obtain k from the following equation $2*k+1=filter\_size$

In [None]:
filter_size = filter_.shape[0]
k = int((filter_size - 1)/2)

In [None]:
corr_out = []

In [None]:
for i in range(k,img.shape[0]-k):
  temp = []
  for j in range(k,img.shape[1]-k):
    mat = img[i-k:i+k+1,j-k:j+k+1]
    temp.append(np.sum(filter_ * mat))
  corr_out.append(temp)

In [None]:
corr_out = np.array(corr_out)
corr_out.shape

In [None]:
corr_out

In [None]:
plt.imshow(corr_out,cmap='gray')
#You will notice that the output is flipped

## Convolution

In [None]:
# Convolution is just like correlation, except that we flip over the filter before correlating.
# The key difference between the two is that convolution is associative.
#  It is very convenient to have convolution be associative.
# Suppose, for example, we want to smooth an image and then take its derivative. We
# could do this by convolving the image with a Gaussian filter, and then convolving it with
# a derivative filter. But we could alternatively convolve the derivative filter with the
# Gaussian to produce a filter called a Difference of Gaussian (DOG), and then convolve
# this with our image. The nice thing about this is that the DOG filter can be precomputed,
# and we only have to convolve one filter with our image.

# In general, people use convolution for image processing operations such as smoothing,
# and they use correlation to match a template to an image. Then, we don’t mind that
# correlation isn’t associative, because it doesn’t really make sense to combine two
# templates into one with correlation, whereas we might often want to combine two filter
# together for convolution
conv_out = []
for i in range(k,img.shape[0]-k):
  temp = []
  for j in range(k,img.shape[1]-k):
    mat = img[i-k:i+k+1,j-k:j+k+1][::-1,::-1]    # You can also use np.flip
    temp.append(np.sum(filter_ * mat))
  conv_out.append(temp)

conv_out = np.array(conv_out)
conv_out.shape

In [None]:
conv_out

In [None]:
# You will notice that output is not flipped for convolution

plt.imshow(conv_out,cmap='gray')

# Basics of Pytorch

PyTorch is a Python-based scientific computing package targeted at two sets of
audiences:

-  A replacement for NumPy optimized for the power of GPUs
-  A deep learning platform that provides significant flexibility
   and speed

At its core, PyTorch provides a few key features:

- A multidimensional [Tensor](https://pytorch.org/docs/stable/tensors.html) object, similar to [NumPy Array](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) but with GPU acceleration.
- An optimized **autograd** engine for automatically computing derivatives.
- A clean, modular API for building and deploying **deep learning models**.

You can find more information about PyTorch in the Appendix.

In [None]:
import torch

In [None]:
# We can construct a tensor directly from some common python iterables,
# such as list and tuple nested iterables can also be handled as long as the
# dimensions are compatible

# tensor from a list
a = torch.tensor([0, 1, 2])

#tensor from a tuple of tuples
b = ((1.0, 1.1), (1.2, 1.3))
b = torch.tensor(b)

# tensor from a numpy array
c = np.ones([2, 3])
c = torch.tensor(c)

print(f"Tensor a: {a}")
print(f"Tensor b: {b}")
print(f"Tensor c: {c}")

### Manipulating Tensors in Pytorch

**Indexing**

Just as in numpy, elements in a tensor can be accessed by index. As in any numpy array, the first element has index 0 and ranges are specified to include the first to last_element-1. We can access elements according to their relative position to the end of the list by using negative indices. Indexing is also referred to as slicing.

For example, `[-1]` selects the last element; `[1:3]` selects the second and the third elements, and `[:-2]` will select all elements excluding the last and second-to-last elements.

In [None]:
x = torch.arange(0, 10)
print(x)
print(x[-1])
print(x[1:3])
print(x[:-2])

In [None]:
# make a 5D tensor
x = torch.rand(1, 2, 3, 4, 5)
print(x.shape)
print(f" shape of x[0]:{x[0].shape}")
print(f" shape of x[0][0]:{x[0][0].shape}")
print(f" shape of x[0][0][0]:{x[0][0][0].shape}")

In [None]:
# By default, when we create a tensor it will *not* live on the GPU!

x = torch.randn(10)
print(x.device)

# When using Colab notebooks, by default, will not have access to a GPU. In order to start using GPUs we need to request one. We can do this by going to the runtime tab at the top of the page.

# By following *Runtime* → *Change runtime type* and selecting **GPU** from the *Hardware Accelerator* dropdown list, we can start playing with sending tensors to GPUs.

# Once you have done this your runtime will restart and you will need to rerun the first setup cell to reimport PyTorch. Then proceed to the next cell.


In [None]:
print(torch.cuda.is_available())


[CUDA](https://developer.nvidia.com/cuda-toolkit) is an API developed by Nvidia for interfacing with GPUs. PyTorch provides us with a layer of abstraction, and allows us to launch CUDA kernels using pure Python.

In short, we get the power of parallelizing our tensor computations on GPUs, whilst only writing (relatively) simple Python!

Here, we define the function `set_device`, which returns the device use in the notebook, i.e., `cpu` or `cuda`. Unless otherwise specified, we use this function on top of every tutorial, and we store the device variable such as

```python
DEVICE = set_device()
```

Let's define the function using the PyTorch package `torch.cuda`, which is lazily initialized, so we can always import it, and use `is_available()` to determine if our system supports CUDA.

In [None]:
# common device agnostic way of writing code that can run on cpu OR gpu
# that we provide for you in each of the tutorials
DEVICE = "cuda"

# we can specify a device when we first create our tensor
x = torch.randn(2, 2, device=DEVICE)
print(x.dtype)
print(x.device)

# we can also use the .to() method to change the device a tensor lives on
y = torch.randn(2, 2)
print(f"y before calling to() | device: {y.device} | dtype: {y.type()}")

y = y.to(DEVICE)
print(f"y after calling to() | device: {y.device} | dtype: {y.type()}")

In [None]:
x = torch.tensor([0, 1, 2], device=DEVICE)
y = torch.tensor([3, 4, 5], device="cpu")
z = torch.tensor([6, 7, 8], device=DEVICE)

# moving to cpu
# x = x.to("cpu")  # alternatively, you can use x = x.cpu()
print(x + y)

# moving to gpu
y = y.to(DEVICE)  # alternatively, you can use y = y.cuda()
print(y + z)