## Introduction to Basic Pytorch 

Before diving deep in pytorch lightning let us focus on Pytorch as Lightning just enables us to wrap pytorch code to train and deploy Pytorch at scale. Hence in this tutorial let us focus in learning some survival skills. Such as,

1. Data Manipulation and processing 
2. Linear Algebra 
3. Differentiation

### Data Manipulation 

In order to train neural networks we need to ingest data and manipulate data. Let us get our hands dirty with n-dimensional data. As, most of the manipulation on n-dimensional arrays such as tensors will be much similar to the popular scientific computing library such as numpy this section of the chapter will be breeze for most of our audience. Only two differences are that one the tensor class supports automatic differentiation which is crucial for neural nets and two it leverages GPU to accelerate the computation, where as numpy only supports CPU's. Let us go ahead and start our tutorials on subsections of Data Manipulation. 

Note: Please refer to Detailed documentation of [pytorch](https://docs.pytorch.org/docs/stable/torch.html) for more details. 

#### Tensor Creation 

In [8]:
import torch 

#  creating a tensor whuch is evenly spaced between 0 and 7
x = torch.arange(start=0, end = 8, step = 1)

print("Tensor x:", x)

print("Shape of x:", x.shape)

print("Data type of x:", x.dtype)

print("Type of x:", type(x))

# creating tensor with equal spacing between 0 and 1
y = torch.linspace(start =0, end=1, steps=20, dtype=torch.float32)
# linspace auto computes tuthe number of steps based on the start and stop values
print("Tensor y:", y)
print("Shape of y:", y.shape)
print("Data type of y:", y.dtype)
print("Type of y:", type(y))

# Create a tensor filled with zeros
zeros_tensor = torch.zeros(3, 3)
print("Zeros Tensor:\n", zeros_tensor)

# Create a tensor filled with ones
ones_tensor = torch.ones(3, 3)
print("Ones Tensor:\n", ones_tensor)

# Create a tensor with random values
random_tensor = torch.rand(3, 3)
print("Random Tensor:\n", random_tensor)

# Create a tensor with specified values
tensor_from_list = torch.tensor([[1, 2], [3, 4]])
print("Tensor from List:\n", tensor_from_list)

Tensor x: tensor([0, 1, 2, 3, 4, 5, 6, 7])
Shape of x: torch.Size([8])
Data type of x: torch.int64
Type of x: <class 'torch.Tensor'>
Tensor y: tensor([0.0000, 0.0526, 0.1053, 0.1579, 0.2105, 0.2632, 0.3158, 0.3684, 0.4211,
        0.4737, 0.5263, 0.5789, 0.6316, 0.6842, 0.7368, 0.7895, 0.8421, 0.8947,
        0.9474, 1.0000])
Shape of y: torch.Size([20])
Data type of y: torch.float32
Type of y: <class 'torch.Tensor'>
Zeros Tensor:
 tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
Ones Tensor:
 tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
Random Tensor:
 tensor([[0.4276, 0.5639, 0.3950],
        [0.7499, 0.3426, 0.2318],
        [0.4370, 0.9344, 0.6116]])
Tensor from List:
 tensor([[1, 2],
        [3, 4]])


#### What are Key Tensor Attributes

1. **`.shape`** or **`.size()`**:
   - **Description**: Returns the shape of the tensor as a `torch.Size` object, which is a subclass of a tuple.
   - **Example**:
     ```python
     tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])
     print("Shape:", tensor.shape)
     print("Size:", tensor.size())
     ```

2. **`.dtype`**:
   - **Description**: Returns the data type of the tensor elements.
   - **Example**:
     ```python
     tensor = torch.tensor([1, 2, 3], dtype=torch.float32)
     print("Data Type:", tensor.dtype)
     ```

3. **`.device`**:
   - **Description**: Indicates the device (CPU or GPU ) on which the tensor is allocated.
   - **Example**:
     ```python
     tensor = torch.tensor([1, 2, 3])
     print("Device (CPU):", tensor.device)

     tensor_gpu = torch.tensor([1, 2, 3], device='cuda')
     print("Device (GPU):", tensor_gpu.device)
      # for GPU allocation on M1, M2, M3 
     if torch.backends.mps.is_available():
      tensor_mps = torch.tensor([1, 2, 3], device='mps')
      print("Device (MPS):", tensor_mps.device)
     ```

4. **`.requires_grad`**:
   - **Description**: Indicates whether the tensor requires gradient computation. This is essential for training neural networks.
   - **Example**:
     ```python
     tensor = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
     print("Requires Grad:", tensor.requires_grad)
     ```

5. **`.grad`**:
   - **Description**: Holds the gradient of the tensor after backpropagation. This attribute is `None` for tensors that do not require gradients or before gradients are computed.
   - **Example**:
     ```python
     tensor = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
     result = tensor * 2
     result.sum().backward()
     print("Gradient:", tensor.grad)
     ```

6. **`.is_leaf`**:
   - **Description**: Indicates whether the tensor is a leaf tensor. A leaf tensor is a tensor that is created by the user and not the result of an operation involving other tensors.
   - **Example**:
     ```python
     a = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
     b = a * 2
     print("Is Leaf (a):", a.is_leaf)  # True
     print("Is Leaf (b):", b.is_leaf)  # False
     ```

7. **`.is_cuda`**:
   - **Description**: Indicates whether the tensor is stored on the GPU.
   - **Example**:
     ```python
     tensor_cpu = torch.tensor([1, 2, 3])
     tensor_gpu = tensor_cpu.to('cuda')
     print("Is CUDA (CPU Tensor):", tensor_cpu.is_cuda)
     print("Is CUDA (GPU Tensor):", tensor_gpu.is_cuda)
     ```

8. **`.numel()`**:
   - **Description**: Returns the total number of elements in the tensor.
   - **Example**:
     ```python
     tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])
     print("Number of Elements:", tensor.numel())
     ```

9. **`.ndimension()`**:
   - **Description**: Returns the number of dimensions of the tensor.
   - **Example**:
     ```python
     tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])
     print("Number of Dimensions:", tensor.ndimension())
     ```

10. **`.stride()`**:
    - **Description**: Returns the step in each dimension when traversing the tensor.
    - **Example**:
      ```python
      tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])
      print("Stride:", tensor.stride())
      ```

**Important Notes**

- **`.requires_grad=True`** is essential for enabling **automatic differentiation**, especially during training with optimizers like `torch.optim.SGD`. Without it, `.backward()` won’t track gradients.

- **Specifying `device='cuda'` or `device='mps'`** ensures tensors are allocated on the **GPU (NVIDIA)** or **Apple Silicon GPU**, which can **drastically speed up training**. Mismatched device usage (e.g., mixing CPU and GPU tensors) will raise errors.

- **Use `.to(device)`** or `.cuda()` / `.mps()` to move models and data to the appropriate computation device before training:
  ```python
  model.to(device)
  input_tensor = input_tensor.to(device)

#### Indexing and slicing 

In [12]:
# Retrieve an element from the tensor
print(random_tensor[1])
# retreive the last element of the tensor
print(random_tensor[-1, -1])

# Retrieve a slice of the tensor
print(random_tensor[0:2, 0:2])  # First two rows and first two columns
# Retrieve a specific row
print(random_tensor[1, :])  # Second row
# Retrieve a specific column
print(random_tensor[:, 1])  # Second column

tensor([0.7499, 0.3426, 0.2318])
tensor(0.6116)
tensor([[0.4276, 0.5639],
        [0.7499, 0.3426]])
tensor([0.7499, 0.3426, 0.2318])
tensor([0.5639, 0.3426, 0.9344])


#### Basic Operations 

Now that we know how to create tensors, let us explore to perform some scalar operations on each element of tensor

In [16]:
print("Tensor from List:\n", tensor_from_list)
# Perform basic operations
print("Addition:\n", tensor_from_list + 2)  # Add 2 to each element
print("Subtraction:\n", tensor_from_list - 1)  # Subtract 1 from each element
print("Multiplication:\n", tensor_from_list * 3)  # Multiply each element by 3
print("Division:\n", tensor_from_list / 2)  # Divide each element by 2  
print("Square:\n", tensor_from_list ** 2)  # Square each element
print("Apply exp for each element", tensor_from_list.exp())

Tensor from List:
 tensor([[1, 2],
        [3, 4]])
Addition:
 tensor([[3, 4],
        [5, 6]])
Subtraction:
 tensor([[0, 1],
        [2, 3]])
Multiplication:
 tensor([[ 3,  6],
        [ 9, 12]])
Division:
 tensor([[0.5000, 1.0000],
        [1.5000, 2.0000]])
Square:
 tensor([[ 1,  4],
        [ 9, 16]])
Apply exp for each element tensor([[ 2.7183,  7.3891],
        [20.0855, 54.5982]])


In [None]:
# Reshape a tensor
reshaped_tensor = random_tensor.view(1, 9)
print("Reshaped Tensor:\n", reshaped_tensor)
# Reshape a tensor to a different shape
reshaped_tensor_2 = random_tensor.reshape(9, 1)
print("Reshaped Tensor (9, 1):\n", reshaped_tensor_2)
# difference between view and reshape is that view returns a new tensor with the same data but different shape, 
# while reshape returns a new tensor with the same data but different shape and may copy the data if necessary.
# Flatten a tensor
flattened_tensor = random_tensor.flatten()
print("Flattened Tensor:\n", flattened_tensor)

# Transpose a tensor
transposed_tensor = random_tensor.t()
print("Transposed Tensor:\n", transposed_tensor)

a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
# Concatenate tensors
concatenated_tensor = torch.cat((a, b), dim=0)
print("Concatenated Tensor:\n", concatenated_tensor)

# example of verticle stacking 
verticle_stack= torch.stack([a,b],axis =1)
print("Verticle Stacked Tensor:\n", verticle_stack)
# example of splitting 
split_tensor = torch.split(tensor=random_tensor, split_size_or_sections=3, dim=0)
print("Split Tensor:\n", split_tensor)

# example of chunking 

chunk_tensor = random_tensor.chunk(chunks=3, dim=1)
print("Chunked Tensor:\n", chunk_tensor)

# difference between split and chunk is that split divides the tensor into equal parts,
# while chunk divides the tensor into specified number of parts where they need not be equal. 

Reshaped Tensor:
 tensor([[0.4276, 0.5639, 0.3950, 0.7499, 0.3426, 0.2318, 0.4370, 0.9344, 0.6116]])
Reshaped Tensor (9, 1):
 tensor([[0.4276],
        [0.5639],
        [0.3950],
        [0.7499],
        [0.3426],
        [0.2318],
        [0.4370],
        [0.9344],
        [0.6116]])
Flattened Tensor:
 tensor([0.4276, 0.5639, 0.3950, 0.7499, 0.3426, 0.2318, 0.4370, 0.9344, 0.6116])
Transposed Tensor:
 tensor([[0.4276, 0.7499, 0.4370],
        [0.5639, 0.3426, 0.9344],
        [0.3950, 0.2318, 0.6116]])
Concatenated Tensor:
 tensor([1, 2, 3, 4, 5, 6])
Verticle Stacked Tensor:
 tensor([[1, 4],
        [2, 5],
        [3, 6]])
Split Tensor:
 (tensor([[0.4276, 0.5639, 0.3950],
        [0.7499, 0.3426, 0.2318],
        [0.4370, 0.9344, 0.6116]]),)
Chunked Tensor:
 (tensor([[0.4276],
        [0.7499],
        [0.4370]]), tensor([[0.5639],
        [0.3426],
        [0.9344]]), tensor([[0.3950],
        [0.2318],
        [0.6116]]))


In [26]:
# Element-wise comparison
a = torch.tensor([1, 2, 3])
b = torch.tensor([3, 2, 1])
comparison = torch.eq(a, b)
print("Element-wise Comparison (Equal):\n", comparison)

# Element-wise greater than comparison
greater_than = torch.gt(a, b)
print("Element-wise Comparison (Greater than):\n", greater_than)

# Using torch.where to select elements based on condition
condition = a > b
selected_elements = torch.where(condition, a, b)
print(a,b,selected_elements)
print("Selected Elements (using where):\n", selected_elements)

# conditional tensors 
# Check if all elements satisfy a condition
all_elements_greater = torch.all(a > 0)
print("All elements are greater than 0:\n", all_elements_greater)

# Check if any element satisfies a condition
any_element_greater = torch.any(a > 2)
print("Any element is greater than 2:\n", any_element_greater)

# Masking tensor elements based on condition
masked_tensor = torch.masked_select(a, a > 1)
print("Masked Tensor (elements > 1):\n", masked_tensor)

# Replacing elements based on condition
replaced_tensor = torch.where(a > 2, torch.tensor(-1), a)
print("Replaced Tensor (elements > 2 replaced with -1):\n", replaced_tensor)

Element-wise Comparison (Equal):
 tensor([False,  True, False])
Element-wise Comparison (Greater than):
 tensor([False, False,  True])
tensor([1, 2, 3]) tensor([3, 2, 1]) tensor([3, 2, 3])
Selected Elements (using where):
 tensor([3, 2, 3])
All elements are greater than 0:
 tensor(True)
Any element is greater than 2:
 tensor(True)
Masked Tensor (elements > 1):
 tensor([2, 3])
Replaced Tensor (elements > 2 replaced with -1):
 tensor([ 1,  2, -1])


#### Broadcasting 

We already understand how element wise binary operations are performed on two tensors of same shape. In some circumstances we can still perform element wise binary operations by invoking a boradcasting mechanism. It performs following two steps. 

1. Exand one or both the arrays by copying the element along axes to make both the arrays to same dimensions, it is a view and not a copy 

2.  It performs element wise operations let us see some examples below

In [21]:
A = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])       # Shape: (2, 3)

B = torch.tensor([[10],
                  [100]])           # Shape: (2, 1)

C = A * B

print("Element-wise multiplication result:\n", C)
# Broadcasting allows us to perform operations on tensors of different shapes.
# In this case, B is broadcasted to match the shape of A during multiplication.
# Broadcasting works by expanding the smaller tensor along the dimensions of the larger tensor. 
# Example of broadcasting
A = torch.tensor([[10],
                  [20],
                  [30]])  # Shape: (3, 1)
B = torch.tensor([[1, 2]])  # Shape: (1, 2)

C = A + B
print("Broadcasting result:\n", C)
# Broadcasting allows us to perform operations on tensors of different shapes.
# In this case, Both tensors are broadcasted to a common shape of (3, 2) during addition.
# Broadcasting works by expanding the smaller tensor along the dimensions of the larger tensor.


Element-wise multiplication result:
 tensor([[ 10,  20,  30],
        [400, 500, 600]])
Broadcasting result:
 tensor([[11, 12],
        [21, 22],
        [31, 32]])


#### Saving Memory 
Running operations can cause new memory to be allocated to host results 


In [23]:
X = torch.tensor([1,2,3])
Y = torch.tensor([4,5,6])
before = id(Y)
Y = Y + X
after = id(Y)
print("Before operation, Y ID:", before)
print("After operation, Y ID:", after)

Before operation, Y ID: 4378512944
After operation, Y ID: 5754938064


Above example specification is sometimes undesirable in machine learning as some times we want to update thousands of parameters which can duplicate the memory. Fortunately we can update the parameters using inplace operations without duplicating the memory such as 

In [24]:
before = id(Y)
Y[:] = Y + X
after = id(Y)

print("Before inplace operation, Y ID:", before)
print("After inplace operation, Y ID:", after)

Before inplace operation, Y ID: 5754938064
After inplace operation, Y ID: 5754938064


#### Conversion to python objects 

In [25]:
# conversion to python objects
# We can convert a tensor to a Python object using the .item() method for single-element tensors or the .tolist() method for multi-element tensors.
single_element_tensor = torch.tensor(42)
multi_element_tensor = torch.tensor([1, 2, 3, 4]) 

# Convert single-element tensor to Python object
single_element_value = single_element_tensor.item()
print("Single Element Tensor to Python Object:", single_element_value, float(single_element_value), int(single_element_value))
# Convert multi-element tensor to Python list
multi_element_list = multi_element_tensor.tolist()
print("Multi Element Tensor to Python List:", multi_element_list, list(multi_element_list), tuple(multi_element_list))
# Convert multi-element tensor to numpy array
multi_element_numpy = multi_element_tensor.numpy()
print("Multi Element Tensor to Numpy Array:", multi_element_numpy, type(multi_element_numpy))


Single Element Tensor to Python Object: 42 42.0 42
Multi Element Tensor to Python List: [1, 2, 3, 4] [1, 2, 3, 4] (1, 2, 3, 4)
Multi Element Tensor to Numpy Array: [1 2 3 4] <class 'numpy.ndarray'>


### Linear Algebra 

We have already seen lot of vector and scalar multiplications above such as manipulating one value is scalar manipulation and manipulation a group of scalars is vector manipulation, which is a fixed length of array. 

Just as scalars which are 0th order tensors, vectors are 1st order tensors, matrics are 2nd order tensors. 

In [27]:
# some examples of matrix operations are 

A = torch.arange(1, 10).reshape(3, 3)

print("Matrix A:\n", A)
A =A.T

print("Transposed Matrix A:\n", A)

Matrix A:
 tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Transposed Matrix A:
 tensor([[1, 4, 7],
        [2, 5, 8],
        [3, 6, 9]])


#### Tensors 

Basically tensors with dimensions that are greater than 2nd order is called a Tensor, This is a generic way of describing the nth order arrays. These are more essential when we start working with images as they are generally 3rd order tensors. 

General Notation of a tensor is $X_{i,j,k}$ Where i,j,k denote the index of a element in each dimension specifically pointing to the spatial dimension. 

In [28]:
torch.arange(36).reshape(3, 4, 3)

tensor([[[ 0,  1,  2],
         [ 3,  4,  5],
         [ 6,  7,  8],
         [ 9, 10, 11]],

        [[12, 13, 14],
         [15, 16, 17],
         [18, 19, 20],
         [21, 22, 23]],

        [[24, 25, 26],
         [27, 28, 29],
         [30, 31, 32],
         [33, 34, 35]]])

#### Reduction 



In [None]:
x = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])  # Shape: (2 rows, 3 columns)

# 1. Sum
print("Sum of all elements:", torch.sum(x))  # → scalar
print("Sum along dim=0 (row wise reduction):", torch.sum(x, dim=0))  
print("Sum along dim=1 (column wise reduction):", torch.sum(x, dim=1))     

# 2. Mean
print("Mean of all elements:", torch.mean(x.float()))
print("Mean along dim=1 (mean per row):", torch.mean(x.float(), dim=1))

# 3. Max and Min
max_vals, max_indices = torch.max(x, dim=1)  # max of each row (dim=1)
print("Max values per row:", max_vals)
print("Max indices per row:", max_indices)

min_vals, min_indices = torch.min(x, dim=0)  # min of each column (dim=0)
print("Min values per column:", min_vals)
print("Min indices per column:", min_indices)

# 4. Argmax and Argmin
print("Argmax along dim=0 (per column):", torch.argmax(x, dim=0))
print("Argmin along dim=1 (per row):", torch.argmin(x, dim=1))

# 5. Product
y = torch.tensor([1, 2, 3, 4])
print("Product of elements:", torch.prod(y))  # → scalar 24

# 6. Standard Deviation and Variance
z = torch.tensor([1.0, 2.0, 3.0, 4.0])
print("Standard Deviation:", torch.std(z))
print("Variance:", torch.var(z))

Sum of all elements: tensor(21)
Sum along dim=0 (column-wise reduction): tensor([5, 7, 9])
Sum along dim=1 (row-wise reduction): tensor([ 6, 15])
Mean of all elements: tensor(3.5000)
Mean along dim=1 (mean per row): tensor([2., 5.])
Max values per row: tensor([3, 6])
Max indices per row: tensor([2, 2])
Min values per column: tensor([1, 2, 3])
Min indices per column: tensor([0, 0, 0])
Argmax along dim=0 (per column): tensor([1, 1, 1])
Argmin along dim=1 (per row): tensor([0, 0])
Product of elements: tensor(24)
Standard Deviation: tensor(1.2910)
Variance: tensor(1.6667)
Sum with keepdim (preserve dim=1): tensor([[ 6],
        [15]])


**Note**

In the above example, axis and dimension refer to the same concept in PyTorch: they specify along which dimension the reduction is performed.

For a tensor with n dimensions, each dimension is indexed from 0 to n-1.
So when we specify dim=k in a reduction operation, the reduction is performed along the (k-th) dimension, affecting the elements across that axis.

For example, In a tensor of shape (2, 3), dim=0 reduces across rows (column-wise), and dim=1 reduces across columns (row-wise).

In [None]:
# 7. Keepdim Example

sum_x = torch.sum(x, dim=1, keepdim=True)
print("Sum with keepdim (preserve dim=1):", sum_x, sum_x.shape )

sum_y = torch.sum(x, dim=1)
print("Sum without keepdim (dim=1):", sum_y, sum_y.shape )


Sum with keepdim (preserve dim=1): tensor([[ 6],
        [15]]) torch.Size([2, 1])
Sum without keepdim (dim=1): tensor([ 6, 15]) torch.Size([2])


**Note**

As you can see, when keepdim=True is used, the original dimension is preserved in the result with size 1.
This means the shape of sum_x will be (2, 1) instead of being reduced to (2,).

This is especially useful for broadcasting in later operations, where you want to keep the tensor shape consistent for elementwise computation.

If keepdim=False (default), the reduced dimension is removed entirely, which can make broadcasting harder or require reshaping.

#### Dot Product 

In [33]:
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

dot = torch.dot(a, b)
print("Dot product:", dot)

Dot product: tensor(32)


The dot product is a reduction operation that multiplies two vectors elementwise and then sums the result:

$$
\text{dot}(a, b) = (1 \times 4) + (2 \times 5) + (3 \times 6) = 4 + 10 + 18 = 32
$$

In PyTorch, `torch.dot()` is used specifically for **1D tensors (vectors)** of equal length.  
It returns a **scalar (0-D tensor)** as the result.

This is different from **elementwise multiplication** (`a * b`) or **matrix multiplication** (`torch.matmul` or `@`), which preserve dimensions.

---

##### Dot Product Applications

Dot products are useful in a wide range of contexts.  
For example, given a vector of values **x** and a vector of weights **w**,  
the **weighted sum** of the values in **x** according to the weights **w** can be expressed as the dot product:

$$
\mathbf{x} \cdot \mathbf{w} = \sum_{i=1}^n x_i w_i
$$

##### Weighted Average

When the weights are **nonnegative** and **sum to 1**, i.e.,

$$
w_i \geq 0 \quad \text{for all } i, \quad \text{and} \quad \sum_{i=1}^n w_i = 1,
$$

the dot product $\mathbf{x} \cdot \mathbf{w}$ expresses a **weighted average** of the elements in $\mathbf{x}$.

##### Cosine Similarity
As

$$
\text{dot}(a, b) = \|a\| \cdot \|b\| \cos(\theta)
$$
After **normalizing** two vectors $\mathbf{a}$ and $\mathbf{b}$ to have unit length:

$$
\|\mathbf{a}\| = \|\mathbf{b}\| = 1,
$$

their dot product becomes:

$$
\mathbf{a} \cdot \mathbf{b} = \cos(\theta)
$$

where $\theta$ is the angle between the two vectors.  
This gives a **cosine similarity** measure ranging from $-1$ to $1$.

We will formally introduce this notion of **vector length** (also called the **L2 norm**) later in this section.

#### Matrix Multiplication 

In [None]:

# Define 2 matrices
A = torch.arange(1, 13).view(3,4)   # Shape: (3, 4)

B = torch.arange(1, 13).view(4,3)   # Shape: (4, 3)

# Matrix multiplication
C = torch.matmul(A, B)       # or: C = A @ B

print("Result of A @ B:\n", C)
print("Shape:", C.shape)

Result of A @ B:
 tensor([[ 70,  80,  90],
        [158, 184, 210],
        [246, 288, 330]])
Shape: torch.Size([3, 3])



Matrix multiplication involves taking the **dot product** of rows from the first matrix with columns from the second matrix.

If:
- Matrix **A** has shape $(m \times n)$
- Matrix **B** has shape $(n \times p)$  
Then:
- The result **C = A @ B** will have shape $(m \times p)$

##### Equation

$$
C_{ij} = \sum_{k=1}^{n} A_{ik} \cdot B_{kj}
$$

That is, each element $C_{ij}$ in the result matrix is the dot product of:
- The **i-th row** of matrix **A**
- The **j-th column** of matrix **B**

#### Norms

In [35]:
# Define a vector
v = torch.tensor([3.0, -4.0])

# L1 norm: sum of absolute values
l1_norm = torch.norm(v, p=1)
print("L1 Norm (Manhattan Distance):", l1_norm)

# L2 norm: square root of sum of squares
l2_norm = torch.norm(v, p=2)
print("L2 Norm (Euclidean Distance):", l2_norm)

L1 Norm (Manhattan Distance): tensor(7.)
L2 Norm (Euclidean Distance): tensor(5.)


In deep learning, we are often trying to solve the optimization problems by maximizing the probability assigned to observed data or minimize the distance between predictions and the ground truth observations. The above mentioned distances, are often helpful to contruct the objectives of deep learning algorithms 

### Automatic differentiation 

While teaching full calculus is out of scope for this course, we will focus on the basics of automatic differentiation — a powerful tool that PyTorch provides to handle gradient computation automatically.

Gradients are essential for optimizing neural networks using methods like gradient descent. In PyTorch, we don’t need to manually compute partial derivatives. Instead, PyTorch tracks operations and uses reverse-mode automatic differentiation during .backward() calls.



In [36]:
# Scalar example
x = torch.tensor(2.0, requires_grad=True)
y = x ** 3 + 2 * x + 1   # Define a simple function
y.backward()             # Computes dy/dx
print("dy/dx:", x.grad)  # Output: 3x^2 + 2 = 3*4 + 2 = 14

dy/dx: tensor(14.)


In [37]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = (x ** 2).sum()  # y = x₁² + x₂² + x₃²
y.backward()
print("dy/dx:", x.grad)  # dy/dx = 2x → [2., 4., 6.

dy/dx: tensor([2., 4., 6.])


##### detach 
In PyTorch, detach() is used to stop a tensor from tracking operations for gradient computation. This is useful when:
- You want to freeze a tensor so it won’t affect gradient updates.
- You need to convert a tensor to a NumPy array without tracking gradients.
- You want to reuse a value from a forward pass without backpropagating through it again.


In [38]:

x = torch.tensor([2.0, 3.0], requires_grad=True)
y = x * 2

# z requires grad
z = y.detach()  # z shares data with y, but has no autograd history
print("Requires grad for z:", z.requires_grad)  # False

Requires grad for z: False


##### In place gradient reset 

After each .backward() call, gradients accumulate (i.e., they are added, not overwritten).
To avoid incorrect gradient updates, you must zero the gradients manually before the next backward pass.

In [43]:
w = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)

for _ in range(2):
    y = (w ** 2).sum()
    y.backward()
    print("Gradients:", w.grad)  # Accumulates across iterations
    #w.grad.zero_() #
w.grad.zero_() # Reset gradients to zero before next iteration
for _ in range(2):
    y = (w ** 2).sum()
    y.backward()
    print("Gradients:", w.grad)  # Accumulates across iterations
    w.grad.zero_() #with zero of gradients 

Gradients: tensor([2., 4., 6.])
Gradients: tensor([ 4.,  8., 12.])
Gradients: tensor([2., 4., 6.])
Gradients: tensor([2., 4., 6.])


### Notable Resources. 

While we cannot completely cover the Pytorch functions and class, Please use the below mentioned resources to explore pytorch documentation and tutorials to explore the use case of Pytorch API

1. [Pytorch Documentation](https://docs.pytorch.org/docs/stable/index.html)
2. [d2l](https://d2l.ai/chapter_preliminaries/index.html)
3. [Pytorch Tutorials](https://docs.pytorch.org/tutorials/)