<a href="https://colab.research.google.com/github/iamrahill7/Let-s-learn-PyTorch/blob/main/PyTorch_Basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to PyTorch

## History of PyTorch

* In 2002, **Torch** was introduced as a tool for performing complex mathematical operations on GPUs. It was mainly designed for researchers who needed a powerful framework for scientific computing.
* AlexNet, Facebook’s DeepFace and several other early deep learning models were initially developed using Torch.

However, Torch had two big problems:  

1. **Lua based programming:**

  * It used **Lua**, a programming language that was not very popular compared to Python.  

  * Developers had to learn Lua language to build applications, which made it difficult for many people to use.  

  * Meta AI (formerly Facebook) saw that Torch was excellent at handling **tensor-based operations** (which are essential for deep learning). But since Lua was a barrier, they decided to create a library that combined Torch’s power with the ease of Python.  

  * This led to the birth of **PyTorch**—a Python-friendly version of Torch that made deep learning more accessible and widely adopted.

## 🤔 Why PyTorch?


PyTorch became popular because it made deep learning easier, faster, and more flexible compared to older frameworks like Torch (Lua-based) and even TensorFlow (before TF 2.0).

It provides three major advantages:

  - **Tensor Computation** – Handles numerical data efficiently.

  - **GPU Acceleration** – Uses GPUs to speed up processing. PyTorch allows us to move tensor operations to the GPU, which processes them much faster.

  - **Dynamic Computation Graph** – Makes model development more flexible and intuitive. *(In the upcoming chapters, I’ve included coded examples for this concept. Be sure to check them out for a deeper understanding!)*

### **PyTorch vs TensorFlow: Key Differences**  

### *Which one you should use?*


| Feature            | **PyTorch** 🔵 | **TensorFlow** 🟠 |
|-------------------|--------------|----------------|
| **Ease of Use** | Simple, Pythonic, beginner-friendly | More complex, but improved in TF 2.0 |
| **Computation Graph** | **Dynamic** (built on the fly, easy to debug) | **Static (TF 1.x)** & **Dynamic (TF 2.0)** |
| **Debugging** | Easy (uses eager execution) | Harder, requires `tf.function` for optimization |
| **Performance** | Faster in research, slightly slower in deployment | Optimized for large-scale production & inference |
| **Deployment** | TorchServe (not as optimized) | **TensorFlow Serving, TensorFlow Lite, TensorFlow.js** |
| **Industry Adoption** | More common in **academia & research** | More common in **industry & production** |
| **GPU/TPU Support** | Strong GPU support, but TPU support is limited | Optimized for **GPUs & TPUs** |
| **Mobile & Edge AI** | Less support for mobile devices | Better support (**TensorFlow Lite**) |
| **Backed By** | Meta AI (Facebook) | Google |
| **Community & Learning** | Popular in **research & AI papers** | Popular in **companies & large-scale AI i.e for deployment and production-level AI systems** |

### **Which One Should You Learn?**  
| **Goal** | **Recommended Framework** |
|---------|------------------------|
| **Beginner-friendly learning** | ✅ PyTorch |
| **Research & AI experiments** | ✅ PyTorch |
| **Large-scale production** | ✅ TensorFlow |
| **Mobile & Edge AI** | ✅ TensorFlow |
| **Industry Jobs** | ✅ TensorFlow (but knowing both helps) |

🚀 **Final Recommendation:** Start with **PyTorch** (easier), then learn **TensorFlow** if needed for deployment.

#### Note:

Writing code in **Keras (for TensorFlow) is easier** compared to raw TensorFlow, and **PyTorch is also easy to use**.  

### **Why?**  
✅ **Keras (for TensorFlow)** → Provides a **high-level API**, making it easier to build and train deep learning models with fewer lines of code.  
✅ **PyTorch** → Feels more like standard Python, making it intuitive and beginner-friendly.  

📌 **Keras (TF) and PyTorch both simplify deep learning**, but **Keras is best for TensorFlow users**, while **PyTorch is best for research and experimentation**.

---

## **What is a Tensor?**  

* A **tensor** is a **multi-dimensional array** used in deep learning and machine learning.
* It is similar to **arrays (NumPy)** or **matrices**, but it can have more dimensions.  




---


## **Types of Tensors (Scalar to 5D)**  

| **Tensor Type** | **Shape** | **Example** | **Usage** |
|---------------|---------|---------|---------|
| **Scalar (0D Tensor)** | `[]` | `5` | **Single number** (e.g., temperature, loss value) |
| **1D Tensor (Vector)** | `[N]` | `[5, 10, 15]` | **List of numbers** (e.g., features of a single data point) |
| **2D Tensor (Matrix)** | `[N, M]` | `[[1, 2], [3, 4]]` | **Matrix** (e.g., grayscale image, tabular data (rows & columns)) |
| **3D Tensor** | `[N, M, P]` | `[[[1,2],[3,4]],[[5,6],[7,8]]]` | **RGB Image** (Height × Width × Channels) |
| **4D Tensor** | `[Batch, Channels, Height, Width]` | `shape=[32, 3, 224, 224]` | **A batch of images processed together** (used in CNNs) |
| **5D Tensor** | `[Batch, Depth, Channels, Height, Width]` | `shape=[8, 10, 3, 224, 224]` | **A batch of videos (sequences of images)** |

---


---
### **Why Do We Use Tensors?**  



## **Why Do We Need Tensors Instead of Arrays?**  

### **1. Tensors Work with GPUs for Faster Computation**  
- Machine learning tasks involve **millions or even billions of calculations**.  
- Running operations on a **GPU** makes computations significantly faster.  
- **NumPy arrays do not support this**, but tensors are optimized for GPU acceleration.  
- GPUs have multiple cores, allowing for parallel processing—handling multiple calculations at the same time instead of sequentially.
- This is crucial for deep learning, where large datasets and complex models require high-speed computations. 🚀

### **2. Tensors Represent Real-World Data Efficiently**  
- Real-world data comes in different forms: **images, videos, text, and audio**.  
- These data types are naturally multi-dimensional (e.g., an image has height, width, and color channels).  
- **Tensors handle multi-dimensional data seamlessly**, making them ideal for deep learning applications.  

### **3. Tensors Support Advanced Mathematical Operations**  
- Machine learning involves **complex mathematical functions** like:  
  - Matrix multiplications  
  - Dot products  
  - Convolutions  
  - Transformations  
- Tensors have built-in support for these operations, making them essential for **neural networks and AI models**.  
- Unlike regular arrays, tensors **allow automatic differentiation**, which is crucial for training models.  

---


In [None]:
# Lets' code

# No need to install PyTorch as it is already pre-installed on your Colab book

import torch

print(torch.__version__)

2.6.0+cu124


### Creating a Tensor

In [None]:
# isong a scaler

A = torch.empty(2,4)

In [None]:
# check type
type(A)

torch.Tensor

In [None]:
# Using zeros

# creates a tensor filled with zeros

### Initializing Weights & Biases in Neural Networks
# In deep learning, models have weights and biases that need to be initialized before training.

# Sometimes, initializing biases with zero values helps in maintaining uniformity.

# Example: Bias vectors in neural networks are often initialized as zero tensors.

torch.zeros(3,3)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

In [None]:
# Using ones

torch.ones(2,7)

tensor([[1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1.]])

In [None]:
# Using rand

# The values over here wont be consistent

torch.rand(2,3)

tensor([[0.5319, 0.6946, 0.7776],
        [0.0654, 0.4336, 0.3366]])

In [None]:
# Therefore,

# Use of seed

# Machine learning models often involve random processes, such as: Initializing weights in a neural network

# If a model gives different results every time, debugging becomes difficult.

# Setting `torch.manual_seed(100)` ensures that every run has the same random values, making it easier to track issues.

torch.manual_seed(100)

torch.rand(2,3)


tensor([[0.1117, 0.8158, 0.2626],
        [0.4839, 0.6765, 0.7539]])

In [None]:
### Assigning dtypes to tensor

torch.tensor([1.0, 2.0, 3.0], dtype = int)

tensor([1, 2, 3])

In [None]:
## Similary

torch.tensor([1.0, 2.0, 3.0], dtype = float)

tensor([1., 2., 3.], dtype=torch.float64)

### Let's perform some operations on our Tensor

In [None]:
X = torch.tensor([[1,2,3],[4,5,6]])
X

tensor([[1, 2, 3],
        [4, 5, 6]])

In [None]:
type(X)

torch.Tensor

In [None]:
X.shape

torch.Size([2, 3])

In [None]:
X.ndim

2

In [None]:
# Addition
X_add = X + 2
# OR
X_add = torch.add(X, 2)

print(X_add)

tensor([[3, 4, 5],
        [6, 7, 8]])


In [None]:
# Subtraction
X_sub = X - 1
# OR
X_sub = torch.sub(X, 1)

print(X_sub)

tensor([[0, 1, 2],
        [3, 4, 5]])


In [None]:
# Multiplying each element by 3
X_mul = X * 3
# OR
X_mul = torch.mul(X, 3)

print(X_mul)

tensor([[ 3,  6,  9],
        [12, 15, 18]])


In [None]:
# Dividing each element by 2
X_div = X / 2
# OR
X_div = torch.div(X, 2)

print(X_div)

tensor([[0.5000, 1.0000, 1.5000],
        [2.0000, 2.5000, 3.0000]])


In [None]:
# Squaring each element
X_pow = X ** 2
# OR
X_pow = torch.pow(X, 2)

print(X_pow)

tensor([[ 1,  4,  9],
        [16, 25, 36]])


In [None]:
## Matrix Multiplicaton - matmul()

Y = torch.tensor([[1, 2], [3, 4], [5, 6]])
X_matmul = torch.matmul(X, Y)  # OR X @ Y

print(X_matmul)

tensor([[22, 28],
        [49, 64]])


In [None]:
X

tensor([[1, 2, 3],
        [4, 5, 6]])

In [None]:
## Abs - Absolute value of a tensor

## Gives all positive values

abs_tensor = torch.tensor([1,-2,4,6])

torch.abs(abs_tensor)

tensor([1, 2, 4, 6])

In [None]:
X = X.to(torch.float)

torch.mean(X)

tensor(3.5000)

In [None]:
torch.median(X)

tensor(3.)

In [None]:
# max
torch.max(X)

tensor(6.)

In [None]:
# min
torch.min(X)

tensor(1.)

In [None]:
# product

torch.prod(X)

tensor(720.)

In [None]:
# Standard deviation
torch.std(X)

tensor(1.8708)

In [None]:
# Variance
torch.var(X)

tensor(3.5000)

In [None]:
# Argmax
# Returns the index of the maximum value in the tensor

torch.argmax(X)

tensor(5)

In [None]:
# Argmin
torch.argmin(X)

tensor(0)

In [None]:
A = torch.tensor([1,2,3])
B = torch.tensor([4,5,6])

In [None]:
# Matrix Multiplication
torch.matmul(A,B)

tensor(32)

In [None]:
# dot product
torch.dot(A,B)

tensor(32)

In [None]:
X

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [None]:
# Transpose
X_transposed = X.transpose(0,1)  # Swaps rows and columns with each other
X_transposed

tensor([[1., 4.],
        [2., 5.],
        [3., 6.]])

In [None]:
### Inverse

### The inverse of a matrix is like the "opposite" of the matrix. When you multiply a matrix by its inverse,
### you get a special matrix called the identity matrix (which has 1s on the diagonal and 0s everywhere else).

### Also, note for a matrix to have an inverse, it must be square, meaning it should have the same number of rows and columns

Formula for Inverse of a mtrix:
$$ A \times A^{-1} = I $$

In [None]:
torch.manual_seed(100)

M  = torch.rand(3,3)

inverse_matrix = torch.inverse(M)

inverse_matrix

tensor([[ 2.1540, -3.1479,  8.6896],
        [ 1.9345, -0.9087,  0.8514],
        [-3.1183,  4.1622, -6.3412]])

##  How Fast is GPU Compared to CPU?

In [None]:
import torch
import time

# Define matrix size
size = 10000

# Create two large tensors on CPU
A = torch.randn(size, size)
B = torch.randn(size, size)

# Measure time on CPU
start_cpu = time.time()
C_cpu = torch.matmul(A, B)  # Matrix multiplication on CPU
end_cpu = time.time()

# Move tensors to GPU
A_gpu = A.to("cuda")
B_gpu = B.to("cuda")

# Measure time on GPU
start_gpu = time.time()
C_gpu = torch.matmul(A_gpu, B_gpu)  # Matrix multiplication on GPU
end_gpu = time.time()

# Print time taken
print(f"CPU Time: {end_cpu - start_cpu:.4f} seconds")
print(f"GPU Time: {end_gpu - start_gpu:.4f} seconds")


CPU Time: 20.5405 seconds
GPU Time: 0.1294 seconds
