# 📘 Lesson 1: Tensors - The Building Blocks of Machine Learning

---

### 🎯 Why this lesson is important
Machine Learning is about teaching computers to **learn from data**.  
But all data — text, images, sound, video — must first be represented as **numbers**.  

A **Tensor** is the universal way to store and manipulate these numbers.  
If you understand Tensors, you understand the "language" that ML models speak.  

In this lesson, we will not only learn *how* to use Tensors in PyTorch,  
but also *why each operation is important for ML*.


In [1]:
# Import libraries
import torch
import numpy as np


## 🔍 Part 1: What is a Tensor?

- A Tensor is just a **multi-dimensional array** (like a list, but more powerful).  
- Why not just use Python lists?  
  - Lists are **slow** and cannot run on GPU.  
  - ML requires **millions of operations per second** — Tensors make this possible.  

💡 **In ML context**:  
- An image (28×28 grayscale) is stored as a **2D tensor**.  
- A color image (32×32×3) is a **3D tensor**.  
- A batch of 100 images is a **4D tensor**.  

So every time we talk about data in ML, we talk about **Tensors**.


In [2]:
# Compare Python list vs PyTorch tensor
python_list = [1, 2, 3, 4]
tensor = torch.tensor([1, 2, 3, 4])

print("Python list:", python_list)
print("PyTorch tensor:", tensor)
print("Type of list:", type(python_list))
print("Type of tensor:", type(tensor))


Python list: [1, 2, 3, 4]
PyTorch tensor: tensor([1, 2, 3, 4])
Type of list: <class 'list'>
Type of tensor: <class 'torch.Tensor'>


## 🔧 Part 2: Creating Tensors

Why so many ways to create tensors?  
Because data in ML comes from many sources:
- Raw numbers (Python list)
- Data science pipelines (NumPy arrays)
- Empty containers to be filled (zeros, ones)
- Random values for model initialization

💡 **In ML context**:  
When training a neural network, weights are usually **randomly initialized**.  
Zeros/ones tensors are used as placeholders or masks.


In [3]:
# From Python list
tensor_from_list = torch.tensor([1, 2, 3, 4, 5])
print("From list:", tensor_from_list)

# From NumPy array
numpy_array = np.array([1, 2, 3, 4, 5])
tensor_from_numpy = torch.from_numpy(numpy_array)
print("From NumPy:", tensor_from_numpy)

# Zeros
zeros_tensor = torch.zeros(3, 4)
print("Zeros (3x4):\n", zeros_tensor)

# Ones
ones_tensor = torch.ones(2, 3)
print("Ones (2x3):\n", ones_tensor)

# Random values
random_tensor = torch.randn(2, 3)
print("Random (2x3):\n", random_tensor)

# Range
range_tensor = torch.arange(0, 10, 2)
print("Range tensor:", range_tensor)


From list: tensor([1, 2, 3, 4, 5])
From NumPy: tensor([1, 2, 3, 4, 5])
Zeros (3x4):
 tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
Ones (2x3):
 tensor([[1., 1., 1.],
        [1., 1., 1.]])
Random (2x3):
 tensor([[-1.0658, -0.6084,  0.2289],
        [ 1.2029,  0.7661,  1.2428]])
Range tensor: tensor([0, 2, 4, 6, 8])


## 📊 Part 3: Tensor Attributes

We always need to know:
- **Shape** → what kind of data are we holding?  
- **Dimensions** → is it a vector (1D), matrix (2D), image (3D), or batch (4D)?  
- **Data type (dtype)** → integers (labels) vs floats (features).  
- **Device** → CPU for preparation, GPU for training.  

💡 **In ML context**:  
- Shape mismatches cause errors in training.  
- GPU usage makes training 100× faster.  


In [4]:
sample_tensor = torch.randn(3, 4, 5)

print("Shape:", sample_tensor.shape)
print("Size (same):", sample_tensor.size())
print("Dimensions:", sample_tensor.dim())
print("Data type:", sample_tensor.dtype)
print("Device:", sample_tensor.device)
print("Number of elements:", sample_tensor.numel())


Shape: torch.Size([3, 4, 5])
Size (same): torch.Size([3, 4, 5])
Dimensions: 3
Data type: torch.float32
Device: cpu
Number of elements: 60


## 🎯 Part 4: Indexing and Slicing

Why do we need slicing?  
Because ML is about working with parts of data:  
- Extracting a row of pixels from an image  
- Taking a single word embedding from a sentence  
- Splitting a dataset into smaller batches  

Without slicing, we can’t “look inside” our tensors.


In [5]:
matrix = torch.tensor([[1, 2, 3, 4],
                       [5, 6, 7, 8], 
                       [9, 10, 11, 12]])
print("Matrix:\n", matrix)

# Indexing
print("Element [0,0]:", matrix[0,0])
print("Element [1,2]:", matrix[1,2])

# Slicing
print("First row:", matrix[0, :])
print("First column:", matrix[:, 0])
print("Sub-matrix:\n", matrix[0:2, 1:3])


Matrix:
 tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])
Element [0,0]: tensor(1)
Element [1,2]: tensor(7)
First row: tensor([1, 2, 3, 4])
First column: tensor([1, 5, 9])
Sub-matrix:
 tensor([[2, 3],
        [6, 7]])


## 🧮 Part 5: Basic Operations

ML models learn by doing **millions of arithmetic operations**:
- Adding bias  
- Multiplying weights  
- Dividing by normalization factors  
- Raising numbers to powers (activation functions like ReLU or softmax use these)  

💡 Example:  
When you scale your data from 0–255 (pixels) to 0–1, you are doing division on tensors.


In [7]:
a = torch.tensor([1, 2, 3, 4])
b = torch.tensor([5, 6, 7, 8])

print("a + b =", a + b)
print("a * b =", a * b)
print("b / a =", b / a)
print("a squared =", a ** 2)


a + b = tensor([ 6,  8, 10, 12])
a * b = tensor([ 5, 12, 21, 32])
b / a = tensor([5.0000, 3.0000, 2.3333, 2.0000])
a squared = tensor([ 1,  4,  9, 16])


## 🔢 Part 6: Matrix Operations

This is the **heart of Machine Learning**.  
Why?
- A neural network layer = **Matrix Multiplication**  
  - Input (data) × Weights (parameters) = Output (features)  

Transpose, sum, mean → used in:  
- Loss functions  
- Feature extraction  
- Statistics on data  

If you understand matrix multiplication, you understand 80% of deep learning math.


In [8]:
matrix_a = torch.tensor([[1, 2],
                         [3, 4]], dtype=torch.float)
matrix_b = torch.tensor([[5, 6],
                         [7, 8]], dtype=torch.float)

print("A:\n", matrix_a)
print("B:\n", matrix_b)

print("A @ B:\n", torch.matmul(matrix_a, matrix_b))
print("Transpose of A:\n", matrix_a.T)
print("Sum of A:", matrix_a.sum())
print("Mean of A:", matrix_a.mean())
print("Max of A:", matrix_a.max())
print("Min of A:", matrix_a.min())


A:
 tensor([[1., 2.],
        [3., 4.]])
B:
 tensor([[5., 6.],
        [7., 8.]])
A @ B:
 tensor([[19., 22.],
        [43., 50.]])
Transpose of A:
 tensor([[1., 3.],
        [2., 4.]])
Sum of A: tensor(10.)
Mean of A: tensor(2.5000)
Max of A: tensor(4.)
Min of A: tensor(1.)


## 🔄 Part 7: Reshape

Why reshape?  
Because ML requires data in specific formats:
- Images: (Batch, Channels, Height, Width)  
- Text: (Batch, Sequence_length, Embedding_size)  

Reshaping is how we “tell PyTorch” what our data looks like.  
Example: turning 784 pixels into a flat vector for feeding into a neural network.


In [9]:
original = torch.arange(12)
print("Original:", original)

reshaped_2d = original.reshape(3, 4)
print("Reshaped (3x4):\n", reshaped_2d)

reshaped_3d = original.reshape(2, 2, 3)
print("Reshaped (2x2x3):\n", reshaped_3d)

flattened = reshaped_2d.flatten()
print("Flattened:", flattened)


Original: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
Reshaped (3x4):
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
Reshaped (2x2x3):
 tensor([[[ 0,  1,  2],
         [ 3,  4,  5]],

        [[ 6,  7,  8],
         [ 9, 10, 11]]])
Flattened: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])


## 📡 Part 8: Broadcasting

Broadcasting = making different-sized tensors work together.  

Why important?  
- Adding a bias vector to every row of a dataset  
- Normalizing images by subtracting mean and dividing by std  
- Applying the same weight across multiple samples  

Without broadcasting, we’d have to manually duplicate data → very inefficient.


In [10]:
tensor_2d = torch.tensor([[1, 2, 3],
                          [4, 5, 6]])
scalar = 10
vector = torch.tensor([100, 200, 300])

print("Tensor + scalar:\n", tensor_2d + scalar)
print("Tensor + vector:\n", tensor_2d + vector)


Tensor + scalar:
 tensor([[11, 12, 13],
        [14, 15, 16]])
Tensor + vector:
 tensor([[101, 202, 303],
        [104, 205, 306]])


## 🎯 Part 9: Practice Exercises


In [11]:
# Example: Student scores
scores = torch.tensor([[85, 90, 78, 92],
                       [88, 85, 91, 89],
                       [79, 95, 87, 93]])
print("Scores:\n", scores)

print("Average per student:", scores.float().mean(dim=1))
print("Average per subject:", scores.float().mean(dim=0))
print("Highest score:", scores.max())
print("Lowest score:", scores.min())


Scores:
 tensor([[85, 90, 78, 92],
        [88, 85, 91, 89],
        [79, 95, 87, 93]])
Average per student: tensor([86.2500, 88.2500, 88.5000])
Average per subject: tensor([84.0000, 90.0000, 85.3333, 91.3333])
Highest score: tensor(95)
Lowest score: tensor(78)


In [12]:
# Example: Image
image = torch.randn(32, 32, 3)   # H x W x C
print("Image shape:", image.shape)

image_chw = image.permute(2, 0, 1)   # Convert to (C, H, W)
print("CHW format:", image_chw.shape)

batch_images = image_chw.unsqueeze(0)  # Add batch dimension
print("Batch images shape:", batch_images.shape)


Image shape: torch.Size([32, 32, 3])
CHW format: torch.Size([3, 32, 32])
Batch images shape: torch.Size([1, 3, 32, 32])


## 📚 Summary

✅ What we learned:
- Tensors = multi-dimensional arrays (foundation of ML)  
- How to create tensors (from lists, NumPy, zeros, ones, random, ranges)  
- Tensor attributes: shape, dimensions, dtype, device  
- Indexing & slicing like NumPy  
- Basic operations (+, -, *, /, power)  
- Matrix operations (matmul, transpose, sum, mean, min, max)  
- Reshape & flatten  
- Broadcasting rules  
- Real-world practice (scores, images)

🚀 Next Lesson: **Automatic Differentiation (Gradients)**
