<a href="https://www.kaggle.com/code/mrafraim/dl-day-13-pytorch-introduction?scriptVersionId=287732914" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Day 13: PyTorch Introduction

Welcome to day 13!

Today you will learn:
- What PyTorch is and why it’s widely used
- Tensors: PyTorch’s core data structure
- Basic tensor operations
- GPU usage for acceleration

---

#  What is PyTorch?

PyTorch is an open-source deep learning framework developed by Facebook AI Research (FAIR).  
It is designed for building, training, and deploying neural networks efficiently, with support for both CPU and GPU computation.  

Key components of PyTorch:

- **Tensors**: multidimensional arrays, similar to NumPy arrays, but can run on GPUs.  
- **Autograd**: automatic differentiation engine that computes gradients for learning.  
- **nn module**: provides layers, loss functions, and utilities to define neural networks.  
- **optim module**: optimizers like SGD, Adam to update network parameters.  
- **Data utilities**: `Dataset` and `DataLoader` for batching, shuffling, and feeding data.

## Why do we need PyTorch?

Training deep learning models manually is extremely complex because:

1. **Models have millions of parameters**: manually computing gradients is impractical.  
2. **Layers and operations are chained**: forward pass and backpropagation involve many computations.  
3. **GPU acceleration is necessary**: CPUs are too slow for large-scale training.  

PyTorch solves these problems by:

- Computing gradients automatically via autograd.  
- Allowing GPU-accelerated operations with minimal code changes.  
- Providing pre-built layers and optimizers to avoid writing everything from scratch.  
- Supporting flexible model design using dynamic computation graphs.

## Why PyTorch over other frameworks?

- **Dynamic computation graph**: Unlike TensorFlow 1.x, PyTorch builds the graph on-the-fly. This is more intuitive for debugging and experimentation.  
- **NumPy-like syntax**: If you know NumPy, PyTorch feels familiar, making it easy to transition to deep learning.  
- **Research + production**: Widely adopted by researchers for experiments and by engineers for production deployments.  
- **Community & ecosystem**: Strong support, libraries, tutorials, and pre-trained models available.

## Summary

- PyTorch is a tool to simplify deep learning.  
- It handles tensors, gradients, and optimization automatically.  
- Makes training, debugging, and experimenting with neural networks far easier.  
- Essential for building modern deep learning models quickly and safely.

## Quick Mental Model

Think of PyTorch as a smart math engine:

1. You define your inputs, weights, and operations.  
2. You do a forward pass → PyTorch calculates outputs.  
3. You do a backward pass → PyTorch computes all gradients automatically.  
4. You update weights → model learns iteratively.  

Without PyTorch, you’d be calculating derivatives manually, which is error-prone and slow.

In [1]:
# Install if not available (uncomment if needed)
# !pip install torch

import torch


# What is a Tensor?

A tensor is a multidimensional array that generalizes scalars, vectors, and matrices.

- **0D tensor** → scalar (single number)  
- **1D tensor** → vector (array of numbers)  
- **2D tensor** → matrix (table of numbers)  
- **3D tensor** → cube of numbers (e.g., image with height × width × channels)  
- **ND tensor** → higher-dimensional arrays

In short:
> Tensor = multidimensional array that can be used in linear algebra and deep learning.

## Why do we need Tensors?

Deep learning deals with inputs, weights, and outputs in multidimensional form:

- Images → 3D tensor (Height × Width × Channels)  
- Text sequences → 2D tensor (Sequence Length × Embedding Size)  
- Batch of images → 4D tensor (Batch × Height × Width × Channels)  

Tensors are the core data structure for all DL frameworks (PyTorch, TensorFlow, etc.).

## Tensor Order (Rank) and Indices

The order of a tensor = number of indices required to identify a single element.

| Object | Symbol | Index Notation | Order |
|------|------|---------------|------|
| Scalar | $a$ | none | 0 |
| Vector | $v_i$ | 1 index | 1 |
| Matrix | $M_{ij}$ | 2 indices | 2 |
| Tensor | $T_{ijk}$ | 3 indices | 3 |
| General | $T_{i_1 i_2 \dots i_n}$ | $n$ indices | $n$ |

Each index corresponds to one axis / dimension.

## Different Orders of Tensors
### 1. Scalars (0th-Order Tensor)

A scalar is a single number:
$$
a \in \mathbb{R}
$$

Examples:
- Loss value $L$
- Learning rate $\eta$
- Bias offset

No direction, no structure.

### 2. Vectors (1st-Order Tensor)

A vector is an ordered collection of scalars:
$$
\mathbf{v} =
\begin{bmatrix}
v_1 \\
v_2 \\
\vdots \\
v_n
\end{bmatrix}
\quad\text{or}\quad v_i
$$

- One index `i`
- Shape: $(n)$

Meaning:
- Each component represents a feature
- Direction + magnitude

Examples:
- Feature vector
- Bias vector
- Word embedding

### 3. Matrices (2nd-Order Tensor)

A matrix has two indices:
$$
M_{ij} =
\begin{bmatrix}
m_{11} & m_{12} & \dots \\
m_{21} & m_{22} & \dots \\
\vdots & \vdots & \ddots
\end{bmatrix}
$$

- `i` → row index
- `j` → column index
- Shape: $(m, n)$

#### Matrix–vector multiplication
$$
y_i = \sum_{j=1}^{n} M_{ij} x_j
$$

This equation is the core computation of a neural layer.


#### Example

$$
M =
\begin{bmatrix}
1 & 0 & -2 \\
3 & 4 & 1
\end{bmatrix}
$$

$$
\quad
x =
\begin{bmatrix}
2 \\
-1 \\
3
\end{bmatrix}
$$

Compute each output:

$$
\begin{aligned}
y_1
&= M_{11}x_1 + M_{12}x_2 + M_{13}x_3 \\
&= (1)(2) + (0)(-1) + (-2)(3) \\
&= -4
\end{aligned}
$$

$$
\begin{aligned}
y_2
&= M_{21}x_1 + M_{22}x_2 + M_{23}x_3 \\
&= (3)(2) + (4)(-1) + (1)(3) \\
&= 5
\end{aligned}
$$

Final result:

$$
y =
\begin{bmatrix}
-4 \\
5
\end{bmatrix}
$$


### 4. 3D Tensor (3rd-Order Tensor )

A 3rd-order tensor is a tensor with three indices:

$$
T_{ijk}
$$

Each index answers one question:
- $i$ → row (height)
- $j$ → column (width)
- $k$ → slice / channel / feature map

So $T_{ijk}$ means:
> the value at **row $i$**, **column $j$**, in the **$k$-th matrix**.

**Shape Meaning**

If a tensor has shape:

$$
H \times W \times C
$$

then:
- $H$ = number of rows in each matrix
- $W$ = number of columns in each matrix
- $C$ = number of matrices (channels)

> A 3rd-order tensor is a stack of $C$ matrices, each of size $H \times W$.

#### Example: Tensor of Shape $2 \times 2 \times 3$

This tensor contains **3 matrices**, each of size $2 \times 2$.

Channel $k = 1$
$$
T_{::1} =
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}
$$

Channel $k = 2$
$$
T_{::2} =
\begin{bmatrix}
5 & 6 \\
7 & 8
\end{bmatrix}
$$

Channel $k = 3$
$$
T_{::3} =
\begin{bmatrix}
9 & 10 \\
11 & 12
\end{bmatrix}
$$

Together, these three matrices form the tensor $T$ with shape $2 \times 2 \times 3$.


**Single Element**

$$
T_{2,1,3} = 11
$$

Read as:
- row $i = 2$
- column $j = 1$
- channel $k = 3$

Value at that position is **11**.

#### Slicing a 3rd-Order Tensor

Slicing is selecting a subset of elements along one or more axes of a tensor, which may reduce its dimensionality depending on how many indices are fixed.

| Term      | Dimensional Meaning               | Physical Analogy                        |
|-----------|---------------------------------|----------------------------------------|
| Frontal   | Fixes the depth index (`k`)       | One slice of bread from a loaf         |
| Horizontal| Fixes the row index (`i`)         | A layer cut parallel to the floor      |
| Lateral   | Fixes the column index (`j`)      | A vertical cut from the side           |


#### Fixing the third index ($k$) - The frontal slice

The k-th frontal slice of T

$$
T_{::k}
$$

This gives a 2D matrix (one slice of the tensor).

Example:
$$
T_{::2} =
\begin{bmatrix}
5 & 6 \\
7 & 8
\end{bmatrix}
$$


#### Fixing the first index ($i$) - The horizontal slice

The i-th horizontal slice of T

$$
T_{i::}
$$

This gives a row across all channels.

Example:
$$
T_{1::} =
\begin{bmatrix}
1 & 2 & 5 & 6 & 9 & 10
\end{bmatrix}
$$

(conceptually: row 1 from each matrix)

#### Fixing the second index ($j$) - The lateral slice

The j-th lateral slice of T

$$
T_{:j:}
$$

This gives a column across all channels.

Example: 

$$
T_{:2:} =
\begin{bmatrix}
2 & 6 & 10 \\
4 & 8 & 12
\end{bmatrix}
$$
(conceptually: column  2 from each matrix)

#### Fixing Two Indices → Fiber

When two indices are fixed, the resulting 1D vector is called a **fiber**.  
Fibers are the lines running through the tensor along the axis that wasn’t fixed.


- **Mode-1 fiber (fix j and k, vary i):**  
$$
T_{:,j,k}  
$$
Column along the first dimension  

- **Mode-2 fiber (fix i and k, vary j):**  
$$
T_{i,:,k}  
$$
Row along the second dimension  


- **Mode-3 fiber (fix i and j, vary k):**  
$$
T_{i,j,:}  
$$
Vector along the third dimension (channels)  


#### Examples of Two Indices Fixed (Fibers)

Tensor of Shape $2 \times 2 \times 3$

This tensor contains **3 matrices**, each of size $2 \times 2$.

Channel $k = 1$:
$$
T_{::1} =
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}
$$

Channel $k = 2$:
$$
T_{::2} =
\begin{bmatrix}
5 & 6 \\
7 & 8
\end{bmatrix}
$$

Channel $k = 3$:
$$
T_{::3} =
\begin{bmatrix}
9 & 10 \\
11 & 12
\end{bmatrix}
$$

1. **Mode-3 fiber (fix i=1, j=2):**  
$$
T_{1,2,:} = [2, 6, 10]
$$  
Vector of values across channels at **row 1, column 2**.

2. **Mode-2 fiber (fix i=2, k=3):**  
$$
T_{2,:,3} = [11, 12]
$$  
Row vector of **row 2 in channel 3**.

3. **Mode-1 fiber (fix j=1, k=1):**  
$$
T_{:,1,1} = [1, 3]
$$  
Column vector of **column 1 in channel 1**.


#### Single Element (all three indices fixed)

$$
T_{2,1,3} = 11
$$

- row $i=2$  
- column $j=1$  
- channel $k=3$  

#### Key Intuition

- Fix **one index** → **matrix slice**  
- Fix **two indices** → **fiber (vector)**  
- Fix **all three indices** → **scalar (single number)**  

### 5. Higher-Order Tensors (ND)

General tensor:
$$
T_{i_1 i_2 i_3 \dots i_n}
$$

Each index corresponds to:
- A dimension
- A semantic axis

Examples in DL:
- Batch dimension
- Time steps
- Feature channels

## Tensor Shape and Dimension Meaning

### 1. Tensor Shape

- The shape of a tensor describes how many elements it has along each axis.  
- Written as a tuple:  
  $$
  (d_1, d_2, d_3, \dots, d_n)
  $$  
  where $d_k$ is the size of the $k$-th axis.

- Examples:
  - Scalar → shape `()` → 0D  
  - Vector of length 5 → shape `(5,)` → 1D  
  - Matrix 3×4 → shape `(3, 4)` → 2D  
  - 3rd-order tensor 2×2×3 → shape `(2, 2, 3)` → 3D  
  - Batch of 10 images (RGB 32×32) → shape `(10, 32, 32, 3)` → 4D

### 2. Tensor Dimension

- Dimension (or rank / order) = number of axes in the tensor.  
- Number of axes = length of the shape tuple.
- Examples:
  - Scalar → 0D  
  - Vector → 1D  
  - Matrix → 2D  
  - 3rd-order tensor → 3D  
  - ND tensor → N dimensions

### 3. Semantic Meaning of Axes

- Each axis often has a conceptual meaning:
  - **Batch axis** → number of samples  
  - **Height / width axis** → spatial dimensions of an image  
  - **Channel / feature axis** → color channels, feature maps, embedding dimensions  
  - **Time axis** → positions in a sequence

- Example: Batch of 10 RGB images of size 32×32:
  $$
  \text{Shape: } (10, 32, 32, 3)
  $$
  - Axis 0 → batch size  
  - Axis 1 → height  
  - Axis 2 → width  
  - Axis 3 → channels

### 4. Quick Reference Table

| Tensor | Shape Example | Dimensions | Semantic Axes |
|--------|---------------|------------|---------------|
| Scalar | ()            | 0D         | N/A           |
| Vector | (5,)          | 1D         | features      |
| Matrix | (3,4)         | 2D         | rows × columns |
| 3D Tensor | (2,2,3)    | 3D         | height × width × channels |
| Batch of images | (10,32,32,3) | 4D | batch × height × width × channels |


### 5. Why Shape & Dimension Matter

- Operations like matrix multiplication, convolution, and broadcasting rely on knowing shape and axes meaning.  
- Misunderstanding axes often causes runtime errors or wrong results in deep learning.


## Tensor Operations: Index Notation

Tensor operations can be neatly described using index notation, which helps understand how elements interact along axes.

### (a) Element-wise Operations

- These operate on tensors of the same shape.  
- No index is removed. The result has the same shape as the inputs.

Example: Addition of two matrices

$$
A =
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}, \quad
B =
\begin{bmatrix}
5 & 6 \\
7 & 8
\end{bmatrix}
$$

Element-wise addition:

$$
C_{ij} = A_{ij} + B_{ij}
$$

Compute each element:

$$
C =
\begin{bmatrix}
1+5 & 2+6 \\
3+7 & 4+8
\end{bmatrix} =
\begin{bmatrix}
6 & 8 \\
10 & 12
\end{bmatrix}
$$

- Shape of $C$ = shape of $A$ = shape of $B$  
- Indices $i,j$ are retained.

### (b) Tensor Contraction (Generalized Dot Product)

- Some indices are summed over → they disappear in the output.  
- Remaining indices define the shape of the resulting tensor.  
- Matrix multiplication is a special case.

Matrix multiplication example

$$
C_{ik} = \sum_{j} A_{ij} B_{jk}
$$

- $A$ shape = (2×3)  
- $B$ shape = (3×2)  
- Sum over index $j=1..3$  
- Result $C$ shape = (2×2)  

Compute Example:

$$
A =
\begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6
\end{bmatrix}, \quad
B =
\begin{bmatrix}
7 & 8 \\
9 & 10 \\
11 & 12
\end{bmatrix}
$$

Compute $C_{ik}$:

- $C_{11} = 1*7 + 2*9 + 3*11 = 58$  
- $C_{12} = 1*8 + 2*10 + 3*12 = 64$  
- $C_{21} = 4*7 + 5*9 + 6*11 = 139$  
- $C_{22} = 4*8 + 5*10 + 6*12 = 154$

$$
C =
\begin{bmatrix}
58 & 64 \\
139 & 154
\end{bmatrix}
$$

**Key Points:**

- Index $j$ is contracted (summed over) → disappears in $C$  
- Remaining indices $i,k$ define the output shape 


### (c) Generalization to Higher-Order Tensors

- Tensor contraction generalizes dot products to any order:  
  $$
  C_{ab\ldots} = \sum_{i} A_{ai\ldots} B_{i b\ldots}
  $$

- This operation underlies all neural network layers:
  - Fully connected layers → matrix multiplication  
  - Convolution layers → sum over input channels  
  - Attention → sum over sequence positions  

## Einstein Summation Convention

The Einstein summation convention (or einsum) is a concise notation for expressing tensor operations without explicitly writing summation symbols.

### Basic Rule

- Repeated index in a product implies summation over that index
- Example (matrix multiplication):

$$
C_{ik} = \sum_j A_{ij} B_{jk} \quad \longrightarrow \quad C_{ik} = A_{ij} B_{jk}
$$

- Here, index `j` appears twice → automatically summed over  
- Indices `i` and `k` appear once → remain in the output

### Benefits

- Removes clutter of explicit summation symbols  
- Makes operations more readable** for higher-order tensors  
- Easily generalizes to ND tensor contractions

Examples

**(a) Matrix Multiplication**

$$
C_{ik} = A_{ij} B_{jk}
$$

- Shapes: \(A\) = (2×3), \(B\) = (3×2)  
- Index `j` summed → output shape = (2×2)

**(b) Vector Dot Product**

$$
s = x_i y_i
$$

- Repeated index `i` → sum over all elements  
- Equivalent to $\sum_i x_i y_i$

**(c) 3rd-Order Tensor Contraction**

$$
C_{il} = A_{ijk} B_{jkl}
$$`

- Repeated indices `j` and `k` → summed  
- Remaining indices `i` and `l` → output shape


## Tensor Products (Outer Products)

The tensor product, often called the outer product, is an operation that combines lower-order tensors to produce a higher-order tensor.

For two vectors \(a\) and \(b\):

$$
T_{ij} = a_i b_j
$$

- Input: two vectors $a \in \mathbb{R}^{m}$, $b \in \mathbb{R}^{n}$
- Output: a matrix $T \in \mathbb{R}^{m \times n}$  
- No summation occurs; each element of $T$ is simply the product of elements from a and b.

> Tensor product generalizes to any order of tensors, building higher-order tensors from lower-order ones.

Example: Vector Outer Product

Let:

$$
a = 
\begin{bmatrix}
1 \\ 2 \\ 3
\end{bmatrix}, \quad
b =
\begin{bmatrix}
4 \\ 5
\end{bmatrix}
$$

Compute outer product $T = a \otimes b$:

$$
T_{ij} = a_i b_j
$$

$$
T =
\begin{bmatrix}
1*4 & 1*5 \\
2*4 & 2*5 \\
3*4 & 3*5
\end{bmatrix} =
\begin{bmatrix}
4 & 5 \\
8 & 10 \\
12 & 15
\end{bmatrix}
$$

- Shape of \(T\) = (3 × 2)  
- Each row corresponds to one element of \(a\) multiplied by all elements of \(b\)  


### Generalization

- Outer product can combine **higher-order tensors**:  
  $$
  T_{ijkl} = A_{ij} B_{kl}
  $$
- Input: 2×2 tensor \(A\), 2×3 tensor \(B\)  
- Output: 4D tensor \(T\) with shape 2×2×2×3  

> Outer products are used in deep learning for constructing bilinear layers, attention scores, and higher-order feature interactions.


## Why Gradients Are Tensors

In deep learning, gradients of the loss with respect to parameters are themselves tensors. This comes from the fact that each parameter can have multiple components, and the gradient must match that shape.

### Scalar Loss

Let the loss be a scalar:

$$
L \in \mathbb{R}
$$

- This is just a single number representing error.

### Gradient w.r.t a Vector

Suppose we have a vector parameter $v \in \mathbb{R}^n$.  
The gradient of the loss w.r.t $v$ is:

$$
\frac{\partial L}{\partial v_i}, \quad i = 1, \dots, n
$$

- This is a vector of the same size as $v$ 
- Each element tells us how changing that component of \(v\) affects the loss.

**Example:**

$$
v = 
\begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix}, \quad
\frac{\partial L}{\partial v} =
\begin{bmatrix} \frac{\partial L}{\partial v_1} \\ \frac{\partial L}{\partial v_2} \\ \frac{\partial L}{\partial v_3} \end{bmatrix}
$$

### Gradient w.r.t a Matrix

Suppose we have a weight matrix $W \in \mathbb{R}^{m \times n}$.  
The gradient is:

$$
\frac{\partial L}{\partial W_{ij}}, \quad i = 1,\dots,m, \; j = 1,\dots,n
$$

- This is a matrix of the same shape as $W $
- Each element tells how changing that weight affects the loss.

**Example:**

$$
W =
\begin{bmatrix}
w_{11} & w_{12} \\
w_{21} & w_{22}
\end{bmatrix}, \quad
\frac{\partial L}{\partial W} =
\begin{bmatrix}
\frac{\partial L}{\partial w_{11}} & \frac{\partial L}{\partial w_{12}} \\
\frac{\partial L}{\partial w_{21}} & \frac{\partial L}{\partial w_{22}}
\end{bmatrix}
$$

### Rule of Thumb

> **Gradient has the same shape as the parameter it differentiates**

- Scalar → gradient is scalar  
- Vector → gradient is vector  
- Matrix → gradient is matrix  
- Higher-order tensor → gradient is higher-order tensor  

### Implications in Deep Learning

- **Weights are tensors** → gradients are tensors  
- **Backpropagation** → sequence of tensor operations (matrix multiplication, contraction, addition)  
- This is why automatic differentiation frameworks treat gradients as tensors, matching the shapes of parameters.


## Neural Networks = Tensor Transformations

A neural network layer:
$$
Z = XW + b
$$

In index form:
$$
Z_{ik} = \sum_j X_{ij} W_{jk} + b_k
$$

Followed by:
$$
A_{ik} = f(Z_{ik})
$$

Training adjusts tensor values to minimize:
$$
L(y, f_\theta(X))
$$

## Quick Mental Model 

- **Scalars** → measure error
- **Vectors** → represent features
- **Matrices** → mix features
- **Tensors** → organize learning at scale

Deep learning is:
> **Tensor algebra + nonlinear functions + optimization**


## Final Insight

> A neural network is a sequence of tensor contractions followed by nonlinear mappings, optimized via gradient descent.

If this sentence eventually feels *obvious*, you truly understand deep learning.


# Implementation

In [2]:
# -------------------------
# Scalar Tensor
# -------------------------

# Create a 0-dimensional tensor (a scalar)
# torch.tensor(5) wraps the Python number 5 into a PyTorch tensor object
a = torch.tensor(5)

# Print the tensor
# Even though it looks like a number, it is a tensor with shape ()
print("Scalar:", a)


# -------------------------
# Vector Tensor
# -------------------------

# Create a 1-dimensional tensor (vector) with 3 elements
# The list defines values along a single axis
v = torch.tensor([1.0, 2.0, 3.0])

# Print the vector tensor
# Shape is (3,)
print("Vector:", v)


# -------------------------
# Matrix Tensor
# -------------------------

# Create a 2-dimensional tensor (matrix)
# Outer list → rows
# Inner lists → columns
m = torch.tensor([
    [1.0, 2.0],
    [3.0, 4.0]
])

# Print the matrix tensor
# Shape is (2, 2)
print("Matrix:\n", m)


# -------------------------
# 3D Tensor
# -------------------------

# Create a 3-dimensional tensor filled with ones
# Shape: (2, 2, 2)
# This means:
# - 2 blocks
# - each block has 2 rows
# - each row has 2 columns
t3 = torch.ones((2, 2, 2))

# Print the 3D tensor
print("3D Tensor:\n", t3)



Scalar: tensor(5)
Vector: tensor([1., 2., 3.])
Matrix:
 tensor([[1., 2.],
        [3., 4.]])
3D Tensor:
 tensor([[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]])


`torch` is doing three critical things:

- Allocates memory for a tensor
- Assigns shape and datatype
- Registers the tensor in PyTorch’s computation system

In [3]:
# -------------------------
# Tensor attributes
# -------------------------

# -------------------------
# Scalar attributes
# -------------------------

# Shape: number of elements along each axis
# For a scalar, shape is empty → ()
print("Shape of scalar:", a.shape)

# Data type of elements stored in the tensor
# Important for precision, memory, and performance
print("Datatype of scalar:", a.dtype)

# Device where the tensor lives (CPU or GPU)
print("Device of scalar:", a.device)

Shape of scalar: torch.Size([])
Datatype of scalar: torch.int64
Device of scalar: cpu


In [4]:
# -------------------------
# Vector attributes
# -------------------------

# Shape of vector → (3,)
# One axis with 3 elements
print("Shape of vector:", v.shape)

# Floating point type (default: float32)
print("Datatype of vector:", v.dtype)

# Device of vector
print("Device of vector:", v.device)

Shape of vector: torch.Size([3])
Datatype of vector: torch.float32
Device of vector: cpu


In [5]:
# -------------------------
# Matrix attributes
# -------------------------

# Shape of matrix → (2, 2)
# Two axes: rows × columns
print("Shape of matrix:", m.shape)

# Datatype of matrix
print("Datatype of matrix:", m.dtype)

# Device of matrix
print("Device of matrix:", m.device)

Shape of matrix: torch.Size([2, 2])
Datatype of matrix: torch.float32
Device of matrix: cpu


In [6]:
# -------------------------
# 3D Tensor attributes
# -------------------------

# Shape of 3D tensor → (2, 2, 2)
# Three axes: blocks × rows × columns
print("Shape of 3D tensor:", t3.shape)

# Datatype of 3D tensor
print("Datatype of 3D tensor:", t3.dtype)

# Device of 3D tensor
print("Device of 3D tensor:", t3.device)

Shape of 3D tensor: torch.Size([2, 2, 2])
Datatype of 3D tensor: torch.float32
Device of 3D tensor: cpu


In [7]:
# -------------------------
# Basic operations
# -------------------------

# -------------------------
# Element-wise addition
# -------------------------

# Create two 1D tensors (vectors) of the same shape (2,)
x = torch.tensor([1.0, 2.0])
y = torch.tensor([3.0, 4.0])

# Element-wise addition
# Each element is added independently:
# [1+3, 2+4]
print("Add:", x + y)

Add: tensor([4., 6.])


In [8]:
# -------------------------
# Element-wise multiplication
# -------------------------

# Element-wise multiplication (Hadamard product)
# Each element is multiplied independently:
# [1*3, 2*4]
print("Multiply:", x * y)


Multiply: tensor([3., 8.])


In [9]:
# -------------------------
# Matrix multiplication
# -------------------------

# Create two 2×2 matrices
# Explicitly set dtype to float32 (required for most DL ops)
a = torch.tensor([[1, 2],
                  [3, 4]], dtype=torch.float32)

b = torch.tensor([[2, 0],
                  [0, 2]], dtype=torch.float32)

# Matrix multiplication
print("MatMul:\n", torch.matmul(a, b))

MatMul:
 tensor([[2., 4.],
        [6., 8.]])


In [10]:
# -------------------------
# Reshaping tensors
# -------------------------

# Create a 1D tensor with values from 0 to 11
# torch.arange(12) → [0, 1, 2, ..., 11]
t = torch.arange(12)

# Print the original tensor
# Shape: (12,)
print("Original:", t)


# Reshape the tensor into a 3×4 matrix
t_reshaped = t.view(3,4)

# Print reshaped tensor
print("Reshaped (3x4):\n", t_reshaped)

Original: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
Reshaped (3x4):
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])


In [11]:
# -------------------------
# Gradients & Autograd
# -------------------------

# Create a tensor with value 2.0
# requires_grad=True tells PyTorch to track operations on this tensor
# so that derivatives (gradients) can be computed later
x = torch.tensor([2.0], requires_grad=True)

# Define a mathematical function using x
# This is the forward pass
# y = x^2 + 3x + 1
y = x**2 + 3*x + 1

# Trigger backpropagation
# PyTorch computes dy/dx using the stored computation graph
# This works because y is a scalar
y.backward()

# Access the gradient of y with respect to x
# The result is stored in x.grad
print("Gradient dy/dx:", x.grad)

Gradient dy/dx: tensor([7.])


In [12]:
# Check GPU availability
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Device:", device)

Device: cuda


CUDA (Compute Unified Device Architecture) is NVIDIA's parallel computing platform and API that lets developers use NVIDIA Graphics Processing Units (GPUs) for general-purpose tasks, not just graphics, enabling massive speedups for data-intensive applications like AI, scientific simulation, and high-performance computing (HPC) by harnessing thousands of GPU cores. 

In [13]:
# Move tensor to GPU if available

x = x.to(device)
print("Tensor after moving to device:", x)

Tensor after moving to device: tensor([2.], device='cuda:0', grad_fn=<ToCopyBackward0>)


- `torch.cuda.is_available()` → Checks if a CUDA-compatible GPU is present.  
- `.to(device)` → Moves tensor to the selected device (`cpu` or `cuda`).  
- Even if you only have CPU, this code runs safely, defaulting to CPU.  
- This is the standard PyTorch pattern for device-agnostic code.  


In [14]:
# -------------------------
# GPU vs CPU Computation
# -------------------------

# We create two square matrices of size 1000 × 1000
# This size is large enough to see CPU vs GPU difference
size = (1000, 1000)

# torch.randn generates random values from a normal distribution
# These tensors live in CPU memory by default
a_cpu = torch.randn(size)
b_cpu = torch.randn(size)


# Move tensors to GPU 
if device == "cuda":
    
    a_gpu = a_cpu.to(device)
    b_gpu = b_cpu.to(device)

    # %timeit is a Jupyter magic command
    # It runs the operation multiple times and reports average execution time

    # Matrix multiplication on CPU
    %timeit torch.matmul(a_cpu, b_cpu)

    # Matrix multiplication on GPU
    %timeit torch.matmul(a_gpu, b_gpu)

7.51 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
564 µs ± 3.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


The GPU execution is significantly faster than the CPU for a 1000×1000 matrix multiplication (~10× speedup). This is because matrix multiplication is a highly parallel, compute-intensive operation that maps well to GPU architecture. Since the tensors were moved to the GPU before timing, this benchmark reflects pure compute performance without CPU–GPU transfer overhead. For large tensor operations and repeated computations (as in neural network training), GPUs provide substantial speedups over CPUs.

# Key Takeaways from Day 13

- PyTorch tensors = NumPy arrays + GPU + autograd
- Supports automatic differentiation with `requires_grad`
- GPU usage drastically accelerates large computations
- PyTorch is foundation for building neural networks

---

<p style="text-align:center; font-size:18px;">
© 2025 Mostafizur Rahman
</p>
