## 00. PyTorch Fundamentals

In [1]:
import torch
import numpy as np
import matplotlib.pyplot as pylt
import pandas

print(torch.__version__)

2.2.2


### Introduction to tensors

## scalar
*scalar is a single number and in tensor-speak it's a zero dimension tensor.*

In [2]:
scalar = torch.tensor(7)
print(f'value of the tensor: {scalar.item()}')
print(f'Dimensions of the tensor: {scalar.ndim}')

value of the tensor: 7
Dimensions of the tensor: 0


### Vector

A vector is a single dimension tensor but can contain many numbers.

As in, you could have a vector [3, 2] to describe [bedrooms, bathrooms] in your house. Or you could have [3, 2, 2] to describe [bedrooms, bathrooms, car_parks] in your house.

The important trend here is that a vector is flexible in what it can represent (the same with tensors).

In [3]:
vector = torch.tensor([8, 7])
print(f'Dimension of the vector: {vector.ndim}')
print(f'Shape of the vector: {vector.shape}')

Dimension of the vector: 1
Shape of the vector: torch.Size([2])


### Matrix 

Matrices are as flexible as vectors, except they've got an extra dimension.

In [4]:
MATRIX = torch.tensor([[7, 9], [0, 9]])

print(f'Dimension of the vector: {MATRIX.ndim}')
print(f'Shape of the vector: {MATRIX.shape}')

Dimension of the vector: 2
Shape of the vector: torch.Size([2, 2])


In [5]:
# Tensor
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]]])
print(f'Dimension of the vector: {TENSOR.ndim}')
print(f'Shape of the vector: {TENSOR.shape}')

Dimension of the vector: 3
Shape of the vector: torch.Size([1, 3, 3])


# LETS SUMMARIZE

| Name    | What is it?                                                                 | Number of dimensions                                                | Lower or Upper (usually/example) |
|---------|-----------------------------------------------------------------------------|----------------------------------------------------------------------|----------------------------------|
| Scalar  | A single number                                                             | 0                                                                    | Lower (a)                        |
| Vector  | A number with direction (e.g. wind speed with direction) or an array of numbers | 1                                                                    | Lower (y)                        |
| Matrix  | A 2-dimensional array of numbers                                            | 2                                                                    | Upper (Q)                        |
| Tensor  | An n-dimensional array of numbers (0D = scalar, 1D = vector, etc.)          | Can be any number of dimensions                                      | Upper                            |


# Random Tensors

We've established that **tensors represent some form of data**.  

Machine learning models such as neural networks manipulate and seek patterns within tensors.  

However, when building models with **PyTorch**, it's rare to create tensors by hand (like we've been doing so far).  

Instead, a machine learning model often starts out with **large random tensors of numbers** and adjusts these random numbers as it works through data to better represent it.  

---

### In essence:
1. Start with random numbers  
2. Look at data  
3. Update random numbers  
4. Look at data  
5. Update random numbers...  

---

As a data scientist, you define how the machine learning model:  
- **Initializes** → how it starts with random numbers  
- **Represents data** → how it looks at data  
- **Optimizes** → how it updates the random numbers  

---

### Creating Random Tensors in PyTorch

In [6]:
random_tensor = torch.rand(size=(4, 4, 9))
print(f'Shape of the tensor: {random_tensor.shape}')
print(f'Data type: {random_tensor.dtype} \n')

print(random_tensor)

Shape of the tensor: torch.Size([4, 4, 9])
Data type: torch.float32 

tensor([[[0.0504, 0.1529, 0.7912, 0.7139, 0.3253, 0.7680, 0.7462, 0.4134,
          0.4700],
         [0.3667, 0.0061, 0.6525, 0.4448, 0.3475, 0.7442, 0.3174, 0.9970,
          0.5822],
         [0.0468, 0.0233, 0.1331, 0.3126, 0.8998, 0.0353, 0.0408, 0.1027,
          0.0201],
         [0.3672, 0.2071, 0.7648, 0.5944, 0.5853, 0.9856, 0.2334, 0.4265,
          0.7335]],

        [[0.0793, 0.5394, 0.9670, 0.3694, 0.2816, 0.6009, 0.7456, 0.7286,
          0.4025],
         [0.1240, 0.7781, 0.2448, 0.7096, 0.2484, 0.4169, 0.4272, 0.9825,
          0.4164],
         [0.9887, 0.2832, 0.6494, 0.7354, 0.1801, 0.2302, 0.0594, 0.9070,
          0.3206],
         [0.6450, 0.1805, 0.6504, 0.2554, 0.5649, 0.5604, 0.9675, 0.3219,
          0.9737]],

        [[0.2475, 0.2523, 0.2507, 0.3250, 0.8399, 0.8780, 0.9019, 0.3290,
          0.7969],
         [0.0798, 0.7652, 0.3747, 0.3051, 0.3304, 0.7422, 0.8784, 0.5432,
          0.672

In [7]:
# Create a tensor of all zeros
zeros = torch.zeros(size=(3, 4))
zeros, zeros.dtype

(tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]),
 torch.float32)

In [8]:
# Create a tensor of all ones
ones = torch.ones(size=(3, 4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

In [9]:
# zero_to_ten_deprecated = torch.range(0, .9) 

In [10]:
zero_to_ten = torch.arange(start=0, end=11, step=1)
zero_to_ten

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [11]:
zeros_like  = torch.zeros_like(input=zero_to_ten)
zeros_like

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [12]:
float_32_tensor = torch.tensor([3, 6, 9], 
                            dtype=None,
                              device=None,
                              requires_grad=False)

float_32_tensor.shape

torch.Size([3])

# Tensor Datatypes in PyTorch

Tensors in PyTorch can be created with **different datatypes** (`dtype`). The datatype determines:

* **How the tensor stores numbers** (integers, floats, booleans, etc.)
* **The precision of numbers** (8-bit, 16-bit, 32-bit, 64-bit).
* **The device they run on** (CPU or GPU).

---

## 🔹 Why so many datatypes?

1. **Precision (detail of numbers)**

   * `torch.float16` (half precision) uses **less memory**, runs **faster**, but is **less accurate**.
   * `torch.float32` (single precision) is the **default**, balancing speed and accuracy.
   * `torch.float64` (double precision) is **more accurate**, but **slower** and uses **more memory**.

2. **Device compatibility**

   * If you see **`torch.cuda`**, the tensor is using the **GPU**.
   * If no device is set, tensors are created on the **CPU** by default.

3. **Operations compatibility**

   * Tensors must have the **same datatype** and be on the **same device** for operations.
   * Example: `float32` tensor cannot directly be added to a `float16` tensor without casting.

---

## 🔹 Summary Table of PyTorch Datatypes

| **Datatype**     | **Alias**      | **Size** | **Description**                                  | **Use case**                                   |
| ---------------- | -------------- | -------- | ------------------------------------------------ | ---------------------------------------------- |
| `torch.float16`  | `torch.half`   | 16-bit   | Low precision float (half precision)             | Faster training (e.g. mixed precision on GPUs) |
| `torch.float32`  | `torch.float`  | 32-bit   | Default float (single precision)                 | Standard ML/DL computations                    |
| `torch.float64`  | `torch.double` | 64-bit   | High precision float (double precision)          | Scientific computing, when accuracy > speed    |
| `torch.int8`     | -              | 8-bit    | Integer                                          | Compact storage, quantized models              |
| `torch.int16`    | `torch.short`  | 16-bit   | Small integer                                    | Rarely used                                    |
| `torch.int32`    | `torch.int`    | 32-bit   | Standard integer                                 | General integer ops                            |
| `torch.int64`    | `torch.long`   | 64-bit   | Large integer                                    | Indexing, counters                             |
| `torch.bool`     | -              | 1-bit    | Boolean values (True/False)                      | Masks, conditions                              |
| `torch.bfloat16` | -              | 16-bit   | Brain floating point (better range than float16) | Training on TPUs / some GPUs                   |

---

✅ **Key notes**:

* **Default**: `torch.float32`.
* **Best for deep learning**: Mix of `torch.float16` (for speed) and `torch.float32` (for stability).
* **GPU**: Use `tensor.to("cuda")` for GPU acceleration.
* **Precision tradeoff**: Higher precision → slower but more accurate.

In [15]:
# Default tensor (float32)
float_32_tensor = torch.tensor([3.0, 6.0, 9.0])  
print(float_32_tensor.dtype)  # torch.float32

# Half precision (float16)
float_16_tensor = torch.tensor([3.0, 6.0, 9.0], dtype=torch.float16)  
print(float_16_tensor.dtype)  # torch.float16

# Double precision (float64)
float_64_tensor = torch.tensor([3.0, 6.0, 9.0], dtype=torch.float64)  
print(float_64_tensor.dtype, float_64_tensor)  # torch.float64

# Integer tensor (int32)
int_tensor = torch.tensor([1, 2, 3], dtype=torch.int32)  
print(int_tensor.dtype)  # torch.int32

# Boolean tensor
bool_tensor = torch.tensor([True, False, True])  
print(bool_tensor.dtype)  # torch.bool

torch.float32
torch.float16
torch.float64 tensor([3., 6., 9.], dtype=torch.float64)
torch.int32
torch.bool


In [21]:
rand_tensor = torch.rand(3, 4)

print(f'Tensor: {rand_tensor}')
print(f'Shape of the tensor: {rand_tensor.shape}')
print(f'Data type of the tensor: {rand_tensor.dtype}')
print(f'Device which the data is stored on: {rand_tensor.device}')

Tensor: tensor([[0.1187, 0.0805, 0.0168, 0.0373],
        [0.9154, 0.6223, 0.3162, 0.0125],
        [0.7083, 0.7527, 0.5687, 0.2659]])
Shape of the tensor: torch.Size([3, 4])
Data type of the tensor: torch.float32
Device which the data is stored on: cpu


In [34]:
tensor = torch.ones(4, 4)

print(f'tensor: {tensor} \n')
print(f'First low: {tensor[0]} \n')
print(f'first column: {tensor[:, 1]}')
print(f'last column: {tensor[..., -1]}')
tensor[:, 1] = 0
print(f'Modified tensor now: \n {tensor}')

tensor: tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]) 

First low: tensor([1., 1., 1., 1.]) 

first column: tensor([1., 1., 1., 1.])
last column: tensor([1., 1., 1., 1.])
Modified tensor now: 
 tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


### Basic Operations 
Let's start with a few of the fundamental operations, addition (+), subtraction (-), mutliplication (*)

In [35]:
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [38]:
tensor + 10

tensor([11, 12, 13])

In [37]:
tensor * 10

tensor([10, 20, 30])

# 🧮 Matrix Multiplication in PyTorch

Matrix multiplication is **one of the most fundamental operations** in machine learning and deep learning, powering neural networks, embeddings, and transformations.

In PyTorch, matrix multiplication can be done using:

* `torch.matmul()`
* The `@` operator (shortcut for matrix multiplication)

---

## 🔑 Rules of Matrix Multiplication

When multiplying two matrices (or tensors):

1. **Inner dimensions must match**

   * If one matrix has shape `(m, n)` and the other `(n, p)`, then multiplication is possible.
   * Example: `(2, 3) @ (3, 2)` works, but `(3, 2) @ (3, 2)` does not.

2. **The resulting matrix shape comes from the outer dimensions**

   * Example: `(2, 3) @ (3, 2)` results in `(2, 2)`
   * Example: `(3, 2) @ (2, 3)` results in `(3, 3)`

👉 Think of it as: **inner dimensions collapse, outer dimensions remain.**

---

## ✨ Element-wise vs Matrix Multiplication

It is important to distinguish between two types of multiplication:

* **Element-wise multiplication (`*`)**
  Each entry in one matrix is multiplied by the corresponding entry in the other.
  Requires matrices of the same shape (or broadcastable).

  * Example: `(2, 2) * (2, 2)` → `(2, 2)`

* **Matrix multiplication (`@` or `matmul`)**
  Uses linear algebra rules. Inner dimensions must align, and the result takes the shape of the outer dimensions.

  * Example: `(2, 3) @ (3, 2)` → `(2, 2)`

---

## 🧩 Why is this important in Deep Learning?

* **Linear layers (fully connected layers)** are matrix multiplications between:

  * Input data (batch size × input features)
  * Weights (input features × output features)
  * Output (batch size × output features)

* **Transformations in neural networks** (embeddings, convolutions in linear form, attention in Transformers) are built on matrix multiplication.

* **GPUs** excel at matrix multiplication, making it the backbone of efficient deep learning computations.

---

## 📊 Summary Table

| Operation                       | Symbol                | Shape Rule                                                 | Example                      |
| ------------------------------- | --------------------- | ---------------------------------------------------------- | ---------------------------- |
| **Element-wise multiplication** | `*`                   | Shapes must match (or be broadcastable)                    | `(2, 2) * (2, 2)` → `(2, 2)` |
| **Matrix multiplication**       | `@` or `torch.matmul` | Inner dimensions must match; result takes outer dimensions | `(2, 3) @ (3, 2)` → `(2, 2)` |

---

📚 **Resources**

* [PyTorch Documentation — `torch.matmul`](https://pytorch.org/docs/stable/generated/torch.matmul.html)
* [Wikipedia: Matrix Multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication)

In [44]:
tensor = torch.tensor([1, 2, 5])
tensor.shape, tensor.ndim

(torch.Size([3]), 1)

In [42]:
tensor * tensor

tensor([ 1,  4, 25])

In [43]:
tensor @ tensor

tensor(30)

In [45]:
%%time
# Matrix multiplication by hand 
# (avoid doing operations with for loops at all cost, they are computationally expensive)
value = 0
for i in range(len(tensor)):
  value += tensor[i] * tensor[i]
value

CPU times: user 1.02 ms, sys: 833 µs, total: 1.85 ms
Wall time: 1.21 ms


tensor(30)

In [46]:
%%time
torch.matmul(tensor, tensor)

CPU times: user 269 µs, sys: 169 µs, total: 438 µs
Wall time: 5.97 ms


tensor(30)

In [52]:
# Shapes need to be in the right way  
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

# print(f'dim of tensor_A: {tensor_A.shape}')

tensor_B = torch.tensor([[7, 10],
                         [8, 11], 
                         [9, 12]], dtype=torch.float32)

# print(f'dim of tensor_A: {tensor_B.shape}')


matmul = torch.matmul(tensor_A, tensor_B.T)
matmul

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

> **Note:**  
> A matrix multiplication like this is also referred to as the **dot product** of two matrices.

# 🔗 Neural Networks and Matrix Multiplication

At their core, **neural networks are built on matrix multiplications and dot products.**

One of the most common building blocks in PyTorch is the **`torch.nn.Linear()` module**, also called a:

* **Feed-forward layer**
* **Fully connected layer (FC layer)**

This layer essentially performs a **linear transformation** of the input data.

---

## ⚙️ How it Works

The formula behind a linear layer is:

$$
y = xA^T + b
$$

Where:

* **x** → The **input** to the layer (a vector or matrix).
* **A** → The **weights matrix** created by the layer.

  * Starts out as random numbers.
  * Gets adjusted during training as the model learns patterns in the data.
  * Notice the **transpose (T)** because the multiplication aligns dimensions properly.
* **b** → The **bias term**, which allows shifting the output and gives the model more flexibility.
* **y** → The **output**, which is a transformed version of the input.

---

## 📐 Connection to High School Algebra

This is just a higher-dimensional version of the simple **linear function** you may have seen before:

$$
y = mx + b
$$

* Here, **m** is the slope (weights in neural networks).
* **b** is the bias (same as in neural networks).
* The difference is that in deep learning, instead of a single line, we’re working with **matrices and vectors** to represent higher-dimensional relationships.

---

## 🔑 Why This Matters in Deep Learning

* Every **fully connected layer** in a neural network is a **matrix multiplication + bias addition**.
* By stacking many layers, neural networks can represent complex, non-linear patterns.
* Training adjusts the weights (**A**) and biases (**b**) to minimize the difference between predicted output and actual data.

---

✅ **Summary:**
`torch.nn.Linear()` = **Matrix Multiplication (input × weights) + Bias**

This is the foundation of most deep learning architectures.

In [54]:
torch.manual_seed(42)

linear = torch.nn.Linear(in_features=2, out_features=6)

x = tensor_A
output = linear(x)

In [56]:
print(f"Input shape: {x.shape}\n")
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

Input shape: torch.Size([3, 2])

Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])


In [63]:
x = torch.arange(0, 100, 100)
print(f'the data type: {x.dtype}')
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
# print(f"Mean: {x.mean()}") # this will error
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

the data type: torch.int64
Minimum: 0
Maximum: 0
Mean: 0.0
Sum: 0


In [64]:
tensor = torch.arange(10., 100., 10.)
tensor.dtype

torch.float32

In [65]:
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

In [66]:
tensor_int8 = tensor.type(torch.int8)
tensor_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)

### 🔹 Reshaping, stacking, squeezing, and unsqueezing tensors

* **torch.reshape(input, shape)**
  Changes the arrangement of data into a new shape.
  Example: `(2, 3)` → reshape → `(3, 2)`
  (same 6 elements, just rearranged).

* **Tensor.view(shape)**
  Same as reshape but it shares the same data as the original tensor.
  Example: `(4, 2)` → view → `(2, 4)`

* **torch.stack(tensors, dim=0)**
  Combines multiple tensors of the same size by adding a new dimension.
  Example: three tensors of shape `(2, 2)` → stack → `(3, 2, 2)`

* **torch.squeeze(input)**
  Removes dimensions of size 1.
  Example: `(1, 3, 1, 5)` → squeeze → `(3, 5)`

* **torch.unsqueeze(input, dim)**
  Adds a new dimension of size 1.
  Example: `(3, 5)` → unsqueeze at dim=0 → `(1, 3, 5)`
  Example: `(3, 5)` → unsqueeze at dim=2 → `(3, 5, 1)`

* **torch.permute(input, dims)**
  Rearranges the order of dimensions.
  Example: `(2, 3, 5)` → permute to `(3, 5, 2)`

In [71]:
x = torch.arange(1., 10.)
x.shape, x, x.dtype

(torch.Size([9]), tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), torch.float32)

In [79]:
x.reshape(1, 3, 3)

tensor([[[1., 2., 3.],
         [4., 5., 6.],
         [7., 8., 9.]]])

In [80]:
z = x.view(1, 9)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

In [84]:
z[:, 3]

tensor([4.])

In [98]:
x_stacked = torch.vstack([x, x, x, x])
x_stacked.shape, x_stacked

(torch.Size([4, 9]),
 tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.],
         [1., 2., 3., 4., 5., 6., 7., 8., 9.],
         [1., 2., 3., 4., 5., 6., 7., 8., 9.],
         [1., 2., 3., 4., 5., 6., 7., 8., 9.]]))

In [100]:
x_reshaped_by1 = x.reshape(1, 9)
x_reshaped_by3 = x.reshape(3, 3)

In [103]:
x_reshaped_by1.squeeze()

tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [105]:
x_reshaped_by3.squeeze()

torch.Size([3, 3])

In [111]:
x_unsquezed = x_reshaped_by3.unsqueeze(dim=1)
x_unsquezed.shape, x_unsquezed

(torch.Size([3, 1, 3]),
 tensor([[[1., 2., 3.]],
 
         [[4., 5., 6.]],
 
         [[7., 8., 9.]]]))

In [110]:
x_reshaped_by3.unsqueeze(dim=0)

torch.Size([1, 3, 3])

 ### Understanding `dim=0`

* `dim=0` refers to the **first axis** of a tensor.
* In a **2D matrix**, this is the **rows** axis.
* When you perform an operation along `dim=0`, PyTorch combines values **across rows**, while keeping columns separate.

**Example:**
Suppose you have a tensor of shape `(3, 4)` → 3 rows, 4 columns.

```
[
 [1, 2, 3, 4],      <- row 0
 [5, 6, 7, 8],      <- row 1
 [9, 10, 11, 12]    <- row 2
]
```

* If you **sum over `dim=0`**, you’re adding values down each column:
  Result: `[1+5+9, 2+6+10, 3+7+11, 4+8+12]` → `[15, 18, 21, 24]`
* Shape changes from `(3, 4)` → `(4,)`

So, `dim=0` means: **operate top-to-bottom across rows**.

---

### Understanding `dim=1`

* `dim=1` refers to the **second axis** of a tensor.
* In a **2D matrix**, this is the **columns** axis.
* When you perform an operation along `dim=1`, PyTorch combines values **across columns**, while keeping rows separate.

**Example:**
Same matrix `(3, 4)`:

```
[
 [1, 2, 3, 4],      <- row 0
 [5, 6, 7, 8],      <- row 1
 [9, 10, 11, 12]    <- row 2
]
```

* If you **sum over `dim=1`**, you’re adding values across each row:
  Row 0 → `1+2+3+4 = 10`
  Row 1 → `5+6+7+8 = 26`
  Row 2 → `9+10+11+12 = 42`

Result: `[10, 26, 42]`

* Shape changes from `(3, 4)` → `(3,)`

So, `dim=1` means: **operate left-to-right across columns**.

---

### Extending Beyond 2D

* **1D Tensor (vector):** Only `dim=0` exists.
* **3D Tensor (e.g., image batches):**

  * `dim=0` → batch (different images).
  * `dim=1` → rows/height.
  * `dim=2` → columns/width.

Each new dimension just adds another “direction” to your data.

In [113]:
x_original = torch.rand(size=(224, 224, 3))

x_permuted = x_original.permute(2, 0, 1)

In [114]:
x_permuted.shape

torch.Size([3, 224, 224])

# PyTorch Tensors & NumPy

Since **NumPy** is one of the most popular Python libraries for numerical computing, PyTorch provides built-in functionality to easily interact with it. This makes it convenient to switch between the two libraries depending on what operations you need.

The two most important methods are:

* **`torch.from_numpy(ndarray)`**
  Converts a **NumPy array** into a **PyTorch tensor**.
  This is useful when you already have data in NumPy and want to use PyTorch features (like GPU acceleration or autograd).

* **`torch.Tensor.numpy()`**
  Converts a **PyTorch tensor** into a **NumPy array**.
  This is helpful when you want to leverage NumPy’s rich ecosystem of functions, visualization tools, or integrations with other Python libraries.

⚡ Note: Both objects share the **same underlying memory** when possible. This means changes in the tensor may reflect in the NumPy array (and vice versa).

In [124]:
array = np.arange(1.0, 8.)

tensor = torch.from_numpy(array)

numpy_tensor = tensor.numpy()

tensor, numpy_tensor

(tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64),
 array([1., 2., 3., 4., 5., 6., 7.]))

# Reproducibility (Taking the Random Out of Random)

As you dive deeper into **neural networks and machine learning**, you’ll notice how much randomness plays a role.

Of course, this is not *true randomness*, but rather **pseudorandomness**. Computers are deterministic by design (each step is predictable), so the randomness they generate is simulated.

---

### Why does randomness matter in neural networks?

Neural networks often **start with random numbers** to initialize their parameters (like weights). These numbers initially describe patterns in data very poorly. Through training, the network applies many **tensor operations** and adjustments, improving these random numbers step by step until they represent useful patterns.

In short:
random numbers → tensor operations → better numbers → repeat until improved

---

### The problem with randomness

While randomness is powerful, it makes experiments harder to reproduce. For example:

* You design a model that achieves a certain level of performance.
* Your friend runs the same code but gets slightly different results.

This inconsistency happens because of the randomness in initialization and other operations.

---

### The solution: Reproducibility

Reproducibility means being able to **get the same (or very similar) results** when running the same code on different machines or at different times.

It is crucial for:

* **Experiment verification** – others can confirm your results.
* **Debugging** – you can consistently test changes.
* **Scientific progress** – reliable results are needed to build upon prior work.

---

## Best Practices for Reproducibility in PyTorch

To reduce randomness and make experiments reproducible, you can follow these practices:

1. **Set random seeds**

   * PyTorch, NumPy, and Python’s `random` module all generate random numbers.
   * Setting the same seed ensures they produce the same sequences each time.

2. **Fix CUDA randomness** (when using GPUs)

   * Some GPU operations are nondeterministic for efficiency reasons.
   * You can enable deterministic algorithms in PyTorch to reduce this.

3. **Control data shuffling**

   * Data loaders often shuffle batches randomly during training.
   * Fixing the random seed ensures the same batch order across runs.

4. **Document your environment**

   * Versions of PyTorch, CUDA, and other libraries can impact reproducibility.
   * Always record software versions when running experiments.

5. **Be aware of hardware differences**

   * CPUs vs. GPUs, or even different GPU models, can sometimes lead to slight numerical differences.

In [127]:
random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)

print(f"Tensor A:\n{random_tensor_A}\n")
print(f"Tensor B:\n{random_tensor_B}\n")
print(f"Does Tensor A equal Tensor B? (anywhere)")
random_tensor_A == random_tensor_B

Tensor A:
tensor([[0.8892, 0.1317, 0.7164, 0.2706],
        [0.6560, 0.6614, 0.3648, 0.1490],
        [0.8693, 0.0294, 0.5449, 0.0794]])

Tensor B:
tensor([[0.2313, 0.2927, 0.3386, 0.3708],
        [0.0831, 0.0944, 0.6568, 0.7708],
        [0.1053, 0.1467, 0.1932, 0.9118]])

Does Tensor A equal Tensor B? (anywhere)


tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

### 1️⃣ Randomness on a computer is fake

Computers are **deterministic**, meaning if you give the same input, you always get the same output. True randomness doesn’t naturally exist for computers.

So what computers do is generate **pseudorandom numbers**—numbers that **look random** but are actually calculated using a deterministic algorithm.

---

### 2️⃣ Seeds “flavor” the randomness

The pseudorandom number generator (PRNG) works like this:

```
next_number = f(previous_number)
```

* It starts with a number called the **seed**.
* From that seed, it deterministically generates a sequence of numbers that **looks random**.
* If you start from the same seed, the sequence is exactly the same every time.

Think of it like starting a song from the same exact point: the melody will always be identical.

---

### 3️⃣ Why 42 is used

* There’s **nothing magical** about 42 mathematically.
* It became a **popular convention** because of *The Hitchhiker’s Guide to the Galaxy*.
* Any integer could be used (1, 123, 2025…) and it would work the same for reproducibility.

It’s purely for **convenience and culture**, not for randomness itself.

---

### 4️⃣ Why you care about seeds in PyTorch

Imagine you’re training a neural network:

* If you initialize weights randomly without a seed, every run gives slightly different results.
* This makes debugging very hard, because you can’t tell if a change in output is due to your code or just random chance.

Setting a seed with `torch.manual_seed(seed)`:

```python
import torch

torch.manual_seed(42)
a = torch.rand(3)

torch.manual_seed(42)
b = torch.rand(3)

print(a)
print(b)
```

**Output will always be the same**, e.g.:

```
tensor([0.3745, 0.9507, 0.7320])
tensor([0.3745, 0.9507, 0.7320])
```

Even though the numbers are “random-looking,” using the same seed guarantees reproducibility.

---

### 5️⃣ Analogy: baking cookies

* Seed = recipe
* PRNG = oven
* Random numbers = cookies

If you use the **same recipe (seed)**, you’ll always get the **same batch of cookies**, even if they look random (different shapes, chocolate chips, etc.)

---

✅ **Key points:**

1. Computers can’t generate true randomness.
2. Seeds are just starting points for a pseudorandom sequence.
3. 42 is a cultural convention, not special mathematically.
4. Using the same seed makes your “random” outputs reproducible.

In [128]:
import random

In [133]:
RANDOM_SEED = 42

torch.manual_seed(seed=RANDOM_SEED)
random_tensor_C = torch.rand(3, 4)

torch.manual_seed(seed=RANDOM_SEED)
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
print(f"Does Tensor C equal Tensor D? (anywhere)")

random_tensor_C == random_tensor_D

Tensor C:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Tensor D:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Does Tensor C equal Tensor D? (anywhere)


tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])

## 1️⃣ The basic idea

A **PRNG** is an algorithm that produces a sequence of numbers that:

* Looks random
* Can be **reproduced** if you know the starting point (the seed)

It’s called **pseudo** because it’s **deterministic**—the same input always gives the same output.

Think of it like a really complicated math function that takes a number (seed) and churns out “random-looking” numbers.

---

## 2️⃣ How it works internally (simplified)

Most PRNGs follow this pattern:

```
state = seed
while generating numbers:
    state = f(state)      # a deterministic function
    output = transform(state)
    yield output
```

Where:

* `state` = internal memory of the generator
* `f(state)` = mathematical function that scrambles the state
* `transform(state)` = converts the state into a number between 0 and 1

---

### Example: Linear Congruential Generator (LCG)

One of the simplest PRNGs is the **LCG**, used in many languages:

```
X_(n+1) = (a * X_n + c) % m
```

Where:

* `X_n` = current state
* `a`, `c`, `m` = carefully chosen constants
* `% m` = modulo operation

Then you normalize the number:

```
random_number = X_n / m   # now it’s between 0 and 1
```

**Step by step**:

1. Pick a seed `X_0`
2. Apply the formula to get `X_1`
3. Output `X_1 / m` as your “random” number
4. Repeat for `X_2, X_3, ...`

Even though this is purely deterministic, the sequence **appears random**.

---

## 3️⃣ Why changing the seed gives different sequences

* The seed is just `X_0`.
* Even changing it by 1 gives a completely different sequence because the function `f(state)` **amplifies differences** over iterations.
* Good PRNGs make it so that sequences from similar seeds are **unpredictable and statistically independent**.

---

## 4️⃣ Why it looks random

PRNGs are designed to pass statistical tests:

* Uniformity: all numbers are equally likely
* Independence: one number doesn’t predict the next
* Long period: the sequence doesn’t repeat for a huge number of numbers

Modern PRNGs, like **Mersenne Twister** (used in Python and PyTorch), have:

* Period = 2^19937 − 1 (a HUGE number)
* Very uniform distribution
* Good randomness properties for simulations and machine learning

---

## 5️⃣ Key points about PRNGs

1. **Deterministic** → reproducible if you know the seed
2. **Any integer seed works** → it just sets the starting state
3. **Not truly random** → computers need hardware RNGs for real entropy
4. **Good PRNGs** → sequences pass randomness tests and appear unpredictable

---

### Analogy

Imagine a **huge maze with numbered doors**:

* Seed = starting door
* PRNG = a rule that tells you which door to go to next
* Numbers you see = output of PRNG

Even though the maze has a fixed layout, if you start at **the same door (seed)**, your path (sequence of numbers) is exactly the same. Start at a **different door**, and your path looks completely different.

In [134]:
!nvidia-smi

zsh:1: command not found: nvidia-smi


In [136]:
torch.backends.mps.is_available()

True

In [137]:
device = "mps" if torch.backends.mps.is_available() else "cpu"
device

'mps'

### Putting tensors (and models) on the GPU

In [139]:
tensor = torch.tensor([1, 2, 3, 4])

print(f"Tensor: {tensor} \nDevice: {tensor.device}")

Tensor: tensor([1, 2, 3, 4]) 
Device: cpu


In [142]:
# Moving tensor to GPU 
tensor_on_gpu = tensor.to(device)

tensor_on_gpu

tensor([1, 2, 3, 4], device='mps:0')

In [144]:
tensor_on_gpu.cpu().numpy()

array([1, 2, 3, 4])

In [145]:
tensor_on_gpu

tensor([1, 2, 3, 4], device='mps:0')