# 📌 Understanding Tensors — The Foundation of PyTorch

Before we start coding, let's first **understand what a Tensor actually is**.

---

## 🔹 What is a Tensor?

A **Tensor** is a **multi-dimensional array**, a generalization of scalars, vectors, and matrices.

| Object Type | Dimensions | Example | PyTorch Shape |
|--------------|-------------|----------|----------------|
| **Scalar**   | 0D | $x = 5$ | `torch.Size([])` |
| **Vector**   | 1D | $\mathbf{v} = [1, 2, 3]$ | `torch.Size([3])` |
| **Matrix**   | 2D | $$ A = \begin{bmatrix} 1 & 2 & 3 \\\\ 4 & 5 & 6 \end{bmatrix}$$ | `torch.Size([2, 3])` |
| **Tensor (3D+)** | 3D+ | Think of it as a cube or higher-dimensional block | `torch.Size([depth, height, width])` |

Formally, a tensor is an element of a **tensor space**, written as:

$$
\mathbb{R}^{n_1 \times n_2 \times \dots \times n_k}
$$

---

## 🔹 Why Are Tensors Useful?

✅ **Universal Data Structure** — Tensors can represent almost any kind of data:
- A grayscale image → 2D tensor `[height, width]`
- An RGB image → 3D tensor `[channels, height, width]`
- A batch of images → 4D tensor `[batch, channels, height, width]`

✅ **Hardware Accelerated** — PyTorch Tensors run on **GPU** using CUDA for massive speedups.  
A tensor operation (like addition, dot product, etc.) can run thousands of times faster on GPU.

✅ **Automatic Differentiation** — When `requires_grad=True`, PyTorch automatically tracks operations to compute **gradients** during training (key for Deep Learning).

✅ **Mathematical Expressiveness** — Tensors make it easy to express:
- Linear algebra operations  
- Convolutions  
- Attention mechanisms (in Transformers)  
- Gradient computations

---

## 🔹 Tensor Analogy

| Concept | Intuition |
|----------|------------|
| **Scalar** | A single number — point on a line |
| **Vector (1D tensor)** | Direction + magnitude — point in 1D/2D/3D space |
| **Matrix (2D tensor)** | Grid of numbers — transformation in space |
| **Tensor (3D+)** | Stack of matrices — complex data like images or volumes |

---

## 🔹 Visualization Idea

Imagine stacking sheets of paper (each a 2D matrix):  
The whole stack becomes a **3D tensor**.  
If you stack multiple such stacks, you get **4D and beyond**.

---

## ✅ Summary

- A **Tensor** is the core data structure of PyTorch.  
- It’s a **multi-dimensional array** with GPU acceleration and gradient tracking.  
- All deep learning models (CNNs, RNNs, Transformers) process tensors as inputs, weights, and outputs.  
- In this notebook, we’ll begin with **1D tensors (vectors)** — the simplest and most fundamental form.


In [91]:
!pip install torch



# 📘 Step 1: Creating 1D Tensors in PyTorch

Now that we know what tensors are, let’s start **creating 1D tensors** (vectors) — the simplest form of tensor.

A **1D Tensor** is like a mathematical vector:
$$
\mathbf{x} = [x_0, x_1, \dots, x_{n-1}]
$$

In PyTorch:
- 1D tensors have shape `(n,)`
- They store data of a specific **dtype** (`float32`, `int64`, etc.)
- They can live on different **devices** (CPU, GPU)

---

### 🔹 1. Create from Python List

You can create a tensor directly from a Python list using `torch.tensor()`.

---

### 🔹 2. Factory Functions

PyTorch provides convenient tensor **factory functions**:

| Function | Description | Example |
|-----------|--------------|----------|
| `torch.zeros(n)` | n zeros | `[0, 0, 0]` |
| `torch.ones(n)` | n ones | `[1, 1, 1]` |
| `torch.arange(start, end, step)` | Range sequence | `[0, 2, 4, 6, 8]` |
| `torch.linspace(start, end, steps)` | Evenly spaced values | `[0.0, 0.25, 0.5, 0.75, 1.0]` |
| `torch.randn(n)` | Random normal values | e.g., `[0.24, -0.89, 1.02]` |

---

### 🔹 3. Dtype & Device

You can specify:
- `dtype=torch.float32` or `torch.int64`
- `device='cpu'` or `device='cuda'` (if GPU available)

---

### ✅ Example Summary

We'll:
1. Create tensors from lists  
2. Use factory functions  
3. Explore dtype and shape

In [92]:
import torch
import numpy as np
import pandas as pd
import matplotlib.pylab as plt

In [93]:
## Check torch version 
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

PyTorch version: 2.9.0
CUDA available: False


In [94]:
# 1️⃣ From Python lists 
# 1.1 From Integer List
list_of_integer = [1, 2, 3, 4, 5, 6, 7, 8]
v1 = torch.tensor(list_of_integer)
print(f"v1: {v1}")
print(f"Shape: {v1.shape} | dType: {v1.dtype} | type: {v1.type()}")

# 1.2 From Float List
list_of_float = [1.0, 2.1, 3.0, 4.7, 5.2, 6.9, 7.0, 8,6]
v2 = torch.tensor(list_of_float)
print(f"v2: {v2}")
print(f"Shape: {v2.shape} | dType: {v2.dtype} | type: {v2.type()}")

v1: tensor([1, 2, 3, 4, 5, 6, 7, 8])
Shape: torch.Size([8]) | dType: torch.int64 | type: torch.LongTensor
v2: tensor([1.0000, 2.1000, 3.0000, 4.7000, 5.2000, 6.9000, 7.0000, 8.0000, 6.0000])
Shape: torch.Size([9]) | dType: torch.float32 | type: torch.FloatTensor


In [95]:
# 2️⃣ with explict type
v2 = torch.tensor([1.2, 2.3, 3.4, 5.0], dtype=torch.float32)
print(f"v2: {v2}")
print(f"Shape: {v2.shape} | dType: {v2.dtype} | type: {v2.type()}")

v3 = torch.tensor([1, 2, 3, 4, 5, 6], dtype=torch.int64) 
print(f"v3: {v3}")
print(f"Shape: {v3.shape} | dType: {v3.dtype} | type: {v3.type()}")

# 3️⃣ with explict type (FloatTensor)
v4 = torch.FloatTensor([1.2, 2.5, 3.7, 4.3])
print(f"v4: {v4}")
print(f"Shape: {v4.shape} | dType: {v4.dtype} | type: {v4.type()}")

v5 = torch.IntTensor([1, 2 ,3, 4, 5])
print(f"v4: {v5}")
print(f"Shape: {v5.shape} | dType: {v5.dtype} | type: {v5.type()}")


v2: tensor([1.2000, 2.3000, 3.4000, 5.0000])
Shape: torch.Size([4]) | dType: torch.float32 | type: torch.FloatTensor
v3: tensor([1, 2, 3, 4, 5, 6])
Shape: torch.Size([6]) | dType: torch.int64 | type: torch.LongTensor
v4: tensor([1.2000, 2.5000, 3.7000, 4.3000])
Shape: torch.Size([4]) | dType: torch.float32 | type: torch.FloatTensor
v4: tensor([1, 2, 3, 4, 5], dtype=torch.int32)
Shape: torch.Size([5]) | dType: torch.int32 | type: torch.IntTensor


In [96]:
# 4️⃣ Convering float to Int

list_of_floats = [1.2, 2.0, 3.2]
v5 = torch.tensor(list_of_floats, dtype=torch.int64)
print(f"v5: {v5}")

# Another Method to convert float to Int tesnor and vice versa

v6 = torch.IntTensor(list_of_floats)
print(f"v6: {v6}")

v5: tensor([1, 2, 3])
v6: tensor([1, 2, 3], dtype=torch.int32)


In [97]:
# 5️⃣ tensor_obj.size() & tensor_object.ndimension() methods

tensor_obj = torch.tensor([1, 2, 3, 4], dtype=torch.int64)

print(f"Size of the tensor object: {tensor_obj.size()}")
print(f"Dimensions of the tensor object: {tensor_obj.ndimension()}")


Size of the tensor object: torch.Size([4])
Dimensions of the tensor object: 1


In [98]:
# 6️⃣ Factory functions — quick ways to create tensors

# 🔹 Create a 1D tensor of all zeros (length = 5)
zeros = torch.zeros(5)   # [0., 0., 0., 0., 0.]

# 🔹 Create a 1D tensor of all ones (length = 4)
ones = torch.ones(4)     # [1., 1., 1., 1.]

# 🔹 Create evenly spaced values within a range using step size
#    torch.arange(start, end, step)
#    end is *exclusive* (like Python range)
arange = torch.arange(0, 10, 2)  # [0, 2, 4, 6, 8]



# 🔹 Create evenly spaced values between two numbers (inclusive)
#    torch.linspace(start, end, steps)
linspace = torch.linspace(0, 1, 5)  # [0.0000, 0.2500, 0.5000, 0.7500, 1.0000]

linspace_1 = torch.linspace(0, 1, steps=5)

# 🔹 Create random values from a standard normal distribution (mean=0, std=1)
randn = torch.randn(6)   # e.g., [0.12, -1.45, 0.73, ...]

# ✅ Print results
print("\nzeros:", zeros)
print("ones:", ones)
print("arange:", arange)
print("linspace:", linspace)
print("linspace_1:", linspace_1)
print("randn:", randn)



zeros: tensor([0., 0., 0., 0., 0.])
ones: tensor([1., 1., 1., 1.])
arange: tensor([0, 2, 4, 6, 8])
linspace: tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])
linspace_1: tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])
randn: tensor([-0.5177,  1.2713, -0.0881,  0.2428,  0.7371,  0.2405])


In [99]:
# 7️⃣ Device demonstration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
v_device = torch.tensor([10., 20., 30.], device=device)
print("\nCreated on device:", v_device.device)


Created on device: cpu


# 📘 Step 2: Indexing, Slicing, and Views

Now that we can create tensors, let’s learn how to **access**, **slice**, and **manipulate** parts of them.

Just like lists and NumPy arrays, PyTorch tensors support **Pythonic indexing and slicing**.

---

## 🔹 1. Indexing

For a 1D tensor $\mathbf{t} = [t_0, t_1, t_2, \dots, t_{n-1}]$:

- `t[i]` → gives the *i-th element* (0-based)
- `t[-1]` → gives the *last element*

✅ **Example:**  
If  
$$
t = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
$$  
then  
$t[0] = 0$, $t[3] = 3$, $t[-1] = 9$

---

## 🔹 2. Slicing

You can extract **sub-tensors** using:
```python
t[start:end:step]
```
- `start` → index to begin (default = 0)
- `end` → index to stop before (excluded)
- `step` → stride length (default = 1)

### ✅ Examples
```python
t[2:6]    # elements 2,3,4,5
t[:5]     # first 5 elements
t[5:]     # elements from index 5 to end
t[::2]    # every second element
```
---
## 🔹 3. Negative Indexing
- t[-1] → last element
- t[-3:] → last 3 elements
- t[:-3] → all except last 3
---

## 🔹 4. Views vs Copies
Slicing in PyTorch returns a view, not a copy — meaning the sliced tensor shares memory with the original tensor.

If you modify the view, the original tensor will change too.
 - To make a copy, use `.clone()`.
---

## ✅ Quick Recap

| Operation   | Description    | Example            | Result      |
| ----------- | -------------- | ------------------ | ----------- |
| `t[i]`      | Single element | `t[3]`             | scalar      |
| `t[a:b]`    | Slice          | `t[2:5]`           | `[2, 3, 4]` |
| `t[::-1]`   | Reverse        | `[9,8,7,...,0]`    |             |
| `t.clone()` | Copy           | Independent tensor |             |


In [100]:
# 1️⃣ Create a simple 1D tensor
t = torch.arange(10)
print("Original tensor:", t)

# 2️⃣ Indexing examples
print("\nt[0] ->", t[0])       # first element
print("t[3] ->", t[3])         # 4th element
print("t[-1] ->", t[-1])       # last element

# 3️⃣ Slicing examples
print("\nt[2:6] ->", t[2:6])   # elements from index 2 to 5
print("t[:5]  ->", t[:5])      # first 5 elements
print("t[5:]  ->", t[5:])      # elements from 5 to end
print("t[::2] ->", t[::2])     # every 2nd element
print("torch.flip ->", torch.flip(t, dims=[0]))   # reverse the tensor

# 4️⃣ Negative slicing
print("\nt[-3:]  ->", t[-3:])  # last 3 elements
print("t[:-3]  ->", t[:-3])    # all except last 3

# 5️⃣ Views vs copies demonstration
orig = torch.arange(6, dtype=torch.float32)
view = orig[1:4]  # this is a view (shares memory)
print("\norig before modifying view:", orig)
view[0] = 99.0
print("orig after modifying view:", orig)  # changed!

# To make a copy, use clone()
copy = orig[1:4].clone()
copy[0] = -1.0
print("\norig after modifying copy:", orig)  # unchanged
print("copy:", copy)


Original tensor: tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

t[0] -> tensor(0)
t[3] -> tensor(3)
t[-1] -> tensor(9)

t[2:6] -> tensor([2, 3, 4, 5])
t[:5]  -> tensor([0, 1, 2, 3, 4])
t[5:]  -> tensor([5, 6, 7, 8, 9])
t[::2] -> tensor([0, 2, 4, 6, 8])
torch.flip -> tensor([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

t[-3:]  -> tensor([7, 8, 9])
t[:-3]  -> tensor([0, 1, 2, 3, 4, 5, 6])

orig before modifying view: tensor([0., 1., 2., 3., 4., 5.])
orig after modifying view: tensor([ 0., 99.,  2.,  3.,  4.,  5.])

orig after modifying copy: tensor([ 0., 99.,  2.,  3.,  4.,  5.])
copy: tensor([-1.,  2.,  3.])


# 📘 Step 3: Tensor Functions and Reductions

Now that we know how to create and slice tensors,  
let’s explore the **mathematical operations** and **summary (reduction) functions** that PyTorch provides.  
These are essential for deep-learning computations and transformations.

---

## 🔹 1️⃣ Element-wise Mathematical Functions

These functions operate **individually on each element** of a tensor —  
just like NumPy’s universal functions (ufuncs).

| Function | Description | Example (Math) |
|-----------|--------------|----------------|
| `torch.add(x, y)` | Addition | $z_i = x_i + y_i$ |
| `torch.sub(x, y)` | Subtraction | $z_i = x_i - y_i$ |
| `torch.mul(x, y)` | Multiplication | $z_i = x_i \times y_i$ |
| `torch.div(x, y)` | Division | $z_i = x_i / y_i$ |
| `torch.pow(x, 2)` | Power | $x_i^2$ |
| `torch.sqrt(x)` | Square root | $\sqrt{x_i}$ |
| `torch.exp(x)` | Exponential | $e^{x_i}$ |
| `torch.log(x)` | Natural log | $\ln(x_i)$ |
| `torch.sin(x)` / `torch.cos(x)` | Trigonometric ops | $\sin(x_i)$, $\cos(x_i)$ |

🧠 **Note:** These are **vectorized** — they act on all elements without loops and run efficiently on GPU.

---

## 🔹 2️⃣ Reduction Functions

Reduction functions **aggregate** values of a tensor into a single result  
(e.g., mean, sum, or max).  
They’re used to compute statistics, metrics, or losses.

| Function | Description | Formula |
|-----------|--------------|----------|
| `t.sum()` | Sum of all elements | $\sum_i t_i$ |
| `t.mean()` | Mean value | $\frac{1}{n}\sum_i t_i$ |
| `t.median()` | Median value | middle element (sorted) |
| `t.max()` / `t.min()` | Maximum / Minimum | $\max(t_i)$, $\min(t_i)$ |
| `t.std()` / `t.var()` | Standard deviation / Variance | $\sqrt{\mathrm{Var}(t)}$, $\mathrm{Var}(t)$ |

You can also get the **index** of the max/min value with `t.argmax()` and `t.argmin()`.

---

## 🔹 3️⃣ Mixed Example — Element-wise + Reduction

Sometimes we apply a **mathematical operation first**,  
then a **reduction operation** to summarize the result.  
This is common in deep-learning pipelines (for example, activation → loss).

**Example:**

→ Compute the mean of sine values across a tensor:  
`torch.sin(x).mean()`

→ Compute the sum of exponentials:  
`torch.exp(x).sum()`

**Mathematical Form:**

$\text{mean}(\sin(x)) = \frac{1}{n}\sum_{i=1}^{n}\sin(x_i)$

🧩 This pattern is common in neural networks — apply a non-linear transformation, then aggregate (like average loss or normalization).

---

## 🔹 4️⃣ Functions with Dimensions (`dim=`)

Most reduction functions can also work **along a specific dimension**  
for multi-dimensional tensors.

For a 2D tensor $A \in \mathbb{R}^{m \times n}$:

- `A.sum(dim=0)` → sums **down the rows**, producing shape `[n]`  
- `A.sum(dim=1)` → sums **across columns**, producing shape `[m]`

**Example:**

Given:  
$A = \begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6\end{bmatrix}$  

Then:  
$A.sum(dim=0) = [5, 7, 9]$  
$A.sum(dim=1) = [6, 15]$

This **dimension-wise control** is critical in ML —  
e.g., computing loss per batch (`dim=1`) or feature-wise normalization (`dim=0`).

---

## ✅ Summary

| Category | Example | Description |
|-----------|----------|-------------|
| Element-wise | `torch.sin(x)` | Operates on each tensor element |
| Reduction | `t.mean()` | Aggregates tensor into a scalar |
| Combined | `torch.exp(x).sum()` | Element-wise + reduction |
| Dim-wise | `t.sum(dim=1)` | Reduces along a specific axis |

---

📌 **Key Takeaway:**  
Tensor functions make PyTorch expressive and fast — all these computations are vectorized, GPU-optimized, and fundamental for ML & DL pipelines.


In [101]:
# 📘 Step 3: Tensor Functions and Reductions — Code Practice
# -----------------------------------------------------------

# 1️⃣ Create a simple 1D tensor for demonstration
t = torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0])
print("Tensor:", t)

# ===========================================================
# 🔹 1. Element-wise Mathematical Functions
# ===========================================================

print("\n🔹 Element-wise Mathematical Operations:")

# Square each element
print("Square (t ** 2):", torch.pow(t, 2))

# Square root of each element
print("Square Root:", torch.sqrt(t))

# Exponential function e^x
print("Exponential:", torch.exp(t))

# Natural logarithm (ln x)
print("Logarithm:", torch.log(t))

# Trigonometric functions
print("Sine:", torch.sin(t))
print("Cosine:", torch.cos(t))

# Arithmetic operations between tensors
a = torch.tensor([10.0, 20.0, 30.0, 40.0, 50.0])
print("\nAddition (t + a):", torch.add(t, a))
print("Subtraction (a - t):", torch.sub(a, t))
print("Multiplication (t * 2):", torch.mul(t, 2))
print("Division (a / t):", torch.div(a, t))

# ===========================================================
# 🔹 2. Reduction Functions
# ===========================================================

print("\n🔹 Reduction Functions:")

# Sum of all elements
print("Sum:", t.sum())

# Mean (average) of all elements
print("Mean:", t.mean())

# Median value
print("Median:", t.median())

# Minimum and maximum values
print("Min:", t.min())
print("Max:", t.max())

# Variance and standard deviation
print("Variance:", t.var())
print("Standard Deviation:", t.std())

# Index of max and min
print("Index of Max (argmax):", t.argmax())
print("Index of Min (argmin):", t.argmin())

# ===========================================================
# 🔹 3. Mixed Example — Element-wise + Reduction
# ===========================================================

print("\n🔹 Mixed Example — Element-wise + Reduction:")

# Create a tensor of evenly spaced values between 0 and π
x = torch.linspace(0, torch.pi, 5)
print("x:", x)

# Mean of sine values
print("Mean of sin(x):", torch.sin(x).mean())

# Sum of exponential values
print("Sum of exp(x):", torch.exp(x).sum())

# ===========================================================
# 🔹 4. Functions with Dimensions (dim=)
# ===========================================================

print("\n🔹 Functions with Dimensions (dim=):")

# Create a 2D tensor (matrix)
A = torch.tensor([[1, 2, 3],
                  [4, 5, 6]], dtype=torch.float32)
print("Matrix A:\n", A)

# Sum along dimension 0 (down the rows)
print("\nSum along dim=0 (column-wise):", A.sum(dim=0))

# Sum along dimension 1 (across columns)
print("Sum along dim=1 (row-wise):", A.sum(dim=1))

# Mean along dim=0 and dim=1
print("\nMean along dim=0 (column-wise):", A.mean(dim=0))
print("Mean along dim=1 (row-wise):", A.mean(dim=1))



Tensor: tensor([1., 2., 3., 4., 5.])

🔹 Element-wise Mathematical Operations:
Square (t ** 2): tensor([ 1.,  4.,  9., 16., 25.])
Square Root: tensor([1.0000, 1.4142, 1.7321, 2.0000, 2.2361])
Exponential: tensor([  2.7183,   7.3891,  20.0855,  54.5982, 148.4132])
Logarithm: tensor([0.0000, 0.6931, 1.0986, 1.3863, 1.6094])
Sine: tensor([ 0.8415,  0.9093,  0.1411, -0.7568, -0.9589])
Cosine: tensor([ 0.5403, -0.4161, -0.9900, -0.6536,  0.2837])

Addition (t + a): tensor([11., 22., 33., 44., 55.])
Subtraction (a - t): tensor([ 9., 18., 27., 36., 45.])
Multiplication (t * 2): tensor([ 2.,  4.,  6.,  8., 10.])
Division (a / t): tensor([10., 10., 10., 10., 10.])

🔹 Reduction Functions:
Sum: tensor(15.)
Mean: tensor(3.)
Median: tensor(3.)
Min: tensor(1.)
Max: tensor(5.)
Variance: tensor(2.5000)
Standard Deviation: tensor(1.5811)
Index of Max (argmax): tensor(4)
Index of Min (argmin): tensor(0)

🔹 Mixed Example — Element-wise + Reduction:
x: tensor([0.0000, 0.7854, 1.5708, 2.3562, 3.1416])
Mean 

# 📘 Step 4: Tensor Operations

Now that we’ve explored individual tensor functions and reductions,  
let’s move to **Tensor Operations** — how tensors interact, combine, and transform.  

Tensor operations are the heart of PyTorch’s numerical engine — powering everything from vector math to deep neural networks.

---

## 🔹 1️⃣ Arithmetic Operations (Tensor ↔ Tensor and Tensor ↔ Scalar)

PyTorch supports all **basic arithmetic operations** both between tensors  
and between tensors and scalars.

| Operation | Example | Description | Math |
|------------|----------|--------------|------|
| Addition | `a + b` or `torch.add(a,b)` | Element-wise addition | $c_i = a_i + b_i$ |
| Subtraction | `a - b` or `torch.sub(a,b)` | Element-wise subtraction | $c_i = a_i - b_i$ |
| Multiplication | `a * b` or `torch.mul(a,b)` | Element-wise multiplication | $c_i = a_i \times b_i$ |
| Division | `a / b` or `torch.div(a,b)` | Element-wise division | $c_i = a_i / b_i$ |
| Power | `a ** 2` or `torch.pow(a,2)` | Element-wise exponentiation | $a_i^2$ |

### 🔹 Scalar Operations
Tensors can interact with **scalars** directly —  
the scalar is *broadcasted* to match the tensor shape.

Examples:
- `a + 5` → adds 5 to every element  
- `a * 10` → multiplies every element by 10  
- `a / 2` → divides each element by 2  

Mathematically:  
$$
a = [1, 2, 3] \quad \Rightarrow \quad a + 5 = [6, 7, 8]
$$

🧠 **Note:**  
All these operations are **vectorized** and GPU-accelerated.

---

## 🔹 2️⃣ Comparison & Logical Operations

PyTorch allows **element-wise comparisons** between tensors,  
returning Boolean tensors that can be used for masking or conditions.

| Type | Function | Example | Description |
|------|-----------|----------|-------------|
| Comparison | `torch.eq(a,b)` | Equal | $a_i == b_i$ |
| 〃 | `torch.ne(a,b)` | Not equal | $a_i \ne b_i$ |
| 〃 | `torch.gt(a,b)` | Greater than | $a_i > b_i$ |
| 〃 | `torch.ge(a,b)` | Greater or equal | $a_i \ge b_i$ |
| 〃 | `torch.lt(a,b)` | Less than | $a_i < b_i$ |
| 〃 | `torch.le(a,b)` | Less or equal | $a_i \le b_i$ |
| Logical | `torch.logical_and(x,y)` | AND | $x_i \land y_i$ |
| 〃 | `torch.logical_or(x,y)` | OR | $x_i \lor y_i$ |
| 〃 | `torch.logical_not(x)` | NOT | $\neg x_i$ |

✅ Boolean tensors are crucial for **filtering**, **thresholding**, or **conditional masking** in ML pipelines.

---

## 🔹 3️⃣ Matrix Operations (Linear Algebra)

Matrix operations are central to all deep-learning layers.

| Operation | Function | Description | Math |
|------------|-----------|-------------|------|
| Dot Product | `torch.dot(a,b)` | Inner product (1D tensors) | $a \cdot b = \sum_i a_i b_i$ |
| Matrix Multiplication | `torch.mm(A,B)` or `A @ B` | 2D matrix multiplication | $C = A \times B$ |
| Batch Matrix Mult | `torch.bmm(A,B)` | 3D tensors (batch of matrices) | $C_i = A_i B_i$ |
| Transpose | `A.T` or `torch.transpose(A,0,1)` | Swap axes | $A^T$ |
| Determinant | `torch.det(A)` | Square matrix determinant | $\det(A)$ |
| Inverse | `torch.inverse(A)` | Matrix inverse | $A^{-1}$ |

🧮 **Example:**
$$
A = 
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}, \quad
B = 
\begin{bmatrix}
5 & 6 \\
7 & 8
\end{bmatrix}
$$  
Then:  
$A @ B =
\begin{bmatrix}
19 & 22 \\
43 & 50
\end{bmatrix}$

---

## 🔹 4️⃣ Broadcasting Rules

PyTorch supports **broadcasting** — automatic expansion of smaller tensors  
to match the shape of larger ones for element-wise operations.

Example shapes:
- `a` → `(3, 1)`
- `b` → `(1, 4)`

Operation → `a + b`  
Result → `(3, 4)`  

**Broadcasting Rules:**
1. Compare dimensions from **right to left**  
2. Dimensions must be **equal** or **one must be 1**  
3. Otherwise → broadcasting error

**Example Visualization:**

$$
\begin{bmatrix}
1 \\
2 \\
3
\end{bmatrix}
+
\begin{bmatrix}
10 & 20 & 30 & 40
\end{bmatrix}
\Rightarrow
\begin{bmatrix}
11 & 21 & 31 & 41 \\
12 & 22 & 32 & 42 \\
13 & 23 & 33 & 43
\end{bmatrix}
$$

---

## 🔹 5️⃣ In-place vs Out-of-place Operations (Preview)

In PyTorch, **in-place operations** modify the tensor in memory  
and are denoted with an underscore `_`, e.g. `t.add_(1)`.

| Type | Example | Description |
|------|----------|-------------|
| Out-of-place | `y = x + 1` | Creates a new tensor |
| In-place | `x.add_(1)` | Updates tensor `x` directly |

⚠️ **Important:**  
Avoid in-place ops on tensors that require gradients,  
as they can disrupt the computation graph.

---

## ✅ Summary

| Category | Example | Description |
|-----------|----------|-------------|
| Arithmetic | `a + b`, `a * 5`, `a / 2` | Element-wise & scalar ops |
| Comparison | `torch.eq(a,b)` | Boolean comparisons |
| Matrix | `A @ B`, `torch.mm()` | Linear algebra ops |
| Broadcasting | Auto shape expansion | Smart dimension matching |
| In-place | `x.add_(1)` | Updates tensor directly in memory |

---

📌 **Key Takeaway:**  
Tensor operations (including scalar arithmetic) are the foundation of PyTorch’s power —  
fast, GPU-optimized, and seamlessly broadcasted for efficient computation.


In [102]:
# ---------------------------------------------------------
# Step 4 — Tensor Operations: examples with clear comments
# ---------------------------------------------------------


# --------------------------
# 1) Basic tensor-to-tensor arithmetic (element-wise)
# --------------------------
a = torch.tensor([1.0, 2.0, 3.0])   # 1D float tensor
b = torch.tensor([10.0, 20.0, 30.0]) 

# Element-wise add/sub/mul/div (shapes must match or be broadcastable)
print("a:", a)
print("b:", b)
print("a + b  ->", a + b)            # element-wise addition
print("a - b  ->", a - b)            # element-wise subtraction
print("a * b  ->", a * b)            # element-wise multiplication
print("a / b  ->", a / b)            # element-wise division
print()

# --------------------------
# 2) Tensor <-> Scalar arithmetic (scalar is broadcast to shape)
# --------------------------
print("a + 5  ->", a + 5)            # add scalar 5 to every element
print("a * 2  ->", a * 2)            # multiply every element by 2
print("a / 2  ->", a / 2)            # divide every element by 2
print("a ** 2 ->", a ** 2)           # square each element (power)
print()

# --------------------------
# 3) Comparison & logical ops (produce boolean tensors)
# --------------------------
print("a == 2        ->", a == 2.0)                      # equality check
print("a > 1.5       ->", a > 1.5)                       # greater-than
print("torch.eq(a,b) ->", torch.eq(a, b))                # element-wise equality
print()

# Logical operations (work on boolean tensors)
mask = a > 1.5
print("mask (a > 1.5):", mask)
print("masked values (a[mask]):", a[mask])               # use mask to filter values
print("logical_and mask with (b > 10):", torch.logical_and(mask, b > 10))
print()

# --------------------------
# 4) Broadcasting examples
# --------------------------
# a: shape (3,)
# row vector r: shape (1,3)
# column vector c: shape (3,1)
r = torch.tensor([[1.0, 2.0, 3.0]])   # shape (1,3)
c = torch.tensor([[10.0], [20.0], [30.0]])  # shape (3,1)

print("r shape:", r.shape, "c shape:", c.shape)
# r + a -> r is (1,3), a is (3,) -> a is treated as (1,3) -> result (1,3)
print("r + a ->", (r + a).shape, "\n", r + a)

# c + r -> c (3,1) broadcasts with r (1,3) -> result (3,3)
print("c + r ->", (c + r).shape, "\n", c + r)
print()

# --------------------------
# 5) Matrix / Linear algebra operations
# --------------------------
# Prepare small matrices (float) for linear algebra
A = torch.tensor([[1.0, 2.0],
                  [3.0, 4.0]])
B = torch.tensor([[5.0, 6.0],
                  [7.0, 8.0]])

print("Matrix A:\n", A)
print("Matrix B:\n", B)

# Matrix multiplication (2D)
C = A @ B                   # same as torch.mm(A, B)
print("\nA @ B =\n", C)

# Transpose
print("\nA.T =\n", A.T)

# Dot product (1D vectors)
v1 = torch.tensor([1.0, 2.0, 3.0])
v2 = torch.tensor([4.0, 5.0, 6.0])
print("\nDot product v1 · v2 ->", torch.dot(v1, v2))  # scalar

# Determinant and inverse (A is invertible)
detA = torch.det(A)
invA = torch.inverse(A)
print("\ndet(A) ->", detA)
print("A^{-1} ->\n", invA)
# Verify A @ invA ~= Identity
print("A @ A^{-1} ->\n", A @ invA)
print()

# Batch matrix multiplication example (3D tensors)
# Create batch of two (2x2) matrices: shape (batch, n, m)
batchA = torch.stack([A, A * 2])   # shape (2,2,2)
batchB = torch.stack([B, B * 0.5]) # shape (2,2,2)
print("batchA shape:", batchA.shape, "batchB shape:", batchB.shape)
batchC = torch.bmm(batchA, batchB)  # batch-wise matrix multiplication
print("batchC shape:", batchC.shape, "\n", batchC)
print()

# --------------------------
# 6) In-place vs Out-of-place operations (and autograd caution)
# --------------------------
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)  # requires grad
y = x + 1.0   # out-of-place: creates new tensor (computation graph preserved)
print("Before in-place: x =", x, " y =", y)

# In-place modification (will raise a runtime error if it breaks grad history)
# Example: modifying a leaf tensor that requires_grad is unsafe.
try:
    x.add_(1.0)   # in-place: modifies x directly
    print("After in-place add_: x =", x)
except RuntimeError as e:
    print("In-place operation failed (as expected when it breaks autograd):", e)

# Safer approach: operate on a clone if you need to preserve original or avoid graph issues
x_safe = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
x_clone = x_safe.clone()
x_clone.add_(1.0)   # in-place on the clone is fine; original x_safe unchanged
print("Original (x_safe):", x_safe, "Clone after in-place:", x_clone)

# Use out-of-place ops when you want to keep computation graph intact:
x2 = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y2 = x2 + 1.0       # out-of-place
loss = y2.sum()
loss.backward()
print("Gradients (dL/dx2):", x2.grad)   # gradient computed successfully
print()

# --------------------------
# 7) Masking + conditional updates (use boolean masks)
# --------------------------
t = torch.arange(6.).reshape(3,2)   # shape (3,2)
print("t:\n", t)
mask = t > 2.0
print("mask:\n", mask)

# Select values with mask (1D view of True elements)
print("t[mask] ->", t[mask])

# Conditional assignment: set masked elements to -1 (in-place on a copy/cloned tensor if needed)
t2 = t.clone()
t2[mask] = -1.0
print("t2 after mask assignment:\n", t2)

# --------------------------
# End of demonstrations
# --------------------------
print("\n✅ Completed tensor operations demo (element-wise, scalar ops, comparisons, broadcasting, matrix ops, and in-place vs out-of-place).")


a: tensor([1., 2., 3.])
b: tensor([10., 20., 30.])
a + b  -> tensor([11., 22., 33.])
a - b  -> tensor([ -9., -18., -27.])
a * b  -> tensor([10., 40., 90.])
a / b  -> tensor([0.1000, 0.1000, 0.1000])

a + 5  -> tensor([6., 7., 8.])
a * 2  -> tensor([2., 4., 6.])
a / 2  -> tensor([0.5000, 1.0000, 1.5000])
a ** 2 -> tensor([1., 4., 9.])

a == 2        -> tensor([False,  True, False])
a > 1.5       -> tensor([False,  True,  True])
torch.eq(a,b) -> tensor([False, False, False])

mask (a > 1.5): tensor([False,  True,  True])
masked values (a[mask]): tensor([2., 3.])
logical_and mask with (b > 10): tensor([False,  True,  True])

r shape: torch.Size([1, 3]) c shape: torch.Size([3, 1])
r + a -> torch.Size([1, 3]) 
 tensor([[2., 4., 6.]])
c + r -> torch.Size([3, 3]) 
 tensor([[11., 12., 13.],
        [21., 22., 23.],
        [31., 32., 33.]])

Matrix A:
 tensor([[1., 2.],
        [3., 4.]])
Matrix B:
 tensor([[5., 6.],
        [7., 8.]])

A @ B =
 tensor([[19., 22.],
        [43., 50.]])

A.T =


# 📘 Step 5: Tensor Reshaping & Dimension Manipulation

Deep learning models often require tensors to be reshaped, flattened, or expanded —  
for example, converting an image tensor into a vector or adding a batch dimension.  

In PyTorch, we use **view**, **reshape**, **squeeze**, **unsqueeze**, **flatten**, **expand**, and **repeat**  
to control the *shape* and *dimensionality* of tensors efficiently.

---

## 🔹 1️⃣ Tensor Shape Basics

Each tensor has a **shape** (tuple of dimensions).

$$
\text{Tensor shape} = (B, C, H, W)
$$

- **B** → Batch size  
- **C** → Channels  
- **H, W** → Height and Width  

Use:
- `t.shape` → get shape  
- `t.ndim` → number of dimensions  
- `t.numel()` → total number of elements  

---

## 🔹 2️⃣ Reshape vs View

Both `reshape()` and `view()` change the tensor’s shape **without changing the data**.

| Function | Description | Example |
|-----------|--------------|----------|
| `t.reshape(new_shape)` | Returns a reshaped tensor (may copy data if needed) | `t.reshape(2, 3)` |
| `t.view(new_shape)` | Returns a view if possible (shares memory) | `t.view(2, 3)` |

**Mathematically:**

If tensor $t$ has $m \times n$ elements,  
then `t.reshape(a, b)` must satisfy:

$$
a \times b = m \times n
$$

🧠 **Difference:**  
- `view()` requires tensor to be **contiguous** in memory.  
- `reshape()` automatically creates a copy if required.

**Example Idea:**  
If a tensor has shape `(6,)`, we can reshape it into `(2, 3)` or `(3, 2)` —  
since $2 \times 3 = 3 \times 2 = 6$.

---

## 🔹 3️⃣ Contiguity (`contiguous()`)

Some operations like `transpose()` or slicing create **non-contiguous** tensors.  
A non-contiguous tensor can’t be safely used with `.view()`.

To fix this, use:

- `t = t.contiguous().view(new_shape)`

✅ Always call `.contiguous()` before `.view()` after a transpose or slice.

---

## 🔹 4️⃣ Squeeze and Unsqueeze

These are used to **add** or **remove** singleton dimensions (size = 1).

| Function | Description | Example | Result |
|-----------|--------------|----------|---------|
| `t.unsqueeze(dim)` | Adds a new dimension of size 1 | `(3,) → (1,3)` | Adds axis |
| `t.squeeze(dim)` | Removes dimensions of size 1 | `(1,3,1) → (3,)` | Removes axis |

**Mathematical Intuition:**

If  
$$
t = [1, 2, 3]
$$  
then  
$$
t.unsqueeze(0) \Rightarrow [[1, 2, 3]]
$$

- `unsqueeze(0)` adds a new axis at the start → shape `(1,3)`  
- `unsqueeze(1)` adds a new axis in the middle → shape `(3,1)`  
- `squeeze()` removes axes with size = 1  

Used for:  
- Adding batch dimensions → `x.unsqueeze(0)`  
- Removing redundant dimensions → `t.squeeze()`

---

## 🔹 5️⃣ Flatten

Flattens a tensor into a **1D vector** or from a specific dimension range.

| Function | Example | Description |
|-----------|----------|--------------|
| `torch.flatten(t)` | `shape: (B, C, H, W) → (B, C*H*W)` | Flattens all dims except batch |
| `torch.flatten(t, start_dim=1)` | Flattens from a specific dimension | Keeps batch intact |

**Example Explanation:**

For an image tensor of shape `(B=32, C=3, H=28, W=28)`:

$$
\text{Flattened shape} = (32, 3 \times 28 \times 28) = (32, 2352)
$$

This is essential before feeding tensors into a **fully connected (linear) layer**.

---

## 🔹 6️⃣ Expand vs Repeat

Both expand and repeat are used to **replicate tensor data**,  
but differ in **memory behavior**.

| Function | Example | Copies Data? | Description |
|-----------|----------|--------------|--------------|
| `t.expand(new_shape)` | `t.expand(3, 4)` | ❌ No | Creates a *view* (no real memory copy) |
| `t.repeat(repeats)` | `t.repeat(2, 3)` | ✅ Yes | Physically copies data in memory |

🧠 **Rule of Thumb:**
- Use `expand()` when you just want to *broadcast* data.  
- Use `repeat()` when you need actual repeated data.

**Example:**

If  
$$
t = [1, 2, 3]
$$  

Then:  
- `t.expand(3, 3)` → uses broadcasting →  
$$
\begin{bmatrix}
1 & 2 & 3 \\
1 & 2 & 3 \\
1 & 2 & 3
\end{bmatrix}
$$  
(no extra memory)

- `t.repeat(2, 1)` → copies the data twice →  
$$
\begin{bmatrix}
1 & 2 & 3 \\
1 & 2 & 3
\end{bmatrix}
$$

---

## 🔹 7️⃣ Practical Example Overview

Let’s take $t = [1, 2, 3]$ and visualize how different operations change its shape:

| Operation | Output | Shape |
|------------|----------|--------|
| `t.unsqueeze(0)` | `[[1, 2, 3]]` | `(1, 3)` |
| `t.unsqueeze(1)` | `[[1], [2], [3]]` | `(3, 1)` |
| `t.squeeze()` | `[1, 2, 3]` | `(3,)` |
| `t.expand(3, 3)` | broadcasted rows | `(3, 3)` |
| `t.repeat(2, 1)` | duplicated rows | `(2, 3)` |
| `t.view(1, 3)` | reshaped | `(1, 3)` |
| `t.reshape(3, 1)` | reshaped copy | `(3, 1)` |
| `t.flatten()` | flattened | `(3,)` |

---

## ✅ Summary

| Operation | Purpose | Example |
|------------|----------|----------|
| `view()` / `reshape()` | Change shape | `t.view(2,3)` |
| `contiguous()` | Fix memory layout | `t = t.contiguous()` |
| `unsqueeze()` / `squeeze()` | Add / remove axes | `t.unsqueeze(0)` |
| `flatten()` | Collapse dims | `torch.flatten(t)` |
| `expand()` / `repeat()` | Broadcast or copy | `t.expand(3,3)`, `t.repeat(2,1)` |

---

📌 **Key Takeaway:**  
Mastering reshaping and dimensionality control is essential for deep learning —  
especially when batching images, flattening outputs, or adapting tensors for linear layers.


In [103]:
# --------------------------
# 0) Helper printer for clarity
# --------------------------
def p(name, t):
    """
    Clean tensor printer for notebooks.
    Shows: name, shape, dtype, device, and contiguity.
    Works for all tensor types (CPU/GPU, int/float).
    """
    print(f"{name:25} | shape={tuple(t.shape)} | dtype={t.dtype} | device={t.device} | contiguous={t.is_contiguous()}")
    print("  ->", t)
    print()


In [104]:
# ------------------------------------------------------------------------
# Step 5 — Tensor Reshaping & Dimension Manipulation (examples + comments)
# ------------------------------------------------------------------------

print("PyTorch version:", torch.__version__)
print()

# --------------------------
# 1) Basic tensor & shape helpers
# --------------------------
t = torch.arange(6)            # 1D tensor [0,1,2,3,4,5]
p("original t", t)
print("t.ndim:", t.ndim, " t.numel():", t.numel())
print()

# --------------------------
# 2) view() vs reshape()
# --------------------------
# view() returns a view (no copy) but requires the tensor to be contiguous.
# reshape() returns a tensor with desired shape; it will copy if necessary.
t2 = t.reshape(2, 3)          # safe: reshape a (6,) -> (2,3)
p("t.reshape(2,3)", t2)

t3 = t.view(2, 3)             # view() also works here (contiguous)
p("t.view(2,3)", t3)

# reshape that produces a copy when necessary: show with transpose later

# --------------------------
# 3) Creating a non-contiguous tensor (transpose)
# --------------------------
M = torch.arange(12).reshape(3, 4)   # shape (3,4)
p("M (3x4)", M)

Mt = M.T                             # transpose -> shape (4,3)
p("Mt = M.T (transposed)", Mt)

# Transpose often produces a non-contiguous tensor
print("Is Mt contiguous?", Mt.is_contiguous())
print("Attempting Mt.view(-1) may fail if non-contiguous; use contiguous() first.")
print()

# Demonstrate view() failing safely with try/except, then fix with contiguous()
try:
    # This may raise an error if Mt is non-contiguous
    v_bad = Mt.view(-1)
    print("Mt.view(-1) succeeded (unexpected):", v_bad.shape)
except Exception as e:
    print("Mt.view(-1) failed (expected for non-contiguous):", type(e).__name__, "-", e)

# Fix: make contiguous then view
v_fix = Mt.contiguous().view(-1)
p("Mt.contiguous().view(-1)", v_fix)

# --------------------------
# 4) unsqueeze() and squeeze()
# --------------------------
a = torch.tensor([1.0, 2.0, 3.0])   # shape (3,)
p("a (orig)", a)

a_u0 = a.unsqueeze(0)               # add batch dim at front -> (1,3)
p("a.unsqueeze(0)", a_u0)

a_u1 = a.unsqueeze(1)               # add dim in middle -> (3,1)
p("a.unsqueeze(1)", a_u1)

# squeeze removes dims of size 1
b = torch.tensor([[[7.0], [8.0], [9.0]]])  # shape (1,3,1)
p("b (1,3,1)", b)

b_s = b.squeeze()                    # removes all size-1 dims -> (3,)
p("b.squeeze()", b_s)

# squeeze with dim parameter - only remove if that dim is size 1
b_s_dim = b.squeeze(0)               # remove first dim only -> (3,1)
p("b.squeeze(0)", b_s_dim)

# --------------------------
# 5) flatten()
# --------------------------
img = torch.arange(2*3*4).reshape(2, 3, 4)  # pretend (B=2, C=3, W=4)
p("img (2,3,4)", img)

flat_all = torch.flatten(img)         # flatten all dims -> (24,)
p("torch.flatten(img)", flat_all)

flat_from1 = torch.flatten(img, start_dim=1)  # keep batch (2, 12)
p("torch.flatten(img, start_dim=1)", flat_from1)

# --------------------------
# 6) expand() vs repeat()
# --------------------------
# expand returns a view that reuses the same memory (no new copy),
# repeat actually copies data into a new tensor.

t_small = torch.tensor([1.0, 2.0, 3.0])   # shape (3,)
p("t_small", t_small)

# expand: make it (3,3) by broadcasting (no extra memory)
# to call expand, shape must be compatible: here (3,) -> (3,3) by (3,1) view first
t_expand = t_small.unsqueeze(1).expand(3, 3)  # (3,1) -> (3,3) by broadcasting
p("t_small.unsqueeze(1).expand(3,3)", t_expand)

# repeat: physically copies data
t_repeat = t_small.repeat(2, 1)   # (2,3) creates an actual copy
p("t_small.repeat(2,1)", t_repeat)

# Show that expand shares storage pointer while repeat uses different storage
print("Storage ptrs -> expand.data_ptr:", t_expand.storage().data_ptr(), 
      "repeat.data_ptr:", t_repeat.storage().data_ptr())
print("Note: different ptrs for repeat => new memory; expand may reuse original storage (broadcast view).")
print()

# --------------------------
# 7) view/resahpe with -1 (inferred dimension)
# --------------------------
x = torch.arange(24)
p("x (24)", x)

y = x.view(2, -1)   # infer second dim => (2, 12)
p("x.view(2, -1)", y)

z = x.reshape(-1, 6)  # infer first dim => (4, 6)
p("x.reshape(-1, 6)", z)

# --------------------------
# 8) chaining ops: common patterns
# --------------------------
# Example: image (B,C,H,W) -> flatten per sample -> feed to linear layer
img2 = torch.randn(8, 3, 28, 28)   # batch of 8 images
p("img2 (8,3,28,28)", img2)

# keep batch dimension, flatten rest
img2_flat = img2.view(img2.size(0), -1)   # common pattern, requires contiguous layout
p("img2.view(B, -1)", img2_flat)

# safer alternative using flatten:
img2_flat2 = torch.flatten(img2, start_dim=1)
p("torch.flatten(img2, start_dim=1)", img2_flat2)

# --------------------------
# 9) Practical caution: when to use contiguous()
# --------------------------
# Start from a transposed tensor and then want to view -> must call contiguous()
m = torch.randn(2, 3, 4)
m_t = m.transpose(1, 2)   # swap dims -> likely non-contiguous
p("m", m)
p("m.transpose(1,2)", m_t)
print("m_t.is_contiguous() ->", m_t.is_contiguous())
# Fix before view
m_t_fixed = m_t.contiguous().view(m_t.shape[0], -1)
p("m_t.contiguous().view(B,-1)", m_t_fixed)

# --------------------------
# 10) Summary prints
# --------------------------
print("Quick summary:")
print("- use .view() for reshaping when tensor is contiguous (fast, no-copy).")
print("- use .reshape() if you want convenience and don't care about copying.")
print("- use .contiguous() after transpose/slice before .view().")
print("- use .unsqueeze() / .squeeze() to add/remove singleton dims (batch dims).")
print("- use .flatten(start_dim=..) to collapse dims safely.")
print("- use .expand() to broadcast a view (no memory copy) and .repeat() to copy data.")
print()
print("✅ Step 5 code demo complete.")


PyTorch version: 2.9.0

original t                | shape=(6,) | dtype=torch.int64 | device=cpu | contiguous=True
  -> tensor([0, 1, 2, 3, 4, 5])

t.ndim: 1  t.numel(): 6

t.reshape(2,3)            | shape=(2, 3) | dtype=torch.int64 | device=cpu | contiguous=True
  -> tensor([[0, 1, 2],
        [3, 4, 5]])

t.view(2,3)               | shape=(2, 3) | dtype=torch.int64 | device=cpu | contiguous=True
  -> tensor([[0, 1, 2],
        [3, 4, 5]])

M (3x4)                   | shape=(3, 4) | dtype=torch.int64 | device=cpu | contiguous=True
  -> tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

Mt = M.T (transposed)     | shape=(4, 3) | dtype=torch.int64 | device=cpu | contiguous=False
  -> tensor([[ 0,  4,  8],
        [ 1,  5,  9],
        [ 2,  6, 10],
        [ 3,  7, 11]])

Is Mt contiguous? False
Attempting Mt.view(-1) may fail if non-contiguous; use contiguous() first.

Mt.view(-1) failed (expected for non-contiguous): RuntimeError - view size is not compati