# 01. PyTorch Fundamentals

Welcome to PyTorch! Now that you understand what machine learning is, let's dive into the framework that will power your AI journey.

By the end of this notebook, you'll understand:

- **What PyTorch is** and why it's amazing for machine learning
- **Tensors**: The fundamental building blocks of PyTorch
- **Device management**: CPU vs GPU computation
- **Basic tensor operations** and mathematics
- **Advanced shape manipulation**: reshape, view, squeeze, unsqueeze, transpose, permute
- **Tensor memory and performance**: contiguous tensors, cloning, and detaching
- **Debugging tensor shapes**: common errors and how to fix them
- **Practice exercises** to solidify your understanding

Let's get started!


## What is PyTorch?

PyTorch is like a **supercharged version of NumPy** designed specifically for machine learning. Think of it as:

### NumPy + Superpowers = PyTorch

| Feature                      | NumPy       | PyTorch        |
| ---------------------------- | ----------- | -------------- |
| **Multi-dimensional arrays** | ‚úÖ Arrays   | ‚úÖ Tensors     |
| **Mathematical operations**  | ‚úÖ Fast     | ‚úÖ Even Faster |
| **GPU acceleration**         | ‚ùå CPU only | ‚úÖ GPU + CPU   |
| **Automatic gradients**      | ‚ùå Manual   | ‚úÖ Automatic   |
| **Neural networks**          | Complex     | Easy           |
| **Deep learning**            | Very hard   | Built-in       |

### Key PyTorch Concepts:

1. **Tensors**: Multi-dimensional arrays (like NumPy arrays, but better)
2. **Autograd**: Automatic differentiation (computes gradients for you)
3. **Neural Networks**: Pre-built components for deep learning
4. **GPU Support**: Lightning-fast computation on graphics cards
5. **Dynamic Graphs**: Flexible networks that can change during runtime


## Setting Up PyTorch

Let's start by importing PyTorch and checking our setup:


In [20]:
# Import PyTorch and related libraries
import torch
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better-looking plots
plt.style.use("seaborn-v0_8")
sns.set_palette("husl")

print("PyTorch Setup Information")
print("=" * 40)
print(f"PyTorch version: {torch.__version__}")

# Check device availability
print("\nAvailable Devices:")
print("CPU: ‚úÖ Always available")

# Check CUDA (NVIDIA GPU)
if torch.cuda.is_available():
    print(f"CUDA GPU: ‚úÖ {torch.cuda.get_device_name(0)}")
    print(f"CUDA version: {torch.version.cuda}")
    print(f"Number of GPUs: {torch.cuda.device_count()}")
else:
    print("CUDA GPU: ‚ùå Not available (CPU only)")

# Check MPS (Apple Silicon GPU)
if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
    print("MPS (Apple Silicon): ‚úÖ Available")
else:
    print("MPS (Apple Silicon): ‚ùå Not available")

print("\nReady to learn PyTorch!")

PyTorch Setup Information
PyTorch version: 2.9.1

Available Devices:
CPU: ‚úÖ Always available
CUDA GPU: ‚ùå Not available (CPU only)
MPS (Apple Silicon): ‚úÖ Available

Ready to learn PyTorch!


## Device Management: CPU vs GPU

One of PyTorch's superpowers is the ability to run computations on different devices. Let's understand this:


In [21]:
# Let's determine the best device for our computations
def get_device():
    """Get the best available device for PyTorch computations"""
    if torch.cuda.is_available():
        device = torch.device("cuda")
        device_name = torch.cuda.get_device_name(0)
        device_type = "NVIDIA GPU"
    elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
        device = torch.device("mps")
        device_name = "Apple Silicon GPU"
        device_type = "Apple GPU"
    else:
        device = torch.device("cpu")
        device_name = "CPU"
        device_type = "CPU"

    return device, device_name, device_type


device, device_name, device_type = get_device()

print(f"Selected device: {device}")
print(f"Device name: {device_name}")
print(f"Device type: {device_type}")

Selected device: mps
Device name: Apple Silicon GPU
Device type: Apple GPU


In [22]:
# Create a simple tensor and move it to our device
x = torch.tensor([1, 2, 3, 4, 5], device=device)
print(f"\nSample tensor: {x}")
print(f"Tensor device: {x.device}")
print(f"Tensor dtype: {x.dtype}")
print(f"Tensor shape: {x.shape}")

# Why device matters?
print("\nWhy Device Selection Matters:")
if device.type == "cuda":
    print("‚úÖ Using GPU: 10-100x faster for large computations!")
    print("Perfect for training neural networks")
elif device.type == "mps":
    print("‚úÖ Using Apple Silicon: Efficient and fast!")
    print("Great balance of speed and power efficiency")
else:
    print("‚úÖ Using CPU: Always works, good for learning")
    print("Perfect for understanding concepts")


Sample tensor: tensor([1, 2, 3, 4, 5], device='mps:0')
Tensor device: mps:0
Tensor dtype: torch.int64
Tensor shape: torch.Size([5])

Why Device Selection Matters:
‚úÖ Using Apple Silicon: Efficient and fast!
Great balance of speed and power efficiency


## Understanding Tensors

**Tensors are the foundation of PyTorch.** Think of them as multi-dimensional arrays with superpowers!

### Tensor Dimensions Explained:

- **0D Tensor (Scalar)**: A single number ‚Üí `5`
- **1D Tensor (Vector)**: A list of numbers ‚Üí `[1, 2, 3]`
- **2D Tensor (Matrix)**: A table of numbers ‚Üí `[[1, 2], [3, 4]]`
- **3D Tensor**: A cube of numbers ‚Üí Used for RGB images
- **4D Tensor**: Multiple cubes ‚Üí Batch of images
- **nD Tensor**: As many dimensions as you need!

<video width="800" controls autoplay muted loop>
  <source src='../12_assets/understanding_tensors.mp4' type="video/mp4">
  Your browser does not support the video tag.
</video>


In [23]:
# 0D Tensor (Scalar)
scalar = torch.tensor(42)
print(f"0D Tensor (Scalar): {scalar}")
print(f"Shape: {scalar.shape}, Dimensions: {scalar.ndim}")

0D Tensor (Scalar): 42
Shape: torch.Size([]), Dimensions: 0


In [24]:
# 1D Tensor (Vector)
vector = torch.tensor([1, 2, 3, 4, 5])
print(f"\n1D Tensor (Vector): {vector}")
print(f"Shape: {vector.shape}, Dimensions: {vector.ndim}")


1D Tensor (Vector): tensor([1, 2, 3, 4, 5])
Shape: torch.Size([5]), Dimensions: 1


In [25]:
# 2D Tensor (Matrix)
matrix = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(f"\n2D Tensor (Matrix):\n{matrix}")
print(f"Shape: {matrix.shape}, Dimensions: {matrix.ndim}")


2D Tensor (Matrix):
tensor([[1, 2, 3],
        [4, 5, 6]])
Shape: torch.Size([2, 3]), Dimensions: 2


In [26]:
# 3D Tensor
tensor_3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(f"\n3D Tensor:\n{tensor_3d}")
print(f"Shape: {tensor_3d.shape}, Dimensions: {tensor_3d.ndim}")


3D Tensor:
tensor([[[1, 2],
         [3, 4]],

        [[5, 6],
         [7, 8]]])
Shape: torch.Size([2, 2, 2]), Dimensions: 3


## Creating Tensors

There are many ways to create tensors in PyTorch. Let's explore the most common methods:


In [27]:
# 1. From Python lists
from_list = torch.tensor([1, 2, 3, 4, 5])
print(f"From list: {from_list}")

From list: tensor([1, 2, 3, 4, 5])


In [28]:
# 2. From NumPy arrays
numpy_array = np.array([1, 2, 3, 4, 5])
from_numpy = torch.from_numpy(numpy_array)
print(f"From NumPy: {from_numpy}")

From NumPy: tensor([1, 2, 3, 4, 5])


In [29]:
# 3. Filled with zeros
zeros_tensor = torch.zeros(3, 4)  # 3x4 matrix of zeros
print(f"Zeros tensor (3x4):\n{zeros_tensor}")

Zeros tensor (3x4):
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])


In [30]:
# 4. Filled with ones
ones_tensor = torch.ones(2, 3)  # 2x3 matrix of ones
print(f"Ones tensor (2x3):\n{ones_tensor}")

Ones tensor (2x3):
tensor([[1., 1., 1.],
        [1., 1., 1.]])


In [31]:
# 5. Random numbers
random_tensor = torch.rand(2, 3)  # Random numbers between 0 and 1
print(f"Random tensor (2x3):\n{random_tensor}")

Random tensor (2x3):
tensor([[0.7832, 0.4306, 0.0361],
        [0.5041, 0.3105, 0.5953]])


In [32]:
# 6. Random integers
random_int = torch.randint(0, 10, (3, 3))  # Random integers 0-9
print(f"Random integers (3x3, 0-9):\n{random_int}")

Random integers (3x3, 0-9):
tensor([[5, 4, 0],
        [6, 6, 2],
        [8, 4, 9]])


In [33]:
# 7. Linear sequence
linear_seq = torch.arange(0, 10, 2)  # 0, 2, 4, 6, 8
print(f"Linear sequence (0 to 10, step 2): {linear_seq}")

Linear sequence (0 to 10, step 2): tensor([0, 2, 4, 6, 8])


In [34]:
# 8. Linearly spaced
linspace = torch.linspace(0, 1, 5)  # 5 evenly spaced numbers from 0 to 1
print(f"Linspace (0 to 1, 5 points): {linspace}")

Linspace (0 to 1, 5 points): tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])


In [35]:
# 9. Like another tensor (same shape)
template = torch.tensor([[1, 2], [3, 4]])
zeros_like = torch.zeros_like(template)
ones_like = torch.ones_like(template)
print(f"Template tensor:\n{template}")
print(f"\nZeros like template:\n{zeros_like}")
print(f"\nOnes like template:\n{ones_like}")

Template tensor:
tensor([[1, 2],
        [3, 4]])

Zeros like template:
tensor([[0, 0],
        [0, 0]])

Ones like template:
tensor([[1, 1],
        [1, 1]])


## Tensor Properties and Information

Every tensor has important properties that tell us about its structure and data:


In [36]:
# Create a sample tensor for demonstration
sample_tensor = torch.randn(3, 4, 5)

print(f"\nSample tensor:\n{sample_tensor}")


Sample tensor:
tensor([[[ 1.5640e-01,  9.7927e-01, -5.1039e-01,  3.0801e+00,  2.3008e+00],
         [-1.2666e+00,  1.8803e+00,  5.2863e-01,  9.8053e-01,  7.8732e-02],
         [-3.2803e-02,  3.9363e-01, -6.9764e-01, -3.5430e-01,  8.3407e-01],
         [ 2.0363e-01,  1.1565e+00, -5.7861e-01, -1.1036e+00,  4.9071e-01]],

        [[-3.6829e-01,  3.2015e-01,  4.8171e-02, -4.4054e-01,  3.9004e-01],
         [-3.3492e-01,  1.1768e+00,  5.6455e-01,  6.8401e-02, -1.5138e+00],
         [ 1.1983e+00,  6.9467e-01, -4.3558e-01, -1.4403e+00,  1.4983e-01],
         [ 1.0869e+00, -8.4305e-01, -8.3348e-01,  8.7935e-01,  3.6670e-01]],

        [[ 3.2765e-01, -2.4384e-01, -6.6729e-01, -1.4888e-03, -1.1025e+00],
         [-5.9479e-01, -1.9178e+00, -8.8494e-01,  7.5928e-01, -5.6905e-02],
         [-1.8809e+00,  8.8587e-02,  9.1350e-02, -2.4446e-01,  6.1463e-01],
         [-4.1297e-01,  5.7677e-01, -1.1181e+00,  5.7626e-01,  1.6109e+00]]])


In [37]:
print("Tensor Properties")
print("=" * 30)

print(f"Shape: {sample_tensor.shape} or {sample_tensor.size()}")
print(f"Number of dimensions: {sample_tensor.ndim}")
print(f"Total number of elements: {sample_tensor.numel()}")
print(f"Data type: {sample_tensor.dtype}")
print(f"Device: {sample_tensor.device}")
print(f"Requires gradient: {sample_tensor.requires_grad}")

Tensor Properties
Shape: torch.Size([3, 4, 5]) or torch.Size([3, 4, 5])
Number of dimensions: 3
Total number of elements: 60
Data type: torch.float32
Device: cpu
Requires gradient: False


**Shape Explanation:**

- The first dimension (3) represents the number of matrices.
- The second dimension (4) represents the number of rows in each matrix.
- The third dimension (5) represents the number of columns in each row.


In [38]:
# Common data types
print("Common PyTorch Data Types:")
types_demo = {
    "torch.float32 (default)": torch.tensor([1.0, 2.0]),
    "torch.int64": torch.tensor([1, 2]),
    "torch.bool": torch.tensor([True, False]),
    "torch.float16 (half precision)": torch.tensor([1.0, 2.0], dtype=torch.float16),
}

for name, tensor in types_demo.items():
    print(f"  {name}: {tensor} (dtype: {tensor.dtype})")

Common PyTorch Data Types:
  torch.float32 (default): tensor([1., 2.]) (dtype: torch.float32)
  torch.int64: tensor([1, 2]) (dtype: torch.int64)
  torch.bool: tensor([ True, False]) (dtype: torch.bool)
  torch.float16 (half precision): tensor([1., 2.], dtype=torch.float16) (dtype: torch.float16)


Pro Tip: Different data types use different amounts of memory!

- float32 = 4 bytes per number
- float16 = 2 bytes per number (saves memory!)
- int64 = 8 bytes per number


## Basic Tensor Operations

Now let's learn the fundamental operations you can perform with tensors:


In [39]:
# Create some sample tensors
a = torch.tensor([1, 2, 3, 4])
b = torch.tensor([5, 6, 7, 8])
matrix_a = torch.tensor([[1, 2], [3, 4]])
matrix_b = torch.tensor([[5, 6], [7, 8]])

print(f"Tensor a: {a}")
print(f"Tensor b: {b}")

Tensor a: tensor([1, 2, 3, 4])
Tensor b: tensor([5, 6, 7, 8])


In [40]:
# Arithmetic operations
print("Arithmetic Operations:\n")
print(f"Addition: {a + b}")
print(f"Subtraction: {a - b}")
print(f"Multiplication: {a * b}")
print(f"Division: {a / b}")
print(f"Power: {a ** 2}")

Arithmetic Operations:

Addition: tensor([ 6,  8, 10, 12])
Subtraction: tensor([-4, -4, -4, -4])
Multiplication: tensor([ 5, 12, 21, 32])
Division: tensor([0.2000, 0.3333, 0.4286, 0.5000])
Power: tensor([ 1,  4,  9, 16])


In [41]:
print("Statistical Operations:\n")
print(f"Sum: {a.sum()}")
print(f"Mean: {a.float().mean():.2f}")
print(f"Max: {a.max()}")
print(f"Min: {a.min()}")
print(f"Standard deviation: {a.float().std():.2f}")

Statistical Operations:

Sum: 10
Mean: 2.50
Max: 4
Min: 1
Standard deviation: 1.29


In [42]:
print("Matrix Operations:\n")
print(f"Matrix A:\n{matrix_a}")
print(f"Matrix B:\n{matrix_b}")
print(f"\nMatrix multiplication:\n{torch.matmul(matrix_a, matrix_b)}")
print(f"\nTranspose of A:\n{matrix_a.T}")

Matrix Operations:

Matrix A:
tensor([[1, 2],
        [3, 4]])
Matrix B:
tensor([[5, 6],
        [7, 8]])

Matrix multiplication:
tensor([[19, 22],
        [43, 50]])

Transpose of A:
tensor([[1, 3],
        [2, 4]])


In [43]:
# Element-wise vs matrix multiplication
print("Element-wise vs Matrix Multiplication:")
print(f"Element-wise (A * B):\n{matrix_a * matrix_b}")
print(f"Matrix multiplication (A @ B):\n{matrix_a @ matrix_b}")

Element-wise vs Matrix Multiplication:
Element-wise (A * B):
tensor([[ 5, 12],
        [21, 32]])
Matrix multiplication (A @ B):
tensor([[19, 22],
        [43, 50]])


In [44]:
print("Key Difference:")
print("   * : Element-wise (each element multiplied independently)")
print("   @ : Matrix multiplication (linear algebra rules)")

Key Difference:
   * : Element-wise (each element multiplied independently)
   @ : Matrix multiplication (linear algebra rules)


<video width="800" controls autoplay muted loop>
  <source src='../12_assets/matrix_multiplication.mp4' type="video/mp4">
  Your browser does not support the video tag.
</video>


## Tensor Shape Manipulation - Deep Dive

Understanding how to manipulate tensor shapes is crucial for deep learning. Let's explore all the important operations in detail!


### reshape() vs view() vs resize\_()

These three methods can change tensor shapes, but they have important differences:


In [45]:
# Create a simple tensor
original = torch.arange(12)
print(f"Original tensor: {original}")
print(f"Shape: {original.shape}")

Original tensor: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
Shape: torch.Size([12])


In [46]:
# 1. reshape() - Always works, may return a copy
reshaped = original.reshape(3, 4)
reshaped

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

In [47]:
# 2. view() - Faster but requires contiguous tensor
viewed = original.view(3, 4)
viewed

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

In [48]:
# 3. resize_() - In-place operation, changes the original!
to_resize = original.clone()
to_resize.resize_(3, 4)
to_resize

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

### squeeze() and unsqueeze() - In Depth

These operations add or remove dimensions of size 1. They're essential for matching shapes in neural networks!


In [49]:
# Create a tensor with extra dimensions
tensor_with_ones = torch.rand(1, 3, 1, 4, 1)
print(f"Original tensor shape: {tensor_with_ones.shape}")
print(f"Dimensions: {tensor_with_ones.ndim}")

Original tensor shape: torch.Size([1, 3, 1, 4, 1])
Dimensions: 5


In [50]:
# squeeze() - Remove all dimensions of size 1
squeezed = tensor_with_ones.squeeze()
print(f"{tensor_with_ones.shape} -> {squeezed.shape}")

torch.Size([1, 3, 1, 4, 1]) -> torch.Size([3, 4])


In [51]:
# squeeze(dim) - Remove specific dimension
squeeze_dim0 = tensor_with_ones.squeeze(0)  # Remove first dimension
squeeze_dim2 = tensor_with_ones.squeeze(2)  # Remove third dimension
print(f"{tensor_with_ones.shape} ‚Üí {squeeze_dim0.shape}")
print(f"{tensor_with_ones.shape} ‚Üí {squeeze_dim2.shape}")

torch.Size([1, 3, 1, 4, 1]) ‚Üí torch.Size([3, 1, 4, 1])
torch.Size([1, 3, 1, 4, 1]) ‚Üí torch.Size([1, 3, 4, 1])


In [52]:
# unsqueeze() - Add dimension of size 1
base = torch.tensor([1, 2, 3, 4])

# Add dimension at different positions
unsqueeze_0 = base.unsqueeze(0)  # Add at beginning
unsqueeze_1 = base.unsqueeze(1)  # Add at end
unsqueeze_neg = base.unsqueeze(-1)  # Add at end (negative indexing)

print(f"unsqueeze(0): {base.shape} ‚Üí {unsqueeze_0.shape}")
print(f"unsqueeze(1): {base.shape} ‚Üí {unsqueeze_1.shape}")
print(f"unsqueeze(-1): {base.shape} ‚Üí {unsqueeze_neg.shape}")

unsqueeze(0): torch.Size([4]) ‚Üí torch.Size([1, 4])
unsqueeze(1): torch.Size([4]) ‚Üí torch.Size([4, 1])
unsqueeze(-1): torch.Size([4]) ‚Üí torch.Size([4, 1])


In [53]:
# Practical example: Preparing for batch processing
print("Practical Example: Batch Processing\n")
single_image = torch.rand(3, 224, 224)  # Single image: [channels, height, width]
print(f"Single image shape: {single_image.shape}")

# Add batch dimension
batched_image = single_image.unsqueeze(0)
print(f"After unsqueeze(0): {batched_image.shape}")

Practical Example: Batch Processing

Single image shape: torch.Size([3, 224, 224])
After unsqueeze(0): torch.Size([1, 3, 224, 224])


### transpose() and permute()

These operations rearrange dimensions, crucial for matching expected input shapes in neural networks.


In [54]:
# Create a sample tensor
tensor_3d = torch.rand(2, 3, 4)
tensor_3d.shape

torch.Size([2, 3, 4])

In [55]:
# transpose() - Swap two dimensions
trans_0_1 = tensor_3d.transpose(0, 1)
trans_0_2 = tensor_3d.transpose(0, 2)
trans_1_2 = tensor_3d.transpose(1, 2)

print(f"transpose(0, 1): {tensor_3d.shape} ‚Üí {trans_0_1.shape}")
print(f"transpose(0, 2): {tensor_3d.shape} ‚Üí {trans_0_2.shape}")
print(f"transpose(1, 2): {tensor_3d.shape} ‚Üí {trans_1_2.shape}")

transpose(0, 1): torch.Size([2, 3, 4]) ‚Üí torch.Size([3, 2, 4])
transpose(0, 2): torch.Size([2, 3, 4]) ‚Üí torch.Size([4, 3, 2])
transpose(1, 2): torch.Size([2, 3, 4]) ‚Üí torch.Size([2, 4, 3])


In [56]:
# .T shortcut for 2D tensors
matrix = torch.rand(3, 4)
print(f"2D Matrix: {matrix.shape}")
print(f"matrix.T: {matrix.T.shape}")
print(f"matrix.transpose(0, 1): {matrix.transpose(0, 1).shape}")

2D Matrix: torch.Size([3, 4])
matrix.T: torch.Size([4, 3])
matrix.transpose(0, 1): torch.Size([4, 3])


In [57]:
# permute() - Rearrange all dimensions at once
print(f"Original: {tensor_3d.shape}")

perm1 = tensor_3d.permute(2, 0, 1)  # [dim2, dim0, dim1]
perm2 = tensor_3d.permute(1, 2, 0)  # [dim1, dim2, dim0]
perm3 = tensor_3d.permute(2, 1, 0)  # [dim2, dim1, dim0] - reverse order

print(f"permute(2, 0, 1): {tensor_3d.shape} ‚Üí {perm1.shape}")
print(f"permute(1, 2, 0): {tensor_3d.shape} ‚Üí {perm2.shape}")
print(f"permute(2, 1, 0): {tensor_3d.shape} ‚Üí {perm3.shape}")

Original: torch.Size([2, 3, 4])
permute(2, 0, 1): torch.Size([2, 3, 4]) ‚Üí torch.Size([4, 2, 3])
permute(1, 2, 0): torch.Size([2, 3, 4]) ‚Üí torch.Size([3, 4, 2])
permute(2, 1, 0): torch.Size([2, 3, 4]) ‚Üí torch.Size([4, 3, 2])


In [58]:
# Practical example: Image format conversion
print("Practical Example: Image Format Conversion\n")

# PyTorch uses [batch, channels, height, width]
pytorch_format = torch.rand(32, 3, 224, 224)
print(f"PyTorch format (NCHW): {pytorch_format.shape}")
print("[batch, channels, height, width]\n")

# Some libraries use [batch, height, width, channels]
other_format = pytorch_format.permute(0, 2, 3, 1)
print(f"Other format (NHWC): {other_format.shape}")
print("[batch, height, width, channels]")

Practical Example: Image Format Conversion

PyTorch format (NCHW): torch.Size([32, 3, 224, 224])
[batch, channels, height, width]

Other format (NHWC): torch.Size([32, 224, 224, 3])
[batch, height, width, channels]


Key Differences between transpose() and permute():

- transpose() is a specialized operation for 2D tensors (matrices) that swaps two dimensions.
- permute() is a more general operation that can rearrange all dimensions of a tensor.

Use transpose() for simple swaps, and permute() for more complex rearrangements.


### Common Shape Errors and How to Fix Them

Let's look at common shape-related errors you'll encounter and how to solve them!


In [59]:
# Error 1: Matrix multiplication shape mismatch
a = torch.rand(3, 4)
b = torch.rand(3, 5)

print(f"Tensor a: {a.shape}")
print(f"Tensor b: {b.shape}")
print("\nTrying a @ b...")
try:
    result = a @ b
except RuntimeError as e:
    print(f"ERROR: {e}")

print("\nSolution: Transpose one of them")
# result = a @ b.T  # Now shapes are (3,4) @ (5,3).T = (3,4) @ (3,5) - still wrong!

# Correct way:
b_correct = torch.rand(4, 5)
result = a @ b_correct
print(f"a @ b_correct: {a.shape} @ {b_correct.shape} = {result.shape}")
print("Rule: (m, n) @ (n, p) = (m, p)")

Tensor a: torch.Size([3, 4])
Tensor b: torch.Size([3, 5])

Trying a @ b...
ERROR: mat1 and mat2 shapes cannot be multiplied (3x4 and 3x5)

Solution: Transpose one of them
a @ b_correct: torch.Size([3, 4]) @ torch.Size([4, 5]) = torch.Size([3, 5])
Rule: (m, n) @ (n, p) = (m, p)


In [60]:
# Error 2: Missing batch dimension
single_sample = torch.rand(10)  # Single sample with 10 features
print(f"Single sample shape: {single_sample.shape}")
print("Neural networks expect: [batch_size, features]\n")

# Solution 1: unsqueeze
batched_v1 = single_sample.unsqueeze(0)
print(f"Solution 1 - unsqueeze(0): {batched_v1.shape}")

# Solution 2: indexing
batched_v2 = single_sample[None, :]  # or single_sample[np.newaxis, :]
print(f"Solution 2 - [None, :]: {batched_v2.shape}")

# Solution 3: reshape
batched_v3 = single_sample.reshape(1, -1)
print(f"Solution 3 - reshape(1, -1): {batched_v3.shape}")

Single sample shape: torch.Size([10])
Neural networks expect: [batch_size, features]

Solution 1 - unsqueeze(0): torch.Size([1, 10])
Solution 2 - [None, :]: torch.Size([1, 10])
Solution 3 - reshape(1, -1): torch.Size([1, 10])


In [61]:
# Error 3: Dimension mismatch in operations
x = torch.rand(32, 10)  # Batch of 32 samples, 10 features each
y = torch.rand(10)  # 10 values

print(f"x shape: {x.shape}")
print(f"y shape: {y.shape}")
print("\nWant to add y to each sample in x")

# This works thanks to broadcasting!
result = x + y
print(f"x + y works! Result shape: {result.shape}")
print("Broadcasting automatically expands y to (32, 10)\n")

# But this won't work:
y_wrong = torch.rand(32)
print(f"If y has shape {y_wrong.shape}:")
try:
    result = x + y_wrong.unsqueeze(0)  # Try to add (1, 32) to (32, 10)
except RuntimeError as e:
    print("ERROR: Shape mismatch!\n")

print("Solution: Make sure dimensions align")
y_correct = y_wrong.unsqueeze(1)  # (32,) ‚Üí (32, 1)
result = x + y_correct  # (32, 10) + (32, 1) broadcasts to (32, 10)
print(f"x + y_correct.unsqueeze(1): {result.shape}")

x shape: torch.Size([32, 10])
y shape: torch.Size([10])

Want to add y to each sample in x
x + y works! Result shape: torch.Size([32, 10])
Broadcasting automatically expands y to (32, 10)

If y has shape torch.Size([32]):
ERROR: Shape mismatch!

Solution: Make sure dimensions align
x + y_correct.unsqueeze(1): torch.Size([32, 10])


## Indexing and Slicing - Advanced

Just like NumPy arrays, you can select specific elements or parts of tensors. Let's explore basic and advanced indexing techniques:


In [62]:
data = torch.arange(24).reshape(4, 6)  # 4x6 matrix
data

tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17],
        [18, 19, 20, 21, 22, 23]])

### Basic Indexing


In [63]:
print(f"First row: {data[0]}")
print(f"Last row: {data[-1]}")
print(f"Element at position [1, 3]: {data[1, 3]}")
print(f"First element of each row: {data[:, 0]}")

First row: tensor([0, 1, 2, 3, 4, 5])
Last row: tensor([18, 19, 20, 21, 22, 23])
Element at position [1, 3]: 9
First element of each row: tensor([ 0,  6, 12, 18])


### Slicing


In [64]:
print("First 2 rows:")
print(data[:2])

First 2 rows:
tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11]])


In [65]:
print("Last 3 columns:")
print(data[:, -3:])

Last 3 columns:
tensor([[ 3,  4,  5],
        [ 9, 10, 11],
        [15, 16, 17],
        [21, 22, 23]])


In [66]:
print("Middle 2x3 submatrix:")
print(data[1:3, 2:5])

Middle 2x3 submatrix:
tensor([[ 8,  9, 10],
        [14, 15, 16]])


In [67]:
print("Every other row and column:")
print(data[::2, ::2])

Every other row and column:
tensor([[ 0,  2,  4],
        [12, 14, 16]])


### Boolean Indexing


In [68]:
mask = data > 10
print(f"Elements greater than 10: {data[mask]}")

Elements greater than 10: tensor([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])


In [69]:
print(f"Number of elements > 10: {(data > 10).sum()}")

Number of elements > 10: 13


### Advanced Indexing Techniques


In [70]:
# Ellipsis (...) indexing
tensor_4d = torch.rand(2, 3, 4, 5)
tensor_4d.shape, tensor_4d

(torch.Size([2, 3, 4, 5]),
 tensor([[[[0.7908, 0.1824, 0.7160, 0.9256, 0.2808],
           [0.1740, 0.2295, 0.2597, 0.2145, 0.2552],
           [0.7239, 0.4379, 0.0972, 0.3034, 0.2554],
           [0.1236, 0.0605, 0.7103, 0.7975, 0.1538]],
 
          [[0.2465, 0.5959, 0.2569, 0.8514, 0.4644],
           [0.2938, 0.5732, 0.4248, 0.3513, 0.3371],
           [0.1525, 0.2905, 0.6000, 0.1701, 0.0221],
           [0.4289, 0.2826, 0.5970, 0.7360, 0.3822]],
 
          [[0.3544, 0.8231, 0.8708, 0.5377, 0.0314],
           [0.3406, 0.8524, 0.3711, 0.7938, 0.6055],
           [0.8699, 0.4564, 0.1173, 0.4095, 0.9201],
           [0.2632, 0.0147, 0.1753, 0.6964, 0.6919]]],
 
 
         [[[0.1353, 0.5035, 0.8191, 0.6948, 0.6619],
           [0.8941, 0.4973, 0.5123, 0.4088, 0.5292],
           [0.9059, 0.6570, 0.1745, 0.5709, 0.5902],
           [0.6171, 0.2286, 0.0893, 0.9714, 0.2501]],
 
          [[0.7886, 0.5231, 0.9759, 0.1455, 0.9904],
           [0.4901, 0.3624, 0.0062, 0.6042, 0.3608],
    

In [71]:
# Select all but last dimension
result1 = tensor_4d[..., 0]  # Same as tensor_4d[:, :, :, 0]
result1.shape, tensor_4d

(torch.Size([2, 3, 4]),
 tensor([[[[0.7908, 0.1824, 0.7160, 0.9256, 0.2808],
           [0.1740, 0.2295, 0.2597, 0.2145, 0.2552],
           [0.7239, 0.4379, 0.0972, 0.3034, 0.2554],
           [0.1236, 0.0605, 0.7103, 0.7975, 0.1538]],
 
          [[0.2465, 0.5959, 0.2569, 0.8514, 0.4644],
           [0.2938, 0.5732, 0.4248, 0.3513, 0.3371],
           [0.1525, 0.2905, 0.6000, 0.1701, 0.0221],
           [0.4289, 0.2826, 0.5970, 0.7360, 0.3822]],
 
          [[0.3544, 0.8231, 0.8708, 0.5377, 0.0314],
           [0.3406, 0.8524, 0.3711, 0.7938, 0.6055],
           [0.8699, 0.4564, 0.1173, 0.4095, 0.9201],
           [0.2632, 0.0147, 0.1753, 0.6964, 0.6919]]],
 
 
         [[[0.1353, 0.5035, 0.8191, 0.6948, 0.6619],
           [0.8941, 0.4973, 0.5123, 0.4088, 0.5292],
           [0.9059, 0.6570, 0.1745, 0.5709, 0.5902],
           [0.6171, 0.2286, 0.0893, 0.9714, 0.2501]],
 
          [[0.7886, 0.5231, 0.9759, 0.1455, 0.9904],
           [0.4901, 0.3624, 0.0062, 0.6042, 0.3608],
       

In [72]:
# Select from first and last dimension
result2 = tensor_4d[0, ..., 0]  # Same as tensor_4d[0, :, :, 0]
result2.shape, result2

(torch.Size([3, 4]),
 tensor([[0.7908, 0.1740, 0.7239, 0.1236],
         [0.2465, 0.2938, 0.1525, 0.4289],
         [0.3544, 0.3406, 0.8699, 0.2632]]))

In [73]:
# torch.where() - conditional selection
x = torch.randn(3, 4)
x

tensor([[-0.1807, -0.1213, -0.0417, -0.1967],
        [-1.9986, -0.3001, -0.4529, -0.7093],
        [ 0.0046, -0.9967,  0.2855,  0.1409]])

In [74]:
# Replace negative values with 0
result = torch.where(x > 0, x, torch.tensor(0.0))
result

tensor([[0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000],
        [0.0046, 0.0000, 0.2855, 0.1409]])

In [75]:
# Get indices of elements satisfying condition
indices = torch.where(x > 0)
indices

(tensor([2, 2, 2]), tensor([0, 2, 3]))

In [76]:
# torch.index_select() - Select along a dimension
tensor = torch.rand(4, 5)
indices = torch.tensor([0, 2, 3])

selected_rows = torch.index_select(tensor, dim=0, index=indices)
selected_cols = torch.index_select(tensor, dim=1, index=indices)

print(f"Original: {tensor.shape}")
print(f"Select rows [0, 2, 3]: {selected_rows.shape}")
print(f"Select cols [0, 2, 3]: {selected_cols.shape}")

Original: torch.Size([4, 5])
Select rows [0, 2, 3]: torch.Size([3, 5])
Select cols [0, 2, 3]: torch.Size([4, 3])


In [77]:
# torch.masked_select() - Select using boolean mask
x = torch.randn(3, 4)
mask = x > 0
selected = torch.masked_select(x, mask)
print(f"Original shape: {x.shape}")
print(f"Selected (x > 0): {selected}")
print(f"Selected shape: {selected.shape} (flattened!)")

Original shape: torch.Size([3, 4])
Selected (x > 0): tensor([0.7016])
Selected shape: torch.Size([1]) (flattened!)


## Practice Exercises

Test your understanding with these exercises! Try to solve them before looking at the solutions.


### Exercise 1: Shape Manipulation

Given a tensor of shape (10, 20, 30):

1. Add a batch dimension at the beginning
2. Swap the last two dimensions
3. Flatten the last two dimensions
4. Remove all dimensions of size 1


In [109]:
# Your code here
x = torch.rand(10, 20, 30)
print(f"Starting shape: {x.shape}\n")

# Solution
print("Solutions:")
step1 = x.unsqueeze(0)
print(f"1. Add batch dimension: {step1.shape}")

step2 = x.transpose(-2, -1)  # or transpose(1, 2)
print(f"2. Swap last two dims: {step2.shape}")

step3 = x.flatten(start_dim=-2)  # or reshape(10, -1)
print(f"3. Flatten last two dims: {step3.shape}")

x_with_ones = x.unsqueeze(0).unsqueeze(2)
step4 = x_with_ones.squeeze()
print(f"4. After squeeze: {x_with_ones.shape} ‚Üí {step4.shape}")

Starting shape: torch.Size([10, 20, 30])

Solutions:
1. Add batch dimension: torch.Size([1, 10, 20, 30])
2. Swap last two dims: torch.Size([10, 30, 20])
3. Flatten last two dims: torch.Size([10, 600])
4. After squeeze: torch.Size([1, 10, 1, 20, 30]) ‚Üí torch.Size([10, 20, 30])


### Exercise 2: Broadcasting Challenge

Given:

- tensor A with shape (32, 1, 10)
- tensor B with shape (1, 5, 10)

Perform element-wise multiplication. What will the output shape be?


In [110]:
# Your code here
A = torch.rand(32, 1, 10)
B = torch.rand(1, 5, 10)

# Solution
print("Solution:")
result = A * B
print(f"A shape: {A.shape}")
print(f"B shape: {B.shape}")
print(f"Result shape: {result.shape}")
print("\nExplanation: Broadcasting aligns dimensions from right to left")
print("A: (32, 1, 10) ‚Üí broadcasts to (32, 5, 10)")
print("B: ( 1, 5, 10) ‚Üí broadcasts to (32, 5, 10)")

Solution:
A shape: torch.Size([32, 1, 10])
B shape: torch.Size([1, 5, 10])
Result shape: torch.Size([32, 5, 10])

Explanation: Broadcasting aligns dimensions from right to left
A: (32, 1, 10) ‚Üí broadcasts to (32, 5, 10)
B: ( 1, 5, 10) ‚Üí broadcasts to (32, 5, 10)


### Exercise 3: Matrix Multiplication Challenge

You have:

- Batch of images: (64, 3, 224, 224)
- Weight matrix: (50176, 1000)

Reshape images and multiply to get (64, 1000) output.
Note: 50176 = 3 _ 224 _ 224


In [111]:
# Your code here
images = torch.rand(64, 3, 224, 224)
weights = torch.rand(50176, 1000)

# Solution
print("Solution:")
print(f"Images shape: {images.shape}")
print(f"Weights shape: {weights.shape}")

# Step 1: Flatten each image
images_flat = images.flatten(start_dim=1)  # Keep batch dimension
print(f"\nAfter flattening: {images_flat.shape}")

# Step 2: Matrix multiplication
# output = images_flat @ weights
# print(f"After matmul: {output.shape}")

print("\nThis is exactly what happens in neural networks!")

Solution:
Images shape: torch.Size([64, 3, 224, 224])
Weights shape: torch.Size([50176, 1000])

After flattening: torch.Size([64, 150528])

This is exactly what happens in neural networks!


### Exercise 4: Memory Efficiency

Create a function that checks if an operation created a view or copy.
Test it with: transpose, reshape, clone, and contiguous.


In [112]:
def shares_memory(tensor1, tensor2):
    """Check if two tensors share memory"""
    return tensor1.storage().data_ptr() == tensor2.storage().data_ptr()


# Solution
x = torch.arange(12).reshape(3, 4)
print(f"Original tensor shape: {x.shape}\n")

operations = [
    ("transpose", x.transpose(0, 1)),
    ("reshape", x.reshape(2, 6)),
    ("clone", x.clone()),
    ("contiguous", x.transpose(0, 1).contiguous()),
]

for name, result in operations:
    is_view = shares_memory(x, result)
    print(f"{name:15s}: {'VIEW (shares memory)' if is_view else 'COPY (new memory)'}")

Original tensor shape: torch.Size([3, 4])

transpose      : VIEW (shares memory)
reshape        : VIEW (shares memory)
clone          : COPY (new memory)
contiguous     : COPY (new memory)


### Exercise 5: Real-World Scenario

You're building an image classifier that processes batches of images.

Task: Write a function that:

1. Takes a single image (3, 224, 224)
2. Normalizes it to [-1, 1] range
3. Adds batch dimension
4. Returns it ready for model input (1, 3, 224, 224)


In [113]:
def prepare_image(image):
    """
    Prepare a single image for model input

    Args:
        image: Tensor of shape (3, 224, 224) with values in [0, 1]

    Returns:
        Tensor of shape (1, 3, 224, 224) with values in [-1, 1]
    """
    # Your code here
    # Step 1: Normalize to [-1, 1]
    normalized = image * 2 - 1

    # Step 2: Add batch dimension
    batched = normalized.unsqueeze(0)

    return batched


# Test the function
test_image = torch.rand(3, 224, 224)  # Random image in [0, 1]
print(f"Input shape: {test_image.shape}")
print(f"Input range: [{test_image.min():.3f}, {test_image.max():.3f}]")

prepared = prepare_image(test_image)
print(f"\nOutput shape: {prepared.shape}")
print(f"Output range: [{prepared.min():.3f}, {prepared.max():.3f}]")
print("\nPerfect! Ready for model input.")

Input shape: torch.Size([3, 224, 224])
Input range: [0.000, 1.000]

Output shape: torch.Size([1, 3, 224, 224])
Output range: [-1.000, 1.000]

Perfect! Ready for model input.


## Summary & What's Next

Congratulations! You've mastered the fundamentals of PyTorch!

### What You Learned:

1. **PyTorch Basics**:

   - What PyTorch is and why it's powerful
   - Device management (CPU vs GPU)
   - Setting up your environment

2. **Tensors Mastery**:

   - Creating tensors in multiple ways
   - Understanding tensor properties (shape, dtype, device)
   - Basic operations and mathematical functions
   - Advanced shape manipulation (reshape, view, squeeze, unsqueeze, transpose, permute)
   - Indexing and slicing

3. **Memory and Performance**:

   - Contiguous vs non-contiguous tensors
   - When copying happens
   - clone() vs detach() vs copy\_()
   - Memory-efficient operations

4. **Debugging Skills**:

   - Common shape errors and solutions
   - Tensor inspection techniques
   - Comprehensive troubleshooting guide

5. **Random Numbers**:
   - Different distributions (uniform, normal, etc.)
   - Setting seeds for reproducibility
   - Why randomness matters in ML

### What's Next?

Now that you have a solid foundation in PyTorch tensors, you're ready to:

1. **Build your first neural network** - See the next notebook for a comprehensive introduction to neural networks
2. **Learn about automatic differentiation** with PyTorch's autograd
3. **Explore advanced tensor operations** and linear algebra
4. **Dive into deep learning architectures**

### Key Takeaways:

- **Tensors are the foundation** of everything in PyTorch
- **Understanding shapes** is crucial - when in doubt, print the shape!
- **Device management** is important for performance
- **Memory efficiency** matters - use views when possible
- **Debugging systematically** saves time - inspect tensors thoroughly

**Ready to build neural networks? Continue to the next notebook!**


## Additional Resources for Going Deeper

Want to master PyTorch fundamentals? Here are curated resources to help you dive deeper into tensors, operations, and deep learning with PyTorch.

### üì∫ Video Tutorials & Courses

**1. Zero to Mastery - Learn PyTorch for Deep Learning**

- 25 hours of beginner-friendly material available on YouTube
- All course materials available for free as an online book
- Code examples in Google Colab notebooks
- [Website](https://www.learnpytorch.io/) | [GitHub Repository](https://github.com/mrdbourke/pytorch-deep-learning)

**2. Official PyTorch YouTube Series**

- Self-contained examples introducing fundamental PyTorch concepts
- Follows the official PyTorch Beginner Series
- [Tutorial Documentation](https://docs.pytorch.org/tutorials/beginner/introyt/)

### üìö Articles & Written Guides

**1. A Beginner's Guide to Tensor Operations in PyTorch (Medium)**

- Complete walkthrough of tensor initialization, operations, indexing, and reshaping
- Covers broadcasting and GPU management
- [Read Article](https://medium.com/@piyushkashyap045/a-beginners-guide-to-tensor-operations-in-pytorch-learn-the-basics-and-beyond-c32d53b28292)

**2. How to Learn PyTorch From Scratch in 2026 (DataCamp)**

- Step-by-step tutorials with practical tips
- 8-week learning plan to master deep learning
- [Read Guide](https://www.datacamp.com/blog/how-to-learn-pytorch)

**3. PyTorch in One Hour (Sebastian Raschka, PhD)**

- Fast-paced guide from tensors to training neural networks on multiple GPUs
- Perfect for those who want a quick but comprehensive overview
- [Read Article](https://sebastianraschka.com/teaching/pytorch-1h/)

**4. Tensor Operations in PyTorch (GeeksforGeeks)**

- Comprehensive coverage of tensor operations
- Great for quick reference
- [Read Article](https://www.geeksforgeeks.org/tensor-operations-in-pytorch/)

**5. Manipulating Tensors in PyTorch (MachineLearningMastery.com)**

- Practical guide to tensor manipulation
- [Read Article](https://machinelearningmastery.com/manipulating-tensors-in-pytorch/)

### üîß Official Documentation

**1. PyTorch Tensors Tutorial**

- Over 1,200 tensor operations comprehensively described
- Includes arithmetic, linear algebra, matrix manipulation, sampling, and more
- [Official Documentation](https://docs.pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html)

**2. Introduction to PyTorch Tensors (Deep Dive)**

- Deeper exploration of PyTorch tensors
- [Official Tutorial](https://docs.pytorch.org/tutorials/beginner/introyt/tensors_deeper_tutorial.html)

**3. Broadcasting Semantics**

- Essential reading for understanding tensor shape compatibility
- Explains broadcasting rules and behavior in detail
- [Official Documentation](https://docs.pytorch.org/docs/stable/notes/broadcasting.html)

**4. Learn the Basics - PyTorch Tutorial Series**

- Official quickstart guide covering tensors, datasets, models, and more
- [Start Learning](https://docs.pytorch.org/tutorials/beginner/basics/intro.html)

### üéì Interactive Courses (Free & Paid)

**1. DataCamp - Introduction to Deep Learning with PyTorch**

- Build neural networks and tackle classification/regression problems
- Interactive coding environment
- [Start Course](https://www.datacamp.com/courses/introduction-to-deep-learning-with-pytorch)

**2. Coursera - Deep Neural Networks with PyTorch**

- Introduction to Neural Networks and PyTorch
- Free to audit
- [Course Link](https://www.coursera.org/learn/deep-neural-networks-with-pytorch)

**3. Scaler - PyTorch for Deep Learning Certification (FREE)**

- Free certification course designed for beginners
- [Enroll Now](https://www.scaler.com/topics/course/pytorch-for-deep-learning-free-course/)

### üí° Practice Resources

**1. CodingNomads - Broadcasting with PyTorch**

- Interactive tutorials on broadcasting
- [Learn Broadcasting](https://codingnomads.com/broadcasting-with-pytorch)

**2. The PyTorch Book - Broadcasting Chapter**

- Detailed examples and explanations
- [Read Chapter](https://aayushmnit.com/pytorch_book/nbs/3_broadcasting.html)

### üöÄ Next Steps

After mastering the fundamentals:

1. Move to the next notebook in this course for neural networks
2. Build small projects to practice (MNIST classifier, image recognition, etc.)
3. Join the PyTorch community forums
4. Contribute to open-source PyTorch projects
5. Follow PyTorch on social media for latest updates

**Pro Tip**: The best way to learn is by doing! Try implementing the concepts you've learned in this notebook using the resources above, then build your own projects.

Happy Learning! üî•
