# Introduction to PyTorch: From Tensors to Training Neural Networks

**Based on Sebastian Raschka's PyTorch Tutorial**

This notebook will introduce you to the essential concepts of PyTorch in a hands-on, interactive way. By the end of this tutorial, you'll understand:

1. What PyTorch is and why it's popular
2. Tensors - the fundamental data structure
3. Automatic differentiation (autograd)
4. Building neural networks
5. Training models with a typical training loop
6. Working with GPUs

---

## Part 1: What is PyTorch?

PyTorch is an open-source Python deep learning library that has become the most widely used framework for  and model development since 2019. It offers the perfect balance between:
- **Ease of use**: Intuitive, Python-native API
- **Flexibility**: Full control for customization
- **Performance**: GPU acceleration for fast training

### The Three Core Components of PyTorch:

1. **Tensor Library**: Like NumPy but with GPU support
2. **Automatic Differentiation (Autograd)**: Computes gradients automatically for backpropagation
3. **Deep Learning Library**: Pre-built modules, loss functions, and optimizers

Let's start by installing and checking PyTorch.

In [None]:
# Install PyTorch (uncomment if needed)
# !pip install torch

# Import PyTorch
import torch
import numpy as np

# Check PyTorch version
print(f"PyTorch version: {torch.__version__}")

# Check if GPU is available
print(f"CUDA (GPU) available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"Number of GPUs: {torch.cuda.device_count()}")
    print(f"Current GPU: {torch.cuda.get_device_name(0)}")

---

## Part 2: Understanding Tensors

**What are tensors?**

Tensors are the fundamental data structure in PyTorch - they're multi-dimensional arrays that can store data:

- **Scalar (0D tensor)**: Just a number (e.g., 5)
- **Vector (1D tensor)**: Array of numbers (e.g., [1, 2, 3])
- **Matrix (2D tensor)**: Table of numbers (e.g., [[1, 2], [3, 4]])
- **3D+ tensors**: Higher-dimensional arrays

### Creating Tensors

In [None]:
# Create a scalar (0D tensor)
tensor0d = torch.tensor(1)
print(f"Scalar: {tensor0d}")
print(f"Shape: {tensor0d.shape}\n")

# Create a vector (1D tensor)
tensor1d = torch.tensor([1, 2, 3])
print(f"Vector: {tensor1d}")
print(f"Shape: {tensor1d.shape}\n")

# Create a matrix (2D tensor)
tensor2d = torch.tensor([[1, 2, 3],
                         [4, 5, 6]])
print(f"Matrix:\n{tensor2d}")
print(f"Shape: {tensor2d.shape}\n")

# Create a 3D tensor
tensor3d = torch.tensor([[[1, 2], [3, 4]], 
                         [[5, 6], [7, 8]]])
print(f"3D Tensor:\n{tensor3d}")
print(f"Shape: {tensor3d.shape}")

### Exercise 1: Create Your Own Tensors

**TODO**: Create the following tensors:
1. A scalar with value 42
2. A 1D tensor with values [10, 20, 30, 40, 50]
3. A 2Ã—2 identity matrix [[1, 0], [0, 1]]

In [None]:
# Your code here
my_scalar = # TODO
my_vector = # TODO
my_matrix = # TODO

print(f"My scalar: {my_scalar}")
print(f"My vector: {my_vector}")
print(f"My matrix:\n{my_matrix}")

### Tensor Data Types

PyTorch tensors have data types (dtypes) that specify how the data is stored:
- **Integers**: `torch.int64` (default for Python integers)
- **Floats**: `torch.float32` (default for Python floats, most common in deep learning)

**Why float32?** It balances precision and computational efficiency, and GPUs are optimized for it.

In [None]:
# Check data types
int_tensor = torch.tensor([1, 2, 3])
print(f"Integer tensor dtype: {int_tensor.dtype}")

float_tensor = torch.tensor([1.0, 2.0, 3.0])
print(f"Float tensor dtype: {float_tensor.dtype}")

# Convert data types
converted = int_tensor.to(torch.float32)
print(f"Converted dtype: {converted.dtype}")

### Common Tensor Operations

PyTorch provides a NumPy-like API for tensor operations.

In [None]:
# Create a 2D tensor
tensor = torch.tensor([[1, 2, 3],
                       [4, 5, 6]])

print(f"Original tensor:\n{tensor}")
print(f"Shape: {tensor.shape}\n")

# Reshape/View - change dimensions
reshaped = tensor.view(3, 2)
print(f"Reshaped (3Ã—2):\n{reshaped}\n")

# Transpose - flip across diagonal
transposed = tensor.T
print(f"Transposed:\n{transposed}\n")

# Matrix multiplication
result = tensor @ tensor.T
print(f"Matrix multiplication (tensor @ tensor.T):\n{result}")

### Exercise 2: Tensor Operations

**TODO**: Given the tensor below:
1. Reshape it to shape (4, 3)
2. Compute the transpose
3. Find the sum of all elements using `.sum()`

In [None]:
practice_tensor = torch.tensor([[1, 2, 3, 4],
                                [5, 6, 7, 8],
                                [9, 10, 11, 12]])

# Your code here
reshaped = # TODO
transposed = # TODO
total_sum = # TODO

print(f"Reshaped:\n{reshaped}\n")
print(f"Transposed:\n{transposed}\n")
print(f"Sum of all elements: {total_sum}")

---

## Part 3: Computation Graphs and Automatic Differentiation

### What is a Computation Graph?

A computation graph tracks the sequence of operations needed to compute an output. PyTorch builds this graph automatically to compute gradients during backpropagation.

**Example**: Simple logistic regression forward pass
```
z = x1 * w1 + b     (net input)
a = sigmoid(z)      (activation)
loss = BCE(a, y)    (binary cross-entropy loss)
```

### Automatic Differentiation (Autograd)

PyTorch's autograd engine automatically computes gradients - no manual calculus needed!

In [None]:
import torch.nn.functional as F

# Set up a simple computation
y = torch.tensor([1.0])  # true label
x1 = torch.tensor([1.1])  # input feature

# Parameters - requires_grad=True enables gradient tracking
w1 = torch.tensor([2.2], requires_grad=True)
b = torch.tensor([0.0], requires_grad=True)

# Forward pass
z = x1 * w1 + b
a = torch.sigmoid(z)
loss = F.binary_cross_entropy(a, y)

print(f"Loss: {loss.item():.4f}")

# Backward pass - compute gradients automatically!
loss.backward()

print(f"Gradient w.r.t. w1: {w1.grad}")
print(f"Gradient w.r.t. b: {b.grad}")

**Key Points**:
- `requires_grad=True`: Tells PyTorch to track operations for gradient computation
- `.backward()`: Computes all gradients automatically
- `.grad`: Stores the computed gradient for each tensor

This is the magic that makes training neural networks easy!

### Exercise 3: Manual Gradient Computation

**TODO**: Complete the forward pass and compute gradients
1. Compute `z = x * w + b`
2. Compute the loss: `loss = (z - target)**2` (mean squared error)
3. Call `.backward()` to compute gradients
4. Print the gradients

In [None]:
# Setup
x = torch.tensor([2.0])
target = torch.tensor([5.0])
w = torch.tensor([1.0], requires_grad=True)
b = torch.tensor([0.0], requires_grad=True)

# Your code here
z = # TODO: compute z = x * w + b
loss = # TODO: compute loss = (z - target)**2

# TODO: call backward to compute gradients

print(f"Loss: {loss.item():.4f}")
print(f"Gradient w.r.t. w: {w.grad}")
print(f"Gradient w.r.t. b: {b.grad}")

---

## Part 4: Building Neural Networks

In PyTorch, we build neural networks by subclassing `torch.nn.Module`. This gives us:
- Automatic parameter tracking
- Easy layer composition
- Built-in training/evaluation modes

### Anatomy of a PyTorch Model

```python
class MyModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # Define layers here
    
    def forward(self, x):
        # Define forward pass here
        return output
```

In [None]:
import torch.nn as nn

class NeuralNetwork(nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super().__init__()
        self.layers = nn.Sequential(
            # First hidden layer
            nn.Linear(num_inputs, 30),
            nn.ReLU(),
            
            # Second hidden layer
            nn.Linear(30, 20),
            nn.ReLU(),
            
            # Output layer
            nn.Linear(20, num_outputs)
        )
    
    def forward(self, x):
        return self.layers(x)

# Create a model
model = NeuralNetwork(num_inputs=50, num_outputs=3)
print(model)

# Count parameters
num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nTotal trainable parameters: {num_params}")

### Making Predictions

To use the model, we pass data through it. The model returns **logits** (raw scores), which we can convert to probabilities.

In [None]:
# Set random seed for reproducibility
torch.manual_seed(123)

# Create random input (1 sample, 50 features)
X = torch.rand((1, 50))

# Forward pass - get logits
logits = model(X)
print(f"Logits: {logits}\n")

# Convert to probabilities with softmax
probabilities = torch.softmax(logits, dim=1)
print(f"Probabilities: {probabilities}")
print(f"Sum of probabilities: {probabilities.sum():.4f}")

# Get predicted class
predicted_class = torch.argmax(probabilities, dim=1)
print(f"Predicted class: {predicted_class.item()}")

### Exercise 4: Build Your Own Network

**TODO**: Create a neural network with:
- Input size: 10
- Hidden layer 1: 64 neurons with ReLU activation
- Hidden layer 2: 32 neurons with ReLU activation  
- Output size: 5

In [None]:
class MyNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        # TODO: Define your layers using nn.Sequential
        self.layers = nn.Sequential(
            # Your layers here
        )
    
    def forward(self, x):
        return self.layers(x)

# Create and test your model
my_model = MyNetwork()
print(my_model)

# Test with random input
test_input = torch.rand((1, 10))
output = my_model(test_input)
print(f"\nOutput shape: {output.shape}")

---

## Part 5: Data Loading

PyTorch provides `Dataset` and `DataLoader` classes for efficient data loading:

1. **Dataset**: Defines how to access individual samples
2. **DataLoader**: Batches data, shuffles, and loads in parallel

### Creating a Custom Dataset

In [None]:
from torch.utils.data import Dataset, DataLoader

class ToyDataset(Dataset):
    def __init__(self, X, y):
        self.features = X
        self.labels = y
    
    def __getitem__(self, index):
        # Return one sample
        return self.features[index], self.labels[index]
    
    def __len__(self):
        # Return dataset size
        return len(self.labels)

# Create toy data
X_train = torch.tensor([[-1.2, 3.1],
                        [-0.9, 2.9],
                        [-0.5, 2.6],
                        [2.3, -1.1],
                        [2.7, -1.5]])
y_train = torch.tensor([0, 0, 0, 1, 1])

X_test = torch.tensor([[-0.8, 2.8],
                       [2.6, -1.6]])
y_test = torch.tensor([0, 1])

# Create datasets
train_dataset = ToyDataset(X_train, y_train)
test_dataset = ToyDataset(X_test, y_test)

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")

### Creating DataLoaders

In [None]:
# Set seed for reproducible shuffling
torch.manual_seed(123)

# Create data loaders
train_loader = DataLoader(
    dataset=train_dataset,
    batch_size=2,
    shuffle=True,
    drop_last=True  # Drop last incomplete batch
)

test_loader = DataLoader(
    dataset=test_dataset,
    batch_size=2,
    shuffle=False
)

# Iterate over batches
print("Training batches:")
for batch_idx, (features, labels) in enumerate(train_loader):
    print(f"Batch {batch_idx + 1}: Features shape={features.shape}, Labels={labels}")

---

## Part 6: The Training Loop

Now we combine everything: model, data, loss function, and optimizer.

### Key Components:
1. **Model**: The neural network
2. **Loss Function**: Measures prediction error (e.g., cross-entropy)
3. **Optimizer**: Updates weights to minimize loss (e.g., SGD, Adam)
4. **Training Loop**: Iterate over data, compute loss, update weights

In [None]:
import torch.nn.functional as F

# Set seed for reproducibility
torch.manual_seed(123)

# Create model (2 inputs, 2 outputs for binary classification)
model = NeuralNetwork(num_inputs=2, num_outputs=2)

# Define optimizer (Stochastic Gradient Descent)
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)

# Training loop
num_epochs = 3

for epoch in range(num_epochs):
    model.train()  # Set model to training mode
    
    for batch_idx, (features, labels) in enumerate(train_loader):
        # Forward pass
        logits = model(features)
        loss = F.cross_entropy(logits, labels)
        
        # Backward pass
        optimizer.zero_grad()  # Reset gradients
        loss.backward()  # Compute gradients
        optimizer.step()  # Update weights
        
        # Logging
        print(f"Epoch {epoch+1}/{num_epochs} | "
              f"Batch {batch_idx+1}/{len(train_loader)} | "
              f"Loss: {loss.item():.4f}")

print("\nTraining complete!")

### Model Evaluation

After training, we evaluate the model's accuracy.

In [None]:
def compute_accuracy(model, dataloader):
    model.eval()  # Set to evaluation mode
    correct = 0
    total = 0
    
    with torch.no_grad():  # Disable gradient computation
        for features, labels in dataloader:
            logits = model(features)
            predictions = torch.argmax(logits, dim=1)
            correct += (predictions == labels).sum().item()
            total += len(labels)
    
    return correct / total

# Evaluate on training and test sets
train_acc = compute_accuracy(model, train_loader)
test_acc = compute_accuracy(model, test_loader)

print(f"Training Accuracy: {train_acc * 100:.2f}%")
print(f"Test Accuracy: {test_acc * 100:.2f}%")

### Exercise 5: Complete Training Loop

**TODO**: Fill in the missing parts of the training loop below

In [None]:
# Create a fresh model
torch.manual_seed(42)
student_model = NeuralNetwork(num_inputs=2, num_outputs=2)

# TODO: Create an Adam optimizer with learning rate 0.01
optimizer = # TODO

num_epochs = 5
for epoch in range(num_epochs):
    student_model.train()
    
    for features, labels in train_loader:
        # TODO: Forward pass - compute logits
        logits = # TODO
        
        # TODO: Compute cross-entropy loss
        loss = # TODO
        
        # TODO: Zero gradients
        # TODO
        
        # TODO: Backward pass
        # TODO
        
        # TODO: Update weights
        # TODO
    
    # Print loss every epoch
    print(f"Epoch {epoch+1}/{num_epochs} | Loss: {loss.item():.4f}")

# Evaluate
final_acc = compute_accuracy(student_model, test_loader)
print(f"\nFinal Test Accuracy: {final_acc * 100:.2f}%")

---

## Part 7: Saving and Loading Models

After training, we want to save our model for later use.

In [None]:
# Save model
torch.save(model.state_dict(), "my_model.pth")
print("Model saved!")

# Load model
loaded_model = NeuralNetwork(num_inputs=2, num_outputs=2)
loaded_model.load_state_dict(torch.load("my_model.pth", weights_only=True))
loaded_model.eval()
print("Model loaded!")

# Verify it works
test_acc = compute_accuracy(loaded_model, test_loader)
print(f"Loaded model test accuracy: {test_acc * 100:.2f}%")

---

## Part 8: GPU Training (Optional)

If you have a GPU available, you can significantly speed up training.

### Key Concepts:
1. Move model to GPU: `model.to(device)`
2. Move data to GPU: `data.to(device)`
3. All tensors must be on the same device

In [None]:
# Check GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Create model and move to GPU
torch.manual_seed(123)
gpu_model = NeuralNetwork(num_inputs=2, num_outputs=2)
gpu_model.to(device)

# Optimizer
optimizer = torch.optim.SGD(gpu_model.parameters(), lr=0.5)

# Training loop with GPU
num_epochs = 3
for epoch in range(num_epochs):
    gpu_model.train()
    
    for batch_idx, (features, labels) in enumerate(train_loader):
        # Move data to GPU
        features, labels = features.to(device), labels.to(device)
        
        # Forward pass
        logits = gpu_model(features)
        loss = F.cross_entropy(logits, labels)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        print(f"Epoch {epoch+1}/{num_epochs} | "
              f"Batch {batch_idx+1}/{len(train_loader)} | "
              f"Loss: {loss.item():.4f}")

### Exercise 6: GPU Training

**TODO**: Modify the accuracy computation function to work with GPU

In [None]:
def compute_accuracy_gpu(model, dataloader, device):
    model.eval()
    correct = 0
    total = 0
    
    with torch.no_grad():
        for features, labels in dataloader:
            # TODO: Move features and labels to device
            features, labels = # TODO
            
            logits = model(features)
            predictions = torch.argmax(logits, dim=1)
            correct += (predictions == labels).sum().item()
            total += len(labels)
    
    return correct / total

# Test your function
test_acc = compute_accuracy_gpu(gpu_model, test_loader, device)
print(f"GPU Model Test Accuracy: {test_acc * 100:.2f}%")

---

## Part 9: Putting It All Together - Real Example

Let's create a complete example with a slightly more complex dataset.

In [None]:
# Create synthetic dataset
torch.manual_seed(42)

# Generate 1000 samples with 20 features
n_samples = 1000
n_features = 20
n_classes = 3

# Random features and labels
X = torch.randn(n_samples, n_features)
y = torch.randint(0, n_classes, (n_samples,))

# Split into train/test (80/20)
n_train = int(0.8 * n_samples)
X_train, X_test = X[:n_train], X[n_train:]
y_train, y_test = y[:n_train], y[n_train:]

print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"Features: {n_features}")
print(f"Classes: {n_classes}")

In [None]:
# Create datasets and loaders
train_dataset = ToyDataset(X_train, y_train)
test_dataset = ToyDataset(X_test, y_test)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# Create model
torch.manual_seed(42)
model = NeuralNetwork(num_inputs=n_features, num_outputs=n_classes)
model.to(device)

# Define optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    
    for features, labels in train_loader:
        features, labels = features.to(device), labels.to(device)
        
        logits = model(features)
        loss = F.cross_entropy(logits, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
    
    avg_loss = total_loss / len(train_loader)
    
    # Evaluate every epoch
    train_acc = compute_accuracy_gpu(model, train_loader, device)
    test_acc = compute_accuracy_gpu(model, test_loader, device)
    
    print(f"Epoch {epoch+1}/{num_epochs} | "
          f"Loss: {avg_loss:.4f} | "
          f"Train Acc: {train_acc*100:.2f}% | "
          f"Test Acc: {test_acc*100:.2f}%")

---

## Summary

Congratulations! You've learned the essential PyTorch concepts:

### âœ… Key Takeaways:

1. **Tensors**: Multi-dimensional arrays (similar to NumPy) with GPU support
2. **Autograd**: Automatic gradient computation for backpropagation
3. **nn.Module**: Base class for building neural networks
4. **DataLoader**: Efficient batching and data loading
5. **Training Loop**: 
   - Forward pass â†’ Compute loss
   - Backward pass â†’ Compute gradients
   - Optimizer step â†’ Update weights
6. **GPU Support**: Simple `.to(device)` for acceleration

### ðŸŽ¯ Next Steps:

- Experiment with different architectures
- Try different optimizers (Adam, AdamW, etc.)
- Explore real datasets (MNIST, CIFAR-10)
- Learn about CNNs, RNNs, Transformers
- Build your own projects!

### ðŸ“š Further Resources:

- **Official PyTorch Tutorials**: https://pytorch.org/tutorials/
- **PyTorch Documentation**: https://pytorch.org/docs/
- **Original Tutorial**: https://sebastianraschka.com/teaching/pytorch-1h/
- **Books**:
  - *Deep Learning with PyTorch* by Stevens, Antiga, and Viehmann
  - *Machine Learning with PyTorch and Scikit-Learn* by Raschka et al.

---

## Bonus Exercise: Build a Complete Project

**Challenge**: Create a neural network to classify the Iris dataset

**Steps**:
1. Load the Iris dataset (sklearn)
2. Create PyTorch datasets and dataloaders
3. Build a neural network
4. Train for multiple epochs
5. Evaluate and report accuracy

**Starter code below:**

In [None]:
# Bonus challenge - Iris classification
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split and scale
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert to tensors
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)
y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

print(f"Dataset: {len(X_train)} training, {len(X_test)} test samples")
print(f"Features: {X_train.shape[1]}, Classes: {len(iris.target_names)}")

# TODO: Complete the rest of the implementation
# 1. Create datasets and dataloaders
# 2. Define a neural network
# 3. Train the model
# 4. Evaluate accuracy