<a href="https://colab.research.google.com/github/sufiyansayyed19/myTorch/blob/main/10_Optimizer_Utilities_(SGD%2C_Adam%2C_step%2C_zero_grad)ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Notebook Goal

Provide a clear, example-driven reference for PyTorch optimizer utilities so parameter updates, learning rate usage, and optimizer mechanics are unambiguous.

## Prerequisites

Understanding tensors and parameters.
Basic idea that optimizers update parameters using gradients.

## After This Notebook You Can

Choose and configure common optimizers.
Explain optimizer.step() and optimizer.zero_grad().
Pass parameters correctly to optimizers.
Answer optimizer-related interview questions confidently.

## Out of Scope

Optimizer math derivations.
Learning rate schedules.
Advanced optimization strategies.

---

## METHODS COVERED (SUMMARY)

Optimizers:

* torch.optim.SGD
* torch.optim.Adam

Core calls:

* optimizer.step
* optimizer.zero_grad

Configuration:

* learning rate (lr)
* momentum (SGD)
* weight_decay

---

## torch.optim.SGD

What it does:
Updates parameters using stochastic gradient descent.

When to use:
Simple models, strong baselines, when you want explicit control.

Minimal example:

```python
import torch
import torch.nn as nn
import torch.optim as optim

model = nn.Linear(1, 1)
optimizer = optim.SGD(model.parameters(), lr=0.01)
```

Important parameters:

* lr: learning rate
* momentum (optional)
* weight_decay (optional)

Common mistake:
Forgetting to pass model.parameters().

---

## torch.optim.Adam

What it does:
Adaptive optimizer that adjusts learning rates per parameter.

When to use:
Most deep learning tasks as a strong default.

Minimal example:

```python
optimizer = optim.Adam(model.parameters(), lr=0.001)
```

Important parameters:

* lr
* betas
* weight_decay

Common mistake:
Assuming Adam removes the need to tune learning rate.

---

## optimizer.zero_grad

What it does:
Clears accumulated gradients.

When to use:
Before each backward pass.

Minimal example:

```python
optimizer.zero_grad()
```

Common mistake:
Calling zero_grad() after backward().

---

## optimizer.step

What it does:
Updates parameters using current gradients.

When to use:
After backward() has computed gradients.

Minimal example:

```python
optimizer.step()
```

Common mistake:
Calling step() without computing gradients.

---

## Weight Decay (Conceptual)

What it does:
Applies L2 regularization during parameter updates.

When to use:
Preventing overfitting.

Minimal example:

```python
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
```

Common mistake:
Confusing weight_decay with dropout.

---

## HANDS-ON PRACTICE

1. Create a simple nn.Linear model and attach an SGD optimizer.
2. Switch to Adam and compare required learning rates.
3. Simulate a training step: zero_grad → backward (mock) → step.
4. Explain why gradients accumulate if zero_grad is skipped.

---

## METHODS RECAP (ONE PLACE)

SGD, Adam, optimizer.step(), optimizer.zero_grad(), lr, momentum, weight_decay

---

## ONE-SENTENCE SUMMARY

Optimizers update parameters using gradients, but only when called in the correct order.

---


In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

# Simple model for demonstration
model = nn.Linear(10, 1)
# Mock input and target
input_data = torch.randn(1, 10)
target = torch.randn(1, 1)
criterion = nn.MSELoss()

In [2]:
# torch.optim.SGD example with lr, momentum, and weight_decay
optimizer_sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-5)
print("SGD Optimizer initialized.")

SGD Optimizer initialized.


In [3]:
# torch.optim.Adam example with lr and weight_decay
optimizer_adam = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
print("Adam Optimizer initialized.")

Adam Optimizer initialized.


In [4]:
# optimizer.zero_grad example
# This clears existing gradients so they don't accumulate
optimizer_adam.zero_grad()
print("Gradients cleared.")

Gradients cleared.


In [5]:
# optimizer.step example
# 1. Perform a forward pass
output = model(input_data)
loss = criterion(output, target)

# 2. Backward pass to compute gradients
loss.backward()

# 3. Update parameters using the gradients
optimizer_adam.step()
print("Model parameters updated.")

Model parameters updated.
