In [1]:
pip install torch torchvision

Collecting torchvision
  Downloading torchvision-0.17.2-cp312-cp312-macosx_10_13_x86_64.whl.metadata (6.6 kB)
Downloading torchvision-0.17.2-cp312-cp312-macosx_10_13_x86_64.whl (1.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: torchvision
Successfully installed torchvision-0.17.2
Note: you may need to restart the kernel to use updated packages.


In [3]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [6]:
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)




test_data=datasets.FashionMNIST(
    root="Data",
    train=False,
    download=True,
    transform=ToTensor()
    
    
    
)

In [7]:
batch_size=64

train_dataloader=DataLoader(training_data,batch_size=batch_size,shuffle=True)
test_dataloader=DataLoader(test_data,batch_size=batch_size,shuffle=True)



In [10]:
def describe(x):
    print("Type: {}".format(x.type()))
    print("Shape/size: {}".format(x.shape)) 
    print("Values: \n{}".format(x))
    



In [12]:
train_dataloader

<torch.utils.data.dataloader.DataLoader at 0x15947ec90>

In [13]:
test_dataloader

<torch.utils.data.dataloader.DataLoader at 0x158d2dc40>

In [14]:
for X,y in train_dataloader:
    print(f"Shape of X [N,C,H,W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break



Shape of X [N,C,H,W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


In [15]:
device=torch.device("cuda") if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")



Using cpu device




---

### **`class NeuralNetwork(nn.Module):`**

1. **What it does:**
   - This defines a new class called `NeuralNetwork`.
   - The class inherits from `nn.Module`, which is part of PyTorch's library.

2. **Why it's important:**
   - In PyTorch, `nn.Module` is the base class for all neural network models. By inheriting from it, `NeuralNetwork` gains all the functionality required to define, train, and run neural networks.

---

### **`def __init__(self):`**

What is the role of __init__ in this context?

Think of __init__ as the constructor that defines the "blueprint" of the neural network:

It initializes all the components (e.g., Flatten, Linear, ReLU) the model will use.

These components remain "inactive" until called in the forward method.

1. **What it does:**
   - This is the constructor method for the `NeuralNetwork` class.
   - It initializes the object whenever you create an instance of the `NeuralNetwork` class.

2. **Why it's important:**
   - This method sets up the neural network by defining its layers, parameters, or any necessary configurations.
   - 
Purpose of __init__:

The __init__ method sets up the architecture of the model.
Layers like Flatten, Linear, or ReLU are only initialized here, meaning they are defined but not yet applied to any data.
---

### **`super().__init__()`**

1. **What it does:**
   - This line calls the `__init__` method of the parent class (`nn.Module`).
   - It ensures that the base class (`nn.Module`) is properly initialized before adding any custom functionality.

2. **Why it's important:**
   - Without this line, the `NeuralNetwork` class might not behave as expected since the parent class (`nn.Module`) needs to handle certain setups like tracking layers and parameters.

---

### Putting it together:

This code defines a neural network class that:
1. Inherits PyTorch's `nn.Module` to leverage built-in features like automatic gradient computation and parameter management.
2. Initializes the parent class to ensure a proper setup for the custom model.
3. Provides a structure for defining the architecture of your neural network, which you would do by adding layers in the `__init__` method and implementing the forward pass in another method (e.g., `forward`).





In [18]:
class neural_network(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten=nn.Flatten()
        self.linear_relu_stack=nn.Sequential(
            nn.Linear(28*28,512),
            nn.ReLU(),
            nn.Linear(512,512),
            nn.ReLU(),
            nn.Linear(512,10),
            
        )
        
    def forward(self,x):
        x=self.flatten(x)
        logits=self.linear_relu_stack(x)
        return logits
    
    model=neural_network().to(device)
    print(model)

neural_network(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)




### **What is `flatten` doing?**

1. **`nn.Flatten` Explanation:**
   - `nn.Flatten()` takes the input tensor (typically with multiple dimensions, like a 2D image) and reshapes it into a 1D vector. This is necessary because fully connected (`nn.Linear`) layers expect a 1D input.
   - For example, if the input is an image of shape `(batch_size, 28, 28)` (a grayscale image), `nn.Flatten()` reshapes it into `(batch_size, 28*28)`.

2. **Why flattening is important:**
   - It allows the image data to flow into a fully connected (dense) layer by converting the grid-like 2D data into a flat 1D format.


---

### **What does `28x28x512` mean?**

1. **The components:**
   - `28x28`: Represents the dimensions of the input image. A grayscale image with a height and width of 28 pixels each.
   - `512`: Represents the number of **neurons (nodes)** in the first hidden layer.

2. **Flow of data:**
   - **Before flattening:** The image is a 2D grid (28x28) for each sample.
   - **After flattening:** The image becomes a 1D vector of size 784 (`28 * 28`), which is the input to the first dense layer.

---

### **How the layers are structured:**

1. **Input Layer:**
   - The input to the network is a batch of images of size `(batch_size, 28, 28)`.
   - The `flatten` layer converts this to `(batch_size, 784)` so it can be fed into the dense layers.

2. **First Fully Connected Layer:**
   - `nn.Linear(28*28, 512)` takes the flattened input (size 784) and connects it to 512 nodes in the first hidden layer.
   - Each of the 784 input values is connected to each of the 512 neurons, creating **784 * 512 connections**.

3. **ReLU Activation:**
   - `nn.ReLU()` introduces non-linearity after the first dense layer. It helps the network learn complex patterns by activating only positive outputs and zeroing out negative ones.

4. **Second Fully Connected Layer:**
   - `nn.Linear(512, 512)` connects the 512 neurons in the first hidden layer to another 512 neurons in the second hidden layer.

5. **ReLU Activation (again):**
   - Another `nn.ReLU()` is applied to introduce non-linearity.

6. **Output Layer:**
   - `nn.Linear(512, 10)` connects the 512 neurons in the second hidden layer to 10 output neurons.
   - The `10` output neurons represent the **classes** (e.g., digits 0-9 if this is for digit classification).

---

### **Imagining the Layers:**

1. **Layer 1 (Flatten):**
   - **Input:** `(28x28 image)` → **Output:** `(784,)` (flattened vector).

2. **Layer 2 (First Dense Layer):**
   - **Input:** `(784,)` → **Output:** `(512,)`.
   - **Visualization:** Think of 512 nodes, each receiving inputs from all 784 pixels.

3. **Layer 3 (ReLU):**
   - Applies non-linearity to the 512 outputs.

4. **Layer 4 (Second Dense Layer):**
   - **Input:** `(512,)` → **Output:** `(512,)`.

5. **Layer 5 (ReLU):**
   - Another non-linear transformation.

6. **Layer 6 (Output Layer):**
   - **Input:** `(512,)` → **Output:** `(10,)`.
   - These 10 outputs typically represent class probabilities for each category (after applying something like `softmax`).

---

### **Flow of Data:**

- The image data starts as a grid (28x28).
- It is flattened into a 1D vector (784).
- Passes through two hidden layers, each with 512 neurons.
- Finally, it is reduced to 10 outputs, representing class scores (logits).




In [20]:
model = neural_network().to(device)
print(model)



neural_network(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


In [22]:
X=torch.rand(1,28,28,device=device)

logits=model(X)



In [23]:
logits

tensor([[-0.0041, -0.0547,  0.0245, -0.0414,  0.0081,  0.0433,  0.0917, -0.0217,
         -0.0039, -0.1045]], grad_fn=<AddmmBackward0>)

In [25]:
pred_probab=nn.Softmax(dim=1)(logits)

tensor([[0.1001, 0.0951, 0.1030, 0.0964, 0.1013, 0.1049, 0.1101, 0.0983, 0.1001,
         0.0905]], grad_fn=<SoftmaxBackward0>)

In [26]:
y_pred=pred_probab.argmax(1)

In [27]:
print(f"Predicted Probab", {y_pred})

Predicted Probab {tensor([5])}


In [34]:
model.parameters

<bound method Module.parameters of neural_network(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)>

In [32]:
loss_fn=nn.CrossEntropyLoss()
optimizer= torch.optim.SGD(model.parameters(), lr=1e-3)

In [36]:
def train(dataloader, model, loss_fn, optimizer):
    size=len(dataloader.dataset)
    
    model.train()
    for batch, (X,y) in enumerate(dataloader):
        X,y=X.to(device),y.to(device)
        
        pred=model(X)
        
        loss=loss_fn(pred,y)
        
        #backpropagation
        
        loss.backward()
        
        optimizer.step()
        
        optimizer.zero_grad()
    
    
        if batch%100==0:
            loss,current=loss.item(),(batch+1)*len(X)
            
            print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}      ")
        
        
        



### **1. `optimizer.step()`**

#### **What it does:**
- Updates the parameters of the model (weights and biases) based on the gradients that were computed during backpropagation.

#### **How it works:**
- After calling `loss.backward()`, the gradients of the loss with respect to each model parameter are stored in the `grad` attribute of the respective parameter.
- When you call `optimizer.step()`, the optimizer uses these gradients to adjust the parameters according to the chosen optimization algorithm (e.g., SGD, Adam).
- For example, in stochastic gradient descent (SGD):
  \[
  \text{new\_param} = \text{param} - \text{learning\_rate} \times \text{gradient}
  \]

#### **Why it's important:**
- Without `optimizer.step()`, the model’s weights wouldn’t be updated, and the training process wouldn’t progress.

---

### **2. `optimizer.zero_grad()`**

#### **What it does:**
- Clears the gradients of all model parameters.

#### **Why it’s needed:**
- Gradients in PyTorch accumulate by default. This means that every time you call `loss.backward()`, the computed gradients are added to the ones already stored in the `grad` attribute of the parameters.
- If you don’t clear the gradients using `optimizer.zero_grad()`, the old gradients from the previous step will mix with the new gradients, leading to incorrect parameter updates.

#### **When to call it:**
- It’s typically called at the start of every training step (before `loss.backward()`).

#### **Example of Accumulation Issue:**
```python
# Without zero_grad
for i in range(2):
    optimizer.zero_grad() if i == 0 else None  # Clearing only once
    y_pred = model(x)
    loss = criterion(y_pred, y)
    loss.backward()
    print(param.grad)  # Gradients accumulate, resulting in incorrect updates
    optimizer.step()
```

---

### **How They Work Together in a Training Loop:**

Here’s how `optimizer.step()` and `optimizer.zero_grad()` are typically used:

```python
for batch in dataloader:  # Loop over batches of data
    optimizer.zero_grad()          # 1. Clear old gradients
    predictions = model(inputs)    # 2. Forward pass
    loss = loss_fn(predictions, targets)  # 3. Compute loss
    loss.backward()                # 4. Backpropagate to compute gradients
    optimizer.step()               # 5. Update model parameters
```



In [41]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [40]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 1.058276 [   64/60000      
loss: 1.101940 [ 6464/60000      
loss: 1.173568 [12864/60000      
loss: 1.093444 [19264/60000      
loss: 0.984082 [25664/60000      
loss: 0.987985 [32064/60000      
loss: 1.016739 [38464/60000      
loss: 1.008723 [44864/60000      
loss: 1.122202 [51264/60000      
loss: 0.900272 [57664/60000      
Test Error: 
 Accuracy: 65.3%, Avg loss: 0.986220 

Epoch 2
-------------------------------
loss: 0.959381 [   64/60000      
loss: 0.943933 [ 6464/60000      
loss: 0.884231 [12864/60000      
loss: 0.977122 [19264/60000      
loss: 0.838946 [25664/60000      
loss: 0.804766 [32064/60000      
loss: 0.905596 [38464/60000      
loss: 0.814231 [44864/60000      
loss: 0.882775 [51264/60000      
loss: 0.924000 [57664/60000      
Test Error: 
 Accuracy: 66.8%, Avg loss: 0.915468 

Epoch 3
-------------------------------
loss: 0.872355 [   64/60000      
loss: 1.152720 [ 6464/60000      
loss: 0.832544 [12864/60000 

In [42]:
torch.save(model.state_dict(),"model.pth")
print("saved pytorch model")

saved pytorch model


In [44]:
model = neural_network().to(device)
model.load_state_dict(torch.load("model.pth", weights_only=True))

<All keys matched successfully>

In [45]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"
