# MODEL

This is a neural network that is defined using the `torch.nn.module` class. It consists of layers and forward computation logic.

## DEFINING A NEURAL NETWORK

A neural network can be defined in several ways:

1. creating a sequence of layers: `nn.Sequential`
2. using `nn.Module`.
3. hybrid of `nn.Module` and `nn.Sequential`
4. Using pre-defined models (transfer learning)


### nn.Sequential

It stacks layers sequentially. Each layer is executed in order as it is defined. 

In [11]:
import torch.nn as nn
model = nn.Sequential(
    nn.Linear(2, 4),
    nn.ReLU(),
    nn.Linear(4,8)
)
model

Sequential(
  (0): Linear(in_features=2, out_features=4, bias=True)
  (1): ReLU()
  (2): Linear(in_features=4, out_features=8, bias=True)
)

### nn.Module

This is the base class for all neural networks.

In [10]:
import torch
import torch.nn as nn

class BasicNeuralNetwork(nn.Module):
    def __init__(self, input, output):
        super(BasicNeuralNetwork, self).__init__()
        # Define custom layers
        self.fc1 = nn.Linear(input, 128)  # First layer
        self.fc2 = nn.Linear(128, 64)    # Second layer
        self.fc3 = nn.Linear(64, output) # Output layer

    def forward(self, x):
        # Apply activations
        x = torch.relu(self.fc1(x))  # Apply ReLU to the first layer
        x = torch.relu(self.fc2(x))  # Apply ReLU to the second layer
        output = self.fc3(x)         # Linear activation for the output layer
        return output

# Define the model
model = BasicNeuralNetwork(input=784, output=10)

# Print the model
print(model)


BasicNeuralNetwork(
  (fc1): Linear(in_features=784, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=10, bias=True)
)


## MODEL METHODS

`torch.nn.Module` has alot of builtin methods:

1. ____init____
2. forward
3. eval()
4. train()
5. state_dict()
6. load_state_dict()
7. parameters()
8. zero_grad()
9. to()
10. cuda(), cpu()

### __init__

Here is where we define the architecture of a neural network. it is used to initialize the layers used for our architecture. Any nn.Module or nn.Parameter defined in the __init__ method is registered as a model parameter and will appear in the `model.parameters()`.

```python
class BasicNeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(BasicNeuralNetwork, self).__init__()
        # define the other layers here

```
`super()` allows us to first inherit from the nn.Module base class

### forward

This method defines the forward pass of a neural network. it is basically the learning pipeline, give me an input I pass it through some pipeline and I will give you my predicted output.

you don't always get to call the `forward method` explicility, you just use our model name and pass your inputs, it will return the outputs. This is what I mean:
```python
class BasicNeuralNetwork(nn.Module):
    def __init__(self, input, hidden, output):
        super(BasicNeuralNetwork, self).__init__()
        # layers
        self.fc1 = nn.Linear(input, hidden)
        self.relu = nn.Relu()
        self.fc2 = nn.Linear(hidden, output)
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x
basicNeuralNetwork = BasicNeuralNetwork(input=256, hidden= 512, output = 2048)
outputs = basicNeuralNetwork(inputs)
```

__NOTE__: Layers are not defined in the `forward method`. They are defined in the `__init__` method

### eval

This is used to set the model to evaluation mode. This disables layers like `dropout` and `Batchnorm` hence making the system to be determinstic.

```python
basicNeuralNetwork.eval()
```

### train

This is used to set the model into training mode. This activates layers like `Dropout` and `Batchnorm`.

```python
basicNeuralNetwork.train()
```

### state_dict() & load_state_dict()

This is a python dictionary object that maps each layer to its parameter tensor. it contains the parameters of a model e.g `weights & biases`, and the buffers: running stats in batch normalization layers.

This is crucial for saving and loading models in pytorch:
```python
torch.save(basicNeuralNetwork.state_dict(), 'basicNeuralNetwork.pth')
```

`basicNeuralNetwork.state_dict()` returns the model's learned weights and buffers and stores it in `basicNeuralNetwork.pth` file.

To load the model, we need to first create a new instance of our model class, (or override the one we have already). What to note though is that the model architecture needs to match the one we used while saving the `state_dict`:

```python
newBasicInstanceNeuralNetwork = BasicNeuralNetwork(input=256, hidden=512, output=2048)
newBasicInstanceNeuralNetwork.load_state_dict(torch.load('basicNeuralNetwork.pth')
```

You can also use this to save and load the opitmizer states by replacing the `model` with `optimizer`

__N/B:__ `torch.save(model.state_dict(), 'name.pth')` saves only parameters and allows portability across pytorch versions while `torch.save(model, 'name.pth')` saves the arch + the parameters but is not portable across various pytorch versions.

### parameters()

This is used to access parameters (weights and biases), of a model. it returns an iterator which you can loop through to access individual parameters or convert it into a list.

```python
for param in basicNeuralNetwork.parameters():
    print(param)
```

This will loop through all weights and biases and print them out.

#### use cases of parameters()
1. passing them to optimizers for updates: `optimizer = nn.optim.SGD(model.parameters(), lr = 0.01, momentum = 0.9)`
2. Freezing layers to prevent them from updating during training, by setting `requires_grad` to `False`
```python
for params in model.fc1.parameters():
    params.requires_grad = False # it freezes the params of the first layer fc1
```
3. Manually inspecting and updating paramters
```python
for param in model.fc2.parameters():
    param.grad = None #zeroing gradients
```

### is_cuda

A boolean that tells you if the model is currently on a GPU

```python
if model.is_cuda:
    print('model is on GPU')
```

### to

This is used to move the model to different devices 
```python
model.to('cuda')
model.to(device)
```

alt you can:
```python
model.cuda()
model.cpu()
```