# PyTorch Model Training API

## Define Model in PyTorch

General Structure of defining a model in PyTorch. This is something that we would do for any type of model, whether its logistic regression or any other deep neural networks.

In PyTorch we **define models as classes** and that **inherits the class** `torch.nn.Module` which will give us certain sets of properties that is helpful for later in model training

**__init__ constructor**: here we would define model parameters, like layers and all the structure. Here we define the structure of model like all layers eg: linear, attention etc. defining its input and output sizes etc.

**forward() method**: here we will pass the data point inputs and define how the does the predictions, or model copmutations will take place here.

```py
import torch

class MyModelName(torch.nn.Module):
  def __init__(self):
    # define model parameters or layers or modules

  def forward(self, inputs):
    # define how model produces outputs
    return outputs
```

## Train Model
Suppose we implemented `MyModelName` model class. Then we initialize the model using `model = MyModelName()`.

Then we **initialize the optimizer**, here we are using stochastic gradient descent, so we define/initialize it like `optimizer = torch.optim.SGD(...)`

**Epoch** one single pass on whole training dataset, here we are iterating over training epoch.


for each training epoch here we compute the model output and compute the loss

**train_dataloader** which will produce mini batches for us.

after computing a loss, 1st we call `optimizer.zero_grad()` then we calculate the gradient of the each parameters using `loss.backward()` and finally calling `optimizer.step()` to update the model parameters.

**Why optimizer.zero_grad()?**: its reseting the gradients from previous iterations. if we omit this, gradients will be accumulated, which we usually dont want. for more clarity checkout micrograd autograd engine.

```py
model = MyModelName() # initialize the model
optimizer = torch.optim.SGD(...) # initialize optimizer

for epoch in range(no_of_epochs):
  for x, y in train_dataloader:
    # forward pass
    outputs = model(x)
    loss = loss_fn(outputs, y)

    # backward pass (backpropagation)
    optimizer.zero_grad()
    loss.backward() # computes the gradients

    # update the model parameters
    optimizer.step() # and it updates the model parameters
```

## Neural Networks Layer in PyTorch

`torch.nn.Linear`: its a neural network layer that computes the weighted sum
- it assigns weight vector and bias for us.

In [None]:
import torch

In [None]:
torch.manual_seed(123) # for reproducablility every time getting same result, in real-world its not required

# below is the linear layer, or layer to compute weighted sum
# with 2 input features x1, x2
# and 1 output feature, meaning 1 weighted sum
# out feature 1 meaning theres only 1 neuron in layer and it has 3 inputs
# if we increase out features to 2 meaning it is having 2 neurons and have same 3 inputs to each perceptron.
linear = torch.nn.Linear(in_features=3, out_features=1) # this will create a weights and bias

As we can see weights and biases are initialized with small numbers than 0. That's because its best practice to use small random number when we train neural network models.

In [None]:
linear.weight # it contains 3 features because we have defined 3 in features

Parameter containing:
tensor([[-0.2354,  0.0191, -0.2867]], requires_grad=True)

In [None]:
linear.bias # checking the bias unit

Parameter containing:
tensor([0.2177], requires_grad=True)

### Computing weighted sum

$$z = XW^{T} + b$$

- Its weighted sum between X and W + bias unit
- if we have a single training example `X` is a vector, and we have weight vector `w` consisting of same no of weights as the input consists of no of features.
- We can express this via dot product or matrix multiplication

In [None]:
# single training example
x = torch.tensor([[1.2, 0.5]])
x

tensor([[1.2000, 0.5000]])

In [None]:
linear = torch.nn.Linear(in_features=2, out_features=1)

In [None]:
w = linear.weight.detach() # getting weight vector from linear layer or computation graph
b = linear.bias.detach()
print(f'shape of x is {x.shape}')
print(f'shape of w is {w.shape}')
print(f'shape of w transpose is {w.T.shape}')
z = x.matmul(w.T) + b

print(f'z is {z}')
print(f'shape of z {z.shape}')

# NOTE: this is not the way we use the linear layer

shape of x is torch.Size([1, 2])
shape of w is torch.Size([1, 2])
shape of w transpose is torch.Size([2, 1])
z is tensor([[-0.9778]])
shape of z torch.Size([1, 1])


In [None]:
# here is the nice way to use linear layer
z = linear(x) # directly passing the input to linear layer
z # will get same output as above

tensor([[-0.9778]], grad_fn=<AddmmBackward0>)

### How does linear layer handle mini batches?


In [None]:
# suppose here, we have mini batch consisting of 3 training examples
x = torch.tensor([[1.2, 0.5],
                  [0.7, 0.8],
                  [0.1, 0.2]]) # 3 training examples

print(x) # 3 training examples and each has 2 features, column represents features and rows are no of examples

tensor([[1.2000, 0.5000],
        [0.7000, 0.8000],
        [0.1000, 0.2000]])


In [None]:
w.T

tensor([[-0.6025],
        [ 0.5183]])

In [None]:
# to compute the net input of weighted sum, we would do matmul on x and on transpose of w weight vector
z = x.matmul(w.T) + b

print(z)

tensor([[-0.9778],
        [-0.5210],
        [-0.4705]])


In [None]:
# This is something that linear layer takes care of we dont have to worry
# just call linear on input, and if input is batch of multiple training examples every thing will be taken careof
z = linear(x)
z # same result as above

tensor([[-0.9778],
        [-0.5210],
        [-0.4705]], grad_fn=<AddmmBackward0>)

# Implementing Logistic Regression using PyTorch

## Defining the Class

In [None]:
 # define class name and inherit torch.nn.Module
class LogisticRegression(torch.nn.Module):

  # Define constructor for defining model layer or structures
  def __init__(self, no_of_inputs):
    super().__init__() # super call will call init function of Module, that is necessary to import all the important thigns to make training work.
    self.linear = torch.nn.Linear(no_of_inputs, 1) # Initializing linear layer

  def forward(self, x):
    logits = self.linear(x) # logits is a technical term for weighted sum.
    probs = torch.sigmoid(logits) # computing probability using sigmoid function

    return probs


In [None]:
torch.manual_seed(1)
# Initialize the LogisticRegression Model

model = LogisticRegression(2)

In [None]:
x = torch.tensor([1.1, 2.1])

with torch.no_grad(): # disabling the computation graph of model, because here we dont need to calculate gradient,
# its best practice to save computer memory and make model prediction more faster
  probas = model(x)
probas # 0.4033 is the probability

tensor([0.4033])

## Define the Dataloader class

For this we need to import a `Dataset`, and `DataLoader` class from `torch.utils.data`.


In [None]:
### MyDataset class
# which is a child class of Dataset parent class
# It takes X features and y labels
class MyDataset(Dataset):
  def __init__(self, X, y):
    self.features = torch.tensor(X, dtype=torch.float32)
    self.labels = torch.tensor(y, dtype=torch.float32)

  def __getitem__(self, index):
    # this function will fetch individual training example/record from X/features
    # x is training features and y is labels
    x = self.features[index]
    y = self.labels[index]

    return x, y

  def __len__(self):
    return self.labels.shape[0]
