## 00. PyTorch Fundamentals

Resource notebook: https://www.learnpytorch.io/00_pytorch_fundamentals/

If you have a question: https://github.com/mrdbourke/pytorch-deep-learning/discussions

In [2]:
pip install torch pandas numpy matplotlib


Note: you may need to restart the kernel to use updated packages.


In [3]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
print(torch.__version__)

2.2.2


In [4]:
pip install scikit-learn


Note: you may need to restart the kernel to use updated packages.


In [5]:
import sklearn

Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
    https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md



In [6]:
from sklearn.datasets import make_circles

# Make 1000 samples
n_samples = 1000

# Create circles
X, y = make_circles(n_samples,
                    noise=0.03,
                    random_state=42)

In [7]:
pip install pandas

Note: you may need to restart the kernel to use updated packages.


In [8]:
import pandas as pd
circles = pd.DataFrame({"X1": X[:, 0], 
                        "X2": X[:, 1],
                        "label": y})
circles.head(10)

Unnamed: 0,X1,X2,label
0,0.754246,0.231481,1
1,-0.756159,0.153259,1
2,-0.815392,0.173282,1
3,-0.393731,0.692883,1
4,0.442208,-0.896723,0
5,-0.479646,0.676435,1
6,-0.013648,0.803349,1
7,0.771513,0.14776,1
8,-0.169322,-0.793456,1
9,-0.121486,1.021509,0


In [9]:
type(X), X.dtype

(numpy.ndarray, dtype('float64'))

In [10]:
import torch

In [11]:
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)

X[:5], y[:5]

(tensor([[ 0.7542,  0.2315],
         [-0.7562,  0.1533],
         [-0.8154,  0.1733],
         [-0.3937,  0.6929],
         [ 0.4422, -0.8967]]),
 tensor([1., 1., 1., 1., 0.]))

In [12]:
# Split data into training and test sets
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y,
                                                    test_size=0.2, # 0.2 = 20% of data will be test & 80% will be train
                                                    random_state=42) 

In [13]:
len(X_train), len(X_test), len(y_train), len(y_test)

(800, 200, 800, 200)

# Build a model
1. set up a device agnostic code to run on GPU if it is present
2. constrcut a model
3. define a loss function
4. craete a training loop and testing loop


In [14]:
import torch
from torch import nn

# Make device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cpu'

# create a model
1. subclass a nn.Module 
2. create 2 nn.Linear() layers
3. define forward() method 


In [15]:
 #1. construct a model that subclass nn.Module
class CircleModelV0(nn.Module):
    def __init__(self):
        super().__init__()
  #2. create a 2nn. Linear layers 
        self.layer_1 = nn.Linear(in_features=2, out_features=5)
        self.layer_2= nn.Linear(in_features=5, out_features=1)

   #3. define forward() method 
    def forward(self,x):
        return self.layer_2(self.layer_1(x))         

#4. create an instance of our model and send it to our target device 
model_0=CircleModelV0().to(device)
model_0    

CircleModelV0(
  (layer_1): Linear(in_features=2, out_features=5, bias=True)
  (layer_2): Linear(in_features=5, out_features=1, bias=True)
)

In [16]:
# Another method to crate w model without creating a subclass using nn.sequential
model_0= nn.Sequential(
    nn.Linear(in_features=2, out_features=5),
    nn.Linear(in_features=5, out_features=1)
).to(device)
model_0

Sequential(
  (0): Linear(in_features=2, out_features=5, bias=True)
  (1): Linear(in_features=5, out_features=1, bias=True)
)

In [17]:
model_0.state_dict()

OrderedDict([('0.weight',
              tensor([[-0.0146,  0.6007],
                      [-0.6083, -0.4471],
                      [ 0.0680,  0.1780],
                      [-0.4916,  0.4064],
                      [ 0.5202, -0.1994]])),
             ('0.bias', tensor([-0.6684, -0.0918,  0.1690,  0.3457, -0.0747])),
             ('1.weight',
              tensor([[ 0.4268, -0.0371,  0.4377, -0.1859, -0.3906]])),
             ('1.bias', tensor([0.3256]))])

# set up loss function 
1. loss_fn-> two option 
 a. nn.BCELoss-> requires input to go through sigmoifd activation function and then insert into bce loss
 b. nn.BCEWithLogitLoss = it combines sigmoid activation function along with bce loss

In [18]:
# loss_fn= nn.BCELoss();
loss_fn=nn.BCEWithLogitsLoss();
optimizer= torch.optim.SGD(params=model_0.parameters(),
                           lr=0.1)

In [19]:
# Calculate accuracy 
def accuracy_fn(y_true,y_pred):
    correct= torch.eq(y_true,y_pred).sum().item()
    acc=(correct/len(y_pred))*100;
    return acc

# 3. Train the model
build a training loop with follow steps:

1. forward pass
2. calculate the loss
3. optimizer zero grad
4. loss backward(backpropagation)
5. optimizer step(gradient descent)

# 3.1 Going from raw logits-> prediction probabilities -> prediction label

initially our model output is to be raw **logits**
 we have to convert **logits** into prediction probabilities by passsing them to activation function (eg: sigmoid for binary classification and softmax for multi)

 convert our model prediction prob into **prediction label** by either rounding or taking **argmax()**

In [20]:
model_0.eval()
with torch.inference_mode():
  y_logits = model_0(X_test.to(device))[:5]
y_logits 

tensor([[ 0.3483],
        [ 0.4207],
        [ 0.0176],
        [ 0.4426],
        [-0.2065]])

In [21]:
y_test[:5]

tensor([1., 0., 1., 0., 1.])

# u can see our model raw ouput or logits and our real y_test ,,
 so we have to first convert them into probabilities and then assign them some labelling based on prob 

In [22]:
# use sigmoid actiavtion function on our model logits to convert into prediction probabilities
y_pred_probs=torch.sigmoid(y_logits)
y_pred_probs

tensor([[0.5862],
        [0.6036],
        [0.5044],
        [0.6089],
        [0.4486]])

In [23]:
torch.round(y_pred_probs)

tensor([[1.],
        [1.],
        [1.],
        [1.],
        [0.]])

In [24]:
# find the predicted label
y_preds=torch.round(y_pred_probs)

# in full
y_pred_lables=torch.round(torch.sigmoid(model_0(X_test)[:5]))

print(f"\n {y_preds}")
print(f"\n  {y_pred_lables}")


 tensor([[1.],
        [1.],
        [1.],
        [1.],
        [0.]])

  tensor([[1.],
        [1.],
        [1.],
        [1.],
        [0.]], grad_fn=<RoundBackward0>)


# 3.2  Build training loop and testing loop

In [25]:
torch.manual_seed(42)

epochs=100

# put data to target device 
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)

# build training loop and evaluation loop
for epoch in range(epochs):
    ## training 
    model_0.train()

    ## 1. forward pass
    y_logits=model_0(X_train).squeeze()
    y_pred=torch.round(torch.sigmoid(y_logits))
    # logit-> prediction_prob -> prediction_label

    ## 2. Calculate loss/ accuracy
    # loss=loss_fn(torch.sigmoid(y_logits),y_train)
    # nn.BCELoss expects prediction probabilities as input
    loss= loss_fn(y_logits,y_train)
    # nn.BCEWithLogitLoss expect raw logits as input 
    acc=accuracy_fn(y_true=y_train,y_pred=y_pred)
     
    #  observe: loss-> first pred then train 
    #           acc-> first train then pred (it is standard notation for accuracy refer documentation)

    ## 3. optimizer zero grad 
    optimizer.zero_grad()

    ##4. loss_backward(backpropagation)
    loss.backward()

    ##5. optimizer step(gradient descent)
    optimizer.step()

    ### Testing
    model_0.eval()
    with torch.inference_mode():
        # 1. Forward pass
        test_logits=model_0(X_test).squeeze()
        test_pred=torch.round(torch.sigmoid(test_logits))

        #2. calculate test loss/acc
        test_loss=loss_fn(test_logits,y_test)
        test_acc=accuracy_fn(y_true=y_test, y_pred=test_pred)

#print 
if epoch % 10 == 0:
    print(f"Epoch: {epoch} | Loss: {loss:.5f}, Acc: {acc:.2f}% | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%")


In [26]:
torch.manual_seed(42)
torch.cuda.manual_seed(42) 

# Set the number of epochs
epochs = 100

# Put data to target device 
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)

# Build training and evaluation loop
for epoch in range(epochs):
  ### Training
  model_0.train()

  # 1. Forward pass
  y_logits = model_0(X_train).squeeze()
  y_pred = torch.round(torch.sigmoid(y_logits)) # turn logits -> pred probs -> pred labels

  # 2. Calculate loss/accuracy
  # loss = loss_fn(torch.sigmoid(y_logits), # nn.BCELoss expects prediction probabilities as input
  #                y_train)
  loss = loss_fn(y_logits, # nn.BCEWithLogitsLoss expects raw logits as input
                 y_train)
  acc = accuracy_fn(y_true=y_train, 
                    y_pred=y_pred)
  
  # 3. Optimizer zero grad
  optimizer.zero_grad()

  # 4. Loss backward (backpropagation)
  loss.backward()

  # 5. Optimizer step (gradient descent)
  optimizer.step() 

  ### Testing
  model_0.eval()
  with torch.inference_mode():
    # 1. Forward pass 
    test_logits = model_0(X_test).squeeze()
    test_pred = torch.round(torch.sigmoid(test_logits))

    # 2. Calculate test loss/acc
    test_loss = loss_fn(test_logits,
                        y_test)
    test_acc = accuracy_fn(y_true=y_test,
                           y_pred=test_pred)
  
  # Print out what's happenin'
  if epoch % 10 == 0:
    print(f"Epoch: {epoch} | Loss: {loss:.5f}, Acc: {acc:.2f}% | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%")

Epoch: 0 | Loss: 0.69336, Acc: 50.62% | Test loss: 0.69710, Test acc: 45.50%
Epoch: 10 | Loss: 0.69328, Acc: 50.75% | Test loss: 0.69679, Test acc: 46.50%
Epoch: 20 | Loss: 0.69321, Acc: 50.88% | Test loss: 0.69652, Test acc: 47.50%
Epoch: 30 | Loss: 0.69316, Acc: 50.88% | Test loss: 0.69629, Test acc: 47.50%
Epoch: 40 | Loss: 0.69312, Acc: 50.88% | Test loss: 0.69609, Test acc: 47.00%
Epoch: 50 | Loss: 0.69309, Acc: 50.88% | Test loss: 0.69592, Test acc: 47.00%
Epoch: 60 | Loss: 0.69307, Acc: 51.12% | Test loss: 0.69577, Test acc: 46.50%
Epoch: 70 | Loss: 0.69305, Acc: 51.12% | Test loss: 0.69564, Test acc: 47.50%
Epoch: 80 | Loss: 0.69303, Acc: 51.25% | Test loss: 0.69553, Test acc: 47.00%
Epoch: 90 | Loss: 0.69302, Acc: 51.12% | Test loss: 0.69543, Test acc: 47.50%


## 5. How to improve the model

1. Add more layer
2. Add more hidden unit ->go from 5 hidden unit to 10hidden unit 
3. fit for longer (more epoch)
4. change the activation function
5. change the learning rate
6. change the loss function 

these option are from model perspective bcoz they deal with model not data know as **hyperparameter**



# How to improve accuracy greater than 80%

# Multi class classification

In [28]:
class BlobModel(nn.Module):
    def __init__(self,input_features,output_features,hidden_units=8):
        """

        """
        super().__init__()
        self.linear_layer_stack=nn.Sequential(
            nn.Linear(in_features=input_features,out_features=hidden_units),
            nn.ReLU(),
            nn.Linear(in_features=hidden_units,out_features=hidden_units),
            nn.ReLU(),
            nn.Linear(in_features=hidden_units,out_features=output_features)

        )
        def forward(self,x):
            return self.linear_layer_stack(x)

#instance of model
model_4=BlobModel(input_features=2,
                  output_features=4,
                  hidden_units=8).to(device)
model_4                


BlobModel(
  (linear_layer_stack): Sequential(
    (0): Linear(in_features=2, out_features=8, bias=True)
    (1): ReLU()
    (2): Linear(in_features=8, out_features=8, bias=True)
    (3): ReLU()
    (4): Linear(in_features=8, out_features=4, bias=True)
  )
)

In [29]:
# create a loss function and optimizer for multiclass model
loss_fn=nn.CrossEntropyLoss()

optimizer=torch.optim.SGD(params=model_4.parameters(),
                          lr=0.1
                          )

In [None]:
#here also same pattern , 
# first raw logits -> pred probabilitt -> pred lablel

In [31]:
# create a training and testing loop 

In [None]:
epochs =100

for epoch in range(epochs):
    
    model_4.train()

    y_logits=model_4(x_blob_train)
    y_pred=torch.softmax(y_logits, dim=1).argmax(dim=1)

    loss=loss_fn(y_logits,y_blob_train)
    acc=accuracy_fn(y_true=y_blob_train, 
                    y_pred=y_pred)
    
    

# few more classification metrics
1. accuracy : it is good but only for balanced dataset like for 
one class we have 500 point , for other 600 points in this way .. but suppose we have 5 datapoint for one class and 2000 data point for other class , in such type of imbalanced data , accuracy is not a good option .. that why we use precision and recall 

Refer this link very beautiful blog for intuition :
https://towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c