## A to Z Pytorch
(based on a tutorial by Python Engineer in Youtube)

In [None]:
import torch
import torch.nn as nn

Steps:
1. training set
    using Pytorch
    - remarks:
        - the input output for training set must defined differently, instead of [1,2], [[1],[2]]
        - we need to know the input and output size
2. weight initialization 
    updates with Pytorch
    - remarks:
        - using optimizer torch.optim.SGD which needs the variable in question (here w) and the learning rate
    - remarks for using all pytorch:
        - here, we won't have w, so for the optimization step we will have model.parameters() as the needed variable of torch.optim.SGD
3. model prediction
    using Pytorch
    - remarks:
        - using model = nn.Linear this function needs to know input size and output size
        - so the forwardpass function will change to model
        - **don't forget about model.reset_parameters()** 
4. loss 
    using Pytorch
    - remarks:
        - using torch.nn function for loss calculation: torch.nn.MSELoss()
5. gradient
    calculating with Pytorch
    
    - important remarks:
        - variables must be torch tensors instead of numpy arrays
        - the variable which wants to get optimized must have the attribute 'requires_grad = True'
        - using loss.backward() function need to use w.grad.zer_() to empty the last gradients,
        - while updating the weights computational graph must not add it to its architecture so
          one should use one of these three options:
            1. w.requires_grad_(False) 
            2. y = w.detach()
            3. with torch.no_grad() wrapper
      (what we used here)
     


## General Training Pipeline in Pytorch

There are 3 steps:
1. Design the model (input size, output size, forward pass)
2. Construct the loss and optimizer
3. Training loop
    - forward pass: Compute the prediction >> beside the forward function we will use the model
    - backward pass: Get the gradients
    - update the weights
    
    iterate on number 3

In [41]:
#linear regression f = w * x where w = 2

#X = np.array([1,2,3,4], dtype=np.float32)
#Y = np.array([2,4,6,8], dtype=np.float32) # as the function is 2 * x
#here we want to add pytorch power so we need to change np arrays to torch tensors

#training set with pytorch
# we need them in a new shape
#X = torch.tensor([1,2,3,4], dtype=torch.float32)
#Y = torch.tensor([2,4,6,8], dtype=torch.float32) # as the function is 2 * x
X = torch.tensor([[1],[2],[3],[4]], dtype=torch.float32)
Y = torch.tensor([[2],[4],[6],[8]], dtype=torch.float32)

In [42]:
# weight initialization with pytorch
# also we need the gradient for this variable so we add the attribute
# w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)
# we don't need w initialization anymore

n_samples, n_features = X.shape   # 4 by 1, 4 samples and 1 feature for each
print(n_samples, n_features)

X_test = torch.tensor([5], dtype=torch.float32)
input_size = n_features
output_size = n_features

model = nn.Linear(input_size, output_size)

4 1


In [43]:
#model predition calculation, the same
#def forward_pass(x):
#    return w * x

In [44]:
#loss calculation, the same
#def loss(y_gold,y_pred):
#    return((y_pred- y_gold)**2).mean()


In [45]:
#gradient calculation, we don't need this part using pytorch for gradient
#def gradient(x,y_gold,y_pred):
#    return np.dot(2*x,y_pred-y_gold).mean()

In [46]:
print(f'Prediction before training:f(5)={model(X_test).item():.3f}')

Prediction before training:f(5)=3.274


In [47]:
learning_rate = 0.01
n_iters = 10

In [48]:
# loss
loss = nn.MSELoss()

#optimizer
#optimizer = torch.optim.SGD([w], lr=learning_rate)
# now we don't have weight
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [49]:
for epoch in range(n_iters):
    # prediction = forward_pass
    #y_pred = forward_pass(X)
    y_pred = model(X)
    
    # loss
    l = loss(Y,y_pred)
    
    # grad = backward pass
    l.backward() #dl/dw
    
    # update weights 
    #this must not be a part of computational graph >> wrap in a with statement
    #with torch.no_grad():
    #    w -= learning_rate * w.grad #update formula for linear regression, as dw is not present here, we use w.grad
    optimizer.step()
        
        
    # zero gradients (because of the usage of backward())
    #w.grad.zero_()
    optimizer.zero_grad()
    
    if epoch % 1 == 0: # we want to print every step
        [w,b] = model.parameters()
        print(f'epoch {epoch+1}: w = {w[0][0]:.3f}, loss = {l:.8f}')
        
print(f'Prediction after training:f(5)={model(X_test).item():.3f}')

epoch 1: w = 1.021, loss = 15.74764252
epoch 2: w = 1.201, loss = 10.93159485
epoch 3: w = 1.351, loss = 7.58981133
epoch 4: w = 1.477, loss = 5.27099276
epoch 5: w = 1.581, loss = 3.66198397
epoch 6: w = 1.667, loss = 2.54550076
epoch 7: w = 1.740, loss = 1.77076936
epoch 8: w = 1.800, loss = 1.23317313
epoch 9: w = 1.850, loss = 0.86011970
epoch 10: w = 1.891, loss = 0.60123926
Prediction after training:f(5)=9.091


In [54]:
# in the previous part iterations were 10, let's see what will happen with 50 of them
#w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)

# IMPORTANT
model.reset_parameters()

learning_rate = 0.01
n_iters = 1000

# here we need to repeat the optimizer if not it won't change the w and it will be always 0
#optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for epoch in range(n_iters):
    # prediction = forward_pass
    #y_pred = forward_pass(X)
    y_pred = model(X)
    
    # loss
    l = loss(Y,y_pred)
    
    # grad
    l.backward()
    
    # update weights
    #with torch.no_grad():
    #    w -= learning_rate * w.grad #update formula for linear regression, as dw is not present here, we use w.grad
    optimizer.step()
    
    optimizer.zero_grad()
    
    if epoch % 100 == 0: # we want to print every 3 step
        # we need to unpack the parameters
        [w, b] = model.parameters() # w is a list of lists
        print(f'epoch {epoch+1}: w = {w[0][0].item():.3f}, loss = {l:.8f}') # so we have w[0][0]
        
print(f'Prediction after training:f(5)={model(X_test).item():.3f}')

epoch 1: w = 0.955, loss = 11.18599129
epoch 101: w = 1.901, loss = 0.01418898
epoch 201: w = 1.927, loss = 0.00778957
epoch 301: w = 1.946, loss = 0.00427640
epoch 401: w = 1.960, loss = 0.00234769
epoch 501: w = 1.970, loss = 0.00128885
epoch 601: w = 1.978, loss = 0.00070756
epoch 701: w = 1.984, loss = 0.00038845
epoch 801: w = 1.988, loss = 0.00021325
epoch 901: w = 1.991, loss = 0.00011707
Prediction after training:f(5)=9.986


## What is in torch.nn.Linear?

In [55]:
# something like this

class LinearRegression(nn.Module): # it has been derived from nn.Modules
    
    def __init__(self,input_dim, output_dim):
        super(LinearRegression, self).__init__()
        
        #define layers
        self.lin = nn.Linear(input_dim, output_dim)
        
        
    def forward_pass(self, x):
        return self.lin(x)
    
model_scratch = LinearRegression(input_size, output_size)

In [39]:

# in the previous part iterations were 10, let's see what will happen with 50 of them
w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)
learning_rate = 0.01
n_iters = 1000

# here we need to repeat the optimizer if not it won't change the w and it will be always 0
#optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for epoch in range(n_iters):
    # prediction = forward_pass
    #y_pred = forward_pass(X)
    y_pred = model(X)
    
    # loss
    l = loss(Y,y_pred)
    
    # grad
    l.backward()
    
    # update weights
    #with torch.no_grad():
    #    w -= learning_rate * w.grad #update formula for linear regression, as dw is not present here, we use w.grad
    optimizer.step()
    
    optimizer.zero_grad()
    
    if epoch % 100 == 0: # we want to print every 3 step
        # we need to unpack the parameters
        [w, b] = model.parameters() # w is a list of lists
        print(f'epoch {epoch+1}: w = {w[0][0].item():.3f}, loss = {l:.8f}') # so we have w[0][0]
        
print(f'Prediction after training:f(5)={model(X_test).item():.3f}')

epoch 1: w = 1.988, loss = 0.00019901
epoch 101: w = 1.991, loss = 0.00010925
epoch 201: w = 1.994, loss = 0.00005998
epoch 301: w = 1.995, loss = 0.00003293
epoch 401: w = 1.996, loss = 0.00001808
epoch 501: w = 1.997, loss = 0.00000992
epoch 601: w = 1.998, loss = 0.00000545
epoch 701: w = 1.999, loss = 0.00000299
epoch 801: w = 1.999, loss = 0.00000164
epoch 901: w = 1.999, loss = 0.00000090
Prediction after training:f(5)=9.999
