# PyTorch Tutorials 2.0
07/05/2023 <br>
This notebook is created for going through the offical [PyTorch Tutorials](https://pytorch.org/tutorials/beginner/basics/intro.html#learn-the-basics) to help readers get familiar with the latest PyTorch.


Content:
1. [Tensors](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html#tensors)
2. [Datasets and DataLoaders](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#datasets-dataloaders)
3. [Build Model](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html#build-the-neural-network)
4. [Automatic Differentiation](https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html#automatic-differentiation-with-torch-autograd) & [Optimization Loop](https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html#automatic-differentiation-with-torch-autograd)
5. [Save, Load and Use Model](https://pytorch.org/tutorials/beginner/basics/saveloadrun_tutorial.html#save-and-load-the-model)

## [Tensors](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html#tensors)
Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or other hardware accelerators. In fact, tensors and NumPy arrays can often share the same underlying memory, eliminating the need to copy data. Tensors are also optimized for automatic differentiation. 

In a nutshell, tensors are the basic blocks to build models using PyTorch.
The input data needs to be in the tensors format, and the parameters of models will be tensors.

In [1]:
import torch #PyTorch package
import numpy as np #NumPy package

### list to tensors

In [2]:
data = [[1, 2],[3, 4]] 
tensor_data=torch.tensor(data) #we can convert a list into a tensor

In [3]:
tensor_data

tensor([[1, 2],
        [3, 4]])

In [4]:
# data = [[1, 2],[3, 4],[1]]  
# tensor_data=torch.tensor(data) #the data has to be in a well defined matrix form

In [5]:
data = [1,2]  
tensor_data=torch.tensor(data)
tensor_data

tensor([1, 2])

### numpy array to tensors

In [6]:
np_array = np.array([[1, 2],[3, 4]])
x_np = torch.from_numpy(np_array)
x_np

tensor([[1, 2],
        [3, 4]])

### create tensors from another tensors (based on its shape & datatype)

In [7]:
torch.ones_like(x_np)

tensor([[1, 1],
        [1, 1]])

In [8]:
torch.rand_like(x_np, dtype=torch.float)

tensor([[0.4807, 0.2131],
        [0.2904, 0.0420]])

### create tensors based on shape info

In [9]:
shape = (2,3,)
torch.rand(shape)

tensor([[0.1027, 0.0010, 0.1098],
        [0.3816, 0.1311, 0.9208]])

In [10]:
torch.ones(shape)

tensor([[1., 1., 1.],
        [1., 1., 1.]])

In [11]:
torch.zeros(shape)

tensor([[0., 0., 0.],
        [0., 0., 0.]])

### attributes of a tensor

In [12]:
x_np

tensor([[1, 2],
        [3, 4]])

shape

In [13]:
x_np.shape

torch.Size([2, 2])

data type

In [14]:
x_np.dtype

torch.int64

device tensor is stored on

In [15]:
x_np.device

device(type='cpu')

### operations on tensors

move to other device if available

In [16]:
device = 'cpu' #or 'cuda' if GPUs are available
x_np.to("cpu")

tensor([[1, 2],
        [3, 4]])

check if a device is available

In [17]:
torch.cuda.is_available()

False

### indexing and slicing

In [18]:
tensor = torch.ones(4, 4)
print(f'Input tensors:\n{tensor}')
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:, 0]}")
print(f"Last column: {tensor[:, -1]}")

Input tensors:
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
First row: tensor([1., 1., 1., 1.])
First column: tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])


### modify the tensor

In [19]:
tensor[:,1] = 0 #change the second column to a zero column
print(tensor)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


### concatenation* (important operations)

concatenate by rows

In [20]:
torch.cat([tensor, tensor, tensor], dim=0)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

concatenate by columns

In [21]:
torch.cat([tensor, tensor, tensor], dim=1)

tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])

### arithmetic operations* (important operations)

#### transpose

In [22]:
print(f'Input tensors:\n{tensor}')
print(f'Transposed tensors:\n{tensor.T}')

Input tensors:
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])
Transposed tensors:
tensor([[1., 1., 1., 1.],
        [0., 0., 0., 0.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])


#### element-wise operations

In [23]:
tensor+tensor.T

tensor([[2., 1., 2., 2.],
        [1., 0., 1., 1.],
        [2., 1., 2., 2.],
        [2., 1., 2., 2.]])

In [24]:
tensor*tensor.T

tensor([[1., 0., 1., 1.],
        [0., 0., 0., 0.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

In [25]:
tensor.mul(tensor.T) # = tensor*tensor.T

tensor([[1., 0., 1., 1.],
        [0., 0., 0., 0.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

#### matrix multiplication

In [26]:
tensor1,tensor2=torch.rand((3,4)),torch.rand((4,2))
print(f'Input tensors:\n{tensor1}\n{tensor2}')
print(f'\nMatrix multiplication:\n{tensor1.matmul(tensor2)}')

Input tensors:
tensor([[0.9151, 0.4169, 0.7529, 0.3129],
        [0.5968, 0.6585, 0.7329, 0.8763],
        [0.1691, 0.8825, 0.9527, 0.9169]])
tensor([[0.5196, 0.5374],
        [0.3964, 0.2188],
        [0.3953, 0.5543],
        [0.4013, 0.6413]])

Matrix multiplication:
tensor([[1.0639, 1.2011],
        [1.2124, 1.4330],
        [1.1822, 1.4000]])


#### dot product/scalar product/inner product (vectors)

In [27]:
v1,v2=torch.rand((3)),torch.rand((3))
v1,v2

(tensor([0.8564, 0.0744, 0.4400]), tensor([0.4661, 0.2564, 0.3391]))

In [28]:
v1.dot(v2)

tensor(0.5674)

#### row/column-wise sum/mean/std/min/max

In [29]:
print(f'Input tensors:\n{tensor1}\n')

print(f'row-wise sum:\n{tensor1.sum(dim=1)}\n')

print(f'row-wise mean:\n{tensor1.mean(dim=1)}\n')

print(f'column-wise std:\n{tensor1.std(dim=0)}\n')



Input tensors:
tensor([[0.9151, 0.4169, 0.7529, 0.3129],
        [0.5968, 0.6585, 0.7329, 0.8763],
        [0.1691, 0.8825, 0.9527, 0.9169]])

row-wise sum:
tensor([2.3979, 2.8644, 2.9212])

row-wise mean:
tensor([0.5995, 0.7161, 0.7303])

column-wise std:
tensor([0.3744, 0.2328, 0.1215, 0.3376])



In [30]:
print(f'row-wise min:\n{tensor1.min(dim=1)}\n')

print(f'column-wise max:\n{tensor1.max(dim=0)}\n')

# note that the min/max operations not only return the min/max, it also returns the ordered index

row-wise min:
torch.return_types.min(
values=tensor([0.3129, 0.5968, 0.1691]),
indices=tensor([3, 0, 0]))

column-wise max:
torch.return_types.max(
values=tensor([0.9151, 0.8825, 0.9527, 0.9169]),
indices=tensor([0, 2, 2, 2]))



#### aggregating all values of a tensor into one value

In [31]:
tensor1

tensor([[0.9151, 0.4169, 0.7529, 0.3129],
        [0.5968, 0.6585, 0.7329, 0.8763],
        [0.1691, 0.8825, 0.9527, 0.9169]])

In [32]:
tensor1.sum()

tensor(8.1835)

In [33]:
tensor1.mean()

tensor(0.6820)

In [34]:
tensor1.std()

tensor(0.2604)

In [35]:
tensor1.min()

tensor(0.1691)

In [36]:
tensor1.max()

tensor(0.9527)

## [Datasets & Dataloaders](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#datasets-dataloaders)

Code for processing data samples can get messy and hard to maintain; we ideally want our dataset code to be decoupled from our model training code for better readability and modularity. PyTorch provides two data primitives: torch.utils.data.DataLoader and torch.utils.data.Dataset that allow you to use pre-loaded datasets as well as your own data. Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset to enable easy access to the samples.

The primary purpose of 'DataLoader' is to facilitate efficient data loading and preprocessing. For example, the DataLoader provides an iterable interface, allowing you to load and process data in a streaming fashion rather than loading the entire dataset into memory all at once. This is particularly useful when working with datasets that are too large to fit in memory. The data loader fetches data on-the-fly as needed, reducing the memory footprint.

### load your own data

I used a [public](https://github.com/Sutanoy/Public-Regression-Datasets/blob/main/Bank_Marketing.csv) data to demo

In [37]:
import pandas as pd
from torch.utils.data import Dataset, DataLoader, random_split

class CustomDataset(Dataset):
    def __init__(self, file_path,features,target,delimiter=';'):
        self.data = pd.read_csv(file_path,delimiter=';')
        self.X=self.data[features]
        self.y=self.data[target].replace(('yes', 'no'), (1, 0)) #converted the binary category to 0 and 1
    
    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):  
        if isinstance(idx, torch.Tensor):
            idx = idx.tolist()
        return torch.from_numpy(self.X.iloc[idx].values).type(torch.float), torch.from_numpy(self.y.iloc[idx].values).type(torch.float)

In [38]:
features=['age', 'balance', 'day','duration', 'pdays','previous'] #picked some numerical features
target=['y']
dataset=CustomDataset(file_path='./Bank_Marketing.csv',features=features,target=target,delimiter=';')

In [39]:
train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train_size,test_size

(36168, 9043)

In [40]:
trainset, testset = random_split(dataset, [train_size, test_size])

In [41]:
# Dataloaders
batch_size=64
trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True)
testloader = DataLoader(testset, batch_size=batch_size, shuffle=False)

In [42]:
# Use gpu if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [43]:
train_features, train_labels = next(iter(trainloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")

Feature batch shape: torch.Size([64, 6])
Labels batch shape: torch.Size([64, 1])


<b>note</b>: in the custom dataset function, we did a simple data conversion on the target "y", and when we define the dataset, we only selected numerical features; in reality, your dataset might have all kinds of features, and you will need to preprocess (one-hot encoding | embedding) the data so everything is either a float or int.

# [Build Model](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html#build-the-neural-network)

In [44]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader

The template of the model looks as follow:<br>
$class ClassName(nn.Module):
    def __init__(self,inputDim,outputDim):
        super().__init__()
        ##define components of your model
    def forward(self, x):
        ##show how the input flow through the model using the components defined above
        return y
     $

I used three examples to show how to use the template:
1. linear regression
2. logistic regression
3. a simple neural network

## linear regression

In [45]:
class LinearRegression(nn.Module):
    def __init__(self,inputDim,outputDim):
        super().__init__()
        self.linear = nn.Linear(inputDim, outputDim)

    def forward(self, x):
        y=self.linear(x)
        return y

[Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear) function applies a linear transformation to the incoming data: y=Ax+b<br>

In [46]:
# Use gpu if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [47]:
LinearRegressionModel = LinearRegression(inputDim=6,outputDim=1).to(device)
print(LinearRegressionModel)

LinearRegression(
  (linear): Linear(in_features=6, out_features=1, bias=True)
)


In [48]:
X = torch.rand(1, 6, device=device)
X

tensor([[0.7313, 0.5935, 0.2909, 0.3071, 0.4727, 0.1642]])

In [49]:
LinearRegressionModel(X)

tensor([[0.1848]], grad_fn=<AddmmBackward0>)

## logistic regression

In [50]:
class LogisticRegression(nn.Module):
    def __init__(self,inputDim,outputDim):
        super().__init__()
        self.linear = nn.Linear(inputDim, outputDim)
        self.lastLayer=nn.Sigmoid()
        
    def forward(self, x):
        x=self.linear(x)
        y = self.lastLayer(x)
        return y

[Sigmoid](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html#torch.nn.Sigmoid) applies the element-wise sigmoid function.
<br>
Since we first made a linear transformation, and then applied that into the sigmoid function, the model becomes a logistic regression.

In [51]:
LogisticRegressionModel = LogisticRegression(inputDim=6,outputDim=1).to(device)
print(LogisticRegressionModel)

LogisticRegression(
  (linear): Linear(in_features=6, out_features=1, bias=True)
  (lastLayer): Sigmoid()
)


In [52]:
LogisticRegressionModel(X)

tensor([[0.6701]], grad_fn=<SigmoidBackward0>)

## neural network

In [53]:
class SimpleNN(nn.Module):
    def __init__(self,inputDim,outputDim):
        super().__init__()
        self.structure = nn.Sequential(
            nn.Linear(inputDim, 16),
            nn.ReLU(),
            nn.Linear(16, 4),
            nn.ReLU(),
            nn.Linear(4, outputDim),
            nn.Sigmoid()
        )
        
    def forward(self, x):
        y = self.structure(x)
        return y

[nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) is an ordered container of modules. The data is passed through all the modules in the same order as defined. 

In [54]:
SimpleNNModel = SimpleNN(inputDim=6,outputDim=1).to(device)
print(SimpleNNModel)

SimpleNN(
  (structure): Sequential(
    (0): Linear(in_features=6, out_features=16, bias=True)
    (1): ReLU()
    (2): Linear(in_features=16, out_features=4, bias=True)
    (3): ReLU()
    (4): Linear(in_features=4, out_features=1, bias=True)
    (5): Sigmoid()
  )
)


In [55]:
SimpleNNModel(X) #note that they sum up to 1

tensor([[0.4196]], grad_fn=<SigmoidBackward0>)

[here](https://pytorch.org/docs/stable/nn.html) is the full list of functions you can use to build your model.

##  [Autograd](https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html#automatic-differentiation-with-torch-autograd) & [Optimization](https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html#optimizing-model-parameters)(training)

All the model defined is a initialized model, and their parameters are randomly initialized, although it does give us a prediction, but the parameters are not optimized based on the training data. <br>

To optimize the parameters in ways that the model will learn from the data, we typically use gradient of the loss function to adjust the parameters.

To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd. It supports automatic computation of gradient for any computational graph.

In [56]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        
        loss, current = loss.item(), (batch + 1) * len(X)
        if batch % 100 == 0:
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
        
def test_loop(dataloader, model, loss_fn, threshold=0.5):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            yhat=(pred >= threshold).long()
            correct += (yhat == y).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")


### binary classification: loss function = Binary Cross Entropy

In [57]:
# setup the dataloader
features=['age', 'balance', 'day','duration', 'pdays','previous'] #picked some numerical features
target=['y']

dataset=CustomDataset(file_path='./Bank_Marketing.csv',features=features,target=target,delimiter=';')

train_size = int(0.75 * len(dataset))
test_size = len(dataset) - train_size

trainset, testset = random_split(dataset, [train_size, test_size])
trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True)
testloader = DataLoader(testset, batch_size=batch_size, shuffle=False)

inputs, labels = next(iter(trainloader))

In [58]:
# init the model
SimpleNNModel = SimpleNN(inputDim=6,outputDim=1).to(device)

after `pip install tensorboard` <br>
you can use the tensorboard by this command in the ternimal `tensorboard --logdir=runs`      

#### visualize your model

In [59]:
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('./runs/')
writer.add_graph(model=SimpleNNModel,input_to_model=inputs)
writer.flush()

In [60]:
learning_rate = 1e-2
batch_size = 32
epochs = 5

loss_fn = nn.BCELoss()
optimizer = torch.optim.SGD(SimpleNNModel.parameters(), lr=learning_rate)


for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(trainloader, SimpleNNModel, loss_fn, optimizer)
    test_loop(testloader, SimpleNNModel, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 7.607918  [   64/33908]
loss: 0.919494  [ 6464/33908]
loss: 0.502805  [12864/33908]
loss: 0.508079  [19264/33908]
loss: 0.421540  [25664/33908]
loss: 0.476836  [32064/33908]
Test Error: 
 Accuracy: 87.7%, Avg loss: 0.419637 

Epoch 2
-------------------------------
loss: 0.426845  [   64/33908]
loss: 0.273120  [ 6464/33908]
loss: 0.372531  [12864/33908]
loss: 0.352171  [19264/33908]
loss: 0.343127  [25664/33908]
loss: 0.344391  [32064/33908]
Test Error: 
 Accuracy: 87.8%, Avg loss: 0.391770 

Epoch 3
-------------------------------
loss: 0.392212  [   64/33908]
loss: 0.379488  [ 6464/33908]
loss: 0.533383  [12864/33908]
loss: 0.389350  [19264/33908]
loss: 0.330368  [25664/33908]
loss: 0.373036  [32064/33908]
Test Error: 
 Accuracy: 87.8%, Avg loss: 0.361651 

Epoch 4
-------------------------------
loss: 0.396607  [   64/33908]
loss: 0.387730  [ 6464/33908]
loss: 0.391032  [12864/33908]
loss: 0.223836  [19264/33908]
loss: 0.430041  [25664/3

## [save and load model](https://pytorch.org/tutorials/beginner/basics/saveloadrun_tutorial.html#save-and-load-the-model)

In [61]:
SimpleNNModel.eval()

SimpleNN(
  (structure): Sequential(
    (0): Linear(in_features=6, out_features=16, bias=True)
    (1): ReLU()
    (2): Linear(in_features=16, out_features=4, bias=True)
    (3): ReLU()
    (4): Linear(in_features=4, out_features=1, bias=True)
    (5): Sigmoid()
  )
)

In [62]:
torch.save(SimpleNNModel, 'SimpleNNModel_demo_weights')

In [63]:
test_loop(testloader, SimpleNNModel, loss_fn)

Test Error: 
 Accuracy: 87.8%, Avg loss: 0.340805 



In [64]:
SimpleNNModel = SimpleNN(inputDim=6,outputDim=1).to(device)
test_loop(testloader, SimpleNNModel, loss_fn)

Test Error: 
 Accuracy: 18.8%, Avg loss: 39.887017 



In [65]:
SimpleNNModel=torch.load('SimpleNNModel_demo_weights')
test_loop(testloader, SimpleNNModel, loss_fn)

Test Error: 
 Accuracy: 87.8%, Avg loss: 0.340805 

