## PyTorch Tutorial with Code Examples
### Name: Payal Pitroda  
### Library: PyTorch 
### URL: https://github.com/pitrodap/OIM_7502_classwork 
### Description:   
This library is a popular open-source machine learning library for Python. It provides a flexible framework for building and training deep learning models. At its core, PyTorch provides two main features: An n-dimensional Tensor, similar to numpy but can run on GPUs and Automatic differentiation for building and training neural networks. It is widely used for various applications such as natural language processing (NLP), computer vision, and reinforcement learning. PyTorch also offers dynamic computational graphs along with automatic differentiation.

In [None]:
pip install torch torchvision torchaudio

In [29]:
import torch
import torchvision
import numpy as np
import math
import torchvision.datasets as datasets
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader
from torch import nn

If you need direct instructions on installation, please use this link: 
https://pytorch.org/get-started/locally/

## How to create a Tensor?

In [7]:
z = torch.zeros(5, 3)
print(z)
print(z.dtype)

i = torch.ones((5, 3), dtype=torch.int16)
print(i)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
torch.float32
tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]], dtype=torch.int16)


Here, we created a 5x3 matrix filled with zeros, and query its datatype to find out that the zeros are 32-bit floating point numbers, which is the default PyTorch. If you wanted integers instead, you can always override the default. You can also see that when we do change the default, the tensor helpfully reports this when printed.

We can also initialize learning weights randomly, often with a specific seed for the PRNG for reproducibility of results

In [8]:
torch.manual_seed(1729)
r1 = torch.rand(2, 2)
print('A random tensor:')
print(r1)

r2 = torch.rand(2, 2)
print('\nA different random tensor:')
print(r2) # new values

torch.manual_seed(1729)
r3 = torch.rand(2, 2)
print('\nShould match r1:')
print(r3) # repeats values of r1 because of re-seed

A random tensor:
tensor([[0.3126, 0.3791],
        [0.3087, 0.0736]])

A different random tensor:
tensor([[0.4216, 0.0691],
        [0.2332, 0.4047]])

Should match r1:
tensor([[0.3126, 0.3791],
        [0.3087, 0.0736]])


PyTorch tensors perform arithmetic operations intuitively. Tensors of similar shapes may be added, multiplied, etc. Operations with scalars are distributed over the tensor.

In [9]:
ones = torch.ones(2, 3)
print(ones)

twos = torch.ones(2, 3) * 2 # every element is multiplied by 2
print(twos)

threes = ones + twos       # addition allowed because shapes are similar
print(threes)              # tensors are added element-wise
print(threes.shape)        # this has the same dimensions as input tensors

r1 = torch.rand(2, 3)
r2 = torch.rand(3, 2)
# uncomment this line to get a runtime error
# r3 = r1 + r2
torch.Size([2, 3])

r = (torch.rand(2, 2) - 0.5) * 2 # values between -1 and 1
print('A random matrix, r:')
print(r)

# Common mathematical operations are supported:
print('\nAbsolute value of r:')
print(torch.abs(r))

# ...as are trigonometric functions:
print('\nInverse sine of r:')
print(torch.asin(r))

# ...and linear algebra operations like determinant and singular value decomposition
print('\nDeterminant of r:')
print(torch.det(r))
print('\nSingular value decomposition of r:')
print(torch.svd(r))

# ...and statistical and aggregate operations:
print('\nAverage and standard deviation of r:')
print(torch.std_mean(r))
print('\nMaximum value of r:')
print(torch.max(r))

tensor([[1., 1., 1.],
        [1., 1., 1.]])
tensor([[2., 2., 2.],
        [2., 2., 2.]])
tensor([[3., 3., 3.],
        [3., 3., 3.]])
torch.Size([2, 3])
A random matrix, r:
tensor([[ 0.9956, -0.2232],
        [ 0.3858, -0.6593]])

Absolute value of r:
tensor([[0.9956, 0.2232],
        [0.3858, 0.6593]])

Inverse sine of r:
tensor([[ 1.4775, -0.2251],
        [ 0.3961, -0.7199]])

Determinant of r:
tensor(-0.5703)

Singular value decomposition of r:
torch.return_types.svd(
U=tensor([[-0.8353, -0.5497],
        [-0.5497,  0.8353]]),
S=tensor([1.1793, 0.4836]),
V=tensor([[-0.8851, -0.4654],
        [ 0.4654, -0.8851]]))

Average and standard deviation of r:
(tensor(0.7217), tensor(0.1247))

Maximum value of r:
tensor(0.9956)


### Let's now learn more about Tensors

Tensors can be created directly from data. The data type is automatically inferred.

In [10]:
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

Tensors can be created from NumPy arrays (and vice versa - see Bridge with NumPy).

In [11]:
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.

In [12]:
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.1384, 0.4759],
        [0.7481, 0.0361]]) 



With random or constant values shape is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.

In [13]:
shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Random Tensor: 
 tensor([[0.5062, 0.8469, 0.2588],
        [0.2707, 0.4115, 0.6839]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])


Tensor attributes describe their shape, datatype, and the device on which they are stored.

In [14]:
tensor = torch.rand(3,4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Operations on Tensors 
Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation (transposing, indexing, slicing), sampling and more are comprehensively described here.

Standard numpy-like indexing and slicing:

In [15]:
tensor = torch.ones(4, 4)
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:, 0]}")
print(f"Last column: {tensor[..., -1]}")
tensor[:,1] = 0
print(tensor)

First row: tensor([1., 1., 1., 1.])
First column: tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


Joining tensors You can use torch.cat to concatenate a sequence of tensors along a given dimension. See also torch.stack, another tensor joining operator that is subtly different from torch.cat.

In [16]:
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])


Arithmetic operations -This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value ``tensor.T`` returns the transpose of a tensor

In [17]:
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

y3 = torch.rand_like(y1)
torch.matmul(tensor, tensor.T, out=y3)

tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]])

This computes the element-wise product. z1, z2, z3 will have the same value

In [18]:
z1 = tensor * tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

Single-element tensors If you have a one-element tensor, for example by aggregating all values of a tensor into one value, you can convert it to a Python numerical value using item():

In [19]:
agg = tensor.sum()
agg_item = agg.item()
print(agg_item, type(agg_item))

12.0 <class 'float'>


In-place operations Operations that store the result into the operand are called in-place. They are denoted by a _ suffix. For example: x.copy_(y), x.t_(), will change x.

In [20]:
print(f"{tensor} \n")
tensor.add_(5)
print(tensor)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]]) 

tensor([[6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.]])


### Working with Data

The torchvision.datasets module contains Dataset objects for many real-world vision data like CIFAR, COCO. We use the FashionMNIST dataset. Every TorchVision Dataset includes two arguments: transform and target_transform to modify the samples and labels respectively.

In [23]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

In [26]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


We pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

In [27]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the __init__ function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to the GPU or MPS if available.

In [30]:
# Get cpu, gpu or mps device for training.
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Optimizing the Model Parameters
To train a model, we need a loss function and an optimizer. In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model’s parameters. We also check the model’s performance against the test dataset to ensure it is learning.

In [31]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In [32]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [33]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.

In [34]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.297716  [   64/60000]
loss: 2.296212  [ 6464/60000]
loss: 2.278649  [12864/60000]
loss: 2.276129  [19264/60000]
loss: 2.249752  [25664/60000]
loss: 2.224671  [32064/60000]
loss: 2.232621  [38464/60000]
loss: 2.200275  [44864/60000]
loss: 2.207772  [51264/60000]
loss: 2.166907  [57664/60000]
Test Error: 
 Accuracy: 44.9%, Avg loss: 2.165901 

Epoch 2
-------------------------------
loss: 2.176962  [   64/60000]
loss: 2.175058  [ 6464/60000]
loss: 2.121765  [12864/60000]
loss: 2.136795  [19264/60000]
loss: 2.083737  [25664/60000]
loss: 2.026645  [32064/60000]
loss: 2.058328  [38464/60000]
loss: 1.987872  [44864/60000]
loss: 2.000674  [51264/60000]
loss: 1.916838  [57664/60000]
Test Error: 
 Accuracy: 59.1%, Avg loss: 1.916489 

Epoch 3
-------------------------------
loss: 1.952897  [   64/60000]
loss: 1.926626  [ 6464/60000]
loss: 1.810859  [12864/60000]
loss: 1.846733  [19264/60000]
loss: 1.740672  [25664/60000]
loss: 1.679634  [32064/600

Saving Models
A common way to save a model is to serialize the internal state dictionary (containing the model parameters).

In [35]:
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


Loading Models
The process for loading a model includes re-creating the model structure and loading the state dictionary into it.

In [36]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

This model can now be used to make predictions.

In [37]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"


### NumPy and Pytorch

In [38]:
#What we could do in NumPy
l = np.random.normal(loc=10, size=50)
print(f"{l.mean():.3f}")
print(f"{l.std():.3f}") 

9.989
1.005


This code generates an array l of 50 random numbers drawn from a normal distribution with mean 10 (loc=10) and a default standard deviation of 1. Then it prints out the mean and standard deviation of this array rounded to three decimal places.

In [39]:
#Lets do the same thing using Pytorch
# Generate random numbers from a normal distribution
l = torch.normal(mean=10, std=1, size=(50,))

# Calculate and print mean
print(f"{l.mean().item():.3f}")

# Calculate and print standard deviation
print(f"{l.std().item():.3f}")


10.039
1.010


This code accomplishes the same task as the NumPy code but using PyTorch tensors. It generates 50 random numbers drawn from a normal distribution with mean 10 and standard deviation 1, calculates their mean and standard deviation using PyTorch tensor methods, and prints the results rounded to three decimal places.

### Trying with a model - NumPy

This code performs a simple polynomial regression using gradient descent to minimize the loss function. Here's a breakdown of what it does:

-It generates random input x values ranging from -π to π with 2000 data points and calculates the corresponding sine values y.

-Randomly initializes the coefficients a, b, c, and d.

-Sets a learning rate.

-Enters a loop of 2000 iterations for gradient descent optimization.

-In each iteration:
    
    Computes the predicted y values (y_pred) using the current coefficients.
    
    Calculates the loss function, which is the sum of squared differences between predicted y and actual y.
    
    Prints the loss every 100 iterations.
    
    Computes gradients of the loss function with respect to the coefficients (a, b, c, d).
    
    Updates the coefficients using gradient descent: subtracts the product of the learning rate and the respective gradient from each coefficient.

-After the loop, it prints the final result with the learned coefficients for the polynomial equation.

So essentially, it's fitting a polynomial curve of degree 3 (a + b*x + c*x^2 + d*x^3) to approximate the sine function over the given range using gradient descent optimization.

In [40]:
# Create random input and output data
x = np.linspace(-math.pi, math.pi, 2000)
y = np.sin(x)

# Randomly initialize weights
a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y
    # y = a + b x + c x^2 + d x^3
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {a} + {b} x + {c} x^2 + {d} x^3')

99 347.5586995439629
199 246.53144041330597
299 175.71781745736752
399 126.05302614856004
499 91.20295086009364
599 66.73649608174503
699 49.551879328112406
799 37.476528060498666
899 28.987835290541824
999 23.018123516719303
1099 18.81832763399651
1199 15.862654354654904
1299 13.781857806935205
1399 12.316513541697756
1499 11.284277894790847
1599 10.556933980089006
1699 10.044290386451157
1799 9.682881049903365
1899 9.428030845692025
1999 9.248281884135942
Result: y = 0.021550341351434122 + 0.860702176296195 x + -0.003717792482350343 x^2 + -0.09389385419589356 x^3


### Trying with a model - PyTorch

This code performs a similar task as the previous one, but it's implemented using PyTorch tensors for automatic differentiation, which enables easier backpropagation. Here's what the code does:

-It sets the data type (dtype) to torch.float and the device to run on (device) to CPU.

-It generates random input x values ranging from -π to π with 2000 data points and calculates the corresponding sine values y. These tensors are created on the specified device.

-Randomly initializes the coefficients a, b, c, and d as tensors on the specified device.

-Sets a learning rate.

-Enters a loop of 2000 iterations for gradient descent optimization.

-In each iteration:
    
    Computes the predicted y values (y_pred) using the current coefficients.
    
    Calculates the mean squared error loss.
    
    Prints the loss every 100 iterations.
    
    Computes gradients of the loss function with respect to the coefficients (a, b, c, d) using automatic differentiation.
    
    Updates the coefficients using gradient descent.
    
-After the loop, it prints the final result with the learned coefficients for the polynomial equation.

This code essentially performs polynomial regression using PyTorch tensors and automatic differentiation, achieving the same goal as the NumPy code but with the added benefits of GPU acceleration and automatic gradient computation.

In [41]:
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# Create random input and output data
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Randomly initialize weights
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights using gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d


print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

99 1770.32763671875
199 1191.91943359375
299 804.0960693359375
399 543.8612060546875
499 369.101318359375
599 251.64505004882812
699 172.63504028320312
799 119.44034576416016
899 83.59333801269531
999 59.413963317871094
1099 43.08884811401367
1199 32.05574035644531
1299 24.59162139892578
1399 19.53670310974121
1499 16.109737396240234
1599 13.783905029296875
1699 12.203714370727539
1799 11.12890625
1899 10.397041320800781
1999 9.898110389709473
Result: y = -0.024886634200811386 + 0.8344007134437561 x + 0.004293360281735659 x^2 + -0.09015269577503204 x^3


# Thank you!