In [4]:
!pip install torch torchvision torchaudio

Defaulting to user installation because normal site-packages is not writeable


In [5]:
import torch

# Tensors
At its core, PyTorch is a library for processing tensors. A tensor is a number, vector, matrix, or any n-dimensional array. Let's create a tensor with a single number.

In [6]:
# Number
t1 = torch.tensor(4.)
t1

tensor(4.)

4. is a shorthand for 4.0. It is used to indicate to Python (and PyTorch) that you want to create a floating-point number. We can verify this by checking the dtype attribute of our tensor.

In [7]:
t1.dtype

torch.float32

# Let's try creating more complex tensors.

In [8]:
# Vector
t2 = torch.tensor([1., 2, 3, 4])
t2

tensor([1., 2., 3., 4.])

In [9]:
# Matrix
t3 = torch.tensor([[5., 6], 
                   [7, 8], 
                   [9, 10]])
t3

tensor([[ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.]])

In [10]:
# 3-dimensional array
t4 = torch.tensor([
    [[11, 12, 13], 
     [13, 14, 15]], 
    [[15, 16, 17], 
     [17, 18, 19.]]])
t4

tensor([[[11., 12., 13.],
         [13., 14., 15.]],

        [[15., 16., 17.],
         [17., 18., 19.]]])

Tensors can have any number of dimensions and different lengths along each dimension. We can inspect the length along each dimension using the .shape property of a tensor.

In [11]:
print(t1)
t1.shape

tensor(4.)


torch.Size([])

In [12]:
print(t2)
t2.shape

tensor([1., 2., 3., 4.])


torch.Size([4])

In [13]:
print(t3)
t3.shape

tensor([[ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.]])


torch.Size([3, 2])

In [14]:
print(t4)
t4.shape

tensor([[[11., 12., 13.],
         [13., 14., 15.]],

        [[15., 16., 17.],
         [17., 18., 19.]]])


torch.Size([2, 2, 3])

# Note that it's not possible to create tensors with an improper shape.

In [15]:
# Matrix
t5 = torch.tensor([[5., 6, 11], 
                   [7, 8], 
                   [9, 10]])
t5

ValueError: expected sequence of length 3 at dim 1 (got 2)

# A ValueError is thrown because the lengths of the rows [5., 6, 11] and [7, 8] don't match.

# Tensor operations and gradients
We can combine tensors with the usual arithmetic operations. Let's look at an example:

In [None]:
# Create tensors.
x = torch.tensor(3.)
w = torch.tensor(4., requires_grad=True)
b = torch.tensor(5., requires_grad=True)
x, w, b

We've created three tensors: x, w, and b, all numbers. w and b have an additional parameter requires_grad set to True. We'll see what it does in just a moment.

Let's create a new tensor y by combining these tensors.

In [None]:
# Arithmetic operations
y = w * x + b
y

As expected, y is a tensor with the value 3 * 4 + 5 = 17. What makes PyTorch unique is that we can automatically compute the derivative of y w.r.t. the tensors that have requires_grad set to True i.e. w and b. This feature of PyTorch is called autograd (automatic gradients).

To compute the derivatives, we can invoke the .backward method on our result y.



In [None]:
# Compute derivatives
y.backward()

The derivatives of y with respect to the input tensors are stored in the .grad property of the respective tensors.

In [None]:
# Display gradients
print('dy/dx:', x.grad)
print('dy/dw:', w.grad)
print('dy/db:', b.grad)

As expected, dy/dw has the same value as x, i.e., 3, and dy/db has the value 1. Note that x.grad is None because x doesn't have requires_grad set to True.

The "grad" in w.grad is short for gradient, which is another term for derivative. The term gradient is primarily used while dealing with vectors and matrices.



# Tensor functions
Apart from arithmetic operations, the torch module also contains many functions for creating and manipulating tensors. Let's look at some examples.

In [None]:
# Create a tensor with a fixed value for every element
t6 = torch.full((3, 2), 42)
t6

In [None]:
t3

In [None]:
# Concatenate two tensors with compatible shapes
t7 = torch.cat((t3, t6))
t7

In [None]:
# Compute the sin of each element
t8 = torch.sin(t7)
t8

In [None]:
# Change the shape of a tensor
t9 = t8.reshape(3, 2, 2)
t9

You can learn more about tensor operations here: https://pytorch.org/docs/stable/torch.html . Experiment with some more tensor functions and operations using the empty cells below.

# Interoperability with Numpy



In [None]:
Numpy is a popular open-source library used for mathematical and scientific 
computing in Python. It enables efficient operations on 
large multi-dimensional arrays and has a vast ecosystem of supporting libraries, including:

Pandas for file I/O and data analysis
Matplotlib for plotting and visualization
OpenCV for image and video processing
Instead of reinventing the wheel, PyTorch interoperates well with Numpy
to leverage its existing ecosystem of tools and libraries.

Here's how we create an array in Numpy:

In [23]:
import numpy as np

x = np.array([[1, 2], [3, 4.]])
x

array([[1., 2.],
       [3., 4.]])

We can convert a Numpy array to a PyTorch tensor using torch.from_numpy.

In [24]:
# Convert the numpy array to a torch tensor.
y = torch.from_numpy(x)
y

tensor([[1., 2.],
        [3., 4.]], dtype=torch.float64)

# Let's verify that the numpy array and torch tensor have similar data types.

In [26]:
x.dtype, y.dtype

(dtype('float64'), torch.float64)

We can convert a PyTorch tensor to a Numpy array using the .numpy method of a tensor.

In [27]:
# Convert a torch tensor to a numpy array
z = y.numpy()
z

array([[1., 2.],
       [3., 4.]])

# The interoperability between PyTorch and Numpy is essential because most datasets you'll work with will likely be read and preprocessed as Numpy arrays.

You might wonder why we need a library like PyTorch at all since Numpy already provides data structures and utilities for working with multi-dimensional numeric data. There are two main reasons:

Autograd: The ability to automatically compute gradients for tensor operations is essential for training deep learning models.
GPU support: While working with massive datasets and large models, PyTorch tensor operations can be performed efficiently using a Graphics Processing Unit (GPU). Computations that might typically take hours can be completed within minutes using GPUs.

# Linear-regression from scrach using pytorch

In [16]:
import numpy as np
import torch

In [17]:
#making training data 
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [18]:
# Targets (apples, oranges)
target = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

In [19]:
#Convert input and target to tensors
inputs = torch.from_numpy(inputs)
target = torch.from_numpy(target)

print(inputs,"\n")
print(target)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]]) 

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [20]:
# weights and biases
w = torch.randn(2,3 , requires_grad=True)
b = torch.randn(2, requires_grad=True)

print(w)
print(b)

tensor([[-0.2885, -0.8025,  0.4200],
        [ 1.0883,  1.0860,  0.1029]], requires_grad=True)
tensor([1.9191, 0.6816], requires_grad=True)


In [21]:
#define the model

def model(x):
    return x @ w.t() + b

In [42]:
# prediction
preds = model(inputs)
print(preds)

tensor([[-145.6381,  248.4880],
        [-186.7533,  325.7734],
        [-244.9442,  394.5558],
        [-139.2003,  240.0287],
        [-177.9548,  315.8690]], grad_fn=<AddBackward0>)


In [43]:
#actual
print(target)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [44]:
# loss function MSE
def MSE(actual, target):
    diff = actual - target
    return torch.sum(diff * diff) / diff.numel()

In [45]:
# error
loss = MSE(target, preds)
print(loss)

tensor(58049.6797, grad_fn=<DivBackward0>)


In [46]:
# compute gradients
loss.backward()

In [47]:
print(w, "\n")
print(w.grad)

tensor([[-0.8547, -1.3262,  0.1009],
        [ 1.3051,  1.7187,  0.8429]], requires_grad=True) 

tensor([[-21315.3203, -23948.7676, -14509.3320],
        [ 18106.4492,  18883.3789,  11704.7217]])


In [48]:
print(b, "\n")
print(b.grad)

tensor([1.2686, 1.8193], requires_grad=True) 

tensor([-255.0982,  212.9430])


In [49]:
#reset grad
w.grad.zero_()
b.grad.zero_()

print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


In [50]:
# adjust params

preds = model(inputs)
print(preds)

tensor([[-145.6381,  248.4880],
        [-186.7533,  325.7734],
        [-244.9442,  394.5558],
        [-139.2003,  240.0287],
        [-177.9548,  315.8690]], grad_fn=<AddBackward0>)


In [51]:
# loss
loss = MSE(target, preds)
print(loss)

tensor(58049.6797, grad_fn=<DivBackward0>)


In [52]:
loss.backward()

print(w.grad, "\n")
print(b.grad)

tensor([[-21315.3203, -23948.7676, -14509.3320],
        [ 18106.4492,  18883.3789,  11704.7217]]) 

tensor([-255.0982,  212.9430])


In [53]:
  # adjust weight & reset grad
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

In [54]:
print(w)
print(b)

tensor([[-0.6415, -1.0867,  0.2460],
        [ 1.1240,  1.5299,  0.7259]], requires_grad=True)
tensor([1.2712, 1.8172], requires_grad=True)


In [55]:
# calculate again
preds = model(inputs)
loss = MSE(target, preds)
print(loss)

tensor(39294.2891, grad_fn=<DivBackward0>)


In [56]:
# Training for multiple epochs
for i in range(400):
  preds = model(inputs)
  loss = MSE(target, preds)
  loss.backward()

  with torch.no_grad():
     w -= w.grad * 1e-5 # learning rate
     b -= b.grad * 1e-5
     w.grad.zero_()
     b.grad.zero_()
  print(f"Epochs({i}/{100}) & Loss {loss}")

Epochs(0/100) & Loss 39294.2890625
Epochs(1/100) & Loss 26653.197265625
Epochs(2/100) & Loss 18132.46484375
Epochs(3/100) & Loss 12388.4150390625
Epochs(4/100) & Loss 8515.5517578125
Epochs(5/100) & Loss 5903.6845703125
Epochs(6/100) & Loss 4141.6083984375
Epochs(7/100) & Loss 2952.219482421875
Epochs(8/100) & Loss 2148.779052734375
Epochs(9/100) & Loss 1605.448974609375
Epochs(10/100) & Loss 1237.4271240234375
Epochs(11/100) & Loss 987.5665283203125
Epochs(12/100) & Loss 817.3563232421875
Epochs(13/100) & Loss 700.8442993164062
Epochs(14/100) & Loss 620.5413208007812
Epochs(15/100) & Loss 564.66162109375
Epochs(16/100) & Loss 525.262451171875
Epochs(17/100) & Loss 496.9911193847656
Epochs(18/100) & Loss 476.240234375
Epochs(19/100) & Loss 460.5782165527344
Epochs(20/100) & Loss 448.36688232421875
Epochs(21/100) & Loss 438.50146484375
Epochs(22/100) & Loss 430.23748779296875
Epochs(23/100) & Loss 423.0728454589844
Epochs(24/100) & Loss 416.6690368652344
Epochs(25/100) & Loss 410.797546

In [57]:
preds = model(inputs)
loss = MSE(target, preds)
print(loss)

tensor(18.4036, grad_fn=<DivBackward0>)


In [58]:
from math import sqrt
sqrt(loss)

4.289942249896412

In [59]:
preds

tensor([[ 57.8801,  71.4046],
        [ 84.2536,  97.3319],
        [112.8927, 138.7951],
        [ 24.1537,  41.2168],
        [103.9399, 111.2345]], grad_fn=<AddBackward0>)

In [60]:
target

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

# You can see they are almost close earch other

# Neural Network using Pytorch

In [65]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt

In [67]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data\FashionMNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:17<00:00, 1536936.27it/s]


Extracting data\FashionMNIST\raw\train-images-idx3-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data\FashionMNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 207979.98it/s]


Extracting data\FashionMNIST\raw\train-labels-idx1-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:01<00:00, 3444650.90it/s]


Extracting data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 2592421.30it/s]

Extracting data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz to data\FashionMNIST\raw






In [68]:
type(training_data)

torchvision.datasets.mnist.FashionMNIST

In [69]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    # print(X)
    # print(y)
    break

Shape of X [N, C, H, W]:  torch.Size([64, 1, 28, 28])
Shape of y:  torch.Size([64]) torch.int64


In [None]:
X.shape prints the shape of the input data tensor X.
In this case, it prints the shape in the format [N, C, H, W], 
where N is the batch size, C is the number of channels 
(e.g., for RGB images, C would be 3), H is the height,
and W is the width of each sample in the batch.

In [70]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


In [71]:
# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


In [72]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In [73]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [74]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [75]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.295748  [    0/60000]
loss: 2.291415  [ 6400/60000]
loss: 2.273428  [12800/60000]
loss: 2.275123  [19200/60000]
loss: 2.246910  [25600/60000]
loss: 2.224137  [32000/60000]
loss: 2.241773  [38400/60000]
loss: 2.207633  [44800/60000]
loss: 2.204364  [51200/60000]
loss: 2.174191  [57600/60000]
Test Error: 
 Accuracy: 45.8%, Avg loss: 2.168380 

Epoch 2
-------------------------------
loss: 2.176975  [    0/60000]
loss: 2.172533  [ 6400/60000]
loss: 2.112639  [12800/60000]
loss: 2.132830  [19200/60000]
loss: 2.081084  [25600/60000]
loss: 2.021886  [32000/60000]
loss: 2.067733  [38400/60000]
loss: 1.988711  [44800/60000]
loss: 1.996453  [51200/60000]
loss: 1.928489  [57600/60000]
Test Error: 
 Accuracy: 55.0%, Avg loss: 1.918153 

Epoch 3
-------------------------------
loss: 1.950550  [    0/60000]
loss: 1.927153  [ 6400/60000]
loss: 1.800188  [12800/60000]
loss: 1.846008  [19200/60000]
loss: 1.740488  [25600/60000]
loss: 1.679178  [32000/600

In [76]:
#save model
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


In [77]:
#load model
model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

In [79]:
## Prediction

classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"
