## installation

installation instructions here: https://pytorch.org .

In [1]:
# Uncomment and run the appropriate command for your operating system, if required

# Linux / Binder
# !pip install numpy torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

# Windows
# !pip install numpy torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

# MacOS
# !pip install numpy torch torchvision torchaudio

In [2]:
import torch

## Tensors

At its core, PyTorch is a library for processing tensors. A tensor is a number, vector, matrix, or any n-dimensional array. Let's create a tensor with a single number.

In [3]:
# Number
t1 = torch.tensor(4.)
t1

tensor(4.)

`4.` is a shorthand for `4.0`. It is used to indicate to Python (and PyTorch) that you want to create a floating-point number. We can verify this by checking the `dtype` attribute of our tensor.

In [5]:
t1.dtype

torch.float32

In [6]:
# Vector
t2 = torch.tensor([1., 2, 3, 4])
t2

tensor([1., 2., 3., 4.])

In [7]:
# Matrix
t3 = torch.tensor([[5.,6],
                   [7,8],
                   [9,10]])
t3

tensor([[ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.]])

In [8]:
# 3-dimensional array
t4 = torch.tensor([
    [[11, 12, 13],
     [13, 14, 15]],
    [[15, 16, 17],
     [17, 18, 19.]]])
t4

tensor([[[11., 12., 13.],
         [13., 14., 15.]],

        [[15., 16., 17.],
         [17., 18., 19.]]])

Tensors can have any number of dimensions and different lengths along each dimension. We can inspect the length along each dimension using the `.shape` property of a tensor.

In [9]:
print(t1)
t1.shape

tensor(4.)


torch.Size([])

In [10]:
print(t2)
t2.shape

tensor([1., 2., 3., 4.])


torch.Size([4])

In [11]:
print(t3)
t3.shape

tensor([[ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.]])


torch.Size([3, 2])

In [12]:
print(t4)
t4.shape

tensor([[[11., 12., 13.],
         [13., 14., 15.]],

        [[15., 16., 17.],
         [17., 18., 19.]]])


torch.Size([2, 2, 3])

Note that it's not possible to create tensors with an improper shape.

In [13]:
# Matrix
t5 = torch.tensor([[5., 6, 11],
                   [7, 8],
                   [9, 10]])
t5

ValueError: ignored

## Tensor operations and gradients

We can combine tensors with the usual arithmetic operations. Let's look at an example:

A `ValueError` is thrown because the lengths of the rows `[5., 6, 11]` and `[7, 8]` don't match.

In [14]:
  # Create tensors.
x = torch.tensor(3.)
w = torch.tensor(4., requires_grad=True)
b = torch.tensor(5., requires_grad=True)
x, w, b

(tensor(3.), tensor(4., requires_grad=True), tensor(5., requires_grad=True))

We've created three tensors: `x`, `w`, and `b`, all numbers. `w` and `b` have an additional parameter `requires_grad` set to `True`. We'll see what it does in just a moment.

Let's create a new tensor `y` by combining these tensors.

In [15]:
# Arithmetic operations
y = w * x + b
y

tensor(17., grad_fn=<AddBackward0>)

As expected, `y` is a tensor with the value `3 * 4 + 5 = 17`. What makes PyTorch unique is that we can automatically compute the derivative of `y` w.r.t. the tensors that have `requires_grad` set to `True` i.e. w and b. This feature of PyTorch is called _autograd_ (automatic gradients).

To compute the derivatives, we can invoke the `.backward` method on our result `y`.

In [16]:
# Compute derivatives
y.backward()

The derivatives of `y` with respect to the input tensors are stored in the `.grad` property of the respective tensors.

In [17]:
# Display gradients
print('dy/dx:', x.grad)
print('dy/dw:', w.grad)
print('dy/db:', b.grad)

dy/dx: None
dy/dw: tensor(3.)
dy/db: tensor(1.)


As expected, `dy/dw` has the same value as `x`, i.e., `3`, and `dy/db` has the value `1`. Note that `x.grad` is `None` because `x` doesn't have `requires_grad` set to `True`.

The "grad" in `w.grad` is short for _gradient_, which is another term for derivative. The term _gradient_ is primarily used while dealing with vectors and matrices.

## Tensor functions

Apart from arithmetic operations, the `torch` module also contains many functions for creating and manipulating tensors. Let's look at some examples.

In [18]:
# Create a tensor with a fixed value for every element
t6 = torch.full((3, 2), 42)
t6

tensor([[42, 42],
        [42, 42],
        [42, 42]])

In [19]:
# Concatenate two tensors with compatible shapes
t7 = torch.cat((t3, t6))
t7

tensor([[ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.],
        [42., 42.],
        [42., 42.],
        [42., 42.]])

In [20]:
  # Compute the sin of each element
t8 = torch.sin(t7)
t8

tensor([[-0.9589, -0.2794],
        [ 0.6570,  0.9894],
        [ 0.4121, -0.5440],
        [-0.9165, -0.9165],
        [-0.9165, -0.9165],
        [-0.9165, -0.9165]])

In [21]:
# Change the shape of a tensor
t9 = t8.reshape(3, 2, 2)
t9

tensor([[[-0.9589, -0.2794],
         [ 0.6570,  0.9894]],

        [[ 0.4121, -0.5440],
         [-0.9165, -0.9165]],

        [[-0.9165, -0.9165],
         [-0.9165, -0.9165]]])

You can learn more about tensor operations here: https://pytorch.org/docs/stable/torch.html . Experiment with some more tensor functions and operations using the empty cells below.

## Interoperability with Numpy

[Numpy](http://www.numpy.org/) is a popular open-source library used for mathematical and scientific computing in Python. It enables efficient operations on large multi-dimensional arrays and has a vast ecosystem of supporting libraries, including:

* [Pandas](https://pandas.pydata.org/) for file I/O and data analysis
* [Matplotlib](https://matplotlib.org/) for plotting and visualization
* [OpenCV](https://opencv.org/) for image and video processing


Instead of reinventing the wheel, PyTorch interoperates well with Numpy to leverage its existing ecosystem of tools and libraries.

Here's how we create an array in Numpy:

In [22]:
import numpy as np

x = np.array([[1, 2], [3, 4.]])
x

array([[1., 2.],
       [3., 4.]])

We can convert a Numpy array to a PyTorch tensor using `torch.from_numpy`.

In [23]:
# Convert the numpy array to a torch tensor.
y = torch.from_numpy(x)
y

tensor([[1., 2.],
        [3., 4.]], dtype=torch.float64)

Let's verify that the numpy array and torch tensor have similar data types.

In [24]:
x.dtype, y.dtype

(dtype('float64'), torch.float64)

In [25]:
# Convert a torch tensor to a numpy array
z = y.numpy()
z

array([[1., 2.],
       [3., 4.]])

The interoperability between PyTorch and Numpy is essential because most datasets you'll work with will likely be read and preprocessed as Numpy arrays.

You might wonder why we need a library like PyTorch at all since Numpy already provides data structures and utilities for working with multi-dimensional numeric data. There are two main reasons:

1. **Autograd**: The ability to automatically compute gradients for tensor operations is essential for training deep learning models.
2. **GPU support**: While working with massive datasets and large models, PyTorch tensor operations can be performed efficiently using a Graphics Processing Unit (GPU). Computations that might typically take hours can be completed within minutes using GPUs.



## Linear-regression from scrach using pytorch

In [26]:
import numpy as np
import torch

In [27]:
#making training data
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134, 58],
                   [102, 43, 37],
                   [69, 96, 70]], dtype='float32')

In [28]:
# Targets (apples, oranges)
target = np.array([[56, 70],
                    [81, 101],
                    [119, 133],
                    [22, 37],
                    [103, 119]], dtype='float32')

In [29]:
#Convert input and target to tensors
inputs = torch.from_numpy(inputs)
target = torch.from_numpy(target)

print(inputs,"\n")
print(target)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]]) 

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [30]:
  # weights and biases
w = torch.randn(2,3 , requires_grad=True)
b = torch.randn(2, requires_grad=True)

print(w)
print(b)

tensor([[ 0.9869, -0.3994, -1.4240],
        [-0.7071,  0.5633,  1.4926]], requires_grad=True)
tensor([-0.9238,  0.4101], requires_grad=True)


In [31]:
#define the model

def model(x):
  return x @ w.t() + b

In [32]:
# prediction
preds = model(inputs)
print(preds)

tensor([[-16.8726,  50.7135],
        [-37.4000,  81.1595],
        [-51.1746, 100.9421],
        [ 29.8759,   7.7336],
        [-70.8504, 110.1776]], grad_fn=<AddBackward0>)


In [33]:
#actual
print(target)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [34]:
# loss function MSE
def MSE(actual, target):
  diff = actual - target
  return torch.sum(diff * diff) / diff.numel()

In [35]:
# error
loss = MSE(target, preds)
print(loss)

tensor(8130.2056, grad_fn=<DivBackward0>)


In [36]:
  # compute gradients
loss.backward()

In [37]:
print(w, "\n")
print(w.grad)

tensor([[ 0.9869, -0.3994, -1.4240],
        [-0.7071,  0.5633,  1.4926]], requires_grad=True) 

tensor([[ -8418.3262, -10891.2080,  -6491.8745],
        [ -1919.2699,  -1887.8640,  -1131.7783]])


In [38]:
print(b, "\n")
print(b.grad)

tensor([-0.9238,  0.4101], requires_grad=True) 

tensor([-105.4843,  -21.8547])


In [39]:
#reset grad
w.grad.zero_()
b.grad.zero_()

print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


In [40]:
# adjust params

preds = model(inputs)
print(preds)

tensor([[-16.8726,  50.7135],
        [-37.4000,  81.1595],
        [-51.1746, 100.9421],
        [ 29.8759,   7.7336],
        [-70.8504, 110.1776]], grad_fn=<AddBackward0>)


In [41]:
  # loss
loss = MSE(target, preds)
print(loss)

tensor(8130.2056, grad_fn=<DivBackward0>)


In [42]:
loss.backward()

print(w.grad, "\n")
print(b.grad)

tensor([[ -8418.3262, -10891.2080,  -6491.8745],
        [ -1919.2699,  -1887.8640,  -1131.7783]]) 

tensor([-105.4843,  -21.8547])


In [43]:
  # adjust weight & reset grad
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

In [44]:
print(w)
print(b)

tensor([[ 1.0711, -0.2905, -1.3591],
        [-0.6879,  0.5821,  1.5040]], requires_grad=True)
tensor([-0.9228,  0.4103], requires_grad=True)


In [45]:
# calculate again
preds = model(inputs)
loss = MSE(target, preds)
print(loss)

tensor(5942.1074, grad_fn=<DivBackward0>)


In [46]:
# Training for multiple epochs
for i in range(400):
  preds = model(inputs)
  loss = MSE(target, preds)
  loss.backward()

  with torch.no_grad():
     w -= w.grad * 1e-5 # learning rate
     b -= b.grad * 1e-5
     w.grad.zero_()
     b.grad.zero_()
  print(f"Epochs({i}/{100}) & Loss {loss}")

Epochs(0/100) & Loss 5942.107421875
Epochs(1/100) & Loss 4462.32568359375
Epochs(2/100) & Loss 3459.93408203125
Epochs(3/100) & Loss 2779.31298828125
Epochs(4/100) & Loss 2315.591796875
Epochs(5/100) & Loss 1998.098876953125
Epochs(6/100) & Loss 1779.2086181640625
Epochs(7/100) & Loss 1626.827392578125
Epochs(8/100) & Loss 1519.326171875
Epochs(9/100) & Loss 1442.12841796875
Epochs(10/100) & Loss 1385.4100341796875
Epochs(11/100) & Loss 1342.5513916015625
Epochs(12/100) & Loss 1309.0902099609375
Epochs(13/100) & Loss 1282.01806640625
Epochs(14/100) & Loss 1259.307373046875
Epochs(15/100) & Loss 1239.591064453125
Epochs(16/100) & Loss 1221.948486328125
Epochs(17/100) & Loss 1205.7557373046875
Epochs(18/100) & Loss 1190.593994140625
Epochs(19/100) & Loss 1176.179443359375
Epochs(20/100) & Loss 1162.3204345703125
Epochs(21/100) & Loss 1148.88720703125
Epochs(22/100) & Loss 1135.791259765625
Epochs(23/100) & Loss 1122.972900390625
Epochs(24/100) & Loss 1110.390625
Epochs(25/100) & Loss 109

In [47]:
preds = model(inputs)
loss = MSE(target, preds)
print(loss)

tensor(88.9986, grad_fn=<DivBackward0>)


In [48]:
from math import sqrt
sqrt(loss)

9.433908347497573

In [49]:
preds

tensor([[ 59.2467,  69.8648],
        [ 74.9273, 103.8244],
        [131.8999, 126.5003],
        [ 33.2049,  34.3815],
        [ 81.9989, 126.2431]], grad_fn=<AddBackward0>)

In [50]:
target

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

In [51]:
## You can see they are almost close earch other

## Neural Network using Pytorch

In [53]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt

In [54]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:14<00:00, 1866750.73it/s]


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 136357.95it/s]


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:01<00:00, 2477147.92it/s]


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 23885262.16it/s]

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw






In [55]:
type(training_data)

torchvision.datasets.mnist.FashionMNIST

In [56]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    # print(X)
    # print(y)
    break

Shape of X [N, C, H, W]:  torch.Size([64, 1, 28, 28])
Shape of y:  torch.Size([64]) torch.int64


In [57]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


In [58]:
# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


In [59]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In [60]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [61]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [62]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.300568  [    0/60000]
loss: 2.294092  [ 6400/60000]
loss: 2.277620  [12800/60000]
loss: 2.277789  [19200/60000]
loss: 2.248350  [25600/60000]
loss: 2.225376  [32000/60000]
loss: 2.232074  [38400/60000]
loss: 2.195417  [44800/60000]
loss: 2.196385  [51200/60000]
loss: 2.162493  [57600/60000]
Test Error: 
 Accuracy: 43.5%, Avg loss: 2.165449 

Epoch 2
-------------------------------
loss: 2.175286  [    0/60000]
loss: 2.168028  [ 6400/60000]
loss: 2.113466  [12800/60000]
loss: 2.131112  [19200/60000]
loss: 2.064315  [25600/60000]
loss: 2.012563  [32000/60000]
loss: 2.040306  [38400/60000]
loss: 1.956064  [44800/60000]
loss: 1.962467  [51200/60000]
loss: 1.892064  [57600/60000]
Test Error: 
 Accuracy: 50.4%, Avg loss: 1.895198 

Epoch 3
-------------------------------
loss: 1.930159  [    0/60000]
loss: 1.900386  [ 6400/60000]
loss: 1.785848  [12800/60000]
loss: 1.829207  [19200/60000]
loss: 1.714321  [25600/60000]
loss: 1.671116  [32000/600

In [63]:
#save model
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


In [64]:
#load model
model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

In [65]:
## Prediction

classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"
