# PyTorch Basics

In [None]:
import torch
import numpy as np
torch.manual_seed(1234)

## Tensors

* Scalar is a single number.
* Vector is an array of numbers.
* Matrix is a 2-D array of numbers.
* Tensors are N-D arrays of numbers.

#### Creating Tensors

You can create tensors by specifying the shape as arguments.  Here is a tensor with 2 rows and 3 columns

In [None]:
def describe(x):
    print("Shape: {}".format(x.shape))
    print("Type: {}".format(x.type()), x.dtype, x.device)
    print(x)
    print()

In [None]:
describe(torch.Tensor(2, 3))

It's common in prototyping to create a tensor with random numbers of a specific shape.

In [None]:
describe(torch.randn(3, 2)) # normal distribution (0,1)

In [None]:
describe(torch.rand(2, 3)) # uniform distribution [0,1)

You can also initialize tensors of ones or zeros.

In [None]:
describe(torch.zeros(2, 3))
x = torch.ones(2, 3)
describe(x)
x.fill_(5)
describe(x)

Tensors can be initialized and then filled in place. 

Note: operations that end in an underscore (`_`) are in place operations.

In [None]:
x = torch.Tensor(3,4).fill_(5)
describe(x)

Tensors can be initialized from a list of lists

In [None]:
x = torch.Tensor([[1, 2,],  
                  [3, 4,],  
                  [5, 6,]])
describe(x)

In [None]:
x = torch.Tensor([[1, 2, 3],  
                  [4, 5, 6]])
describe(x)

Tensors can be initialized from numpy matrices

In [None]:
data = [[1., 2.],[3., 4.]]
np_array = np.array(data)
x = torch.from_numpy(np_array)
describe(x)
print(np_array.dtype)

Tensors can be initialized from other tensors

In [None]:
x_ones = torch.ones_like(x)
describe(x_ones)

We can create a vector of incremental numbers

In [None]:
x = torch.arange(6)
describe(x)

#### Tensor Types

The FloatTensor has been the default tensor that we have been creating all along.

Long Tensors are used for indexing operations and mirror the `int64` numpy type.

In [None]:
x = torch.LongTensor([[1, 2, 3],  
                      [4, 5, 6],
                      [7, 8, 9]])
describe(x)
print(x.dtype)
print(x.numpy().dtype)

We can convert them to each other dynamically

In [None]:
x = torch.FloatTensor([[1, 2, 3],  
                       [4, 5, 6]])
describe(x)

x = x.long() # type-cast
describe(x)

# This is the same as when we provide the dtype initially
x = torch.tensor([[1, 2, 3], 
                  [4, 5, 6]], dtype=torch.int64)
describe(x)

x = x.float() # type-cast
describe(x)

In [None]:
x = torch.randn(2, 3)
describe(x)

In [None]:
describe(torch.add(x, x))

In [None]:
describe(x + x)

In [None]:
x = torch.arange(6)
describe(x)

In [None]:
x = x.view(2, 3)
describe(x)

In [None]:
describe(torch.sum(x, dim=0)) # sum over the rows    (summarize each column)
describe(torch.sum(x, dim=1)) # sum over the columns (summarize each row)

In [None]:
describe(torch.transpose(x, 0, 1)) # swapping the dimensions

In [None]:
describe(x.permute(dims=(1, 0))) # re-ordering the dimensions

## Operations

Using the tensors to do linear algebra is a foundation of modern Deep Learning practices

Reshaping allows you to move the numbers in a tensor around.  One can be sure that the order is preserved.  In PyTorch, reshaping is called `view`

In [None]:
x = torch.arange(0, 20)

print(x.view(1, 20))
print(x.view(2, 10))
print(x.view(4, 5))
print(x.view(5, 4))
print(x.view(10, 2))
print(x.view(20, 1))

If you're also concerned about memory usage and want to ensure that the two tensors share the same data, use torch.view. You can also use torch.reshape which might create a copy of the underlying tensor. 

In [None]:
x = torch.arange(0, 20)

print(x.reshape(1, 20))
print(x.reshape(2, 10))
print(x.reshape(4, 5))

Reshape can have a -1 dimension which will infer the size automatically from the remaining data.

In [None]:
print(x.reshape(5, -1))

We can use view to add size-1 dimensions, which can be useful for combining with other tensors.  This is called broadcasting. 

In [None]:
x = torch.arange(12).view(3, 4)
y = torch.arange(4).view(1, 4)
z = torch.arange(3).view(3, 1)

print("x:", x)
print("y:", y)
print("z:", z)
print("x+y:", x + y)
print("x+z:", x + z)

Unsqueeze and squeeze will add and remove 1-dimensions.

In [None]:
x = torch.arange(12).view(3, 4)
print(x.shape)

x = x.unsqueeze(dim=1)
print(x.shape)

x = x.squeeze()
print(x.shape)

all of the standard mathematics operations apply (such as `add` below)

In [None]:
x = torch.rand(3,4)
print("x: \n", x)
print("--")
print("torch.add(x, x): \n", torch.add(x, x))
print("--")
print("x+x: \n", x + x)

The convention of `_` indicating in-place operations continues:

In [None]:
x = torch.arange(12).reshape(3, 4)
print(x)
print(x.add_(x))

There are many operations for which reduce a dimension.  Such as sum:

In [None]:
x = torch.arange(12).reshape(3, 4)
print("x: \n", x)
print("---")
print("Summing across rows (dim=0): \n", x.sum(dim=0))
print("---")
print("Summing across columns (dim=1): \n", x.sum(dim=1))

#### Indexing, Slicing, Joining and Mutating

In [None]:
x = torch.arange(6).view(2, 3)
print("x: \n", x)
print("---")
print("x[:2, :2]: \n", x[:2, :2])
print("---")
print("x[0][1]: \n", x[0][1])
print("---")
print("x[0,1]: \n", x[0,1])
print("---")
print("x[[0,1],[0,2]]: \n", x[[0,1],[0,2]])
print("---")
print("Setting [0][1] to be 8")
x[0][1] = 8
print(x)

In [None]:
print(x[[0,1], [0,1,2]])

We can select a subset of a tensor using the `index_select`

In [None]:
x = torch.arange(9).view(3,3)
print(x)

print("---")
indices = torch.LongTensor([0, 2])
print(torch.index_select(x, dim=0, index=indices)) # select 0th and 2nd row

print("---")
indices = torch.LongTensor([0, 2])
print(torch.index_select(x, dim=1, index=indices)) # select 0th and 2nd column

print("---")
indices = torch.LongTensor([0, 0]) # select twice the same index
print(torch.index_select(x, dim=0, index=indices))

We can also use numpy-style advanced indexing:

In [None]:
x = torch.arange(9).view(3,3)
indices = torch.LongTensor([0, 2])

print(x[indices])
print("---")
print(x[indices, :])
print("---")
print(x[:, indices])

We can concentate along the first dimension.. the columns.

In [None]:
x = torch.arange(9).view(3,3)

print(x)
print("---")
new_x = torch.cat([x, x], dim=1) # extends an existing dimension
print(new_x.shape)
print(new_x)

We can also concatenate on a new 0th dimension to "stack" the tensors:

In [None]:
x = torch.arange(9).view(3,3)
print(x)
print("---")
new_x = torch.stack([x, x]) # adds a new dimension
print(new_x.shape)
print(new_x)

#### Linear Algebra Tensor Functions

Transposing allows you to switch the dimensions to be on different axis. So we can make it so all the rows are columns
and vice versa. 

In [None]:
x = torch.arange(0, 12).view(3,4)
print("x: \n", x) 
print("---")
print("x.tranpose(1, 0): \n", x.transpose(1, 0))

A three dimensional tensor would represent a batch of sequences, where each sequence item has a feature vector.  It is common to switch the batch and sequence dimensions so that we can more easily index the sequence in a sequence model. 

Note: Transpose will only let you swap 2 axes.  Permute (in the next cell) allows for multiple

In [None]:
batch_size = 3
seq_size = 4
feature_size = 5

x = torch.arange(batch_size * seq_size * feature_size).view(batch_size, seq_size, feature_size)

print("x.shape: \n", x.shape)
print("x: \n", x)
print("-----")

print("x.transpose(1, 0).shape: \n", x.transpose(1, 0).shape)
print("x.transpose(1, 0): \n", x.transpose(1, 0))

Permute is a more general version of tranpose:

In [None]:
batch_size = 3
seq_size = 4
feature_size = 5

x = torch.arange(batch_size * seq_size * feature_size).view(batch_size, seq_size, feature_size)

print("x.shape: \n", x.shape)
print("x: \n", x)
print("-----")

print("x.permute(1, 0, 2).shape: \n", x.permute(1, 0, 2).shape)
print("x.permute(1, 0, 2): \n", x.permute(1, 0, 2))

Matrix multiplication is `mm`:

In [None]:
x1 = torch.arange(6).view(2, 3).float()
describe(x1)

x2 = torch.ones(3, 2)
x2[:, 1] += 1
describe(x2)

describe(torch.mm(x1, x2))
describe(torch.matmul(x1, x2)) # or this
describe(x1 @ x2) # or this

In [None]:
x = torch.arange(0, 12).view(3,4).float()
print(x)

x2 = torch.ones(4, 2)
x2[:, 1] += 1
print(x2)

print(x.mm(x2))

See the [PyTorch Math Operations Documentation](https://pytorch.org/docs/stable/torch.html#math-operations) for more!

## AutoGrad

In [None]:
x = torch.tensor([[2,3]], requires_grad=True, dtype=torch.float32)
z = 3 * x
print(z)

In this small snippet, you can see the gradient computations at work.  We create a tensor and multiply it by 3.  Then, we create a scalar output using `sum()`.  A Scalar output is needed as the the loss variable. Then, called backward on the loss means it computes its rate of change with respect to the inputs.  Since the scalar was created with sum, each position in z and x are independent with respect to the loss scalar. 

The rate of change of x with respect to the output is just the constant 3 that we multiplied x by.

In [None]:
x = torch.tensor([[2,3]], requires_grad=True, dtype=torch.float32)
print("x: \n", x)
print("---")
z = 3 * x + 1
print("z = 3*x: \n", z) # derivative of 3 * x + 1 is 3
print("---")

loss = z.sum()
print("loss = z.sum(): \n", loss)
print("---")

loss.backward()

print("after loss.backward(), x.grad: \n", x.grad)


### CUDA Tensors

PyTorch's operations can seamlessly be used on the GPU or on the CPU.  There are a couple basic operations for interacting in this way.

In [None]:
print(torch.cuda.is_available())

In [None]:
x = torch.rand(3,3)
describe(x)

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

In [None]:
x = torch.rand(3, 3).to(device)
describe(x)
print(x.device)

In [None]:
y = torch.rand(3, 3)
x + y

In [None]:
y = y.to(cpu_device)
x = x.to(cpu_device)
x + y

In [None]:
if torch.cuda.is_available(): # only if GPU is available
    a = torch.rand(3,3).to(device='cuda:0') #  CUDA Tensor
    print(a)
    
    b = torch.rand(3,3).cuda()
    print(b)

    print(a + b)

    a = a.cpu() # Error expected
    print(a + b)

Important Tensors can be converted to numpy only on cpu!

In [None]:
x.numpy() # this fails, when x is on cuda

In [None]:
x.cpu().numpy()

# Quick Start
https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html

In [None]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [None]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

In [None]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

# Have a look at a data sample
for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

In [None]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten() # flatten the 2D input from (1,28,28) to (784)
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512), # this is basically: torch.mm(x, W) + b , W=(784,512), b=(1,512)
            nn.ReLU(),             # this is an activation function
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)     # there are 10 output units (classes)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits # this is the "raw" output (no softmax yet)

model = NeuralNetwork().to(device)
print(model)

In [None]:
loss_fn = nn.CrossEntropyLoss() # the loss function
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3) # add the model parameters

In [None]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device) # to gpu if possible (data always comes from cpu)

        # Compute prediction error
        pred = model(X) # this calls NeuralNetwork.forward(X)
        loss = loss_fn(pred, y) # loss function applies softmax internally

        # Backpropagation
        optimizer.zero_grad() # erase gradients
        loss.backward()       # compute gradients
        optimizer.step()      # update parameters

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [None]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()                  # turn-off required gradients
    test_loss, correct = 0, 0
    # no gradients to be computed during test (faster inference, less memory, no training)
    with torch.no_grad(): 
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            # argmax over the logits (same as on softmax)
            correct += (pred.argmax(1) == y).type(torch.float).sum().item() 
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [None]:
epochs = 3
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

### Exercises

Some of these exercises require operations not covered in the notebook.  You will have to look at [the documentation](https://pytorch.org/docs/) (on purpose!)


(Answers are at the bottom)

#### Exercise 1

Create a 2D tensor and then add a dimension of size 1 inserted at the 0th axis.

#### Exercise 2

Remove the extra dimension you just added to the previous tensor.

#### Exercise 3

Create a random tensor of shape 5x3 in the interval [3, 7)

#### Exercise 4

Create a tensor with values from a normal distribution (mean=0, std=1).

#### Exercise 5

Retrieve the indexes of all the non zero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).

#### Exercise 6

Create a random tensor of size (3,1) and then horizonally stack 4 copies together.

#### Exercise 7

Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).

#### Exercise 8

Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).

### END