The following is from Pytorch tutorials,
https://pytorch.org/tutorials/beginner/basics/data_tutorial.html

# Summary  
prepare a data  
build a model  
train and predict

# prepare a data 
Define a Dataset class and implement len, getitem  
Pass the Dataset to DataLoader  

# build a model
Define a class inherited from nn.Module  
layers in __init__  
operations in forward  

# train and predict
Define a loss function and an optimizer to use  
loss_fn = nn.CrossEntropyLoss() : which combines nn.LogSoftmax and nn.NLLLoss, receiving logits  
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  
Compute prediction and loss  
optimizer.zero_grad() since Gradients by default add up  
loss.backward()  
optimizer.step() to adjust parameters by gradients collected in the backward pass  

In [None]:
torch.Tensor
.shape
.size()
.squeeze(): remove dimensions of size 1, e.g., (1, 28, 28) -> (28, 28)
.dtype
.device -> cpu or gpu
.to('cuda') if torch.cuda.is_available()
.matmul(tensor.T) : matrix multiplication
.mul(tensor) : element-wise
.sum()
.item() : convert a one-element tensor to a Python numerical value
.add_(5) : in-place operation, discouraged because of loss of history
.numpy() : share underlying memory
.argmax(dim) : idx for the maximum along the dim
.type(torch.float)

How to create tensors
torch.tensor(list)
torch.from_numpy(np_array)
torch.ones_like(tensor) : retains shape, datatype of the argument tensor
torch.rand_like(tnesor)
torch.rand(shape)
torch.ones(shape)
torch.zeros(shape)

Operations on Tensors
numpy style indexing and slicing
torch.cat([tensor, tensor], dim=1) : horizontal if dim=1 vertical if dim=0
torch.stack : concatenate along a new dimension, from two of (28, 28) -> (2, 28, 28)

In [None]:
# sample codes for preparing a data
class ourDataset(dataset): def __len__ and def __getitem__
training_data = ourDataset(...)
X_tensor = torch.from_numpy(X_train)
y_tensor = torch.from_numpy(y_train).type(torch.long)
train_dataset = TensorDataset(X_tensor, y_tensor)
train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
train_dataloder : iterable
enumerate(train_dataloader), iter(train_dataloader) -> iterator
for X, y in test_loader:

In [None]:
# sample codes for building a model
class ourNN(nn.Module):
def __init__(self):
nn.flatten()
nn.Sequential(*layers)
nn.Linear(infeatures, outfeatures)
nn.ReLU()
nn.Softmax(dim)

model = ourNN()
for name, param in model.named_parameters():
    name, param.size(), param[:2]
model.parameters()

In [None]:
tensor(..., requires_grad=True)
x.requires_grad_(True)  # inplace

loss.grad_fn -> <BinaryCrossEntropyWithLogitsBackward object at 0x7f3b202b48d0>, an object of class Function

loss.backward()
w.grad, b.grad

with torch.no_grad(): z = torch.matmul(x, w) + b; z.requires_grad -> False or z.detach()
when no need to back propagate, in test loops, etc.

TIP: find a flattened input size for a Linear layer
print(x.size()) in forward
X = torch.rand(1, c, h, w); model(X) : make a random sample and pass it to the model

Tips and Problems

For autoencoders, passing tensors directly to dataloaders solves the problem.  
Wrapping with TensorDataset makes dataloaders return a list of a tensor  
https://discuss.pytorch.org/t/dataloader-returns-the-batch-as-a-list/59902/2  
"I would say this is expected as getitem returns tuples"

Keras optimizer without lr seems automatically adjusting learning rates  
In Pytorch, we should use a scheduler explicitly

In [29]:
import torch
from torch import nn
from torch.utils.data import DataLoader, Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt

import os
from torchvision.io import read_image

# Prepare a data

For training, we need the features as normalized tensors, and the labels as one-hot encoded tensors. To make these transformations, we use ToTensor and Lambda.

ToTensor()
ToTensor converts a PIL image or NumPy ndarray into a FloatTensor. and scales the image’s pixel intensity values in the range [0., 1.]

Tips and Issues  
no need to set requires_grad=True for inputs

In [24]:
training_data = datasets.FashionMNIST(
    root='data',
    train=True,
    download=True,
    transform=ToTensor()
)
# ToTensor() transforms np.array or PIL img (H, W, C) to torch tensor (C, H, W)
training_data[0][0]
# can access an item by idx since __getitem__ implemented

In [31]:
class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform
    
    def __len__(self):
        return len(self.img_labels)
    
    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        sample = {'image': image, 'label': label}
        return sample

While training a model, we typically want to pass samples in “minibatches”, reshuffle the data at every epoch to reduce model overfitting, and use Python’s multiprocessing to speed up data retrieval.
DataLoader is an iterable that abstracts this complexity for us in an easy API.

In [32]:
train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)

In [37]:
train_features, train_labels = next(iter(train_dataloader))  # iter(iterable) -> iterator

In [47]:
type(train_features)

torch.Tensor

In [44]:
train_features.shape, train_labels.shape

(torch.Size([64, 1, 28, 28]), torch.Size([64]))

In [46]:
train_features[0].squeeze().shape

torch.Size([28, 28])

# Build a model

In [50]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

cpu


define a NN by subclassing nn.Module, initialize the layers in __init__  
implements the operations on input data in the forward method

In [51]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )
        
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [52]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)


In [53]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)

In [55]:
logits.shape

torch.Size([1, 10])

In [60]:
pred_probab = nn.Softmax(dim=1)(logits)  # dim along which Softmax will be computed, sum to 1

In [63]:
y_pred = pred_probab.argmax(1)

tensor(0)

# Automatic Differentiation

In back propagation, parameters are adjusted according to the gradient of the loss function wrt the given parameter.  
Consider a nn with input x, params w and b, and a loss.  
We need to optimize parameters w, b thus need to be able to compute the gradients of loss function wrt those variables.

To optimize weights of parameters in the NN, we need to compute the derivative of our loss function wrt parameters, namely, we need grad loss/w and loss/b under some fixed value of input x and target y

An object of class Function  
A reference to the backward propagation function is stored in .grad_fn 

autograd keeps tensors and all executed opearations in a DAG consisting of Function objects.  
In a forward pass, the autograd engine maintains the operation's gradient function in the graph.  
and in the backward pass, computes the gradients from each .grad_fn  
accumulates them in the respective tensor's .grad attribute  
using the chain rule, propagates all the way to the leaf tensors.

# Optimizing model parameters

In [70]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

In [71]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [72]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        pred = model(X)
        loss = loss_fn(pred, y)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
            
def test_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    test_loss, correct = 0, 0
    
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()  # num correct in a batch
            
    test_loss /= size
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [74]:
epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, mode, loss_fn, optimizer)
print("Done!")

Epoch 1
-------------------
loss: 2.306182  [    0/60000]
loss: 2.301007  [ 6400/60000]
loss: 2.295176  [12800/60000]
loss: 2.294940  [19200/60000]
loss: 2.276731  [25600/60000]
loss: 2.272218  [32000/60000]
loss: 2.276416  [38400/60000]
loss: 2.273481  [44800/60000]
loss: 2.257003  [51200/60000]
loss: 2.247311  [57600/60000]


NameError: name 'test_dataloader' is not defined