This is based on the PyTorch Blitz tutorial. I have also compiled the list of useful operations that I read about along the way.

In [1]:
from __future__ import print_function
import torch

# Basic Operations

In [3]:
x = torch.Tensor([[1,2,3], [4,5,6], [7,8,9]]).type(torch.FloatTensor)  # Use zeros, empty, randn_like, randn, ones etc to create 
x[:, 1] = 0  # Set first element of all the elements in the array as 0
arr = torch.arange(1, 11)  # Like Python's range(1, 11)

temp_x = x.numpy() # Both tensor and array will share the same memory
x = torch.from_numpy(temp_x)
print(x)

x = torch.add(x, torch.ones(3, 3))  # Can also use x+y or y.add_(x). Can use out=result
y = x.view(9)                    # Resize x to size 6
z = x.view(-1, 3)                # Size 3x2. First dimension inferred from the second one.
print(x, y, z)

torch.is_tensor(x)

t = torch.ones(2,1,2)
sq = torch.squeeze(t, 1)  # Squeeze dimension 1: New size 2x2
nz = torch.nonzero(t)  # Index of non-zero elements only
print(sq, nz)

_ = torch.take(t, torch.LongTensor([1, 2]))  # Flatten a tensor and return element # 1, 2
_ = torch.transpose(t, 0, 1)  # Transpose dimension 0 and 1

r1 = torch.dot(torch.Tensor([4, 2]), torch.Tensor([3, 1])) # Dot product of 2 tensors. Result=14
r2 = torch.mv(torch.randn(2, 4), torch.randn(4))  # Matrix-Vector Multipication
r3 = torch.mm(torch.randn(2, 3), torch.randn(3, 4))  # Matrix-Matrix Multipication

tensor([[1., 0., 3.],
        [4., 0., 6.],
        [7., 0., 9.]])
tensor([[ 2.,  1.,  4.],
        [ 5.,  1.,  7.],
        [ 8.,  1., 10.]]) tensor([ 2.,  1.,  4.,  5.,  1.,  7.,  8.,  1., 10.]) tensor([[ 2.,  1.,  4.],
        [ 5.,  1.,  7.],
        [ 8.,  1., 10.]])
tensor([[1., 1.],
        [1., 1.]]) tensor([[0, 0, 0],
        [0, 0, 1],
        [1, 0, 0],
        [1, 0, 1]])


The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run. Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a .grad_fn attribute that references a Function that has created the Tensor.


In [4]:
x = torch.ones(2, 2, requires_grad=True)  # Starts to track all operations on it
y = x + 2
z = y * y * 3
out = z.mean()
out.backward()  # Equivalent to out.backward(torch.tensor([1]). It calculates all the gradients automatically.
print(x.grad)
x = x.detach()  # Can also use block like with torch.no_grad():

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


You should have got a matrix of 4.5. Let’s call the out Tensor “o”. We have that o=1/4∑z, z=3(x+2)^2 and z at x=1=27. Therefore, ∂(o)/∂x=32(x+2), hence ∂(o)/∂x at x=1=9/2=4.5.

In [5]:
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(gradients)
print(x.grad)  # When L2-Norm of Y is 1000, x is composed of values

tensor([ 102.4000, 1024.0000,    0.1024])


An nn.Module contains layers, and a method forward(input)that returns the output. The learnable parameters of a model are returned by net.parameters(). 
torch.nn only supports mini-batches. The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample. For example, nn.Conv2d will take in a 4D Tensor of nSamples x nChannels x Height x Width. If you have a single sample, just use input.unsqueeze(0) to add a fake batch dimension.

Recap:
* torch.Tensor - A multi-dimensional array with support for autograd operations like backward(). Also holds the gradient w.r.t. the tensor.
* nn.Module - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.
* nn.Parameter - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.
* autograd.Function - Implements forward and backward definitions of an autograd operation. Every Tensor operation, creates at least a single Function node, that connects to functions that created a Tensor and encodes its history.

The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):
weight = weight - learning_rate * gradient

However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package: torch.optim

In [6]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

NameError: name 'net' is not defined

Generally, when you have to deal with image, text, audio or video data, you can use standard python packages that load data into a numpy array. Then you can convert this array into a torch.*Tensor.

* For images, packages such as Pillow, OpenCV are useful
* For audio, packages such as scipy and librosa
* For text, either raw Python or Cython based loading, or NLTK and SpaCy are useful

Specifically for vision, we have created a package called torchvision, that has data loaders for common datasets such as Imagenet, CIFAR10, MNIST, etc. and data transformers for images, viz., torchvision.datasets and torch.utils.data.DataLoader.