In [1]:
from __future__ import division, print_function, unicode_literals
import torch
import torch.autograd as autograd
import torchvision
import torch.nn as nn
import numpy as np
import torch.utils.data as data
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

## Introduction to Torch's tensor library

All of deep learning is computations on tensors, which are generalizations of a matrix that can be indexed in more than 2 dimensions.</br>
### Creating Tensors
Tensors can be created from Python lists with the torch.Tensor() function.

In [2]:
# Create a torch.Tensor object with the given data.  It is a 1D vector
V_data = [1., 2., 3.]
V = torch.Tensor(V_data)
print(V)

# Creates a matrix
M_data = [[1., 2., 3.], [4., 5., 6]]
M = torch.Tensor(M_data)
print(M)

# Create a 3D tensor of size 2x2x2.
T_data = [[[1.,2.], [3.,4.]],
          [[5.,6.], [7.,8.]]]
T = torch.Tensor(T_data)
print(T)


 1
 2
 3
[torch.FloatTensor of size 3]


 1  2  3
 4  5  6
[torch.FloatTensor of size 2x3]


(0 ,.,.) = 
  1  2
  3  4

(1 ,.,.) = 
  5  6
  7  8
[torch.FloatTensor of size 2x2x2]



What is a 3D tensor anyway? Think about it like this. If you have a vector, indexing into the vector gives you a scalar. If you have a matrix, indexing into the matrix gives you a vector. If you have a 3D tensor, then indexing into the tensor gives you a matrix!
Matrices and vectors are special cases of torch.Tensors, where their dimension is 1 and 2 respectively.

In [3]:
# Index into V and get a scalar
print(V[0])

# Index into M and get a vector
print(M[0])

# Index into T and get a matrix
print(T[0])

1.0

 1
 2
 3
[torch.FloatTensor of size 3]


 1  2
 3  4
[torch.FloatTensor of size 2x2]



You can also create tensors of other datatypes. The default, as you can see, is Float. To create a tensor of integer types, try torch.LongTensor(). Check the documentation for more data types, but Float and Long will be the most common.
You can create a tensor with random data and the supplied dimensionality with torch.randn().

In [4]:
x = torch.randn((3, 4, 5))
print(x)


(0 ,.,.) = 
  0.0047  1.0618  1.0424 -0.1168 -2.3131
  1.4306  0.8319 -0.3556 -2.4006 -0.0879
  1.7537 -0.3260 -0.3313 -1.2495  1.1342
 -0.7303  0.8508 -2.0885  0.0979  0.0586

(1 ,.,.) = 
  0.4753 -0.9999 -0.2265 -0.7197  0.4517
 -0.1617  1.1514  0.8323 -0.4526  0.3752
  0.4124 -0.1802  2.1509  0.0572  1.5805
  0.1084 -0.2148 -0.4405 -1.6300  1.2038

(2 ,.,.) = 
 -0.0601  0.8835 -0.7062  0.7289  0.3507
  0.1325  1.4146  0.2929 -0.5886 -0.2819
  0.1537  0.9382  0.3309 -0.4866  0.3045
 -0.0201 -0.0195 -0.6008  2.2516  1.3379
[torch.FloatTensor of size 3x4x5]



### Operations with Tensors
You can operate on tensors in the ways you would expect.

In [5]:
x = torch.Tensor([ 1., 2., 3. ])
y = torch.Tensor([ 4., 5., 6. ])
z = x + y
print(z)


 5
 7
 9
[torch.FloatTensor of size 3]



See [the documentation](http://pytorch.org/docs/master/torch.html) for a complete list of the massive number of operations available to you. They expand beyond just mathematical operations.
One helpful operation is concatenation.

In [6]:
# By default, it concatenates along the first axis (concatenates rows)
x_1 = torch.randn(2, 5)
y_1 = torch.randn(3, 5)
z_1 =torch.cat([x_1, y_1])
print(z_1)

# Concatenate columns:
x_2 = torch.randn(2, 3)
y_2 = torch.randn(2, 5)
z_2 = torch.cat([x_2, y_2], 1) # second arg specifies which axis to concat along
print(z_2)

# If your tensors are not compatible, torch will complain.  Uncomment to see the error
# torch.cat([x_1, x_2])


 1.3471 -0.1466 -0.9906  0.5951 -1.5556
 2.5255  0.0325 -1.7167 -2.0417 -0.0240
-1.8169  0.4611  0.3595  0.1300  1.1515
 1.0112  0.0194 -0.6737  0.4334 -0.3552
 0.4381 -1.5101  0.6412 -0.3603 -0.2700
[torch.FloatTensor of size 5x5]


-0.3829  1.3229  0.1888  0.4019 -0.6048 -0.4197 -0.4910  2.1996
 0.4921  0.4320  0.4794 -0.9540 -1.9745 -0.2275  0.0149  2.0007
[torch.FloatTensor of size 2x8]



### Reshaping Tensors
Use the .view() method to reshape a tensor. This method receives heavy use, because many neural network components expect their inputs to have a certain shape. Often you will need to reshape before passing your data to the component.

In [7]:
x = torch.randn(2, 3, 4)
print(x)
print(x.view(2, 12)) # Reshape to 2 rows, 12 columns
print(x.view(2, -1)) # Same as above.  If one of the dimensions is -1, its size can be inferred


(0 ,.,.) = 
 -0.0182 -1.3130 -0.2984 -2.3117
 -0.3251  0.8293 -0.4353  0.9406
 -1.6709  0.7706  1.1908 -0.0078

(1 ,.,.) = 
 -2.0340 -0.1752  1.3080 -0.4059
  0.6679  0.6864  0.7874 -0.3695
  0.7440  0.7288  0.6193 -0.3907
[torch.FloatTensor of size 2x3x4]



Columns 0 to 9 
-0.0182 -1.3130 -0.2984 -2.3117 -0.3251  0.8293 -0.4353  0.9406 -1.6709  0.7706
-2.0340 -0.1752  1.3080 -0.4059  0.6679  0.6864  0.7874 -0.3695  0.7440  0.7288

Columns 10 to 11 
 1.1908 -0.0078
 0.6193 -0.3907
[torch.FloatTensor of size 2x12]



Columns 0 to 9 
-0.0182 -1.3130 -0.2984 -2.3117 -0.3251  0.8293 -0.4353  0.9406 -1.6709  0.7706
-2.0340 -0.1752  1.3080 -0.4059  0.6679  0.6864  0.7874 -0.3695  0.7440  0.7288

Columns 10 to 11 
 1.1908 -0.0078
 0.6193 -0.3907
[torch.FloatTensor of size 2x12]



## Computation Graphs and Automatic Differentiation

**Concepts of backpropagation and automatic differentiation will be covered in detail later. Here, a very brief introduction is given for the sake of completeness.**

The concept of a computation graph is essential to efficient deep learning programming, because it allows you to not have to write the back propagation gradients yourself. A computation graph is simply a specification of how your data is combined to give you the output. Since the graph totally specifies what parameters were involved with which operations, it contains enough information to compute derivatives. This probably sounds vague, so lets see what is going on using the fundamental class of Pytorch: ``autograd.Variable``.

First, think from a programmers perspective. What is stored in the torch.Tensor objects we were creating above? Obviously the data and the shape, and maybe a few other things. But when we added two tensors together, we got an output tensor. All this output tensor knows is its data and shape. It has no idea that it was the sum of two other tensors (it could have been read in from a file, it could be the result of some other operation, etc.)
The Variable class keeps track of how it was created. Lets see it in action.

In [8]:
# Variables wrap tensor objects
x = autograd.Variable( torch.Tensor([1., 2., 3]), requires_grad=True )
# You can access the data with the .data attribute
print(x.data)

# You can also do all the same operations you did with tensors with Variables.
y = autograd.Variable( torch.Tensor([4., 5., 6]), requires_grad=True )
z = x + y
print(z.data)

# BUT z knows something extra.
print(z.grad_fn)


 1
 2
 3
[torch.FloatTensor of size 3]


 5
 7
 9
[torch.FloatTensor of size 3]

<torch.autograd.function.AddBackward object at 0x7f9b99900528>


So Variables know what created them. z knows that it wasn't read in from a file, it wasn't the result of a multiplication or exponential or whatever. And if you keep following z.creator, you will find yourself at x and y.
But how does that help us compute a gradient?

In [9]:
# Lets sum up all the entries in z
s = z.sum()
print(s)
print(s.grad_fn)

Variable containing:
 21
[torch.FloatTensor of size 1]

<torch.autograd.function.SumBackward object at 0x7f9b99900ed8>


So now, what is the derivative of this sum with respect to the first component of x?  In math, we want
    $$\frac{\partial s}{\partial x_0}$$
Well, s knows that it was created as a sum of the tensor z.  z knows that it was the sum x + y.
So
    $$s = \overbrace{x_0 + y_0}^\text{$z_0$} + \overbrace{x_1 + y_1}^\text{$z_1$} + \overbrace{x_2 + y_2}^\text{$z_2$}$$
And so s contains enough information to determine that the derivative we want is 1!

Of course this glosses over the challenge of how to actually compute that derivative.  The point here is that s is carrying along enough information that it is possible to compute it.  In reality, the developers of Pytorch program the sum() and + operations to know how to compute their gradients, and run the back propagation algorithm.  An in-depth discussion of that algorithm is beyond the scope of this tutorial.

Lets have Pytorch compute the gradient, and see that we were right: (note if you run this block multiple times, the gradient will increment. That is because Pytorch accumulates the gradient into the .grad property, since for many models this is very convenient.)

In [10]:
s.backward() # calling .backward() on any variable will run backprop, starting from it.
print(x.grad)

Variable containing:
 1
 1
 1
[torch.FloatTensor of size 3]



In [11]:
x = torch.randn((2,2))
y = torch.randn((2,2))
z = x + y # These are Tensor types, and backprop would not be possible

var_x = autograd.Variable( x )
var_y = autograd.Variable( y )
var_z = var_x + var_y # var_z contains enough information to compute gradients, as we saw above
print(var_z.grad_fn)

var_z_data = var_z.data # Get the wrapped Tensor object out of var_z...
new_var_z = autograd.Variable( var_z_data ) # Re-wrap the tensor in a new variable

# ... does new_var_z have information to backprop to x and y?
# NO!
print(new_var_z.grad_fn)
# And how could it?  We yanked the tensor out of var_z (that is what var_z.data is).  This tensor
# doesn't know anything about how it was computed.  We pass it into new_var_z, and this is all the information
# new_var_z gets.  If var_z_data doesn't know how it was computed, theres no way new_var_z will.
# In essence, we have broken the variable away from its past history

<torch.autograd.function.AddBackward object at 0x7f9b98a68050>
None


Here is the basic, extremely important rule for computing with autograd.Variables (note this is more general than Pytorch. There is an equivalent object in every major deep learning toolkit):

**If you want the error from your loss function to backpropogate to a component of your network, you MUST NOT break the Variable chain from that component to your loss Variable. If you do, the loss will have no idea your component exists, and its parameters can't be updated.**

Check [this](http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html) tutorial for more information on automatic differentiation.

## Loading data from numpy

To load data from a numpy array, you will need to convert it to torch.Tensor().

In [12]:
a = np.array([[1,2], [3,4]])
b = torch.from_numpy(a)      # convert numpy array to torch tensor
c = b.numpy()                # convert torch tensor to numpy array

Different libraries have different internal representations for their data structures. In PyTorch itself, the tensors have different representations for cpu and gpu. If you want to run a tensor on gpu, you have to convert it using .gpu() function. A tensor by default is run on cpu unless you specify it.

In [13]:
b.cuda()


 1  2
 3  4
[torch.cuda.LongTensor of size 2x2 (GPU 0)]

Notice the change in type of the variable. If you encounter an error in the above line, PyTorch has no gpu support. To verify it, you can use the following command: ``torch.cuda.is_available()``

In [14]:
torch.cuda.is_available()

True

In [15]:
b.cpu()


 1  2
 3  4
[torch.LongTensor of size 2x2]

Using .gpu() and .cpu(), you can move your tensors between gpu and cpu. The exact need for this will arise according to the type of network architecture that you want to follow.

## Implementing the input pipline

The first time you run this block, it will download the CIFAR10 dataset from internet. It will take some time.

Also, make sure that terminal has access to internet. You can check this using the ``ping`` command.

In [None]:
# Download and construct dataset.
train_dataset = dsets.CIFAR10(root='./data/',
                               train=True, 
                               transform=transforms.ToTensor(),
                               download=True)

# Select one data pair (read data from disk).
image, label = train_dataset[0]
print (image.size())
print (label)

# Data Loader (this provides queue and thread in a very simple way).
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=100, 
                                           shuffle=True,
                                           num_workers=2)

# When iteration starts, queue and thread start to load dataset from files.
data_iter = iter(train_loader)

# Mini-batch images and labels.
images, labels = data_iter.next()

# Actual usage of data loader is as below.
for images, labels in train_loader:
    # Your training code will be written here
    pass

Downloading http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


## Input pipline for custom dataset

In [None]:
# You should build custom dataset as below.
class CustomDataset(data.Dataset):
    def __init__(self):
        # TODO
        # 1. Initialize file path or list of file names. 
        pass
    def __getitem__(self, index):
        # TODO
        # 1. Read one data from file (e.g. using numpy.fromfile, PIL.Image.open).
        # 2. Preprocess the data (e.g. torchvision.Transform).
        # 3. Return a data pair (e.g. image and label).
        pass
    def __len__(self):
        # You should change 0 to the total size of your dataset.
        return 0 

# Then, you can just use prebuilt torch's data loader. 
custom_dataset = CustomDataset()
train_loader = torch.utils.data.DataLoader(dataset=custom_dataset,
                                           batch_size=100, 
                                           shuffle=True,
                                           num_workers=2)