# CSE6250BDH Deep Learning Labs
## 0. Introduction to PyTorch

In this chapter, we will learn basic usage of PyTorch.
There are many good tutorials on PyTorch on web.
We highly recommend you to follow the official [tutorial](http://pytorch.org/tutorials/) even though this tutorial is also mainly from it.

### Import

After installing PyTorch, you can import `torch` in Python to use PyTorch.

In [1]:
import torch

Let's check the version of PyTorch, and it should be 1.0 or higher.

In [2]:
print(torch.__version__)

1.0.0


### Tensor Creation

PyTorch is very similar with Numpy as they say it is a replacement for Numpy to use the power of GPUs. Although there are still missing components, it has many same/similar functions for constructing or manipulating 'Tensor's.

A basic object used in PyTorch is 'Tensor' which is equivalent to 'ndarray' in Numpy. Similarly to Numpy, there are multiple types of Tensors, e.g. Float, Double, Int, Long, etc. Most of time, however, we will use FloatTensor mainly (and it is a default type for the most of functions) to utilize GPU and LongTensor sometime for target/label values.

Lets try to create a Tensor. If you call `torch.Tensor(rows, cols)`, it will return a FloatTensor without initialization (with garbage values).

In [3]:
x = torch.Tensor(5, 3) # same result with torch.FloatTensor(5,3)
x

tensor([[2.6135e+16, 3.0624e-41, 5.7453e-44],
        [0.0000e+00,        nan, 0.0000e+00],
        [1.3733e-14, 6.4076e+07, 2.0706e-19],
        [7.3909e+22, 2.4176e-12, 1.1625e+33],
        [8.9605e-01, 1.1632e+33, 5.6003e-02]])

Similar to `numpy.ndarray()`, you can create a Tensor with values using `torch.tensor(values)`.

In [4]:
x_manual = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
x_manual

tensor([[1., 2.],
        [3., 4.]])

Also, you can create an initialized Tensor filled with 1s, 0s, or random numbers from a uniform distribtution by using `torch.ones`, `torch.zeros`, or `torch.rand` repectively.

In [5]:
x_ones = torch.ones(5,3)
print(x_ones)

x_zeros = torch.zeros(5,3)
print(x_zeros)

x_uniform = torch.rand(5,3)
print(x_uniform)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
tensor([[0.6617, 0.6074, 0.7318],
        [0.4898, 0.9365, 0.4340],
        [0.4794, 0.5075, 0.9448],
        [0.8782, 0.4373, 0.2330],
        [0.6070, 0.9320, 0.0694]])


### Exercise: Try `torch.eye`, `torch.linspace`, `torch.logspace`, etc.
### Exercise: Try other random functions from [here](http://pytorch.org/docs/master/torch.html#random-sampling)

### Converting from/to Numpy ndarray

You can also create a Tensor from Numpy ndarray or vice versa. In fact, we may do this many times in a project since we want to utilize many Numpy-based libraries (e.g., Pandas, Scikit-learn, Matplotlib, etc.) as well as GPU computation.

You can simply call `torch.from_numpy(ndarray)` to create a `Tensor` from a `numpy.ndarray`. **Be careful that the returned Tensor and original ndarray share the same memory**. Therefore, if you modify the Tensor, it will be reflected in the ndarray.

In [6]:
import numpy as np
np_array = np.array([1., 2., 3.], dtype=np.float32)  # set dtype=np.float32 to get a FloatTensor
print(np_array)
torch_tensor = torch.from_numpy(np_array)
print(torch_tensor)

# Modify the Tensor
torch_tensor[0] = -1.0
print(torch_tensor)
# np_array has also been modified
print(np_array)

[1. 2. 3.]
tensor([1., 2., 3.])
tensor([-1.,  2.,  3.])
[-1.  2.  3.]


For the reverse way of conversion, you can call `numpy()` on a Tensor. Again, resulting ndarray shares the memory with the Tensor.

In [7]:
another_torch_tensor = torch.rand(3)
print(another_torch_tensor)
another_np_array = another_torch_tensor.numpy()
print(another_np_array)

# Modify ndarray
another_np_array[0] *= 2.0
print(another_torch_tensor)

tensor([0.3749, 0.5540, 0.2663])
[0.37494987 0.553994   0.26629055]
tensor([0.7499, 0.5540, 0.2663])


To extract the value from a single-element Tensor, e.g., Tensor storing a loss value, you can use `item()` on a Tensor.

In [8]:
single_element_tensor = torch.Tensor([1.23])
print(single_element_tensor)
single_value = single_element_tensor.item()
print(single_value)

tensor([1.2300])
1.2300000190734863


### Basic Operations

#### Indexing

You can use standard numpy-like indexing.

In [9]:
A = torch.rand(3,3)
print(A)
print(A[:, 1])  # get the 1st column
print(A[:2, :])  # get the rows upto the 2nd row

tensor([[0.9285, 0.0259, 0.4460],
        [0.1122, 0.8947, 0.6859],
        [0.9858, 0.0953, 0.9046]])
tensor([0.0259, 0.8947, 0.0953])
tensor([[0.9285, 0.0259, 0.4460],
        [0.1122, 0.8947, 0.6859]])


#### Arithmetic Operations
Arithmetic operations with `+-*/` operators are all element-wise computation. Therefore, if you want to do some matrix computations such as matrix-matrix (or vector) multiplication, you need to call separate functions.  

In [10]:
B = torch.rand(3,3)
print(A+B)
print(A*B)
# Another elementwise multiplication
print(torch.mul(A,B))

# Matrix-Matrix multiplication
print(torch.mm(A,B))
# Matrix-Vector multiplication
print(torch.mv(A,B[:,1]))

tensor([[1.1311, 0.9856, 0.5035],
        [0.1141, 1.5496, 1.2764],
        [1.1244, 0.3644, 1.1973]])
tensor([[1.8818e-01, 2.4884e-02, 2.5658e-02],
        [2.0847e-04, 5.8597e-01, 4.0501e-01],
        [1.3658e-01, 2.5652e-02, 2.6476e-01]])
tensor([[1.8818e-01, 2.4884e-02, 2.5658e-02],
        [2.0847e-04, 5.8597e-01, 4.0501e-01],
        [1.3658e-01, 2.5652e-02, 2.6476e-01]])
tensor([[0.2500, 1.0280, 0.1993],
        [0.1194, 0.8782, 0.7355],
        [0.3253, 1.2520, 0.3778]])
tensor([1.0280, 0.8782, 1.2520])


There are many predefined operations for your convenience such as batch multiplication with addition, etc. Please read [PyTorch Docs](http://pytorch.org/docs/master/torch.html#math-operations) for more information.

### GPU Acceleration

If we have NVIDIA GPU(s), we can accelerate computation once we move Tensors onto GPU.
Let's compare how much GPU can accelerate especially matrix operations.
We will do a matrix-matrix multiplication between two 5k-by-5k matrices on both CPU and GPU.

In [11]:
mat_cpu = torch.rand(5000, 5000)
mat_cpu

tensor([[0.2130, 0.6759, 0.2104,  ..., 0.1507, 0.7708, 0.7172],
        [0.1117, 0.4493, 0.6779,  ..., 0.1140, 0.3263, 0.9410],
        [0.3007, 0.0067, 0.9703,  ..., 0.3241, 0.1301, 0.4399],
        ...,
        [0.3882, 0.5549, 0.4373,  ..., 0.3492, 0.5614, 0.1680],
        [0.2265, 0.5140, 0.8682,  ..., 0.8785, 0.3173, 0.6228],
        [0.3648, 0.9283, 0.0312,  ..., 0.7357, 0.2212, 0.9060]])

In [12]:
mat_cpu.size()

torch.Size([5000, 5000])

In [13]:
%%time
torch.mm(mat_cpu.t(), mat_cpu)

CPU times: user 6.21 s, sys: 1.04 s, total: 7.25 s
Wall time: 290 ms


tensor([[1695.7577, 1285.5846, 1256.9727,  ..., 1270.6284, 1257.8542,
         1269.5782],
        [1285.5846, 1683.9097, 1256.9144,  ..., 1264.7170, 1253.2314,
         1267.9730],
        [1256.9727, 1256.9144, 1638.8047,  ..., 1244.1445, 1239.7261,
         1252.8428],
        ...,
        [1270.6284, 1264.7170, 1244.1445,  ..., 1669.2982, 1247.4923,
         1262.1633],
        [1257.8541, 1253.2316, 1239.7261,  ..., 1247.4923, 1668.7012,
         1263.3229],
        [1269.5781, 1267.9731, 1252.8429,  ..., 1262.1633, 1263.3229,
         1677.7290]])

#### We need a GPU for this comparison
We can check its availability like:

In [14]:
if torch.cuda.is_available():
    cuda = True
else:
    cuda = False
cuda

True

In [15]:
mat_gpu = torch.rand(5000, 5000)
if cuda:
    mat_gpu = mat_gpu.cuda()
mat_gpu

tensor([[0.6907, 0.9506, 0.7832,  ..., 0.0178, 0.2380, 0.7641],
        [0.6972, 0.5160, 0.4726,  ..., 0.1372, 0.0847, 0.6451],
        [0.4686, 0.7244, 0.0956,  ..., 0.9701, 0.7684, 0.5043],
        ...,
        [0.4801, 0.7096, 0.2673,  ..., 0.0248, 0.8264, 0.7739],
        [0.5860, 0.4998, 0.2829,  ..., 0.5636, 0.7892, 0.2528],
        [0.2868, 0.1217, 0.2432,  ..., 0.0665, 0.8304, 0.6522]],
       device='cuda:0')

In [16]:
mat_gpu.size()

torch.Size([5000, 5000])

In [17]:
%%time
torch.mm(mat_gpu.t(), mat_gpu)

CPU times: user 0 ns, sys: 3.51 ms, total: 3.51 ms
Wall time: 2.98 ms


tensor([[1670.1737, 1253.2634, 1260.9958,  ..., 1259.3188, 1263.1609,
         1244.2560],
        [1253.2634, 1665.1344, 1274.3370,  ..., 1264.2490, 1260.3347,
         1248.6567],
        [1260.9958, 1274.3370, 1683.3060,  ..., 1274.7039, 1263.1434,
         1256.4707],
        ...,
        [1259.3188, 1264.2490, 1274.7039,  ..., 1682.9913, 1250.9548,
         1264.6089],
        [1263.1609, 1260.3347, 1263.1434,  ..., 1250.9548, 1685.4556,
         1265.3738],
        [1244.2560, 1248.6567, 1256.4707,  ..., 1264.6089, 1265.3738,
         1682.0267]], device='cuda:0')

Can you see the speed-up? It will be much critical if we use larger matrices, more matrix computations, and a deeper neural network model.

### ~~Variable~~ Autograd

**Variable wrapping is deprecated from PyTorch 0.4.** You can use just regular Tensor to use autograd functionality, and you can control the flag `requires_grad` in the Tensor. 

PyTorch provide a functionality of automatic differentiation with a package `autograd` and now `torch.Tensor` (instead of `Variable` in the old versions) is the key class for utilizing it.

A Tensor keeps its value and the gradient with respect to this Tensor value. Also, almost all of built-in operations in PyTorch supports automatic differentiation. Therefore, we can call `.backward()` on a computation graph, e.g. neural network, after we finish our computation on the graph, then we can get automatically accumulated gradient for each Tensor (which has `requires_grad=True`) related with the graph.

Let's try a simple example for easier understanding.

#### old code using Variable

```python
from torch.autograd import Variable

# Create some Tensors and a Variable
x = Variable(torch.FloatTensor([2.0]), requires_grad=False)
w = Variable(torch.FloatTensor([0.5]), requires_grad=True)
b = Variable(torch.FloatTensor([0.1]), requires_grad=True)
print(x)
print(w)
print(b)

# Define a computational graph
y = w*x + b # Currently, y = 0.5x + 0.1 and y(2) = 1.1
print(y)
```

#### Current one with PyTorch >= 0.4

In [18]:
# Create some Tensors
x = torch.tensor(2.0, requires_grad=False)
w = torch.tensor(0.5, requires_grad=True)
b = torch.tensor(0.1, requires_grad=True)
print(x)
print(w)
print(b)

# Define a computational graph
y = w*x + b # Currently, y = 0.5x + 0.1 and y(2) = 1.1
print(y)

tensor(2.)
tensor(0.5000, requires_grad=True)
tensor(0.1000, requires_grad=True)
tensor(1.1000, grad_fn=<AddBackward0>)


Let's compute gradients on the graph y and print the gradient w.r.t each Variable.

In [19]:
# Compute gradients
y.backward()

print(x.grad)
print(w.grad)
print(b.grad)

None
tensor(2.)
tensor(1.)


Since we set `requires_grad=False` for Tensor `x`, it has `None` value.
Also, if we do a simple math to differentiate it manually, we can easily get:
$$
\frac{\partial y}{\partial w} = \frac{\partial}{\partial w}\left(wx + b\right) = x\\
\text{and}\\
\displaystyle \frac{\partial y}{\partial w}\Bigr|_{x=2} = 2 
$$
Similarly,
$$
\frac{\partial y}{\partial b} = \frac{\partial}{\partial b}\left(wx + b\right) = 1\\
\text{and}\\
\displaystyle \frac{\partial y}{\partial b}\Bigr|_{x=2} = 1 
$$

Thanks to the functionality of automatic differentiation, we can build a very complex computational graph such as a neural network with many layers without manually computing the gradients of parameters.

Please refer to the official [tutorial](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html) for more details.

In the next chapter, we will build a simple feed-forward neural network by using these components of PyTorch we have learnt.