In [2]:
import torch
tv = torch.__version__
print('Using PyTorch version: ', tv)
# check we have PyTorch 0.2.x
assert tv[0] == '0' and tv[2] == '2', tv

import numpy as np

Using PyTorch version:  0.2.0_4


# First things first: The world becomes tensorized

In [3]:
# Every deep learning framework is built upon Tensors
# These are marvelous multi-dimensional structures
# We can create Tensors out of Python lists or NumPy arrays
my_list = [0, 1, 2, 3]
my_array = np.array(my_list)
my_list_T = torch.LongTensor(my_list)
my_array_T = torch.LongTensor(my_array)
# These are the same, so the assertion will confirm it
assert type(my_list_T) == type(my_array_T)

# Now we'll create a multi-dimensional array out of a list of lists of lists (3-D)
T_3 = [[[0, 1, 2.], [5, 6, 7]], [[0.2, 0.4, 2.2], [4.5, -6, -9]]]
T_3 = np.array(T_3)

assert T_3.ndim == 3, T_3.ndim
print('Number of dimensions: ', T_3.ndim)
print('Shape of each dimension: ', T_3.shape)
# the dimensions of this NumPy array are [2, 2, 3]

Number of dimensions:  3
Shape of each dimension:  (2, 2, 3)


### Congratz for your marvelous Tensors, but now what? 
Tensors have:
1. Info about the data type and the size of each dimension (but NumPy too!)
2. the GPU capabilities (NumPy DOES NOT)

In [4]:
# We can operate with Tensors of course
# weights matrix with [inputs x outputs] = [25 x 100]
W = torch.randn(100, 25)
# bias vector [100]
b = torch.zeros(100)
# input vector [25]
x = torch.randn(25)
# Yes, this is a single layer fully connected neural network
y = torch.matmul(W, x) + b
# y ~ [100] output vector
print('x size: ', x.size())
print('W size: ', W.size())
print('b size: ', b.size())
print('y = Wx + b, size: ', y.size())

x size:  torch.Size([25])
W size:  torch.Size([100, 25])
b size:  torch.Size([100])
y = Wx + b, size:  torch.Size([100])


### Some PyTorch notation for Tensors properties:

In [5]:
# NumPy --> PyTorch translation
# --------------------------------
# 1) shape --> size()
y.size()
print('y size: ', y.size())

# 2) reshape() --> view()
z = y.view(10, 10)
print('z size (y reshaped to 10x10): ', z.size())

# 3) expand_dims() --> unsqueeze()
Y = y.unsqueeze(-1)
print('Y size (y unsqueezed in last dim): ', Y.size())

# 4) transpose(0, 1) --> t()
Y_t = Y.t()
print('Y transposed size: ', Y_t.size())

y size:  torch.Size([100])
z size (y reshaped to 10x10):  torch.Size([10, 10])
Y size (y unsqueezed in last dim):  torch.Size([100, 1])
Y transposed size:  torch.Size([1, 100])


### The "magic" behind AUTOGRAD

**Variable:** It wraps a Tensor, and supports nearly all of operations defined on it. Once you finish your computation you can call `.backward()` and have all the gradients computed automatically.

You can access the raw tensor through the `.data` attribute, while the gradient w.r.t. this variable is accumulated into `.grad`[[1]](http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html).

In [6]:
from torch.autograd import Variable

T = torch.randn(10, 10)
# we make the Variable by just wrapping the Tensor with it
V = Variable(T)
# This is a Variable containing a FloatTensor
print(V)

Variable containing:
 0.6762 -1.1325 -1.2233  0.6894  0.2096  0.2356 -1.3594 -0.3461 -1.6773  0.4457
-0.2565 -0.5017 -0.0967 -2.1807  1.4041 -1.4717  0.8760 -0.2092 -0.7740  3.6363
 1.1446 -1.9860  1.1396 -1.9739  0.7423  1.2037  0.4946 -0.1856 -0.4834 -1.3952
-0.0324  0.4748  0.0979  1.1420 -0.4775  0.2321 -2.5791 -1.3295 -0.2176  0.5631
-0.8356 -0.8526  0.2375  1.7524  1.1189 -0.1390  0.0773 -0.6712 -1.4016  0.8095
 1.7392 -0.0452  0.0349  0.6899 -1.5706  0.7190 -1.0788 -0.3919 -1.0044  0.8994
-0.1986  0.9016 -0.1839  0.4053 -0.5015 -0.2803  0.6005  2.2463  0.4193  0.0309
-0.4738  0.3423  0.3874  0.3733  0.5238 -0.4901  1.0321 -0.4264  1.4701  0.4584
-2.6641  1.2440 -1.8874 -0.2299  0.0175 -1.0083 -1.7755 -0.3415  1.7290 -0.7525
-1.7420  0.2854  0.2101 -0.9111 -1.4660 -0.6723  0.7580  0.1770  0.3068 -0.0180
[torch.FloatTensor of size 10x10]



### The reason to create Variables: the Graph

Tensors are nodes in the graph. Edges are the computations relating Tensors (as in TensorFlow). However, the main difference between PyTorch and TensorFlow is: **DYNAMIC GRAPH!**

<img src="dynamic_graph.gif" width="600px">

[comment]: (Reference_for_the_figure:https://medium.com/intuitionmachine/pytorch-dynamic-computational-graphs-and-modular-deep-learning-7e7f89f18d1)

The Graph is built operation by operation, thus on runtime!

In [7]:
# Example of a graph creation z = sum(x * y)
# requires_grad tells the framework we want the gradient wrt to that variable to be computed
x = Variable(torch.ones(10), requires_grad=True)
y = Variable(torch.ones(10), requires_grad=True)
z = x + y
out = z.sum()

In [8]:
out.backward()
print(z)
print(z.grad)
print(x.grad)
print(y.grad)

Variable containing:
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
[torch.FloatTensor of size 10]

None
Variable containing:
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
[torch.FloatTensor of size 10]

Variable containing:
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
[torch.FloatTensor of size 10]



For further reference: http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html