*Colormap of the notebook:*

* <span style="color:red">assignment problem</span>. The red color indicates the task that should be done
* <span style="color:green">debugging</span>. The green tells you what is expected outcome. Its primarily goal is to help you get the correct answer
* <span style="color:blue">comments, hints</span>.

Assignment 1 (pytorch basics)
======================


<img src="fig/pytorch-logo-dark.png" style="height:64px;" />

#### Useful Links:

* pytorch official documentation
http://pytorch.org/docs/master/index.html

* pytorch discussion
https://discuss.pytorch.org/

* official tutorials
https://pytorch.org/tutorials/

* pytorch tutorials
https://github.com/yunjey/pytorch-tutorial

* pytorch examples
https://github.com/jcjohnson/pytorch-examples


### Preliminaries

In [1]:
# for compatability issues (python 2 & python 3)
from __future__ import print_function
from __future__ import division

In [2]:
import numpy as np
import torch

In [3]:
# random seed settings
torch.manual_seed(42)
np.random.seed(42)

###  Tensors

One of the main data type in pytorch is tensor.
We will start with the concept of tensor and how it is used in pytorch

<img src="fig/tensors.jpg" style="height:512px;" />

#### Tensor Initialization

In [4]:
# 1d tensor of size 64 of type float (default)
# (this tensor is initialized with default values close to zero)
v = torch.empty(64)

print(" * the first 4 elements of 'v' are:")
print(v[:4]) # print the first four elements of the tensor

# initialize with array [0,1,...,63]
v = torch.arange(0,64)

print(" * the first 4 elements of 'v' are:")
print(v[:4]) # print the first four elements of the tensor

print(" * the size of the 'v' is ")
print(v.size())

 * the first 4 elements of 'v' are:
tensor([-2.4736e-27,  4.5695e-41, -2.3428e-26,  4.5695e-41])
 * the first 4 elements of 'v' are:
tensor([ 0.,  1.,  2.,  3.])
 * the size of the 'v' is 
torch.Size([64])


In [5]:
# 2d tensor of size 64 of type float
x = torch.zeros(8, 8).type(torch.FloatTensor)

print(" * the last 4 elements of 'x' are:")
print(x[:4,:4]) # print the last four elements of the tensor

# initialize with array all ones
x = torch.ones(8, 8).type(torch.FloatTensor)

print(" * the last 4 elements of 'x' are:")
print(x[:4, :4]) # print the last four elements of the tensor

print(" * the size of the 'x' is ")
print(x.size())

print(" * the size of the 'x' can also be obtained by familar from numpy 'shape' command")
print(x.shape)

 * the last 4 elements of 'x' are:
tensor([[ 0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.]])
 * the last 4 elements of 'x' are:
tensor([[ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.]])
 * the size of the 'x' is 
torch.Size([8, 8])
 * the size of the 'x' can also be obtained by familar from numpy 'shape' command
torch.Size([8, 8])


-----

<span style="color:red"> ** [PROBLEM I]: ** </span>   

<span style="color:red"> Initialize X </span>  
<span style="color:red"> 3d Tensor of size (4,4,4) </span>  
<span style="color:red"> of type IntTensor with all elements equal to zero </span>

-----

In [8]:
# YOUR CODE HERE
X = torch.zeros(4,4,4)

In [9]:
X.shape

torch.Size([4, 4, 4])

#### Reshaping, broadcasting

Tensor reshaping is done with command 'view':

In [10]:
a = torch.tensor([[1,2], [3,4]])
a_reshaped = a.view(4) # reshape into one-dimensional tensor of size 4

print(a)
print(a_reshaped)

tensor([[ 1,  2],
        [ 3,  4]])
tensor([ 1,  2,  3,  4])


-----

<span style="color:red"> **[PROBLEM II]: ** </span> 

<span style="color:red"> Use command 'view' to reshape v and X into 2d tensor --> v' and X'. </span>  
<span style="color:red"> Perform addition of these reshaped tensors, namely calculate v' + X' + x </span>  
<span style="color:red"> Finally display the result. </span>

-----

In [None]:
# YOUR CODE HERE

In [11]:
print(result_add)

tensor([[  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.],
        [  9.,  10.,  11.,  12.,  13.,  14.,  15.,  16.],
        [ 17.,  18.,  19.,  20.,  21.,  22.,  23.,  24.],
        [ 25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.],
        [ 33.,  34.,  35.,  36.,  37.,  38.,  39.,  40.],
        [ 41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.],
        [ 49.,  50.,  51.,  52.,  53.,  54.,  55.,  56.],
        [ 57.,  58.,  59.,  60.,  61.,  62.,  63.,  64.]])


### Numpy bridge

In [12]:
# create numpy array
a = np.array([[1,2], [3,4]])
# transform numpy array into torch.Tensor
b = torch.from_numpy(a)
# make operation on this Tensor (in this case transpose)
b = b.transpose(1,0)
# transform back to numpy
c = b.numpy()                

print(a, type(a))
print(b)
print(c, type(c))

[[1 2]
 [3 4]] <class 'numpy.ndarray'>
tensor([[ 1,  3],
        [ 2,  4]])
[[1 3]
 [2 4]] <class 'numpy.ndarray'>


-----

<span style="color:red"> ** [PROBLEM III]:** </span> 

In [13]:
# using these two random matrices do the following:
x = np.random.randn(3, 10)
y = np.random.randn(4, 10)

<span style="color:red"> Do the following: </span>
* <span style="color:red">transform $\mathbf{x}$ and $\mathbf{y}$ to torch.Tensors</span>
* <span style="color:red">perform matrix mutliplication $\mathbf{r1} = \mathbf{x} \cdot \mathbf{y^T} $</span>  
<span style="color:blue"> look in for pytorch function http://pytorch.org/docs/master/torch.html#torch.mm </span>  
* <span style="color:red">perform matrix element-wise mutliplication $\mathbf{r2} = \mathbf{r1} \cdot \mathbf{r1} $</span>  
<span style="color:blue"> look in for pytorch function http://pytorch.org/docs/master/torch.html#torch.mul </span> 
* <span style="color:red">perform scalar addition and scalar multiplication $\mathbf{r3} = 2 * \mathbf{r2} + 3 $</span>  
* <span style="color:red">transform the result back to numpy </span>

-----

In [21]:
# YOUR CODE HERE

### CUDA stuff

let us run on CUDA! ... if CUDA is available

We will use ``torch.device`` objects to move tensors in and out of GPU

In [15]:
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!

###  Autograd: automatic differentiation

*torch.Tensor* is the central class of the package. If you set its attribute *.requires_grad* as True, it starts to track all operations on it. When you finish your computation you can call *.backward()* and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute. 

**use of autograd**

Lets start with simple example.
Consider the following function:
$$f = (x + y) \cdot z$$

For concretness let's take $x=2$, $y=-7$, $z=3$. The 'forward' calculation is shown in <span style="color:green"> green </span> on the image below.

Automaic differentiation provides the elegant tool to calculate derivatives of $f$ with respect to all variables, by 'backward' path.

$$f = (x + y) \cdot z = u \cdot z $$

$$ \frac{\partial f}{\partial u} = z $$

$$ \frac{\partial f}{\partial z} = u = -5 $$

$$ \frac{\partial f}{\partial x} = \frac{\partial f}{\partial u} \cdot \frac{\partial u}{\partial x} = z = 3$$

$$ \frac{\partial f}{\partial y} = \frac{\partial f}{\partial u} \cdot \frac{\partial u}{\partial y} = z = 3$$

<p align="center">
<img src="fig/comp_graph_1.png" width="500" />
</p>

In [16]:
# Create tensors.
x = torch.tensor([2], requires_grad=True)
y = torch.tensor([-7], requires_grad=True)
z = torch.tensor([3], requires_grad=True)

# Build a computational graph.
f = (x + y) * z   

# Compute gradients.
f.backward()

# Print out the gradients.
print(x.grad)    
print(y.grad)    
print(z.grad) 

tensor([ 3])
tensor([ 3])
tensor([-5])


<span style="color:red"> **[PROBLEM V]**: </span> 

<span style="color:red"> Next we will consider the computational graph of the following function  </span>

$$f = \frac{1}{1 + exp^{-(w_0 \cdot x_0 + w_1 \cdot x_1 + b )}} = \frac{1}{1 + exp^{-(\mathbf{w} \cdot \mathbf{x} + b )}}$$

<img src="fig/comp_graph_2.png" style="height:320px;" />

<span style="color:red"> We are interested in computing partial derivatives:  </span>

<span style="color:red">$$ \frac{\partial f}{\partial \mathbf{w}}  $$ </span>

<span style="color:red">$$ \frac{\partial f}{\partial b}  $$ </span>

<span style="color:red">$$ \frac{\partial f}{\partial \mathbf{x}}  $$ </span>

<span style="color:blue">define $\{x_0, x_1\}$ and $\{w_0, w_1\}$ as vector variables $\mathbf{x}$ and $\mathbf{w}$ </span>  
<span style="color:blue"> look in for pytorch exponent function http://pytorch.org/docs/master/torch.html#torch.exp </span>  
<span style="color:blue">use matrix operations</span>

<span style="color:green">You should get the numbers the same as on the figure</span>

In [17]:
w = torch.tensor([3., 5.], requires_grad=True)
x = torch.tensor([-2., 1.], requires_grad=True)
b = torch.tensor([2.], requires_grad=True)
f = None #YOUR CODE HERE

# Compute gradients.
f.backward()

# Print out the gradients.
print(w.grad)
print(x.grad)      
print(b.grad) 

tensor([-0.3932,  0.1966])
tensor([ 0.5898,  0.9831])
tensor([ 0.1966])


In [18]:
f

tensor([[ 0.7311]])

In [19]:
print(w.grad)
print(x.grad)      
print(b.grad) 

tensor([-0.3932,  0.1966])
tensor([ 0.5898,  0.9831])
tensor([ 0.1966])


One can make gradient zero by *.zero_()* command

In [20]:
w.grad.zero_()
print(w.grad)

tensor([ 0.,  0.])
