# Basics of Constructing Deep Learning Models with TensorFlow/Pytorch

##  Tensorflow and Pytorch

* Many deep learning libraries have been developed util now: TensorFlow, Pytorch, Caffe, Mxnet, CNTK...
* According to the resent reports, TensorFlow and Pytorch are the most popular libraries in deep learning research.
* TensorFlow is suited to constructing static graph models, while Pytorch is good at dynamic(eager) modeling.
* Though TensorFlow 2.0 will make earger mode be default, and Pytorch 1.0 will include caffe to handle static models,  at this time, we still employ them in the most suitable way separately.


## Tensors

Matrices are not enough in deep learning, since

* one instance may be a 2d-array or over.
    * a sentance: $ \#\left|\{\text{dims of word embedding}\}\right| \times \#\left|\{\text{words of the sentance}\}\right| $
    * a color image: $ \#\left|\text{(R, G, B)}\right| \times \#\left|\{\text{horizontal pixels}\}\right| \times \#\left|\{\text{vertical pixels}\}\right| $
* mini-batch training is used commonly.

In [None]:
import numpy as np

import tensorflow as tf
import torch

### Set environment variables about use of GPUs

In [None]:
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # so the IDs match nvidia-smi
os.environ["CUDA_VISIBLE_DEVICES"] = "3"       # eg. "2, 3" for multiple

### With Numpy

#### Creation
* `np.array` is the most common API to create an N dimensions(nd-) array. 

* About other creation APIs see [creation routines](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html) and [random sampling](https://docs.scipy.org/doc/numpy/reference/routines.random.html).

* Operations on nd-arrays "broadcast" while the operands have different shapes. See [this](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).

In [None]:
def print_info(obj, name):
    
    print("{} =\n".format(name), obj, "\n shape =", obj.shape, 
          "\n ndim =", arr.ndim, "\n dtype =", arr.dtype, "\n type =", type(arr), "\n")

arr = np.array([0., 1., 2., 3.], dtype=np.float32)
print_info(arr, "arr")

vec = np.array(arr+1., ndmin=2).T    # .T means transpose
print_info(vec, "vec")

mat = np.vstack((arr**2, np.hstack((vec, vec-1., vec*2, vec/2))))
print_info(mat, "mat")

ten = np.stack((mat, mat))
print_info(ten, "ten")

#### Indexing, slicing and masking

* Like lists in Python core, `[]` is used to index elements in nd-arrays. 

* The operators in `[]` can be array-like or slices, but using array-like returns a copy of data while using slices returns a view.

* However, unlike the usage of multiple `[]` to index nd-lists, eg. `ls[d0][d1]`, `,` is used in `[]` for nd-arrays, eg. `arr[d0, d1]`.

* Boolean operations can be performed masking thanks to broadcasting.

For more detail, see [this](https://docs.scipy.org/doc/numpy/user/basics.indexing.html).



In [None]:
print(mat[3, 1], "\n")

print(mat[1:, -1:-3:-1], "\n")     # same as [slice(1, None), slice(-1, -3, -1)]

print(mat[np.arange(1, mat.shape[0]), np.array([3, 2, 1, 0])], "\n")

print(ten[..., 2], "\n")           # same as [Ellipsis, 2]

print(ten[ten>1], "\n")

print(arr[:, np.newaxis])

### With TensorFlow

#### Creation
Use the low level APIs to create a tensor object.

* `tf.Variable`
* `tf.constant`
* `tf.placeholder`
* `tf.SparseTensor`

To know the usages of the APIs:
1. push `shift` + `tab` while the cursor is behind the `(` of each name string to see the documentation of the API. 
2. Try to push `shift` + `tab` 2~4 times quickly.
3. Push `esc` 1~2 times to close the documentation.

In [None]:
tf.Variable()
tf.random_uniform()
tf.zeros()
tf.placeholder()

For more detail, see this [guide](https://www.tensorflow.org/guide/tensors).

**Notice**: There is a _bug_ in the above official guide. When initializing a variable tensor with a certain dtype, do not use `tf.Variable({value}, tf.{dtype})` but `tf.Variable({value}, dtype=tf.{dtype})`. Because `dtype` is not the second argument of `tf.Variable` at this time.

**Exercise 1**. Create the following graph object in TensorFlow:
* a _variable_ $ 60 \times 128 $ random matrix $w$ with name `weight`
* a _variable_ $128$-d zero vector $b$ with name `bias`
* a $ 16 \times 6 \times 10 $ tensor _placeholder_ $x$ with name `input` 

All the `dtype`s must be `tf.float32`.

Hints: 
1. There are many APIs to initialize variable tensor values, such as `tf.ones`, `tf.zeros`, `tf.fill` and `tf.random_*`. 
2. You can also use `numpy` to initialize variable or constant values.
3. The names indicate the ones in TensorFlow graph.

In [None]:
with tf.device('/cpu:0'):                     # using CPU
    # your codes:
    

**Exercise 2**. Construct a graph of $y = x'w + b$, where $x'$ is a $16 \times 60$ matrix placeholder with name `input`.

Hints:
1. Use `x_` to indicate $x'$
2. Use `tf.matmul` or `@` to perform the matrix multiplication.
3. Tensors broadcast operations the same as arrays in numpy. 
4. The $y$ is a $16 \times 128$ matrix. 

In [None]:
with tf.device('/device:GPU:0'):                 # using a GPU
    # your codes:
    

#### Graph execution

Apply `tf.Session` to create a TensorFlow session to execute operations in a graph.

A session should be closed when the execution finished. We can use `with` syntax in Python to handle it.

In [None]:

with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)
    
    input = np.random.rand(16, 6, 10)
    result = sess.run(y, feed_dict={x: input})
    print_info(result, "result")
    tf.summary.FileWriter('./runs', sess.graph)



### With Pytorch

* Pytorch has a bundle of numpy-like APIs
    * pythonic
    * OOP- or functional style can be chosen in the programming.

* `torch.tensor` is mainly used to create a tensor. 
* Also, there are many convenient APIs to construct some certain tensors, such as `torch.zeros`, `torch.ones` and `torch.rand*`.

Now, review the process of looking up the documentations of APIs.

In [None]:
torch.tensor()
torch.zeros()
torch.rand()

**Exercise 3**. Create the following tensor objects in Pytorch.
* a $ 60 \times 128 $ random matrix $w$
* a $128$-d zero vector $b$
* a $ 16 \times 6 \times 10 $ random tensor $x$ 

All the `dtype`s must be `torch.float32`.

In [None]:
# your codes:


**Exercise 4**. Calculate $y = x'w + b$, where x' is a $16 \times 60$ matrix.

Hints:
1. Use `torch.view` rather than `torch.reshape` to flatten.
2. Use `torch.matmul` or `@` to perform the matrix multiplication.
3. Broadcasting works as well.

In [None]:
# your codes:


In [None]:
print_info(y, 'y')

if torch.cuda.is_available():
    device = torch.device('cuda:0')
    z = y.to(device)
    print(z)


Tensors in Pytorch can be easily converted to numpy array, and vice versa.

In [None]:
y_np = y.numpy()
print_info(y_np, 'y')