# Basics of Constructing Deep Learning Models with TensorFlow/Pytorch

##  Tensorflow and Pytorch

* Many deep learning libraries have been developed util now: TensorFlow, Pytorch, Caffe, Mxnet, CNTK...
* According to the resent reports, TensorFlow and Pytorch are the most popular libraries in deep learning research.
* TensorFlow is suited to constructing static graph models, while Pytorch is good at dynamic(eager) modeling.
* Though TensorFlow 2.0 will make earger mode be default, and Pytorch 1.0 will include caffe to handle static models,  at this time, we still employ them in the most suitable way separately.


## Tensors

Matrices are not enough in deep learning, since

* one instance may be a 2d-array or over.
    * a sentance: $ \#\left|\{\text{dims of word embedding}\}\right| \times \#\left|\{\text{words of the sentance}\}\right| $
    * a color image: $ \#\left|\text{(R, G, B)}\right| \times \#\left|\{\text{horizontal pixels}\}\right| \times \#\left|\{\text{vertical pixels}\}\right| $
* mini-batch training is used commonly.

In [1]:
import numpy as np

import tensorflow as tf
import torch

### Set environment variables about use of GPUs

In [2]:
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # so the IDs match nvidia-smi
os.environ["CUDA_VISIBLE_DEVICES"] = "0"       # eg. "0, 1, 2" for multiple

### With Numpy

#### Creation
* `np.array` is the most common API to create an N dimensions(nd-) array. 

* About other creation APIs see [creation routines](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html) and [random sampling](https://docs.scipy.org/doc/numpy/reference/routines.random.html).

* Operations on nd-arrays "broadcast" while the operands have different shapes. See [this](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).

In [3]:
def print_info(obj, name):
    
    print("{} =\n".format(name), obj, "\n shape =", obj.shape, 
          "\n ndim =", obj.ndim if hasattr(obj, 'ndim') else obj.ndimension(), 
          "\n dtype =", obj.dtype, "\n type =", type(obj), "\n")

arr = np.array([0., 1., 2., 3.], dtype=np.float32)
print_info(arr, "arr")

vec = np.array(arr+1., ndmin=2).T    # .T means transpose
print_info(vec, "vec")

mat = np.vstack((arr**2, np.hstack((vec, vec-1., vec*2, vec/2))))
print_info(mat, "mat")

ten = np.stack((mat, mat))
print_info(ten, "ten")

arr =
 [0. 1. 2. 3.] 
 shape = (4,) 
 ndim = 1 
 dtype = float32 
 type = <class 'numpy.ndarray'> 

vec =
 [[1.]
 [2.]
 [3.]
 [4.]] 
 shape = (4, 1) 
 ndim = 2 
 dtype = float32 
 type = <class 'numpy.ndarray'> 

mat =
 [[0.  1.  4.  9. ]
 [1.  0.  2.  0.5]
 [2.  1.  4.  1. ]
 [3.  2.  6.  1.5]
 [4.  3.  8.  2. ]] 
 shape = (5, 4) 
 ndim = 2 
 dtype = float32 
 type = <class 'numpy.ndarray'> 

ten =
 [[[0.  1.  4.  9. ]
  [1.  0.  2.  0.5]
  [2.  1.  4.  1. ]
  [3.  2.  6.  1.5]
  [4.  3.  8.  2. ]]

 [[0.  1.  4.  9. ]
  [1.  0.  2.  0.5]
  [2.  1.  4.  1. ]
  [3.  2.  6.  1.5]
  [4.  3.  8.  2. ]]] 
 shape = (2, 5, 4) 
 ndim = 3 
 dtype = float32 
 type = <class 'numpy.ndarray'> 



#### Indexing, slicing and masking

* Like lists in Python core, `[]` is used to index elements in nd-arrays. 

* The operators in `[]` can be array-like or slices, but using array-like returns a copy of data while using slices returns a view.

* However, unlike the usage of multiple `[]` to index nd-lists, eg. `ls[d0][d1]`, `,` is used in `[]` for nd-arrays, eg. `arr[d0, d1]`.

* Boolean operations can be performed masking thanks to broadcasting.

For more detail, see [this](https://docs.scipy.org/doc/numpy/user/basics.indexing.html).



In [4]:
print(mat[3, 1], "\n")

print(mat[1:, -1:-3:-1], "\n")     # same as [slice(1, None), slice(-1, -3, -1)]

print(mat[np.arange(1, mat.shape[0]), np.array([3, 2, 1, 0])], "\n")

print(ten[..., 2], "\n")           # same as [Ellipsis, 2]

print(ten[ten>1], "\n")

print(arr[:, np.newaxis])

2.0 

[[0.5 2. ]
 [1.  4. ]
 [1.5 6. ]
 [2.  8. ]] 

[0.5 4.  2.  4. ] 

[[4. 2. 4. 6. 8.]
 [4. 2. 4. 6. 8.]] 

[4.  9.  2.  2.  4.  3.  2.  6.  1.5 4.  3.  8.  2.  4.  9.  2.  2.  4.
 3.  2.  6.  1.5 4.  3.  8.  2. ] 

[[0.]
 [1.]
 [2.]
 [3.]]


### With TensorFlow

#### Creation
Use the low level APIs to create a tensor object.

* `tf.Variable`
* `tf.constant`
* `tf.placeholder`
* `tf.SparseTensor`

To know the usages of the APIs:
1. push `shift` + `tab` while the cursor is behind the `(` of each name string to see the documentation of the API. 
2. Try to push `shift` + `tab` 2~4 times quickly.
3. Push `esc` 1~2 times to close the documentation.

In [None]:
tf.Variable()
tf.random_uniform()
tf.zeros()
tf.placeholder()

For more detail, see this [guide](https://www.tensorflow.org/guide/tensors).

**Notice**: There is a _bug_ in the above official guide. When initializing a variable tensor with a certain dtype, do not use `tf.Variable({value}, tf.{dtype})` but `tf.Variable({value}, dtype=tf.{dtype})`. Because `dtype` is not the second argument of `tf.Variable` at this time.

**Exercise 1**. Create the following graph object in TensorFlow:
* a _variable_ $ 30 \times 48 $ random matrix $w$ with name `weight`
* a _variable_ $48$-d zero vector $b$ with name `bias`
* a $ 4 \times 5 \times 6 $ tensor _placeholder_ $x$ with name `input` 

All the `dtype`s must be `tf.float32`.

Hints: 
1. There are many APIs to initialize variable tensor values, such as `tf.ones`, `tf.zeros`, `tf.fill` and `tf.random_*`. 
2. You can also use `numpy` to initialize variable or constant values.
3. The names indicate the ones in TensorFlow graph.

In [5]:
with tf.device('/cpu:0'):                        # using CPU
    # your codes:
    w = tf.random_uniform((30, 48), 0, 1, dtype=tf.float32, name='weight')
    b = tf.Variable(np.zeros(48), dtype=tf.float32, name='bias')           # same as tf.zeros
    x = tf.placeholder(tf.float32, shape=(4, 5, 6), name='input')


**Exercise 2**. Construct a graph of $y = x'w + b$, where $x'$ is a $4 \times 30$ matrix placeholder reshaped from $x$.

Hints:
1. Use `x_` to indicate $x'$
2. Use `tf.matmul` or `@` to perform the matrix multiplication.
3. Tensors broadcast operations the same as arrays in numpy. 
4. The $y$ is a $4 \times 48$ matrix. 

In [6]:
with tf.device('/device:GPU:0'):                        # using GPU
    # your codes:
    x_ = tf.reshape(x, (-1, 30), name='input_reshaped')             # sames (4, 30)
    y = tf.matmul(x_, w) + b

#### Graph execution

Apply `tf.Session` to create a TensorFlow session to execute operations in a graph.

A session should be closed when the execution finished. We can use `with` syntax in Python to handle it.

In [7]:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True                     # don't set false at a shared GPU environment

with tf.Session(config=config) as sess:
    init = tf.global_variables_initializer()
    sess.run(init)
    
    input = np.random.rand(4, 5, 6)
    result = sess.run(y, feed_dict={x: input})
    print_info(result, "result")
    tf.summary.FileWriter('./runs', sess.graph)



result =
 [[ 8.358244   8.309652   7.779122   8.227763   7.0537233  7.1728964
   9.389893   8.024819   7.1485944  6.424975  10.394542   8.543082
   8.750342   7.904936   8.124608   8.31452    7.604998   8.994152
   6.3616753  9.653735   9.461673   7.726028   9.160962   8.698362
   9.423408   8.009501  10.327158   8.63269    7.6286883  8.833927
   7.2258496  9.192509   7.854035   8.343961   8.373005   9.335024
   7.9447107  9.919081   9.639628   9.591763   8.400371   7.546234
   8.400998   8.997999   6.342138   7.4642334  7.2081466  8.408829 ]
 [ 7.405282   9.07373    6.6454363  7.0295205  6.335593   7.3966513
   7.907217   6.678685   6.2604074  5.5376782  8.235144   7.117328
   6.977808   6.784752   8.556215   8.376909   5.840652   7.871629
   6.1440296  8.175112   7.8326406  7.369941   8.195439   8.494647
   7.430075   6.760482   9.397528   8.229395   6.331881   7.308034
   6.668237   7.621803   7.1842155  8.000627   6.117611   7.5221763
   7.376913   8.091796   8.390335   7.7559032  

### With Pytorch

* Pytorch has a bundle of numpy-like APIs
    * pythonic
    * OOP- or functional style can be chosen in the programming.

* `torch.tensor` is mainly used to create a tensor. 
* Also, there are many convenient APIs to construct some certain tensors, such as `torch.zeros`, `torch.ones` and `torch.rand*`.

Now, review the process of looking up the documentations of APIs.

In [None]:
torch.tensor()
torch.zeros()
torch.rand()

**Exercise 3**. Create the following tensor objects in Pytorch.
* a $ 30 \times 48 $ random matrix $w$
* a $48$-d zero vector $b$
* a $ 4 \times 5 \times 6 $ random tensor $x$ 

All the `dtype`s must be `torch.float32`.

In [8]:
# your codes:
w = torch.rand((30, 48), dtype=torch.float32)
b = torch.zeros(48, dtype=torch.float32)
x = torch.rand((4, 5, 6), dtype=torch.float32)

**Exercise 4**. Calculate $y = x'w + b$, where $x'$ is a $4 \times 30$ matrix reshaped from $x$.

Hints:
1. Use `{tensor}.view` rather than `torch.reshape` to flatten.
2. Use `torch.matmul` or `@` to perform the matrix multiplication.
3. Broadcasting works as well.

In [9]:
# your codes:
y = torch.matmul(x.view(-1, 30), w) + b

In [10]:
print_info(y, 'y')

if torch.cuda.is_available():
    device = torch.device('cuda:0')
    z = y.to(device)
    print_info(z, 'z')


y =
 tensor([[5.5667, 8.0265, 6.5486, 8.1449, 7.3204, 7.5531, 7.8389, 5.9027, 8.4445,
         7.1179, 7.3936, 7.7458, 7.1569, 9.1940, 8.8887, 7.7558, 7.4410, 8.7810,
         6.4504, 7.1810, 8.4517, 7.4026, 7.9868, 7.2360, 7.0586, 6.9077, 8.6155,
         7.2232, 6.8022, 5.6862, 6.2417, 7.9757, 7.4014, 8.2183, 6.4553, 8.3895,
         8.2536, 7.2091, 8.2178, 7.7101, 6.5175, 6.6017, 8.4134, 5.6454, 7.9216,
         7.5809, 6.9308, 7.9102],
        [4.7944, 7.6596, 5.3806, 7.3439, 6.3041, 7.7558, 7.2205, 6.5450, 6.8329,
         6.0705, 7.0618, 6.7989, 7.6537, 7.7555, 8.1527, 7.7323, 6.4669, 8.0622,
         6.6007, 8.1995, 7.4816, 6.5633, 7.8278, 7.4247, 5.1542, 6.0255, 7.4717,
         6.0527, 6.4665, 4.9215, 5.2722, 7.2250, 6.6226, 7.0848, 6.1495, 6.4953,
         7.9893, 6.5039, 6.4224, 8.0600, 6.8567, 5.8803, 7.0927, 5.9097, 6.2198,
         7.8803, 6.1650, 8.1800],
        [5.2311, 7.6031, 6.3994, 8.3263, 6.7625, 7.9609, 7.8390, 5.6355, 8.1225,
         6.4564, 7.5129, 8.0405, 7.5

Tensors in Pytorch can be easily converted to numpy array, and vice versa.

In [11]:
y_np = y.numpy()
print_info(y_np, 'y_np')

y_np2t = torch.from_numpy(y_np)
print_info(y_np2t, 'y_np2t')

y_np =
 [[5.566656  8.02651   6.548632  8.144884  7.320438  7.5531383 7.8388743
  5.9026737 8.444513  7.1178865 7.393579  7.745831  7.156865  9.194041
  8.888714  7.755849  7.4410286 8.781012  6.4504075 7.1809797 8.45172
  7.402625  7.9868283 7.235982  7.058561  6.907666  8.615546  7.2231636
  6.8022313 5.6861963 6.2417436 7.9756975 7.4014196 8.218342  6.4552665
  8.389525  8.253632  7.2090964 8.217768  7.710062  6.5174675 6.6016555
  8.413447  5.645404  7.9215565 7.580886  6.9308214 7.9102364]
 [4.7944493 7.6595645 5.380621  7.3438683 6.304113  7.755822  7.220464
  6.5449853 6.8328843 6.070502  7.0617924 6.7989345 7.6537113 7.7555285
  8.152721  7.7322893 6.4669175 8.062236  6.6007056 8.1995325 7.4815793
  6.5632544 7.8278246 7.424659  5.15418   6.0254593 7.4716516 6.0527406
  6.4664617 4.9215183 5.272157  7.2250047 6.622629  7.084839  6.1494765
  6.495301  7.9892516 6.503865  6.4223833 8.059981  6.8567424 5.8802686
  7.0926504 5.909717  6.2197986 7.880308  6.1649575 8.180039 ]
 [5.23

## Release the GPU resource

Finally, click `File`->`Close and Halt` to close this notebook to prevent Python processes from occupying the GPU.