# Lab 1: TensorFlow Basics

## Section 0: Setup

In [None]:
# Import essential libraries
import numpy as np
import tensorflow as tf

## Section 1: Tensors
If you want to review any of this material, look over https://docs.scipy.org/doc/numpy-1.15.0/user/quickstart.html and https://www.tensorflow.org/guide/tensors.

### 1.1: Tensor values, shapes, rank, and axes
Make tensor values by hand (e.g. `x = np.array([[1, 2, 3], [4, 5, 6]])`) of the following shapes:
 * a: (2, 2)
 * b: (3)
 * c: (3, 1)
 * d: (1, 3)
 * e: ()
 * f: (1)
 * g: (2, 2, 2)
 * h: (2, 3, 1, 2)
 
 For each, put its tensor rank and total number of elements in a comment.
 Yes, this is pretty boring, but it's also short and it's really important to understand what tensors of different shapes look like and how shapes, rank, and axes interact.

In [None]:
# Your code here

### 1.2: Slices and reductions
Use slicing or `tf.reduce_mean`, `tf.reduce_sum`, and `tf.reduce_any` on the tensors defined below to print:
 * The (1-2-3)-st element of `a`
 * The first column of `b`
 * The shape-(2, 3, 2) tensor obtained by selecting the second and third elements of the third axis of `a`
 * The sum of all values in `b`
 * The 2-vector containing means of each row of `b` 
 * The (1, 3) tensor containing, for each column in `c`, whether that column contains any `True` values
 
Each statement should take the form 
```
tf.print(something[...])
```
or 
```
tf.print(tf.reduce_something(...))
```
Follow each with a comment stating the shape of the output.
For a rank-2 tensor, the first index specifies row and the second specifies column.
Make sure to pay attention to the `axis` and `keepdims` arguments of the `reduce` functions.
 
 
For this problem, I'll set up the name scope, but for all future problems you'll need to do that.

In [None]:
a = tf.constant(np.ones((2, 3, 4))) # Tensor of ones with shape (2, 3, 4)
b = tf.constant([[1., 2.], 
                 [3., 4.]]) # Tensor of the matrix [1 2; 3 4] with shape (2, 2)
c = tf.constant([[True, True, False],
                 [False, True, False]]) # Binary tensor with shape (2, 3)

In [None]:
with tf.name_scope('slices_and_reductions'):
    # Your code here

### 1.3: Transposition and reshaping
Use `tf.transpose` to print:
 * `b` with its rows and columns swapped
 * `a` with its second and third axes swapped; comment its shape
 
Use `tf.reshape` to print:
 * The values of `b` in a tensor with shape (1, 4)
 * The values of `b` in a tensor with shape (4, 1)
 
Do this all inside the name scope "transposition_and_reshaping".

In [None]:
# Your code here

## Section 2: Computing with Operations and Graphs 

### 2.1: The dot product (as a sum of scalar products)
Write a function `dot_sum()` that takes in two rank-1 tensors `a` and `b` of equal shape and returns a tensor that holds their dot product, $$\text{result} = a \cdot b = \sum_{i = 1}^{\dim{a}} a_i \cdot b_i $$

The computation should first multiply the elements in $a$ and $b$ into a vector $a \odot b$ (the [Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)) of $a$ and $b$), then sum across the vector to produce a scalar. 
Your implementation should be _vectorized_: it should not explicitly use the shape of an input tensor or do any looping.
The tensor output by your function must be rank-0.

The entire computation should use the name scope "dot_sum" and the tensor you return should have the name "result".

TensorFlow operations to look at:
 * `tf.multiply` (or equivalently, the binary operation *)
 * `tf.reduce_sum`

In [None]:
def dot_sum(a, b):
    '''
    Given rank-1 tensors a and b with equal shapes, return the dot product 
    of a and b as a rank-0 tensor computed via Hadamard product.
    '''
    # Your code here

### 2.2: The dot product (as matrix multiplication)
Write a function `dot_multiply()` that takes in two rank-1 tensors `a` and `b` of equal shape and returns a tensor that holds their dot product, $$\text{result} = a \cdot b = a^T b $$

The computation should use `tf.matmul` to perform the multiplication, which expects that your input tensors have rank of at least two (they should be matrices, not vectors).
Since your input vectors are rank-1, this means you'll need to use `tf.expand_dims` with `axis=-1` to add a "dummy dimension".
This is a subtle but important point: your vectors start with shape [n], but matrix multiplication is only defined for matrices with shapes [1, n] and [n, 1].
Depending on how you do it, you will probably get a rank-2 tensor with a shape like [1, 1].
You must return a rank-0 tensor, so use `tf.squeeze` to eliminate dummy dimensions.

The entire computation should use the name scope "dot_multiply" and the tensor you return should have the name "result".
This will not collide with the previous "result" tensor because of name scoping.
(If it did, it would be renamed to "result_0" in the graph)

TensorFlow operations to look at:
 * `tf.matmul`
 * `tf.transpose`
 * `tf.expand_dims`
 * `tf.squeeze`

In [None]:
def dot_multiply(a, b):
    '''
    Given rank-1 tensors a and b with equal shapes, return the dot product 
    of a and b as a rank-0 tensor computed via matrix multiplication.
    '''
    # Your code here

### 2.3: A single ReLU unit
The "default" activation function for modern neural networks is the [rectified linear unit](https://en.wikipedia.org/wiki/Rectifier_(neural_networks) (or "ReLU"):
$$ \text{relu}(x) = max(0, x). $$

In a neural network using ReLU activation, a single unit with $n$ inputs has parameters $w$ (an $n$-vector of weights) and $b$ (a scalar).
It computes the function
$$ f(x; w, b) = \text{relu}(w \cdot x + b). $$

Using either `dot_sum` or `dot_multiply`, add these tensors and operations to the default graph:
$$
\begin{align}
&x: \space \text{placeholder} \\
&w = \begin{bmatrix}2 & 0.5 & -1\end{bmatrix} \\
&b = 0.3 \\
&\text{state} = w \cdot x + b \\
&\text{activation} = \max(\text{state}, 0)
\end{align}
$$

`x` should have shape [3] and dtype `tf.float32`, and all tensors should be named, under the name scope "ReLU".
This includes the tensors created through your dot product function, but do not change your implementation to add to the name! Then wrap all of this in a function called `relu` that takes in one argument to initialize the `tf.Variable` `x` object, then returns `activation`.

Then, print `relu` for:
 * $x = \begin{bmatrix} 1 & 1 & 1 \end{bmatrix}$
 * $x = \begin{bmatrix} -1 & 2 & 0 \end{bmatrix}$
 * $x = \begin{bmatrix} 1 & 0 & 0 \end{bmatrix}$
 * $x = \begin{bmatrix} 0 & 0 & 0 \end{bmatrix}$

TensorFlow operations to look at:
 * tf.constant
 * tf.Variable
 * tf.add
 * tf.maximum

In [None]:
# Your graph code here

In [None]:
# Your print code here

#### Aside on activation functions

One way to derive feedforward neural networks is to begin by saying "I'd like to do a simple (linear) transformation on my input features to make them easier to model, then use a simple model (e.g. linear regression) that instead uses the transformed features."
Doing this means your total model is $y = ABx + b$ where $B$ is the matrix multiplying an input point $x$ into a new representation and $A$ is the matrix parameterizing the linear regression.

But, $AB$ is just another matrix, and so by adding a representation you have not made your model more powerful; instead if you'd "twisted" the input space after appyling B, the overall map would be nonlinear and the composite model would have greater representation power.
Activation functions perform this "twisting".
Deep neural networks come from the observation that it'd be easier to get a good representation (top layer) if it was based on a lower-level representation (early layers).

Here's a great article explaining the geometric interpretation of activation functions: https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/.
The general idea is that neural networks can learn parameters that use the "twists" such that the entire network deforms space so that the manifold defined by your input data is simple. 

## Section 3: Optimization

### Minimizing a function with gradient descent
Minimize the scalar function $f(x) = (x-1)(2x-2)(x-3)(x-4)$, plotted below, using gradient descent.
It has a local minimum near $x = 1$ and a global minimum near $x = 3.5$.

![f(x)](./images/plot_f.png)

The steps to build the graph are:
 1. Use `tf.Variable` to create a variable named `x` that uses a `np.random.uniform` on the range [-1, 5] to initialize.
 2. Make a `tf.optimizers.SGD` named "optimizer" with a learning rate of 0.01.
 3. Make a function `optimize` that takes in an optimizer as an argument and represents each step of the training loop.
 4. Create a `tf.GradientTape` object and make a tensor `y` that represents $f(x)$ given a value of `x` under it.
 5. Get the gradients of `y` from the `tf.GradientTape` and apply them to the optimizer.
 
Remember steps 4 and 5 go inside the `optimize` function!
The whole subgraph for this problem should go under a name scope of "minimize_f", and operations to compute `y` should have an additional name scope of "compute_f". 

In a comment, rewrite the `optimize` function using the `minimize` function on the optimizer instead of getting the gradients and applying them. Is `tf.GradientTape` is necessary? You do not need to worry about `tf.name_scope` for this.

Then, the steps to minimize the function once are:
 1. Print the initial values of `x` and `y`.
 2. Run `minimize` 1000 times.
 3. Print the final values of `x` and `y`.
 
Minimize the function a few times. If you did it right, you'll find that in each run the optimizer finds one of two minima. Running minimization a few times, you should see it find both eventually. What determines which minimum is found? Answer in the markdown box below.

In [None]:
# Your graph code here

In [None]:
# Your training loop here

Your answer here.