# TensorFlow Introduction

As a motivation, lets say that we have a to minimize the following cost function
$$  J(w)  = w^2 -10 w + 25 $$
We can notice that this function is an expanded form of $(w-5)^2$. Thus we can tell from this form that the minimum will achieve at $w=5$. We will use TensorFlow to predict this result.

In [8]:
# importing the libraries
import numpy as np
import tensorflow as tf

Next we will define the parameter $w$. We are going to use `tf.variable()` to signify that this is a variable initialized to 0 and the type of the variable `dtype` is floating point number, `tf.float32`

In [13]:
w = tf.Variable(0, dtype = tf.float32)
print(w)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=0.0>


Next, we are going to define the optimization algorithm that we are going to use. In this case, the Adam optimization algorithm and then we are going to define the cost function. 

In [11]:
optimizer = tf.keras.optimizers.Adam(0.1)  # alpha = 0.1

# cost = w**2 - 10*w + 25

We only need to implement the forward prop i.e. we only need to write the function to compute the value of the cost function. Tensorflow can figure out how to do the Backprop or the gradient computation. 

- One way to do that is to use `GradientTape`. The `GradientTape` is the analogy to the old school cassette tape where it will record the sequence of operations as we are computing the cost function in the ForwardProp step. When we play the tape backward, in backward order, it can revisit the order of operations in reverse order and along the way, it can compute backprop and the gradients.

- We also need to define a training step function, `train-step()` to loop over. 

In [17]:
def train_step():
    with tf.GradientTape() as tape:
        cost = w**2 - 10*w + 25
        
    trainable_variables = [w]
    grads = tape.gradient(cost, trainable_variables)
    optimizer.apply_gradients(zip(grads, trainable_variables))

- In order to carry out the one iteration of training, we have to define what are the trainable variables, `trainable_variables`. It is just a list with only $w$.
- Then we are going to compute the gradients, `grads` with `tape.gradient()`.
- Having all these steps, we can now use the optimizer to apply the gradients with `apply_gradients()`. We are going to use the `zip` function to take the list of the graidents and trainable_variables, and pair them up. So a `zip` function take the given parameters and pairs the corresponding elements.

Now, let's run one step of our algorithm and print the new value of $w$.

In [18]:
train_step()
print(w)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=0.09999997>


We see that the value of $w$ has changed from 0 to 0.9999. Let's run a 1000 iterations of the algorithm

In [19]:
for i in range(1000):
    train_step()
    
print(w)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=5.000001>


We see that the value has now changed to 5, which is the minimum value for the cost function. So we just specified the cost function and Tensorflow has computed the minimum value for us. In other words, it has minimized the cost function for us.

## With respect to Neural Network

In the above example, we just had $w$ as a fixed parameter or the variable in the cost function. What if we the function we want to minimize is not just a function of $w$, but also a function of your training step? Let us say that we have some data, $x$ and $y$, and we are training a neural network with the cost function which depends upon the data. How do we get the training data into a Tensorflow program? 

Let's use the same $w$ and the optimizer with Adam's Algorithm.

In [20]:
w = tf.Variable(0, dtype = tf.float32)
x = np.array([1.0, -10.0, 25.0], dtype =np.float32)
optimizer = tf.keras.optimizers.Adam(0.1) 

In the above example, the numbers in the array will play the role of the coefficients of the cost function. 

Now, lets define the cost function which will minimize the same as before except that now, we are going to write  the following:

In [23]:
def cost_fn():
    return x[0]* w**2 + x[1]*w + x[2]

optimizer.minimize(cost_fn, [w])
print(w)

# With this we ran the algorithm once (one-step)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=0.09999997>


The `optimizer.minize(cost_fn, [w])` is a simpler version (does the same thing) in comparison to the three lines of code we wrote above (with the `GradientTape`).

In [28]:
w = tf.Variable(0, dtype = tf.float32)
x = np.array([1.0, -10.0, 25.0], dtype =np.float32)
optimizer = tf.keras.optimizers.Adam(0.1) 

def training(x, w, optimizer):
    def cost_fn():
        return x[0]* (w**2) + x[1]*w + x[2]
    for i in range(1000):
        optimizer.minimize(cost_fn, [w])
    return w

w = training(x,w, optimizer)
print(w)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=5.000001>


Thus, $w$ achieves the minimum like before.