# Intro to Tensorflow

In our industry, the lines between what constitutes Machine Learning, Deep Learning, AI, etc. are becoming blurry. This in part due to widely misunderstood theory behind the respective techniques and technologies, and certainly there's an element of tech branding behind the labels. But despite the fantastical images painted of this tech, it is widely accessible-- not just to developers, but to researchers as well. 

While machine learning has shown great value in consumer-facing fields, deep learning is making a case for itself in laboratory settings in providing insights into data with which traditional machine learning approaches struggle.

In the Python universe, there are several well maintained, widely adopted deep learning frameworks. *Tensorflow* is arguably the most popular, guaging by GitHub stars and forks. Originally developed by the Google Brain team, it was open-sourced in 2015 and quickly rose to prominence in the deep learning community.

## Dataflow
In TensorFlow, the user builds a *graph* that specifies the flow of computation. At a lower level, TF determines the dependencies between the operations and efficiently run them accordingly when executed. This is not unlike *dataflow* models in parallel and distributed computing. This framework has the benefit of being able to parallelize operations, and even distribute your model across devices and servers.

In [11]:
import tensorflow as tf

g = tf.Graph()
with g.as_default():
    with tf.Session() as sess:
        x = tf.constant(5, name='x', dtype=tf.float32)
        w=  tf.constant(1, name='w', dtype=tf.float32)
        b = tf.constant(1, name='b', dtype=tf.float32)
        z = w*x + b
        sigmoid = 1 / (1 + tf.exp(-z))
        tanh = tf.tanh(z)

Above, there are six nodes defined. These variables don't contain values, but rather associates the calculations therein with a computation graph. As the method implies, the first three objects `x`, `w`, and `b` are nodes to output constants. `name=` and `dtype=` are optional *source operations*, which set the attributes of the nodes that can be used to set the object's characteristics which can be used for later reference.

Looking at the latter three objects, they depend on several of the preceding objects. The activation functions `sigmoid` and `tanh` cannot be evaluated without `z`, thus these functions have a *direct dependency* on `z`. `z` cannot be evaulated without the defined constants, on which `z` is also directly dependent, and `sigmoid` and `tanh` are *indirectly dependent*. When the operations are evaluated within the session, the computation graph used will reflect this order.


Despite the above code, there has not been any calculations done. This is evident when trying to print a variable. What *has* been done is the instantiation of `Operation` objects (ops, for short). The output in printing these ops are `Tensor` objects, which are handles that make reference to the output that *will be* returned once the graph is executed. 


In [12]:
print(sigmoid)
print(tanh)

Tensor("truediv:0", shape=(), dtype=float32)
Tensor("Tanh:0", shape=(), dtype=float32)


## Creating a Graph
Behind the scenes, TensoFlow is associating the flow of computations to a `Graph` object. Once we import tensorflow, a default graph is formed to which subsequent node definitions are associated. When nodes are defined, (e.g. `x = tf.constant(5, name='x', dtype=tf.float32)`), the variable `x` represents a node within the default graph. Any number of graphs can be instantiated and executed.

In [13]:
print(tf.get_default_graph())

g2 = tf.Graph()

with g2.as_default():
    print(tf.get_default_graph())

<tensorflow.python.framework.ops.Graph object at 0x0000021B17F7FE10>
<tensorflow.python.framework.ops.Graph object at 0x0000021B3AFFCF28>


In building this graph, each node is represented as a *tensor*, or ndarray, instead of a value. To evaluate any of these expresions, it must be executed in a *session*.


Building a project in Tensorflow can largely be described as two phases: The construction phase and the execution phase:
* The **construction phase** involves building the computation graph required for training the given model (above). 
* The **execution phase** involves training the model iteratively to optimize the model's parameters.

In the below `with` clause: we are running a Tensorflow session. This "session" contains the user defined variables, and appropriately designates resources to evaluate the graph. 

In [14]:
with g.as_default():
    with tf.Session() as sess:
        sigmoid_val, tanh_val = sess.run([sigmoid, tanh])
        print("sigmoid: {:.5f}".format(sigmoid_val))        
        print("tanh: {:.5f}".format(tanh_val))

sigmoid: 0.99753
tanh: 0.99999


## Creating a Session

This is done in the *execution phase* of a basic tensorflow project. In calling `tf.Session()`, Tensorflow appropriately designates compute resources within the environment's CPU, GPU, etc, to execute the computations defined in the graph. Rather than instantiating a session variable and closing it after execution, we can run it in a `with` block.

Within the open session, we can run a given node and print its output. The variables executed within the `run()` method are called *fetches*.

While the `with` block convention is common practice in scripting and app development, in exploration and testing environments (i.e. notebooks, IDE's, etc), `InteractiveSession` can be launched in lieu of `Session` to install itself ass the default session, which can avoid needing to constantly refer to the session object(s). In this context, the `.eval()` method can be called on a node for execution.

In [24]:
sess = tf.InteractiveSession()

x = tf.Variable(5.)
y = tf.Variable(2.)
z = x**2 + 2*y + 5

init = tf.global_variables_initializer()
sess.run(init)
print(f"z= {z.eval()}")

sess.close()

z= 34.0


## Variables

While Tensor objects like `z` and `sigmoid` could be potentially filled with different values between different executions, `Variable` objects maintain their state within the graph. Rather than empty the tensor between computations, a `Variable`'s state persists between executions. So if a variable $a$ is fit to a linear model with a value of say 2.0, *the value of $a$ will be 2.0 the next time the graph is executed*. This is important for iterating on variables between executions when optimizing a model. 


A variable object requires two steps in a program: 
1. calling the `Variable()` function 
2. initializing its value(s).

In [70]:
tf.reset_default_graph()
g = tf.Graph()

def var_init():
    with g.as_default():
        # Define the variable
        x = tf.Variable(5, name="x")
        # Create variable initializer
        init = tf.global_variables_initializer()
        with tf.Session() as sess:
            # Initialize variable
            sess.run(init)
            print("Session 1 x-init:",sess.run(x))
            print("name:",x.name, "\n")

            x = x + 5
            print("Session 1 +5:",sess.run(x))
            print("name:",x.name, '\n')

        with tf.Session() as sess2:
            sess2.run(init)
            print("Session 2 Init:",sess2.run(x))
            print("name:",x.name, '\n')        

var_init()
print("Re-run Function...\n")
var_init()

Session 1 x-init: 5
name: x:0 

Session 1 +5: 10
name: add:0 

Session 2 Init: 10
name: add:0 

Re-run Function...

Session 1 x-init: 5
name: x_1:0 

Session 1 +5: 10
name: add_1:0 

Session 2 Init: 10
name: add_1:0 



You can see that between different sessions, the value stored in the Variable `x` persists between different sessions and maintains a fixed state in the graph. Running the same initialization returns a different name for the variable, and a new variable is created in the graph every time. There is no teardown after the session ends.

## Placeholders
Utilization of placeholders (`np.zeros`, `np.ones`, etc) in Python in the context of scientific computing is commonplace, as it makes storage of values much more efficient on compute resources.

In [26]:
def fast_store(n):
    x = np.zeros(n)
    for i in range(n):
        x[i] = i
    return x

def slow_store(n):
    x = np.array([])
    for i in range(n):
        np.append(x, i)
    return x

%timeit -n 10 fast_store(10000)
%timeit -n 10 slow_store(10000)

895 µs ± 47.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
33.2 ms ± 699 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


TensorFlow's `placeholder` object functions similarly, functioning as an empty data structure to be filled with data at execution, with a key difference that defining its shape is optional. A common practice is to define the placeholder's shape as `(None, n_x)`, where number of features of the trained data $n_x$ is known and leaving the number of samples $m$ unspecified (as batch size can vary).

In [167]:
s = tf.placeholder(tf.float32, shape=(None, 15))
s

<tf.Tensor 'Placeholder_1:0' shape=(?, 15) dtype=float32>

In [168]:
X_init = np.zeros((5, 5))

with tf.Graph().as_default():
    X = tf.placeholder(tf.float32, shape=(5, 5))
    tens = X + 10
    with tf.Session() as sess:
        outs = sess.run(tens, feed_dict={X: X_init})

print("outs = {}".format(outs))

outs = [[10. 10. 10. 10. 10.]
 [10. 10. 10. 10. 10.]
 [10. 10. 10. 10. 10.]
 [10. 10. 10. 10. 10.]
 [10. 10. 10. 10. 10.]]


## Simple Graph Example 

For instance, take a simple linear model.
Below is an evaluation of the normal equation to fit a linear regression model on training data $X$ and target $y$,  we can find least squares through the normal equation 

$W = (X^TX)^{-1}X^Ty$

where $X$ is of shape ($m$, $n$). Within this computation graph, tensorflow parallelizes the calculations within the matrix multiplication on the user's GPU card (given you've installed the supporting Tensorflow GPU support packet).

In [299]:
from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler

tf.reset_default_graph()

# Get the data
X, y = make_regression(n_features=10, n_informative=10, bias=1)
m, n = X.shape
b = np.ones((m, 1))

# Add bias term
X_b = np.c_[b, data]
X_b_scaled = StandardScaler().fit_transform(X)

# Define variables
X = tf.constant(X_b_scaled, dtype=tf.float32, name="X")
y = tf.constant(y.reshape(-1, 1), dtype=tf.float32, name='y')

# Build normal equation
X_transpose = tf.transpose(X)
W = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(X_transpose, X)), X_transpose), y)

with tf.Session() as sess:
    W_vals = W.eval()

print(W_vals)

[[ 33.389023]
 [ 81.66528 ]
 [112.97624 ]
 [ 15.458442]
 [ 20.357716]
 [ 58.611958]
 [ 18.960642]
 [ 97.52006 ]
 [ 63.445244]
 [ 86.58184 ]]


## Optimization
With the use of `Variable`s and `placeholder`s, we can effectively optimize the parameters of a learning model. To do so, we first define a cost function (see "ML_Cost_Functions.ipynb"). Using the example above, a basic appraoch would be to use an MSE cost, and iteratively calculate the gradient of the cost with respect to the parameters and updating `W` accordingly (see "Gradient_Descent_Theory.ipynb").

In [318]:
X, y = make_regression(n_features=10, n_informative=1, bias=1, random_state=1)

def lin_reg(X, y, eta=0.01, epochs=1000, optimizer=None):
    """
    Functiont to fit """
    tf.reset_default_graph()
    # Get the data
    m, n = X.shape
    b = np.ones((m, 1))

    # Add bias term
    X_b = np.c_[b, data]
    X_b_scaled = StandardScaler().fit_transform(X)

    # Define variables
    X = tf.constant(X_b_scaled, dtype=tf.float32, name="X")
    X_transpose = tf.transpose(X)
    y = tf.constant(y.reshape(-1, 1), dtype=tf.float32, name='y')

    # Create variable for weights
    W_init = tf.random.uniform([n, 1], -1, 1)
    W = tf.Variable(W_init, name='W')

    # Build cost functions
    y_pred = tf.matmul(X, W, name="predictions")
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name="mse")

    # Build optimizer
    if optimizer == None:
        gradients = 2/m * tf.matmul(X_transpose, error)
        optim_op = tf.assign(W, W - eta * gradients)
    else:
        optimizer = optimizer(learning_rate=eta)
        optim_op = optimizer.minimize(mse)

    # Initialize the variables
    init = tf.global_variables_initializer()

    with tf.Session() as sess:
        sess.run(init)

        for epoch in range(EPOCHS):
            if epoch % 100 == 0:
                print("Epoch {}:".format(epoch), "MSE =", mse.eval())
            sess.run(optim_op)

        W_best = W.eval()
    return W_best

lin_reg(X, y)

Epoch 0: MSE = 1352.946
Epoch 100: MSE = 50.47625
Epoch 200: MSE = 22.143654
Epoch 300: MSE = 20.871607
Epoch 400: MSE = 20.800337
Epoch 500: MSE = 20.795948
Epoch 600: MSE = 20.795664
Epoch 700: MSE = 20.795643
Epoch 800: MSE = 20.795643
Epoch 900: MSE = 20.795643


array([[-2.2154120e-05],
       [-1.9091724e-05],
       [ 1.1751610e-05],
       [ 2.0417933e-06],
       [-8.0235886e-06],
       [ 5.7001848e-06],
       [-2.3571036e-05],
       [ 4.2903166e-06],
       [-3.4671698e-06],
       [ 3.6011684e+01]], dtype=float32)

## Autodiff

Computing gradients of parameters like above is essential in optimizing machine learning models, but hard coding these derivatives can obviously become unwieldy for even moderately complex algorithms, especially in networks that require back-propagation across several layers. Rather than define your own optimization by assigning a new value to a variable, TensorFlow has built-in optimizers within its `train` module, which can be implemented by defining the optimizer, and passing your cost function to the method of interest (e.g. `.minimize()`). These features are built on TensorFlow's autodiff functionality which enables efficient computation of gradients across a variable, given an op. 

In [319]:
optimizer = tf.train.GradientDescentOptimizer
lin_reg(X, y, optimizer=optimizer)

Epoch 0: MSE = 1254.07
Epoch 100: MSE = 50.86708
Epoch 200: MSE = 22.216215
Epoch 300: MSE = 20.877363
Epoch 400: MSE = 20.800753
Epoch 500: MSE = 20.795975
Epoch 600: MSE = 20.795664
Epoch 700: MSE = 20.795645
Epoch 800: MSE = 20.795643
Epoch 900: MSE = 20.795643


array([[-2.1615038e-05],
       [-1.5646498e-05],
       [ 1.5039540e-05],
       [ 5.0544027e-06],
       [-7.9438887e-06],
       [ 4.9725713e-06],
       [-2.4574336e-05],
       [ 7.3482374e-06],
       [-2.9006189e-06],
       [ 3.6011684e+01]], dtype=float32)

In [313]:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
X = np.random.randn(100, 3)
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: X})
    B_val_2 = B.eval(feed_dict={A: X})
    
print(B_val_1)
print(B_val_2)

[[3.5512772 4.3955736 6.1375   ]
 [4.526003  3.2090962 4.816981 ]
 [4.195693  5.1885405 5.3121157]
 [3.3237796 4.7952223 5.1048284]
 [7.1511574 5.1735024 6.221342 ]
 [5.460819  5.457445  4.812509 ]
 [4.450136  3.4602032 4.1982913]
 [4.405355  4.9720197 3.8809934]
 [5.1558924 5.887164  4.9223204]
 [4.2831454 5.5837054 5.9139395]
 [5.857666  5.780221  4.6501865]
 [5.146264  4.8054833 5.1663284]
 [4.012257  5.1810975 5.403373 ]
 [6.0992317 5.4476004 4.8229213]
 [3.094407  3.9399343 4.4695272]
 [6.34593   5.543147  6.5373464]
 [5.153954  3.8028445 5.5816817]
 [6.820577  3.2217987 5.6709   ]
 [5.8448586 2.9144728 3.1781154]
 [3.9795852 4.7052193 4.9806695]
 [6.0521607 5.2096705 4.5544376]
 [3.7426653 5.5440264 4.481957 ]
 [6.140338  5.2969484 6.296299 ]
 [4.346713  5.5494175 5.8985453]
 [5.583038  7.176785  4.091737 ]
 [4.8679514 4.850676  4.36546  ]
 [6.023919  4.8202796 6.7576275]
 [5.5712953 6.5120955 4.2843857]
 [5.4347973 5.4325767 3.7615004]
 [3.798301  5.9995546 3.6785645]
 [6.991279

## Name Scopes
As a TF graph grows and becomes more complex and nodes begin to pile up, the use of name scopes can group nodes as a subset to be referenced later, and can also help simplify visualizations in tensorboard where nodes can be rendered as one namescope cell.