# TensorFlow
using tensorflow you design your own graph and then Tensorflow takes the graph and runs it efficiently using optimized C++ code.Each of the graphs can be broken up into different sections so they can be trained using distribued computing across hundreds of servers.

In [1]:
import tensorflow as tf
print(tf.__version__)

1.10.0


In [2]:
import platform 
platform.python_version()

'3.5.6'

In [3]:

# This code does not actually perform any computation
# It just creates a computation graph.
X = tf.Variable(3, name='X')
y = tf.Variable(4, name='y')
f = X*X*y + y +2


In [4]:
f

<tf.Tensor 'add_1:0' shape=() dtype=int32>

In [5]:
#  Session initializes the variables and 
#  evaluates f.
with tf.Session() as sess:
    X.initializer.run()
    y.initializer.run()
    result = f.eval()
    
result

42

the 'Session' is set as the default session. Calling x.initializer.run() is equivalent to calling tf.get_default_session().run(X.initializer)
  
This makes the code easier to read and automatically closes it at the end of the block.

Instead of manually running the initializer for every single variable, you can use the global_variables_initializer().

In [6]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run() #initialize all the variables
    result = f.eval()

The Tensorflow program typically split into two parts. The construction and execution phase. The constructino phase typically builds the computation grah representing the ML model and the computations required to train it. 

The execution phase generallly runs a loop that evaluates a training step repeatedly e.g. one step per mini-batch, gradually improving model parameters.

## Managing Graphs
any node you create is automatically added to the default graph


In [7]:
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

True

Sometimes you may want to manage multiple independent graphs. You can do this by creating a new Graph and temporarily making it the default graph inside a with block.

In [8]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)
    
x2.graph is graph

True

In [9]:
x2.graph is tf.get_default_graph()

False

# Lifecycle of a Node Value
when you evaluate a node, TensorFlow automatically determines the set of nodes that it depends on and it evaluates these nodes first. 

In [10]:
w = tf.constant(3)
X = w + 2
y = X + 5
z = X * 3

with tf.Session() as sess:
    print(y.eval()) 
    print(z.eval()) 

10
15


In the above code when evaluating y and z tensorflow will firt find that y depends on X which depends in W and will evaluate W then X then Y. when evaluating Z in the same session it will repeat this process again and re-evaluate W and X before Z. 
 
If you want to evaluate y and Z efficiently. You must ask TensorFlow to evaluate both y and z in just one graph. 

In [11]:
with tf.Session() as sess:
    y_val,z_val = sess.run([y,z])
    print(y_val)
    print(z_val)

10
15


every session of tensorflow has its on state and does not share with any other. In distributed TesorFlow variable state is stored on servers and not the sessions so multiple sessions can share the same variables.

# Linear Regression with Tensorflow


In [12]:
import numpy as np
from sklearn.datasets import fetch_california_housing

In [13]:
housing = fetch_california_housing()
m, n = housing.data.shape

# adds bias to all training instances 
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]



# create two tensorflow nodes to hold this data and targets
x = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="x")
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name="y")


# Defines Theta the weights of the linear regression
# theta corresponds to the Normal equation
XT = tf.transpose(x)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, x)), XT), y)


# Starts session to evalluate theta
with tf.Session() as sess:
    theta_value = theta.eval()
    
theta_value # weights

array([[-3.7465141e+01],
       [ 4.3573415e-01],
       [ 9.3382923e-03],
       [-1.0662201e-01],
       [ 6.4410698e-01],
       [-4.2513184e-06],
       [-3.7732250e-03],
       [-4.2664889e-01],
       [-4.4051403e-01]], dtype=float32)

# Gradient Descent
* Random_uniform() creates a node in the graph that will generate a tensor containing random values, given its shape and value range

* assign() creates a node that will assign a new value to a variable

* The main loop executes the training step n_epochs times and every 1-- iterations prints out the current Mean Squared Error.

In [21]:
tf.reset_default_graph() 
tf.set_random_seed(42)

n_epochs = 1000
learning_rate = 0.01

# Scale the inputs 
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]
scaled_housing_data_plus_bias = scaler.fit_transform(housing_data_plus_bias)


# Create two constant tensors for inputs. 
# like multidimensional arrays
X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name="y")

# Creates a Variable
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")

# predictions are input data X multiplied by the weights
y_pred = tf.matmul(X, theta, name="predictions")

#Calculate errors
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

# differential of the cost function.
gradients = 2/m * tf.matmul(tf.transpose(X), error)

# Reassigning the new weights.
trainin_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init) # declaring the variables.
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch:", epoch, "MSE =", mse.eval())
        sess.run(trainin_op) #automatically evaluates all the dependencies
    best_tehta = theta.eval

Epoch: 0 MSE = 7.589863
Epoch: 100 MSE = 4.874135
Epoch: 200 MSE = 4.81821
Epoch: 300 MSE = 4.8120546
Epoch: 400 MSE = 4.80947
Epoch: 500 MSE = 4.807726
Epoch: 600 MSE = 4.8064837
Epoch: 700 MSE = 4.805584
Epoch: 800 MSE = 4.8049355
Epoch: 900 MSE = 4.8044643


## Autodiff 
there are efficient and non efficient ways of determining the partial derivatiives aka gradients. The above mathematical method for determining gradients is not always the most efficient. Tensorflow have their own autodiff method to use the best version.   

gradients = tf.gradients(mse, [theta])[0]

the gradients function takes an operation in this case mean squared error and a list of variables in this case just theta and creates a list of ops one per variable to compute the gradients of the op with regard to the variable. So the gradients node will compute the gradient vector of the MSE with regards to theta.

### Using an Optimizer 
tensor flow can even provide out of the box optimizers for you e.g. 

In [25]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

## Feeding Data to the Training Algorithm
To implement Mini-batch Gradient Descent we need to replace X and y at every iteration within the next mini-batch. Place holder node will help. they dont perform any computation, just output the data you tell them to output at runtime. They are typically used to pass the training data to Tensorflow at runtime.

In [30]:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5

with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A:[[4,5,6],[7,8,9]]})
    
print(B_val_1)
print(B_val_2)

[[6. 7. 8.]]
[[ 9. 10. 11.]
 [12. 13. 14.]]


In [None]:
X = tf.placeholder(tf.float32, shape=(None, n+1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

batch_size = 100
# m being the 
n_batches = int(np.ceil(m / batch_size)) 

def fetch_batch(epoch, batch_index, batch_size):
    [...] #load the data from disk
    return X_batch, y_batch
with tf.session