# Eager Execution

TensorFlow's eager execution evaluates operations immediately, without an extra graph-building step. Operations return concrete values instead of constructing a computational graph to run later. This makes it easy to get started with TensorFlow, debug models, reduce boilerplate code, and is ~~fun~~ amazing!

# Setup and basic usage

In [1]:
from __future__ import division, absolute_import, division, print_function

import tensorflow as tf

tf.enable_eager_execution()

In [2]:
tf.executing_eagerly()

x = [[2.]]
m = tf.matmul(x, x) 
print("hello, {}".format(m))
# tf.Tensor objects reference concrete values instead of 
# symbolic handles to nodes in a computational graph.           COOL
# Since there isn't a computational graph to build and run later 
# in a session, it's easy to inspect results using print() or a debugger.

# TensorFlow math operations convert Python objects 
# and NumPy arrays to tf.Tensor objects. The tf.Tensor.numpy    THANK GOD
# method returns the object's value as a NumPy ndarray.

hello, [[4.]]


In [3]:
a = tf.constant([[1, 2],
                 [3, 4]])
print(a)

# Broadcasting support
b = tf.add(a, 1) #a+1
print(b)

# Operator overloading is supported
print(a * b)     #a.*b    NEAT!!
print("-------")

# Use NumPy values
import numpy as np
c = np.multiply(a, b)
print(c)

# Obtain numpy value from a tensor:
print(a.numpy())

tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32)
tf.Tensor(
[[2 3]
 [4 5]], shape=(2, 2), dtype=int32)
tf.Tensor(
[[ 2  6]
 [12 20]], shape=(2, 2), dtype=int32)
-------
[[ 2  6]
 [12 20]]
[[1 2]
 [3 4]]


# Eager training

## Automatic differentiation

In [4]:
import tensorflow.contrib.eager as tfe

w = tfe.Variable([[1.0]])
with tfe.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, [w])
print(grad) # getting the hang of it, you can print out details of any
            # tf object
    
# During eager execution, use tfe.GradientTape to trace 
# operations for computing gradients later.

Instructions for updating:
Use the retry module or similar alternatives.
[<tf.Tensor: id=31, shape=(1, 1), dtype=float32, numpy=array([[2.]], dtype=float32)>]


In [5]:
# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 1000
training_inputs = tf.random_normal([NUM_EXAMPLES]) #generates 1000 #s 0-1
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise

def prediction(input, weight, bias):
  return input * weight + bias

# A loss function using mean-squared error
def loss(weights, biases):
  error = prediction(training_inputs, weights, biases) - training_outputs
  return tf.reduce_mean(tf.square(error)) 
    #handles matrices, very cool

# Return the derivative of loss with respect to weight and bias
def grad(weights, biases):
  with tfe.GradientTape() as tape:
    loss_value = loss(weights, biases) 
  return tape.gradient(loss_value, [weights, biases]) 
    # tape.gradient(squared_error, [theta, biases])
    # returns derivatives of loss for each theta! neat. 
    # Note to self: seperate biases from weights

train_steps = 200
learning_rate = 0.01
# Start with arbitrary values for W and B on the same batch of data
W = tfe.Variable(5.) #theta
B = tfe.Variable(10.) #theta(1) in octave
#standard variable, in this case a decimal

print("Initial loss: {:.3f}".format(loss(W, B)))

for i in range(train_steps):
  dW, dB = grad(W, B)
  W.assign_sub(dW * learning_rate) # W -=('gradient' * alpha)
  B.assign_sub(dB * learning_rate)
    #basically gradient descent for a simple linear classifier
  if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss(W, B)))

print("Final loss: {:.3f}".format(loss(W, B)))
print("W = {}, B = {}".format(W.numpy(), B.numpy()))

# basically in this example: 
# tfe.GradientTape() as tape, 
# can do, tape.gradient(loss_value, [weights, biases]
# which takes: cost_fcn, [theta(2:end), theta(1)]
# returns: vector of derivative (-/+ int)

# *Can also be used to do:
# backpropagation,
# derivatives in respect to things, 
# partial derivatives,
# *And can be:
# overloaded,

# Beautiful. (& complicated!)

Initial loss: 69.396
Loss at step 000: 66.681
Loss at step 020: 30.206
Loss at step 040: 13.978
Loss at step 060: 6.758
Loss at step 080: 3.545
Loss at step 100: 2.115
Loss at step 120: 1.479
Loss at step 140: 1.196
Loss at step 160: 1.070
Loss at step 180: 1.014
Final loss: 0.990
W = 3.0184741020202637, B = 2.1261708736419678


## Build and train models

Eager execution encourages the use of the Keras-style layer classes in the "tf.keras.layers" module. Additionally, the "tf.train.Optimizer" classes provide sophisticated techniques to calculate parameter updates.

In [6]:
model = tf.keras.Sequential([
  tf.keras.layers.Dense(10, input_shape=(784,)),  # must declare input shape
  tf.keras.layers.Dense(10)
]) 
#tf includes Keras in it! Can derive own class from Keras as well

# Andd of course they updated the tutorial while I was in the middle of doing it. That confused me for a good 20 min

## Build a model
While you can use any Python object to represent a layer, TensorFlow has tf.keras.layers.Layer as a convenient base class. Inherit from it to implement your own layer:

In [7]:
model = tf.keras.Sequential([
  tf.keras.layers.Dense(10, input_shape=(784,)),  # must declare input shape
  tf.keras.layers.Dense(10)
])

## Train a model

In [8]:
# Create a tensor representing a blank image
batch = tf.zeros([1, 1, 784])
print(batch.shape)  # => (1, 1, 784)

result = model(batch)
# => tf.Tensor([[[ 0.  0., ..., 0.]]], shape=(1, 1, 10), dtype=float32)

(1, 1, 784)


In [15]:
import dataset  # download dataset.py file
dataset_train = dataset.train('./datasets').shuffle(60000).repeat(4).batch(32)

To train a model, define a loss function to optimize and then calculate gradients. Use an optimizer to update the variables:

In [16]:
def loss(model, x, y):
  prediction = model(x)
  return tf.losses.sparse_softmax_cross_entropy(labels=y, logits=prediction)
    # loss function, tf.losses is lib with loss functions
    # sparse_softmax_cross_entropy is evidently one of their functions

def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return tape.gradient(loss_value, model.variables)
    # keras object has .variables, which nicely passes into tape.gradient

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
# Gradient optimizer is apperantly an object we use

x, y = iter(dataset_train).next()
print("Initial loss: {:.3f}".format(loss(model, x, y)))

# Training loop
for (i, (x, y)) in enumerate(dataset_train):
  # Calculate derivatives of the input function with respect to its parameters.
  grads = grad(model, x, y)
  # Apply the gradient to the model
  optimizer.apply_gradients(zip(grads, model.variables),
                            global_step=tf.train.get_or_create_global_step())
    # kinda gross, but the optimizer object gets the above inputs (model info)
    # and a global step tensor to do gradient descent
    # still better than implementing backpropogation myself! Hah (even if it might be fun...)
    
    
  if i % 200 == 0:
    print("Loss at step {:04d}: {:.3f}".format(i, loss(model, x, y)))

print("Final loss: {:.3f}".format(loss(model, x, y)))

TypeError: 'BatchDataset' object is not iterable

### Variables and optimizers
tfe.Variable objects store mutable tf.Tensor values accessed during training to make automatic differentiation easier. 

### Use objects for state during eager execution

With graph execution, program state (such as the variables) is stored in global collections and their lifetime is managed by the tf.Session object. In contrast, during eager execution the lifetime of state objects is determined by the lifetime of their corresponding Python object.