# Eager Execution

TensorFlow's eager execution evaluates operations immediately, without an extra graph-building step. Operations return concrete values instead of constructing a computational graph to run later. This makes it easy to get started with TensorFlow, debug models, reduce boilerplate code, and is ~~fun~~ amazing!

# Setup and basic usage

In [1]:
from __future__ import division, absolute_import, division, print_function

import tensorflow as tf

tf.enable_eager_execution()

In [2]:
tf.executing_eagerly()

x = [[2.]]
m = tf.matmul(x, x) 
print("hello, {}".format(m))
# tf.Tensor objects reference concrete values instead of 
# symbolic handles to nodes in a computational graph.           COOL
# Since there isn't a computational graph to build and run later 
# in a session, it's easy to inspect results using print() or a debugger.

# TensorFlow math operations convert Python objects 
# and NumPy arrays to tf.Tensor objects. The tf.Tensor.numpy    THANK GOD
# method returns the object's value as a NumPy ndarray.

hello, [[4.]]


In [3]:
a = tf.constant([[1, 2],
                 [3, 4]])
print(a)

# Broadcasting support
b = tf.add(a, 1) #a+1
print(b)

# Operator overloading is supported
print(a * b)     #a.*b    NEAT!!
print("-------")

# Use NumPy values
import numpy as np
c = np.multiply(a, b)
print(c)

# Obtain numpy value from a tensor:
print(a.numpy())

tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32)
tf.Tensor(
[[2 3]
 [4 5]], shape=(2, 2), dtype=int32)
tf.Tensor(
[[ 2  6]
 [12 20]], shape=(2, 2), dtype=int32)
-------
[[ 2  6]
 [12 20]]
[[1 2]
 [3 4]]


# Eager training

## Automatic differentiation

In [4]:
import tensorflow.contrib.eager as tfe

w = tfe.Variable([[1.0]])
with tfe.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, [w])
print(grad) # getting the hang of it, you can print out details of any
            # tf object
    
# During eager execution, use tfe.GradientTape to trace 
# operations for computing gradients later.

Instructions for updating:
Use the retry module or similar alternatives.
[<tf.Tensor: id=31, shape=(1, 1), dtype=float32, numpy=array([[2.]], dtype=float32)>]


In [5]:
# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 1000
training_inputs = tf.random_normal([NUM_EXAMPLES]) #generates 1000 #s 0-1
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise

def prediction(input, weight, bias):
  return input * weight + bias

# A loss function using mean-squared error
def loss(weights, biases):
  error = prediction(training_inputs, weights, biases) - training_outputs
  return tf.reduce_mean(tf.square(error)) 
    #handles matrices, very cool

# Return the derivative of loss with respect to weight and bias
def grad(weights, biases):
  with tfe.GradientTape() as tape:
    loss_value = loss(weights, biases) 
  return tape.gradient(loss_value, [weights, biases]) 
    # tape.gradient(squared_error, [theta, biases])
    # returns derivatives of loss for each theta! neat. 
    # Note to self: seperate biases from weights

train_steps = 200
learning_rate = 0.01
# Start with arbitrary values for W and B on the same batch of data
W = tfe.Variable(5.) #theta
B = tfe.Variable(10.) #theta(1) in octave
#standard variable, in this case a decimal

print("Initial loss: {:.3f}".format(loss(W, B)))

for i in range(train_steps):
  dW, dB = grad(W, B)
  W.assign_sub(dW * learning_rate) # W -=('gradient' * alpha)
  B.assign_sub(dB * learning_rate)
    #basically gradient descent for a simple linear classifier
  if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss(W, B)))

print("Final loss: {:.3f}".format(loss(W, B)))
print("W = {}, B = {}".format(W.numpy(), B.numpy()))

# basically in this example: 
# tfe.GradientTape() as tape, 
# can do, tape.gradient(loss_value, [weights, biases]
# which takes: cost_fcn, [theta(2:end), theta(1)]
# returns: vector of derivative (-/+ int)

# *Can also be used to do:
# backpropagation,
# derivatives in respect to things, 
# partial derivatives,
# *And can be:
# overloaded,

# Beautiful. (& complicated!)

Initial loss: 68.494
Loss at step 000: 65.857
Loss at step 020: 30.238
Loss at step 040: 14.185
Loss at step 060: 6.946
Loss at step 080: 3.681
Loss at step 100: 2.206
Loss at step 120: 1.540
Loss at step 140: 1.240
Loss at step 160: 1.104
Loss at step 180: 1.042
Final loss: 1.015
W = 3.043541193008423, B = 2.1179614067077637


## Build and train models

Eager execution encourages the use of the Keras-style layer classes in the "tf.keras.layers" module. Additionally, the "tf.train.Optimizer" classes provide sophisticated techniques to calculate parameter updates.

In [6]:
model = tf.keras.Sequential([
  tf.keras.layers.Dense(10, input_shape=(784,)),  # must declare input shape
  tf.keras.layers.Dense(10)
]) 
#tf includes Keras in it! Can derive own class from Keras as well

# Andd of course they updated the tutorial while I was in the middle of doing it. That confused me for a good 20 min

## Build a model
While you can use any Python object to represent a layer, TensorFlow has tf.keras.layers.Layer as a convenient base class. Inherit from it to implement your own layer:

In [7]:
model = tf.keras.Sequential([
  tf.keras.layers.Dense(10, input_shape=(784,)),  # must declare input shape
  tf.keras.layers.Dense(10)
])

## Train a model

In [8]:
# Create a tensor representing a blank image
batch = tf.zeros([1, 1, 784])
print(batch.shape)  # => (1, 1, 784)

result = model(batch)
# => tf.Tensor([[[ 0.  0., ..., 0.]]], shape=(1, 1, 10), dtype=float32)

(1, 1, 784)


In [9]:
import dataset  # download dataset.py file
dataset_train = dataset.train('./datasets').shuffle(60000).repeat(4).batch(32)

To train a model, define a loss function to optimize and then calculate gradients. Use an optimizer to update the variables:

In [10]:
def loss(model, x, y):
  prediction = model(x)
  return tf.losses.sparse_softmax_cross_entropy(labels=y, logits=prediction)
    # loss function, tf.losses is lib with loss functions
    # sparse_softmax_cross_entropy is evidently one of their functions

def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return tape.gradient(loss_value, model.variables)
    # keras object has .variables, which nicely passes into tape.gradient

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
# Gradient optimizer is apperantly an object we use

x, y = iter(dataset_train).next()
print("Initial loss: {:.3f}".format(loss(model, x, y)))

# Training loop
for (i, (x, y)) in enumerate(dataset_train):
  # Calculate derivatives of the input function with respect to its parameters.
  grads = grad(model, x, y)
  # Apply the gradieant to the model
  optimizer.apply_gradients(zip(grads, model.variables),
                            global_step=tf.train.get_or_create_global_step())
    # kinda gross, but the optimizer object gets the above inputs (model info)
    # and a global step tensor to do gradient descent
    # still better than implementing backpropogation myself! Hah (even if it might be fun...)
    
    
  if i % 200 == 0:
    print("Loss at step {:04d}: {:.3f}".format(i, loss(model, x, y)))

print("Final loss: {:.3f}".format(loss(model, x, y)))

### Variables and optimizers
tfe.Variable objects store mutable tf.Tensor values accessed during training to make automatic differentiation easier. 

In [11]:
class Model(tf.keras.Model):
  def __init__(self):
    super(Model, self).__init__()
    self.W = tfe.Variable(5., name='weight') 
    self.B = tfe.Variable(10., name='bias')
  def predict(self, inputs):
    return inputs * self.W + self.B
    # can initalize custom keras models and configure own
    # thetas, bias, and prediction


# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 2000
training_inputs = tf.random_normal([NUM_EXAMPLES])
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise

# The loss function to be optimized
def loss(model, inputs, targets):
  error = model.predict(inputs) - targets
  return tf.reduce_mean(tf.square(error))

def grad(model, inputs, targets):
  with tfe.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return tape.gradient(loss_value, [model.W, model.B])
    # utilize GradientTape in gradient descent

# Define:
# 1. A model.
# 2. Derivatives of a loss function with respect to model parameters.
# 3. A strategy for updating the variables based on the derivatives.
model = Model()
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)

print("Initial loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))

# Training loop
for i in range(300):
  grads = grad(model, training_inputs, training_outputs)
  optimizer.apply_gradients(zip(grads, [model.W, model.B]),
                            global_step=tf.train.get_or_create_global_step())
  if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss(model, training_inputs, training_outputs)))

print("Final loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))
print("W = {}, B = {}".format(model.W.numpy(), model.B.numpy()))

Initial loss: 68.354
Loss at step 000: 65.693
Loss at step 020: 29.883
Loss at step 040: 13.890
Loss at step 060: 6.747
Loss at step 080: 3.557
Loss at step 100: 2.133
Loss at step 120: 1.497
Loss at step 140: 1.212
Loss at step 160: 1.086
Loss at step 180: 1.029
Loss at step 200: 1.004
Loss at step 220: 0.992
Loss at step 240: 0.987
Loss at step 260: 0.985
Loss at step 280: 0.984
Final loss: 0.984
W = 2.9868690967559814, B = 2.052278995513916


## Use objects for state during eager execution

With graph execution, program state (such as the variables) is stored in global collections and their lifetime is managed by the tf.Session object. In contrast, during eager execution the lifetime of state objects is determined by the lifetime of their corresponding Python object.

### Variables are objects
During eager execution, variables persist until the last reference to the object is removed, and is then deleted

In [13]:
v = tfe.Variable(tf.random_normal([1000, 1000]))
v = None  # v no longer takes up GPU memory

### Object-based saving
tfe.Checkpoint can save and restore tfe.Variables to and from checkpoints:

In [14]:
x = tfe.Variable(10.)

checkpoint = tfe.Checkpoint(x=x)  # save as "x"

x.assign(2.)   # Assign a new value to the variables and save.
save_path = checkpoint.save('./ckpt/')

x.assign(11.)  # Change the variable after saving.

# Restore values from the checkpoint
checkpoint.restore(save_path)

print(x)  # => 2.0

# Cool cool cool

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>


To save and load models, tfe.Checkpoint stores the internal state of objects, without requiring hidden variables. To record the state of a model, an optimizer, and a global step, pass them to a tfe.Checkpoint:

### Object-oriented metrics
tfe.metrics are stored as objects. Update a metric by passing the new data to the callable, and retrieve the result using the tfe.metrics.result method, for example:

In [19]:
m = tfe.metrics.Mean("loss")
m(0)
m(5)
m.result()  # => 2.5
m([8, 9])
m.result()  # => 5.5

# So there are some metrics in which you can pass
# values, then do m.result() and it will output the result
# of the previous inputs... hmm wonder how many there are

# Not as many as i'd expect... seems like because it's Eager.
# i'm going to have to go lower level after I complete my logistic
# classifier

<tf.Tensor: id=18110, shape=(), dtype=float64, numpy=5.5>

### Summaries and TensorBoard
TensorBoard is a visualization tool for understanding, debugging and optimizing the model training process. It uses summary events that are written while executing the program.

tf.contrib.summary is compatible with both eager and graph execution environments. Summary operations, such as tf.contrib.summary.scalar, are inserted during model construction. For example, to record summaries once every 100 global steps:

In [22]:
# writer = tf.contrib.summary.create_file_writer(logdir)
# global_step=tf.train.get_or_create_global_step()  # return global step var

# writer.set_as_default()

# for _ in range(iterations):
#   global_step.assign_add(1)
#   # Must include a record_summaries method
#   with tf.contrib.summary.record_summaries_every_n_global_steps(100):
#     # your model code goes here
#     tf.contrib.summary.scalar('loss', loss)
#     # i'm not exactly sure how this works. but it seems like a way
#     # to track how your model performs during training,
    
# # TensorBoard seems like its own whole thing as well. Apperantly it's
# # useful for Neural Networks

### Performance
Computation is automatically offloaded to GPUs during eager execution. If you want control over where a computation runs you can enclose it in a tf.device('/gpu:0') block (or the CPU equivalent):

A tf.Tensor object can be copied to a different device to execute its operations:

In [24]:
# good to know, not relevant as of right now for me 
# but looks quite intuitive

### Work with graphs
While eager execution makes development and debugging more interactive, TensorFlow graph execution has advantages for distributed training, performance optimizations, and production deployment. However, writing graph code can feel different than writing regular Python code and more difficult to debug.

In [25]:
# Eager is more debug friendly

### Write compatible code
The same code written for eager execution will also build a graph during graph execution. Do this by simply running the same code in a new Python session where eager execution is not enabled.

1) Use tf.data for input processing instead of queues. It's faster and easier.

2) Use object-oriented layer APIs—like tf.keras.layers and tf.keras.Model—since they have explicit storage for variables.

3) Once eager execution is enabled with tf.enable_eager_execution, it cannot be turned off. Start a new Python session to return to graph execution.

Write, debug, and iterate in eager execution, then import the model graph for production deployment. Use tfe.Checkpoint to save and restore model variables, this allows movement between eager and graph execution environments. See the examples in: tensorflow/contrib/eager/python/examples.

In [26]:
# ok, Eager for development, graph for the final version.

# End of Eager Execution

Going to now copy their example for a linear classifier, and copy it, then implement a logistic classifier based on it by overloading functions 