In [1]:
import tensorflow as tf

  from ._conv import register_converters as _register_converters


In [3]:
tf.__version__

'1.11.0'

# Tensorflow basics


## 1. With graph execution (i.e. without eager execution)

Tensorflow programs have two halves: definig the graph, and opening a session to run the graph

In [4]:
# first half of the script looks like this:
x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x * x * y + y + 2

In [5]:
# second half of the script looks like this:
sess = tf.Session()
sess.run(x.initializer) # note that you need to initialise your tf.Variables
sess.run(y.initializer)
result = sess.run(f)
print(result)

RuntimeError: The Session graph is empty.  Add operations to the graph before calling run().

In [None]:
# or, equivalently, like this:

with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    #result = f.eval()  # why is this not result = sess.run(f)?
    result = sess.run(f)
    

In [None]:
print(result)

So, what's going on?

*Variables*
`tf.Variable` objects store mutable `tf.Tensor` values accessed during training to make automatic differentiation easier. The parameters of a model can be set as variables.



## 2. With eager execution


Eager execution is good for:

* debugging
* beginners: you use Python control flow, instead of a graph structure

However, deploying code written for eager execution is more difficult: you should either generate a graph from the model, or run the Python runtime and code directly on the server.


With graph execution, program state (such as the variables) gets stored as globals. Its lifetime is managed by the `tf.Session` object that contains the execution. In contrast, during eager execution the lifetime of state objects is determined by the lifetime of their corresponding Python object.

To use eager execution, you have to explicitly enable it with `tf.enable_eager_execution()`. This needs to be done right at program startup, so we'll do it in another notebook.


# Keras basics

Keras is an API on top of Tensorflow. It is intended to be user-friendly, human-readable, etc.

It's worth understanding Keras before trying to use eager execution, because eager execution works well with `tf.keras` 

Google has adopted Keras as an interface to Tensorflow. Note that `tf.keras` is not exactly the same as Keras, but Keras was, on adoption, merged into Tensorflow. Just do `from tensorflow import keras`

Keras docs: https://keras.io/#guiding-principles
Tensoflow on Keras: https://www.tensorflow.org/guide/keras

In Keras, you assemble layers to build models. A model is (usually) a graph of layers. The most common type of model is a stack of layers: the `tf.keras.Sequential` model.




In [7]:
from tensorflow import keras


## 1. Define the model

In [8]:
# Now you can define your model. It needs a kind (eg Sequential) and then some layers (here densely connected layers)

model = keras.Sequential()
# Adds a densely-connected layer with 64 units to the model:
model.add(keras.layers.Dense(64, activation='relu'))
# Add another:
model.add(keras.layers.Dense(64, activation='relu'))
# Add a softmax layer with 10 output units:
model.add(keras.layers.Dense(10, activation='softmax'))


* **There are many `tf.keras.layers`.**


* There are also many kinds of _construction parameter_ to pass to the layers on creation. eg:


 * `activation`: Set the activation function for the layer. By default, no activation is applied.


 * `kernel_initializer` and `bias_initializer`: The initialization schemes that create the layer's weights (kernel and bias). This parameter is a name or a callable object. This defaults to the "Glorot uniform" initializer. That'll probably do for us.


 * `kernel_regularizer` and `bias_regularizer`: The regularization schemes that apply the layer's weights (kernel and bias), such as `L1` or `L2` regularization. By default, no regularization is applied.


If your model is just a simple stack of layers, then instantiate it as a `Sequential` model. Easy.

On the other hand, if your model is more complicated -- multi-input or multi-output -- then use the Keras Functional API.

## 2. Compile the model

In [10]:
# And then train your model by calling .compile()

model.compile(optimizer=tf.train.AdamOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

* **tf.keras.Model.compile takes three important arguments -- optimizer! loss! metrics!**



 * `optimizer`: This object specifies the training procedure. Pass it optimizer instances from the `tf.train` module, such as `AdamOptimizer`, `RMSPropOptimizer`, or `GradientDescentOptimizer`.


* `loss`: The function to minimize during optimization. Common choices include mean square error (`mse`), `categorical_crossentropy`, and `binary_crossentropy`. Loss functions are specified by name or by passing a callable object from the `tf.keras.losses` module.


* `metrics`: Used to monitor training. These are string names or callables from the `tf.keras.metrics` module.



e.g.

    # Configure a model for mean-squared error regression.
    model.compile(optimizer=tf.train.AdamOptimizer(0.01),
                  loss='mse',       # mean squared error
                  metrics=['mae'])  # mean absolute error

    # Configure a model for categorical classification.
    model.compile(optimizer=tf.train.RMSPropOptimizer(0.01),
                  loss=keras.losses.categorical_crossentropy,
                  metrics=[keras.metrics.categorical_accuracy])
              

## 3. Fit the model

Give it some data! 

For small models, you can use `numpy` arrays in memory.
For large models, you can use the `Datasets` API.


* **`tf.keras.Model.fit` takes three important arguments:** 


* `epochs`: Training is structured into epochs. An epoch is one iteration over the entire input data (this is done in smaller batches).


* `batch_size`: When passed NumPy data, the model slices the data into smaller batches and iterates over these batches during training. This integer specifies the size of each batch. Be aware that the last batch may be smaller if the total number of samples is not divisible by the batch size.


* `validation_data`: When prototyping a model, you want to easily monitor its performance on some validation data. Passing this argument — a tuple of inputs and labels — allows the model to display the loss and metrics in inference mode for the passed data, at the end of each `epoch`.

In [11]:
import numpy as np

data = np.random.random((1000, 32))
labels = np.random.random((1000, 10))

val_data = np.random.random((100, 32))
val_labels = np.random.random((100, 10))

model.fit(data, labels, epochs=10, batch_size=32,
          validation_data=(val_data, val_labels))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0xb29ce2c18>

## 4. Evaluate and predict!

Evaluate with `tf.keras.Model.evaluate`. Predict with `tf.keras.Model.predict`.

Both of these methods can use both NumPy data (fine if a small model) and a `tf.data.Dataset` (advised if a big model).

In [12]:
# model.evaluate(x, y, batch_size=32) # for small model

# model.evaluate(dataset, steps=30) # for large model

Great!

You should also probably be aware that Keras runs just fine in a distributed manner. The story is basically the same as for TensorFlow more generally.

`tf.keras` models can run on multiple GPUs using `tf.contrib.distribute.DistributionStrategy`. This API provides distributed training on multiple GPUs with almost no changes to existing code.

There are a few distribution strategies -- synchronous and asynchoronous parameter parallelism, and model parallelism. Mirrored Strategy is one synchronous parameter parallel strategy. It is the only distribution strategy currently supported by `tf.keras`.

To use DistributionStrategy with Keras, convert the `tf.keras.Model` to a `tf.estimator.Estimator` with `tf.keras.estimator.model_to_estimator`, then train the estimator. Then add another line or two but its not (in theory) too hard.
