In [1]:
import tensorflow as tf
from tensorflow import keras

# Build a simple model
## Sequential model
A model is (usually) a graph of layers. The most common type of model is a stack of layers: the `tf.keras.Sequential` model.

To build a simple, fully-connected network (i.e. multi-layer perceptron):

In [2]:
model = keras.Sequential()

# Adds a densely-connected layer with 64 units to the model
model.add(keras.layers.Dense(64, activation='relu'))
# Add another
model.add(keras.layers.Dense(64, activation='relu'))
# Add a softmax layer with 10 output units:
model.add(keras.layers.Dense(10, activation='softmax'))

### [ReLU and Softmax Activation Functions](https://github.com/Kulbear/deep-learning-nano-foundation/wiki/ReLU-and-Softmax-Activation-Functions)
#### Sigmoid
Lose knowledge from data. Deeper your network is, more knowledge from the data will be "lost".

- So usually not for hidden layer

#### Rectified Linear Units
Simplest non-linear activation function.

$$f(x) = max(0, x)$$
    
- ReLUs result in much faster training for large networks.
- Unfortunately, ReLU units can be fragile during training and can "die". Weights will be zero forever after a point.
- With a proper setting of the learning rate this is less frequently an issue.

#### Softmax function
Squashes the outputs of each unit to be between 0 and 1, just like a sigmoid function. But it also divides each output such that the total sum of the outputs is equal to 1.

$$\sigma(z)_j = \frac{e^{z_j}}{\sum^K_{k=1}e^{z_k}}$$
`z` is a vector of the inputs to the output layer (if you have 10 output units, then there are 10 elements in `z`). And `j` indexes the output units, so $j = 1, 2, ..., K$.

While sigmoid only suitable for 2 classes, softmax can be used for hundereds.

## Configure the layers
There are many `tf.keras.layers` available with some common constructor parameters:

- *activation*: Set the activation function for the layer. This parameter is specified by the name of a built-in function or as a callable object. By default, no activation is applied.

- *kernel_initializer* and *bias_initializer*: The initialization schemes that create the layer's weights (kernel and bias). This parameter is a name or a callable object. This defaults to the "Glorot uniform" initializer.

- *kernel_regularizer* and *bias_regularizer*: The regularization schemes that apply the layer's weights (kernel and bias), such as L1 or L2 regularization. By default, no regularization is applied.

The following instantiates `tf.keras.layers.Dense` layers using constructor arguments:

### Regularization
A regression model that uses L1 regularization technique is called **Lasso Regression** (Least Absolute Shrinkage and Selection Operator) adds “absolute value of magnitude” of coefficient as penalty term to the loss function.
$$\sum^n_{i=1}(y_i - \hat y_i)^2 +\lambda\sum^P_{j=1}|\beta_j|$$

That with L2 is called **Ridge Regression** which adds “squared magnitude” of coefficient as penalty term to the loss function.
$$\sum^n_{i=1}(y_i - \hat y_i)^2 +\lambda\sum^P_{j=1}\beta_j^2$$

If *lambda* is zero then we will get back OLS whereas very large value will make coefficients zero hence it will under-fit.

Lasso shrinks the less important feature’s coefficient to zero thus, removing some feature altogether. So, 
- a huge number of features: L1 or L2 regularization
- a small set of features: cross-validation, stepwise regression, etc..

Traditional methods like cross-validation, stepwise regression to handle overfitting and perform feature selection work well with a small set of features but these techniques are a great alternative when we are dealing with a large set of features.

# Train and evaluate
## Set up training
After the model is constructed, configure its learning process by calling the compile method:

In [3]:
model.compile(optimizer=tf.train.AdamOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

`tf.keras.Model.compile` takes three important arguments:

- `optimizer`: This object specifies the training procedure. Pass it optimizer instances from the `tf.train` module, such as `AdamOptimizer`, `RMSPropOptimizer`, or `GradientDescentOptimizer`.
- `loss`: The function to minimize during optimization. Common choices include `mean square error (mse)`, `categorical_crossentropy`, and `binary_crossentropy`. Loss functions are specified by name or by passing a callable object from the `tf.keras.losses module`.
- `metrics`: Used to monitor training. These are string names or callables from the `tf.keras.metrics` module.

The following shows a few examples of configuring a model for training:

## Input Numpy data
For small datasets, use in-memory *Numpy* arrays to train and evaluate a model. The model is "fit" to the training data using the `fit` method:

In [4]:
import numpy as np

data = np.random.random((1000, 32))
labels = np.random.random((1000, 10))

#model.fit(data, labels, epochs=10, batch_size=32)

#model.evaluate(data, labels, batch_size=32)

#model.predict(data, batch_size=32)

`tf.keras.Model.fit` takes three important arguments:
- `epochs`: Training is structured into epochs. An epoch is one iteration over the entire input data (this is done in smaller batches).
- `batch_size`: When passed NumPy data, the model slices the data into smaller batches and iterates over these batches during training. This integer specifies the size of each batch. Be aware that the last batch may be smaller if the total number of samples is not divisible by the batch size.
- `validation_data`: When prototyping a model, you want to easily monitor its performance on some validation data. Passing this argument—a tuple of inputs and labels—allows the model to display the loss and metrics in inference mode for the passed data, at the end of each epoch.

In [5]:
val_data = np.random.random((100, 32))
val_labels = np.random.random((100, 10))

model.fit(data, labels, epochs=10, batch_size=32,
          validation_data=(val_data, val_labels))

model.evaluate(data, labels, batch_size=32)

model.predict(data, batch_size=32)

Train on 1000 samples, validate on 100 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


array([[0.10935095, 0.10236258, 0.09818337, ..., 0.09453773, 0.10262468,
        0.09647124],
       [0.131173  , 0.09998714, 0.09044728, ..., 0.1107971 , 0.08764174,
        0.11112654],
       [0.10746676, 0.1036151 , 0.09376768, ..., 0.1043117 , 0.11596734,
        0.09098008],
       ...,
       [0.09077893, 0.08651523, 0.10472265, ..., 0.12508228, 0.11357807,
        0.10846335],
       [0.08848815, 0.10286779, 0.0975915 , ..., 0.11754835, 0.0990286 ,
        0.10251258],
       [0.08824801, 0.10511488, 0.09578815, ..., 0.09402472, 0.09972115,
        0.09904332]], dtype=float32)

## Input tf.data datasets
Use the *Datasets API* to scale to large datasets or multi-device training. Pass a `tf.data.Dataset` instance to the fit method:

In [6]:
# Instantiates a toy dataset instance:
#print(data.shape)
#data = tf.constant(1,shape=[1000,32])
#print (data.shape)
#labels = tf.constant(1,shape=[1000,1])

dataset = tf.data.Dataset.from_tensor_slices((data, labels))
dataset = dataset.batch(32)
dataset = dataset.repeat()

# Don't forget to specify `steps_per_epoch` when calling `fit`
# on a dataset.
model.fit(dataset, epochs=10, steps_per_epoch=30)

# model.evaluate(dataset, steps=30)

# model.predict(dataset, steps=30)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x11bd30810>

In [12]:
dataset = tf.data.Dataset.from_tensor_slices((data, labels))
dataset = dataset.batch(32).repeat()

val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_labels))
val_dataset = val_dataset.batch(32).repeat()

model.fit(dataset, epochs=10, steps_per_epoch=30,
          validation_data=val_dataset,
          validation_steps=3)

model.evaluate(dataset, steps=30)

model.predict(dataset, steps=30)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


array([[0.5212436 , 0.5286048 , 0.5968594 , ..., 0.5587488 , 0.5999805 ,
        0.59404695],
       [0.5781163 , 0.58532274, 0.46443263, ..., 0.5537153 , 0.5532794 ,
        0.50749433],
       [0.5642902 , 0.62500095, 0.491225  , ..., 0.57367074, 0.57789665,
        0.53011835],
       ...,
       [0.54619735, 0.57631195, 0.506721  , ..., 0.5471295 , 0.6129462 ,
        0.52478355],
       [0.5426198 , 0.5788304 , 0.52312434, ..., 0.5604776 , 0.568208  ,
        0.5508774 ],
       [0.6686078 , 0.599493  , 0.48047528, ..., 0.62443024, 0.49602535,
        0.5199309 ]], dtype=float32)

## Evaluate and predict
The `tf.keras.Model.evaluate` and `tf.keras.Model.predict` method can use Numpy data and a `tf.data.Dataset`.

To evaluate the inference-mode loss and metrics for the data provided.

# Build advanced models
## Functional API
The `tf.keras.Sequenial` model is a simple stack of layers that cannot represent arbitrary models.
Use the [Keras functional API](https://keras.io/getting-started/functional-api-guide/) to build complex model topologies such as:
- Multi-input models
- Multi-output models
- Models with shared layers (the same layer called several times)
- Models with non-sequential data flows (e.g. residual connections)

Using functional API to build a simple, fully-connected network:

In [8]:
inputs = keras.Input(shape=(32,)) # Returns a placeholder tensor

# A layer instance is callable on a tensor, and returns a tensor.
x = keras.layers.Dense(64, activation='relu')(inputs)
x = keras.layers.Dense(64, activation='relu')(x)
predictions = keras.layers.Dense(10, activation='softmax')(x)

# Instantiate the model given inputs and outputs.
model = keras.Model(inputs=inputs, outputs=predictions)

# The compile step specifies the training configuration
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Trains for 5 epochs
model.fit(data, labels, batch_size=32, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x11c223450>

## Model subclassing
Build a fully-customizable model by subclassing `tf.keras.Model` and defining your own forward pass. Create layers in the `__init__` method and set them as attributes of the class instance. Define the forward pass in the call method.

Model subclassing is particularly useful when *eager execution* is enabled since the forward pass can be written imperatively.

> With more flexibility, more opportunities for user errors comes. If possible, prefer the functional API

A subclassed `tf.keras.Model` using a custom forward pass:

In [9]:
class MyModel(keras.Model):
    def __init__(self, num_classes=10):
        super(MyModel, self).__init__(name='my_model')
        self.num_classes = num_classes
        
        # Define your laysers here.
        self.dense_1 = keras.layers.Dense(32, activation='relu')
        self.dense_2 = keras.layers.Dense(num_classes, activation='sigmoid')
        
    def call(self, inputs):
        ''' Define your forward pass here,
            Using layers you previously defined (in `__init__`).
        '''
        x = self.dense_1(inputs)
        return self.dense_2(x)
    
    def compute_output_shape(self, input_shape):
        # you need to override this function if you want to use the
        # subclassed model as part of a functional-style model.
        # Ohterwise, this method is optional.
        shape = tf.TensorShape(input_shape).as_list()
        shape[-1] = self.num_classes
        return tf.TensorShape(shape)

In [10]:
# Instantiates the subclassed model
model = MyModel(num_classes=10)

# The compile step specifies the training configuration.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Trains for 5 epochs
model.fit(data, labels, batch_size=32, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x11bd53950>

## Custom layers
Create a custom layer by subclassing `tf.keras.layers.Layer` and implementing the following methods:
- `build`: Create the weights of the layer. Add weights with the `add_weight` method.
- `call`: Define the forward pass.
- `compute_output_shape`: Specify how to compute the output shape of the layer given the input shape.
- Optionally, a layer can be serializerd by implementing the `get_config` method and the `from_config` class method.

An example of a custom layer that implements a `matmul` of an input with a kernel matrix:

In [19]:
class MyLayer(keras.layers.Layer):
    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(MyLayer, self).__init__(**kwargs)
    
    def build(self, input_shape):
        shape = tf.TensorShape((input_shape[1], self.output_dim))
        # Create a trainable weight variable for this layer.
        self.kernel = self.add_weight(name='kernel',
                                      shape=shape,
                                      initializer='uniform',
                                      trainable=True)
        # Be sure to call this at the end
        super(MyLayer, self).build(input_shape)
        
    def call(self, inputs):
        return tf.matmul(inputs, self.kernel)
    
    def compute_output_shape(self, input_shape):
        shape = tf.TensorShape(input_shape).as_list()
        shape[-1] = self.output_dim
        return tf.TensorShape(shape)
    
    def get_config(self):
        base_config = super(MyLayer, self).get_config()
        base_config['output_dim'] = self.output_dim
        
    @classmethod
    def from_config(cls, config):
        return cls(**config)

In [25]:
# Create a model using the custom layer
model1 = keras.Sequential([MyLayer(10),
                           keras.layers.Activation('softmax')])

# The compile step specifies the training configuration
model1.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
               loss='categorical_crossentropy',
               metrics=['accuracy'])

# Trains for 5 epochs
model1.fit(data, labels, batch_size=32, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x11b976810>

## Callbacks
A callback is an object passed to a model to customize and extend its behavior during training. You can write your own custom callback, or use the built-in `tf.keras.callbacks` that include:

- `tf.keras.callbacks.ModelCheckpoint`: Save checkpoints of your model at regular intervals.
- `tf.keras.callbacks.LearningRateSchedular`: Dynamically change the learning rate.
- `tf.keras.callbacks.EarlyStopping`: Interrupt traning when validation performance has stopped improving.
- `tf.keras.callbakcs.TensorBoard`: Monitor the model's behavior using Tensorboard.

To use a `tf.keras.callbacks.Callback`, pass it to the model's `fit` method:

In [29]:
callbacks = [
    # Interrupt training if `val_loss` stops improving for over 2 epochs
    keras.callbacks.EarlyStopping(patience=2, monitor='val_loss'),
    keras.callbacks.TensorBoard(log_dir='./logs')
]

model1.fit(data, labels, batch_size=32, epochs=5, callbacks=callbacks,
         validation_data=(val_data, val_labels))

Train on 1000 samples, validate on 100 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x10c247050>

# Save and restore
## Weights only
Save and load the weights of a model in the *TensorFlow checkpoint* file format using `tf.keras.Model.save_weights`:

In [27]:
# Save weights to a TensorFlow Checkpoint file
model1.save_weights('./my_model_weights')

# Restore the model's state,
# this requires a model with the same *architecture*.
model1.load_weights('my_model_weights')

<tensorflow.python.training.checkpointable.util.CheckpointLoadStatus at 0x11d00be10>

Weights can also be saved to the *Keras HDF5* format (the default for the multi-backend implementation of Keras):

In [31]:
# Save weights to a HDF5 file
model1.save_weights('./my_model_weights.h5', save_format='h5')

# Restore the model's state
model1.load_weights('my_model_weights.h5')

ImportError: `save_weights` requires h5py when saving in hdf5.

## Configuration Only
A model's configuration can be saved - this serializes the model architecture without any weights. A saved configuration can recreate and initialize the same model, even without the code that defined the original model.

> *Caution*: Subclassed models are not serializable because their architecture is defined by the Python code in the body of the *call* method.

Keras supports JSON and YAML serialization formats:


In [33]:
# Serialize a model to JSON format
json_string = model.to_json()

# Recreate the model (freshly initialized)
fresh_model = keras.models.from_json(json_string)

# Serialize a model to YAML format
yaml_string = model.to_yaml()

# Recreate the model
fresh_model = keras.models.from_yaml(yaml_string)

TypeError: super(type, obj): obj must be an instance or subtype of type

## Entire model
The entire model can be saved to a file that contains the weight values, the model' sconfiguration, and even the optimizer's configuration.

This allows you to checkpoint a model and resume training later - from the exact same state - without access to the original code.

In [35]:
# Create a trival model
model2 = keras.Sequential([
    keras.layers.Dense(10, activation='softmax', input_shape(32,)),
    keras.layers.Dense(10, activation='softmax')
])

model2.compile(optimizer='rmsprop',
               loss='categrorical_crossentropy',
               metrics=['accuracy'])

model2.fit(data, labels, batch_size=32, epochs=5)

# Save entire model to a HDF5 file
model2.save('my_model.h5')

# Recreate the exact same model, including weights and optimizer.
model2 = keras.models.load_model('my_model.h5')

SyntaxError: non-keyword arg after keyword arg (<ipython-input-35-f814aecb0eb6>, line 3)

# Eager execution
**Eager execution** is an imperative programming environment that evaluates operations immediately. This is not required for keras, but is supported by `tf.keras` and useful for inspecting your program and debugging.

All of the tf.keras model-building APIs are compatible with eager execution. And while the Sequential and functional APIs can be used, eager execution especially benefits model subclassing and building custom layers—the APIs that require you to write the forward pass as code (instead of the APIs that create models by assembling existing layers).

# Distribution
## Estimators
The Estimators API is used for training models for distributed environments. This targets industry use cases such as distributed training on large datasets that can export a model for production.

A `tf.keras.Model` can be trained with the `tf.estimator` API by converting the model to an `tf.estimator.Estimator` object with `tf.keras.estimator.model_to_estimator`. See Creating Estimators from Keras models.

Note: Ennable `eager execution` for debugging Estimator input functions and inspecting data.

In [38]:
model3 = keras.Sequential([keras.layers.Dense(10, activation='softmax'),
                           keras.layers.Dense(10, activation='softmax')])

model3.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
               loss='categorical_crossentropy',
               metrics=['accuracy'])

estimator = keras.estimator.model_to_estimator(model3)

INFO:tensorflow:Using the Keras model provided.
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x10d8f9850>, '_evaluation_master': '', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_device_fn': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': '/var/folders/dy/_9hwtjyn0vd3d7fj95ry9mgw0000gn/T/tmpjPhOs2', '_train_distribute': None, '_save_summary_steps': 100}


## Multiple GPUs
`tf.keras` models can run on multiple GPUs using `tf.contrib.distribute.DistributionStrategy`. This API provides distributed training on multiple GPUs with almost no changes to existing code.

Currently, `tf.contrib.distribute.MirroredStrategy` is the only supported distribution strategy. MirroredStrategy does in-graph replication with synchronous training using all-reduce on a single machine. To use DistributionStrategy with Keras, convert the `tf.keras.Model` to a `tf.estimator.Estimator` with `tf.keras.estimator.model_to_estimator`, then train the estimator

The following example distributes a `tf.keras.Model` across multiple GPUs on a single machine.

First, define a simple model:

In [41]:
model4 = keras.Sequential()
model4.add(keras.layers.Dense(16, activation='relu', input_shape=(10,)))
model4.add(keras.layers.Dense(1, activation='sigmoid'))

optimizer = tf.train.GradientDescentOptimizer(0.2)

model4.compile(loss='binary_crossentropy', optimizer=optimizer)
model4.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_15 (Dense)             (None, 16)                176       
_________________________________________________________________
dense_16 (Dense)             (None, 1)                 17        
Total params: 193
Trainable params: 193
Non-trainable params: 0
_________________________________________________________________


Define an *input pipeline*. The `input_fn` returns a `tf.data.Dataset` object userd to distribute the data across multiple devices - with each device processing a slice of the input batch.

In [42]:
def input_fn():
    x = np.random.random((1024, 10))
    y = np.random.random(2, size=(1024, 1))
    x = tf.cast(x, tf.float32)
    dataset = tf.data.Dataset.from_tensor_slices((x, y))
    dataset = dataset.repeat(10)
    dataset = dataset.batch(32)
    return dataset

Next, create a `tf.estimator.RunConfig` and set the *train_distribute* argument to the `tf.contrib.distribute.MirroredStrategy` instance. When creating MirroredStrategy, you can specify a list of devices or set the *num_gpus* argument. The default uses all available GPUs, like the following:

In [47]:
?tf.contrib.distribute.MirroredStrategy(devices=localhost)
strategy = tf.contrib.distribute.MirroredStrategy()
config = tf.estimator.RunConfig(train_distribute=strategy)

AssertionError: Must specify at least one device.

Convert the Keras model to a `tf.estimator.Estimator` instance

In [44]:
keras_estimator = keras.estimator.model_to_estimator(
    keras_model=model4,
    config=config,
    model_dir='./model_dir'
)

NameError: name 'config' is not defined

Finally, train the `Estimator` instance by providing the `input_fn` and `steps` arguments:

In [45]:
keras_estimator.train(input_fn=input_fn, steps=10)

NameError: name 'keras_estimator' is not defined