# Model Training and Evaluation with Estimator API
In this lan, we use the [TensorFlow Estimator API](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator) to build, train, and evaluate a [DNNLinearCombinedClassifier](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNLinearCombinedClassifier). This lab covers the following:
1. Implement a data **input_fn** using **transform schema**
2. Create **feature columns** using **transform schema**
3. Instantiate a premade **DNNCombinedClassifier**
4. **Train** and **evaluate** the model.
5. **Export** the model for **serving**.

<br/>
<img valign="middle" src="imgs/tf-estimator.png" width="800">

In [1]:
import os
import tensorflow as tf
import tensorflow.io as tf_io
import tensorflow_data_validation as tfdv
import tensorflow_transform as tft

print('TF version: {}'.format(tf.__version__))
print('TFDV version: {}'.format(tfdv.__version__))
print('TFT version: {}'.format(tft.__version__))

TF version: 1.15.0
TFDV version: 0.15.0
TFT version: 0.15.0


In [2]:
WORKSPACE = 'workspace' # you can set to a GCS location
TRANSFORMED_DATA_DIR = os.path.join(WORKSPACE, 'transformed_data')
TRANSFORM_ARTEFACTS_DIR = os.path.join(WORKSPACE, 'transform_artifacts')
TRANSFORMED_TRAIN_DATA_FILE = os.path.join(TRANSFORMED_DATA_DIR,'train-*.tfrecords')
TRANSFORMED_EVAL_DATA_FILE = os.path.join(TRANSFORMED_DATA_DIR,'eval-*.tfrecords')
RAW_SCHEMA_LOCATION = os.path.join(WORKSPACE, 'raw_schema.pbtxt')
MODELS_DIR = os.path.join(WORKSPACE, 'models')

### Load TFT Outputs

In [3]:
transform_output = tft.TFTransformOutput(TRANSFORM_ARTEFACTS_DIR)

## 1. Implement TFRecords Input function
* Use [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) APIs: **list_files()**, **skip()**, **map()**, **filter()**, **batch()**, **shuffle()**, **repeat()**, **prefetch()**, **cache()**, etc.
* Use [tf.data.experimental.make_csv_dataset](https://www.tensorflow.org/api_docs/python/tf/data/experimental/make_csv_dataset) to read and parse CSV data files.
* Use [tf.data.experimental.make_batched_features_dataset](https://www.tensorflow.org/api_docs/python/tf/data/experimental/make_batched_features_dataset) to read and parse TFRecords data files.

In [4]:
def make_input_fn(tfrecords_files, 
                  batch_size, num_epochs=1, shuffle=False):
   
    def input_fn():
        dataset = tf.data.experimental.make_batched_features_dataset(
            file_pattern=tfrecords_files,
            batch_size=batch_size,
            features=transform_output.transformed_feature_spec(),
            label_key=TARGET_FEATURE_NAME,
            reader=tf.data.TFRecordDataset,
            num_epochs=num_epochs,
            shuffle=shuffle
        )
        return dataset

    return input_fn

## 2. Create Feature Columns

<br/>
<img valign="middle" src="imgs/feature-columns.png" width="800">

Base feature columns
  1. [numeric_column](https://www.tensorflow.org/api_docs/python/tf/feature_column/numeric_column|)
  2. [categorical_column_with_vocabulary_list](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_list)
  3. [categorical_column_with_vocabulary_file](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_file)
  4. [categorical_column_with_identity](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_identity)
  5. [categorical_column_with_hash_buckets](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_hash_bucket)

Extended feature columns
  1. [bucketized_column](https://www.tensorflow.org/api_docs/python/tf/feature_column/bucketized_column)
  2. [indicator_column](https://www.tensorflow.org/api_docs/python/tf/feature_column/indicator_column)
  3. [crossing_column](https://www.tensorflow.org/api_docs/python/tf/feature_column/crossed_column)
  4. [embedding_column](https://www.tensorflow.org/api_docs/python/tf/feature_column/embedding_column)
  

In [5]:
TARGET_FEATURE_NAME = 'income_bracket'
TARGET_LABELS = [' <=50K', ' >50K']
WEIGHT_COLUMN_NAME = 'fnlwgt'

Creating featuer columns can be **meta-data** driven, with the help of the **stransform schema**!

In [7]:
import math

def create_feature_columns():
    
    wide_columns = []
    deep_columns = []
    
    transformed_features = transform_output.transformed_metadata.schema.feature

    for feature in transformed_features:
        if feature.name in [TARGET_FEATURE_NAME, WEIGHT_COLUMN_NAME]:
            continue

        # Categorical features
        if hasattr(feature, 'int_domain') and feature.int_domain.is_categorical:
            vocab_size = feature.int_domain.max + 1
            
            # Create a categotical feature column with identity
            categorical_feature_column = tf.feature_column.categorical_column_with_identity(
                feature.name, num_buckets=vocab_size
            )
            
            wide_columns.append(categorical_feature_column)
            
            
            # Create embedding column
            embedding_feature_column = tf.feature_column.embedding_column(
                categorical_feature_column, 
                dimension = int(math.sqrt(vocab_size))
            )
            
            deep_columns.append(embedding_feature_column)
            
        
        # Numeric features
        else:
            deep_columns.append(
                tf.feature_column.numeric_column(feature.name))
            
    # Create crossing feature
    education_X_occupation = tf.feature_column.crossed_column(
        ['education_integerized', 'workclass_integerized'], hash_bucket_size=int(1e4))
    wide_columns.append(education_X_occupation)

        # Create embeddings for crossing column.
    education_X_occupation_embedded = tf.feature_column.embedding_column(
        education_X_occupation, dimension=10)
    deep_columns.append(education_X_occupation_embedded)
    
    return wide_columns, deep_columns

In [8]:
#wide, deep = create_feature_columns()

## 3. Instantiate an [Wide and Deep Estimator](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNLinearCombinedClassifier)
1. An ML model that combines **generalization** (deep part) and **memorization** (wide part).
2. **Categorical** (sparse) features are feed into the **wide** part, while **numerical** (dense) features are feed into **deep** part.
3. You can make use of different representation of the same feature in both parts:
    1. **Categorical** features can be **embedded**, and feed into **deep** part.
    2. **Numerical** features can be **bucketized**, and feed into **wide** part.
    
See [Wide & Deep Learning: Better Together with TensorFlow](https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html) blog post for more details.

![alt text](imgs/wide-n-deep.png "Wide ")

### Implement adaptive learning rate
* [exponential_decay](https://www.tensorflow.org/api_docs/python/tf/train/exponential_decay)
* [consine_decay](https://www.tensorflow.org/api_docs/python/tf/train/cosine_decay)
* [linear_cosine_decay](https://www.tensorflow.org/api_docs/python/tf/train/linear_cosine_decay)
* [consine_decay_restarts](https://www.tensorflow.org/api_docs/python/tf/train/cosine_decay_restarts)
* [polynomial decay](https://www.tensorflow.org/api_docs/python/tf/train/polynomial_decay)
* [piecewise_constant_decay](https://www.tensorflow.org/api_docs/python/tf/train/piecewise_constant_decay)

In [9]:
def update_optimizer(initial_learning_rate, decay_steps):
    learning_rate = tf.train.cosine_decay_restarts(
        initial_learning_rate,
        tf.train.get_global_step(),
        first_decay_steps=50,
        t_mul=2.0,
        m_mul=1.0,
        alpha=0.0,
    )
    
    tf.summary.scalar('learning_rate', learning_rate)
    return tf.train.AdamOptimizer(learning_rate=learning_rate)

### Add an evaluation metric
* [tf.metrics](https://www.tensorflow.org/api_docs/python/tf/metrics)
* [tf.estimator.add_metric](https://www.tensorflow.org/api_docs/python/tf/estimator/add_metrics)


In [10]:
def metric_fn(labels, predictions):
    
    metrics = {}
    label_index = tf.contrib.lookup.index_table_from_tensor(tf.constant(TARGET_LABELS)).lookup(labels)
    one_hot_labels = tf.one_hot(label_index, len(TARGET_LABELS))
    
    metrics['mirco_accuracy'] = tf.metrics.mean_per_class_accuracy(
        labels=label_index,
        predictions=predictions['class_ids'],
        num_classes=2
    )
    
    return metrics

In [11]:
def create_estimator(params, run_config):
    
    wide_columns, deep_columns = create_feature_columns()
    
    estimator = tf.estimator.DNNLinearCombinedClassifier(

        n_classes=len(TARGET_LABELS),
        label_vocabulary=TARGET_LABELS,
        weight_column=WEIGHT_COLUMN_NAME,

        dnn_feature_columns=deep_columns,
        dnn_optimizer=lambda: update_optimizer(
            params.learning_rate, params.max_steps),
#         dnn_optimizer=tf.train.AdamOptimizer(
#           learning_rate=params.learning_rate),
        dnn_hidden_units=params.hidden_units,
        dnn_dropout=params.dropout,
        dnn_activation_fn=tf.nn.relu,
        batch_norm=True,

        linear_feature_columns=wide_columns,
        linear_optimizer='Ftrl',

        config=run_config
    )
    
    estimator = tf.estimator.add_metrics(
        estimator, metric_fn)
    
    return estimator

## 4. Run a Train and Evaluate Experiment
**Delete** the **model_dir** file if you don't want a **Warm Start**
* If not deleted, and you **alter** the model, it will error.

[TrainSpec](https://www.tensorflow.org/api_docs/python/tf/estimator/TrainSpec)
* Set **shuffle** in the **input_fn** to **True**
* Set **num_epochs** in the **input_fn** to **None**
* Set **max_steps**. One batch (feed-forward pass & backpropagation) 
corresponds to 1 training step. 

[EvalSpec](https://www.tensorflow.org/api_docs/python/tf/estimator/EvalSpec)
* Set **shuffle** in the **input_fn** to **False**
* Set Set **num_epochs** in the **input_fn** to **1**
* Set **steps** to **None** if you want to use all the evaluation data. 
* Otherwise, set **steps** to the number of batches you want to use for evaluation, and set **shuffle** to True.
* Set **start_delay_secs** to 0  to start evaluation as soon as a checkpoint is produced.
* Set **throttle_secs** to 0 to re-evaluate as soon as a new checkpoint is produced.

### 4.1 Implement an experiment

In [13]:
def run_experiment(estimator, params, run_config, 
                   resume=False, train_hooks=None):
    
    tf.logging.set_verbosity(tf.logging.INFO)
    
    if not resume: 
        if tf_io.gfile.exists(run_config.model_dir):
            print("Removing previous model checkpoints...")
            tf_io.gfile.rmtree(run_config.model_dir)
    else:
        print("Resuming training...")

    # Create train specs.
    train_spec = tf.estimator.TrainSpec(
        input_fn = make_input_fn(
            TRANSFORMED_TRAIN_DATA_FILE,
            batch_size=params.batch_size,
            num_epochs=None, # Run until the max_steps is reached.
            shuffle=True
        ),
        max_steps=params.max_steps,
        hooks=train_hooks
    )

    # Create eval specs.
    eval_spec = tf.estimator.EvalSpec(
        input_fn = make_input_fn(
            TRANSFORMED_EVAL_DATA_FILE,
            batch_size=params.batch_size, 
        ),
        exporters=None, # This can be set to export a saved model 
        start_delay_secs=0,
        throttle_secs=0,
        steps=None # Set to limit number of steps for evaluation.
    )
  

    print("Experiment started...")
    print(".......................................")
  
    # Run train and evaluate.
    tf.estimator.train_and_evaluate(
        estimator=estimator,
        train_spec=train_spec, 
        eval_spec=eval_spec
    )

    print(".......................................")
    print("Experiment finished.")
    print("")


### Set parameters and run_config.

* Set **model_dir** in the **run_config**
* If the data **size is known**, training **steps**, with respect to **epochs** would be: **(training_size / batch_size) * epochs** 
* By default, a **checkpoint** is saved every 600 secs.  That is, the model is **evaluated** only every 10mins. 
* To change this behaviour, set one of the following parameters in the **run_config**
 * **save_checkpoints_secs**: Save checkpoints every this many **seconds**.
 * **save_checkpoints_steps**: Save checkpoints every this many **steps**.
* Set the number of the checkpoints to keep using **keep_checkpoint_max**
* Set **train_distribute** and/or **eval_dsitribute** strategy for Multi-GPU training. 

In [15]:
RANDOM_SEED = 19820610

class Parameters():
    pass

MODEL_NAME = 'dnn_classifier'
MODEL_DIR = os.path.join(MODELS_DIR, MODEL_NAME)

TRAIN_DATA_SIZE = 32561

params = Parameters()
params.learning_rate = 0.001
params.hidden_units = [128, 128, 128]
params.dropout = 0.15
params.batch_size =  128

# Set number of steps with respect to epochs.
epochs = 1000
steps_per_epoch = int(math.ceil(TRAIN_DATA_SIZE / params.batch_size))
params.max_steps = steps_per_epoch * epochs

run_config = tf.estimator.RunConfig(
    tf_random_seed=RANDOM_SEED,
    save_checkpoints_steps=steps_per_epoch, # Save a checkpoint after each epoch, evaluate the model after each epoch.
    keep_checkpoint_max=3, # Keep the 3 most recently  produced checkpoints.
    model_dir=MODEL_DIR,
    save_summary_steps=100, # Summary steps for Tensorboard.
    log_step_count_steps=50
)

### 3.1 Run experiment with early stopping hook
* [stop_if_higher_hook](https://www.google.com/search?q=stop_if_higher_hook&oq=stop_if_higher_hook) 
* [stop_if_lower_hook](https://www.tensorflow.org/api_docs/python/tf/contrib/estimator/stop_if_lower_hook) 
* [stop_if_no_increase_hook](https://www.tensorflow.org/api_docs/python/tf/contrib/estimator/stop_if_no_increase_hook)
* [stop_if_no_decrease_hook](https://www.tensorflow.org/api_docs/python/tf/contrib/estimator/stop_if_no_decrease_hook)

In [18]:
estimator = create_estimator(params, run_config)

early_stopping_hook = tf.estimator.experimental.stop_if_no_increase_hook(
    estimator,
    'accuracy',
    max_steps_without_increase=100,
    run_every_secs=None,
    run_every_steps=500
)

%time run_experiment(estimator, params, run_config, train_hooks=[early_stopping_hook])

INFO:tensorflow:Using config: {'_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 50, '_save_checkpoints_steps': 255, '_master': '', '_session_creation_timeout_secs': 7200, '_eval_distribute': None, '_num_worker_replicas': 1, '_experimental_distribute': None, '_num_ps_replicas': 0, '_global_id_in_cluster': 0, '_protocol': None, '_keep_checkpoint_max': 3, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fa4ec69e240>, '_tf_random_seed': 19820610, '_train_distribute': None, '_evaluation_master': '', '_save_checkpoints_secs': None, '_task_type': 'worker', '_model_dir': 'workspace/models/dnn_classifier', '_service': None, '_task_id': 0, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_is_chief': True, '_experimental_max_worker_delay_secs': None, '_save_summary_steps': 100, '_device_fn': None}
INFO:tensorflow:Using config: {'_keep_checkpoint_every_n_hours': 10000, '_lo

## 5. Export the Model for Serving 

### 5.1 Implement a serving_input_receive_fn
This function expect **raw** data interface, then it applies the **transformation**

In [19]:
from tensorflow_transform.tf_metadata import schema_utils

def _serving_input_receiver_fn():
    
    source_raw_schema = tfdv.load_schema_text(RAW_SCHEMA_LOCATION)
    raw_feature_spec = schema_utils.schema_as_feature_spec(source_raw_schema).feature_spec
    raw_feature_spec.pop(TARGET_FEATURE_NAME)
    raw_feature_spec.pop(WEIGHT_COLUMN_NAME)

    # Create the interface for the serving function with the raw features
    raw_features = tf.estimator.export.build_parsing_serving_input_receiver_fn(
        raw_feature_spec)().features

    receiver_tensors = {
        feature: tf.placeholder(shape=[None], dtype=raw_features[feature].dtype) 
        for feature in raw_features
    }

    receiver_tensors_expanded = {
        tensor: tf.reshape(receiver_tensors[tensor], (-1, 1)) 
        for tensor in receiver_tensors
    }

    # Apply the transform function 
    transformed_features = transform_output.transform_raw_features(
        receiver_tensors_expanded)

    return tf.estimator.export.ServingInputReceiver(
        transformed_features, receiver_tensors)

### 5.2 Export SavedModel

In [25]:
tf.logging.set_verbosity(tf.logging.ERROR)

export_dir = os.path.join(MODEL_DIR, 'export')

if tf.gfile.Exists(export_dir):
    tf.gfile.DeleteRecursively(export_dir)
        
saved_model_location = estimator.export_savedmodel(
    export_dir_base=export_dir,
    serving_input_receiver_fn=_serving_input_receiver_fn
)

print(saved_model_location)

b'workspace/models/dnn_classifier/export/1581339146'


In [26]:
!saved_model_cli show --dir=${saved_model_location} --all


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['predict']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['age'] tensor_info:
        dtype: DT_INT64
        shape: (-1)
        name: Placeholder_9:0
    inputs['capital_gain'] tensor_info:
        dtype: DT_INT64
        shape: (-1)
        name: Placeholder:0
    inputs['capital_loss'] tensor_info:
        dtype: DT_INT64
        shape: (-1)
        name: Placeholder_11:0
    inputs['education'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: Placeholder_4:0
    inputs['education_num'] tensor_info:
        dtype: DT_INT64
        shape: (-1)
        name: Placeholder_7:0
    inputs['gender'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: Placeholder_3:0
    inputs['hours_per_week'] tensor_info:
        dtype: DT_INT64
        shape: (-1)
        name: Placeholder_10:0
    inputs['marital_status'] tensor_info:
   

In [29]:
predictor_fn = tf.contrib.predictor.from_saved_model(
    export_dir = saved_model_location,
    signature_def_key="predict"
)

output = predictor_fn(
    {
        'age': [34.0],
        'workclass': ['Private'],
        'education': ['Doctorate'],
        'education_num': [10.0],
        'marital_status': ['Married-civ-spouse'],
        'occupation': ['Prof-specialty'],
        'relationship': ['Husband'],
        'race': ['White'],
        'gender': ['Male'],
        'capital_gain': [0.0], 
        'capital_loss': [0.0], 
        'hours_per_week': [40.0],
        'native_country':['England']
    }
)

output

{'all_class_ids': array([[0, 1]], dtype=int32),
 'all_classes': array([[b' <=50K', b' >50K']], dtype=object),
 'class_ids': array([[0]]),
 'classes': array([[b' <=50K']], dtype=object),
 'logistic': array([[0.10173763]], dtype=float32),
 'logits': array([[-2.178065]], dtype=float32),
 'probabilities': array([[0.8982623 , 0.10173761]], dtype=float32)}

### Seralizing estimator object to be used in the following lab

In [None]:
import joblib
joblib.dump(estimator, './estimator.joblib')

In [33]:
#!pip install -q tensorflow-model-analysis

In [32]:
import tensorflow_model_analysis as tfma

### 5.1 Implement a serving_input_receive_fn
This function expect **transform** data interface.

In [35]:
def eval_input_receiver_fn():
    
    source_raw_schema = tfdv.load_schema_text(RAW_SCHEMA_LOCATION)
    raw_feature_spec = schema_utils.schema_as_feature_spec(source_raw_schema).feature_spec
    serving_input_receiver = tf.estimator.export.build_parsing_serving_input_receiver_fn(
        raw_feature_spec, default_batch_size=None)()
    
    features = serving_input_receiver.features.copy()
    transformed_features = transform_output.transform_raw_features(features)

    features.update(transformed_features)
    
    return tfma.export.EvalInputReceiver(
        features=features,
        receiver_tensors=serving_input_receiver.receiver_tensors,
        labels=transformed_features[TARGET_FEATURE_NAME]
    )

In [36]:
eval_model_dir = os.path.join(MODEL_DIR, "export/evaluate")
if tf_io.gfile.exists(eval_model_dir):
    tf_io.gfile.rmtree(eval_model_dir)

tfma.export.export_eval_savedmodel(
        estimator=estimator,
        export_dir_base=eval_model_dir,
        eval_input_receiver_fn=eval_input_receiver_fn
)

b'workspace/models/dnn_classifier/export/evaluate/1581340469'

In [40]:
import joblib
joblib.dump(estimator, './estimator.joblib')

['./estimator.joblib']

In [41]:
e = joblib.load( './estimator.joblib')
tfma.export.export_eval_savedmodel(
        estimator=e,
        export_dir_base=eval_model_dir,
        eval_input_receiver_fn=eval_input_receiver_fn
)

b'workspace/models/dnn_classifier/export/evaluate/1581340812'