## Estimator API

tf.estimator.Estimator

### Demo: housing Price Model

In [4]:
import tensorflow as tf
import numpy as np
import shutil

In [5]:
shutil.rmtree("outdir", ignore_errors = True) # start fresh each time

In [7]:
def train_input_fn():
    features = {"sq_footage": [1000, 2000, 3000, 1000, 2000, 3000],
                "type":       ["house","house","house","apt","apt","apt"]}
    labels =                  [ 500, 1000, 1500,  700, 1300, 1900]   # in thousands
    return features, labels

featcols = [
    tf.feature_column.numeric_column("sq_footage"),
    tf.feature_column.categorical_column_with_vocabulary_list("type", ["house", "apt"])
]

model = tf.estimator.LinearRegressor(featcols, "outdir")

model.train(train_input_fn, steps=2000)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_master': '', '_evaluation_master': '', '_is_chief': True, '_task_id': 0, '_num_worker_replicas': 1, '_keep_checkpoint_max': 5, '_train_distribute': None, '_tf_random_seed': None, '_task_type': 'worker', '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_save_summary_steps': 100, '_log_step_count_steps': 100, '_num_ps_replicas': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f56bb51ee10>, '_session_config': None, '_global_id_in_cluster': 0, '_model_dir': 'outdir'}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into outdir/model.ckpt.
INFO:tensorflow:step = 1, loss = 9290000.0
INFO:

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x7f56bb51ec50>

In [9]:
def predict_input_fn():
    features = {"sq_footage": [1500, 1500, 2500, 2500],
                "type": ["house", "apt", "house", "apt"]}
    return features

predictions = model.predict(predict_input_fn)

print(next(predictions))
print(next(predictions))
print(next(predictions))
print(next(predictions))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from outdir/model.ckpt-2000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
{'predictions': array([844.07806], dtype=float32)}
{'predictions': array([871.82715], dtype=float32)}
{'predictions': array([1414.5388], dtype=float32)}
{'predictions': array([1442.2878], dtype=float32)}


### Checkpointing

Estimators automatically checkpoint training
```
model = tf.estimator.LinearRegressor(featcols, "./model_trained")

model.train(train_input_fn, steps=2000)
```

`%ls model_trained`

In [10]:
%ls outdir

checkpoint
events.out.tfevents.1548844028.jxw-Inspiron-7460
graph.pbtxt
model.ckpt-1.data-00000-of-00001
model.ckpt-1.index
model.ckpt-1.meta
model.ckpt-2000.data-00000-of-00001
model.ckpt-2000.index
model.ckpt-2000.meta


### Training on in-memory datasets

In memory data: usually numpy arrays or Pandas dataframes - you can use them directly

![train_input](train_input.png)

## Train on large datasets with Dataset API

![](TextLineDataset.png)
![](TextLineDataset2.png)

## Big jobs, Distributed training

![](train_and_evaluate.png)
![](run_config.png)
![](TrainSpec.png)
![](EvalSpec.png)
![](recap.png)

Shuffling is even more important in distributed training

<center>**Real World ML Models**</center>

|Problem|Solution|
|:-:|:-:|
|Out of memory data| Use the Dataset API|
|Distribution | Use train_and_evaluate|
|Need to evaluate during training | Use train_and_evaluate + TensorBoard|
|Deployments that scale | Use serving input function|

### Serving Input Function

Serving and training-time inputs are often very different

![](serving_input_function.png)
![](json_predict.png)
![](json_predict2.png)

**decode JPEGs(Base64 binary string)**
![](decodes_jpegs.png)