# Estimators

The actual model is contained in a `tf.estimator.Estimator` instance. In this notebook we will work with the premade TensorFlow estimators. Estimators have a Scikit-Learn style api, where there are methods for `train`, `evaluate`, and `predict`. 

Estimators automatically provide many features that would have to be hand-coded in otherwise. Two features worth highlighting are:
1. Logging and checkpointing as the model trains
2. Distributing the computation graph for training on the cloud.

The idea behind working with Estimators is that the only things about the computation graph that need to change between the three different phases of machine learning are all about how and what data is fed into the model. For example, during training we iterate for many epochs, while evaluation and prediction we don't; for training and evaluation we have labels in addition to features while for prediction we only have features.

The way TensorFlow handles this is that each method takes as input an `input_fn` which is a function that takes no inputs and returns ops to be added to the graph. **Note** when training for example, the `input_fn` is not called on each batch of data or anything like that. It is called only once, and used to add nodes to the computation graph.

In [1]:
import datetime
import pathlib
import shutil

import numpy as np
import pandas as pd
import tensorflow as tf
from matplotlib import pyplot as plt
plt.style.use('seaborn')

## 0. Prepare data

We need to split our data into train and evaluation datasets. A good design pattern with sharded data is to make separate folders for training and evaluation data. A bad design pattern is to not randomize what is train data and what is test data, but none-the-less we will just set aside `boston-0.csv` as test data and the rest as train.

In [2]:
data_dir = pathlib.Path('data/sharded_data')
train_dir = data_dir.parent / 'train'
test_dir = data_dir.parent / 'test'

if not train_dir.exists():
    train_dir.mkdir()
if not test_dir.exists():
    test_dir.mkdir()
    
for f in data_dir.glob('*.csv'):
    if f.stem.endswith('0'):
        shutil.copyfile(f, test_dir / f.name)
    else:
        shutil.copyfile(f, train_dir / f.name)

## 1. Estimator setup

Estimators take feature columns as one of the inputs to their constructors, so let's start off by instantiating the feature columns we will be using in our model. Since all of our csv files have the same schema, we can load in a few rows of one using pandas to get some useful information about the data we're ingesting into out model.

In [3]:
data_dir = pathlib.Path('data/sharded_data')
file = (data_dir / 'boston-0.csv').as_posix()

df = pd.read_csv(file, nrows=10)
df.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,target
0,0.0837,45.0,3.44,Y,0.437,7.185,38.9,4.5667,5,398.0,15.2,396.9,5.39,34.9
1,0.05023,35.0,6.06,Y,0.4379,5.706,28.4,6.6407,1,304.0,16.9,394.02,12.43,17.1
2,0.03961,0.0,5.19,Y,0.515,6.037,34.5,5.9853,5,224.0,20.2,396.9,8.01,21.1
3,1.38799,0.0,8.14,Y,0.538,5.95,82.0,3.99,4,307.0,21.0,232.6,27.71,13.2
4,1.35472,0.0,8.14,Y,0.538,6.072,100.0,4.175,4,307.0,21.0,376.73,13.04,14.5


In [4]:
csv_columns = list(df.columns)
label_col = 'target'
# Decoding the csv requires a list of default values to use for each tensor
# produced. The defaults are passed as a list of lists.
default_values = [[0.0]] * 14
default_values[3] = ['_UNKNOWN']; default_values[8] = 0
# Get columns different dtypes
feature_cols = [c for c in csv_columns if c != label_col]
byte_cols = ['chas']
int64_cols = ['rad']
float_cols = [c for c in feature_cols if c not in int64_cols and c not in byte_cols]

In [5]:
# make feature columns
byte_cols = [
    tf.feature_column.indicator_column(
            tf.feature_column.categorical_column_with_vocabulary_list(
            name, ['Y', 'N']
        )
    )
    for name in byte_cols
]
int64_cols = [
    tf.feature_column.numeric_column(name, dtype=tf.int64)
    for name in int64_cols
]
float_cols = [
    tf.feature_column.numeric_column(name, dtype=tf.float32)
    for name in float_cols
]
feature_columns = byte_cols + int64_cols + float_cols

There is one more optional component to setup for our model, namely the configuration details of the model. To evaluate the model's performance we will want to examine the its train and test error. This requires us to set some configurations in advance.

As TensorFlow models train, they write checkpoints after however many steps or hours of training. By default, `Estimator`s are configured to only write a checkpoint once when training is complete. However, evaluations metrics, such as test error, are only computed during checkpoints. Thus to get a test loss curve, we need to set our model to save multiple checkpoints.


Setting runtime configuration for our the model is done by passing a `tf.estimator.RunConfig` object to the estimator. Below we set the `RunConfig` to save multiple checkpoints.

**Aside:** One of the things that is a bit strange about working in TensorFlow is that a lot configuration and parameters are set by passing various objects which hold configuration and specification details. This is a drastic contrast from, say, Scikit Learn where configurations are set by passing strings or numeric values to an estimator/transformer or whatever. Moreover in Scikit Learn, there are only a few base classes one works with, and a standardized API. 

With TensorFlow estimators, there is a lot more flexibility in how things are configured, and accordingly, more complicated objects used to configure them. These configuration objects are mostly glorified holders for key/value pairs and don't have methods that a user would typically access. 

In [6]:
model_dir = pathlib.Path('models')
if not model_dir.exists():
    model_dir.mkdir()

model_dir = model_dir / datetime.datetime.now().strftime('model_%Y%m%d_%H%M%S/')

run_config = tf.estimator.RunConfig(
    save_checkpoints_steps=100,
    keep_checkpoint_max=20,
)

estimator = tf.estimator.LinearRegressor(
    feature_columns,
    model_dir=model_dir,
    config=run_config
)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'models\\model_20190302_144436', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001F53E8876A0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Before moving on, let's recap what the dataset we produced before is doing. In particular, we use defined a function `parse_row` which takes a rstring tensor and returns a tuple of tensors, one for each column of the csv.

In [7]:
data_dir = pathlib.Path('data')
files = (data_dir / 'train' / 'boston-*.csv').as_posix()

# Training parameters
n_epochs = 10
batch_size = 2

def parse_row(row):
    return tf.decode_csv(row, record_defaults=default_values)

# Build data set
dataset = tf.data.Dataset.list_files(files)
dataset = dataset.flat_map(lambda f: tf.data.TextLineDataset(f).skip(1))
dataset = dataset.map(parse_row)
# Repeat the dataset
dataset = dataset.repeat(n_epochs)
# Shuffle data
dataset = dataset.shuffle(buffer_size=1024)
# Get a batch of data
dataset = dataset.batch(batch_size)
# Preload next batch to speed up training
dataset = dataset.prefetch(buffer_size=batch_size)
iterator = dataset.make_one_shot_iterator()
batch = iterator.get_next()

with tf.Session() as sess:
    batch1 = sess.run(batch)

# Just getting a dataset of individual file names
print(batch1)   

(array([ 2.37857, 14.4208 ], dtype=float32), array([0., 0.], dtype=float32), array([18.1, 18.1], dtype=float32), array([b'Y', b'Y'], dtype=object), array([0.583, 0.74 ], dtype=float32), array([5.871, 6.461], dtype=float32), array([41.9, 93.3], dtype=float32), array([3.724 , 2.0026], dtype=float32), array([24, 24]), array([666., 666.], dtype=float32), array([20.2, 20.2], dtype=float32), array([370.73,  27.49], dtype=float32), array([13.34, 18.05], dtype=float32), array([20.6,  9.6], dtype=float32))


## 2. Training

During training and evaluation, data is fed to the estimator as a tuple of features and labels. The features should be in a dict, where the key matches the name of the corresponding `tf.feature_column` feature column the training instance gets sent to. The main thing we need to change about our curent dataset implementation is that instead of `parse_row` returning a tuple of all of the csv folumns, it needs to return these tuples. We also need to only take in training data.

In [8]:
def parse_row(row):
    # Get tuple of csv row
    parsed = tf.decode_csv(row, record_defaults=default_values)
    # Get dict of col_name: value pairs
    features = dict(zip(csv_columns, parsed))
    # Remove label from features
    label = features.pop(label_col)
    return features, label

# File for training
files = (data_dir / 'train' / 'boston-*.csv').as_posix()

# Build data set
dataset = tf.data.Dataset.list_files(files)
dataset = dataset.flat_map(lambda f: tf.data.TextLineDataset(f).skip(1))
dataset = dataset.map(parse_row)
# Repeat the dataset
dataset = dataset.repeat(n_epochs)
# Shuffle data
dataset = dataset.shuffle(buffer_size=1024)
# Get a batch of data
dataset = dataset.batch(batch_size)
# Preload next batch to speed up training
dataset = dataset.prefetch(buffer_size=batch_size)
iterator = dataset.make_one_shot_iterator()
batch = iterator.get_next()

with tf.Session() as sess:
    batch1 = sess.run(batch)

# Now we get a (dict, value) pair
print(batch1) 

({'crim': array([28.6558 ,  0.06466], dtype=float32), 'zn': array([ 0., 70.], dtype=float32), 'indus': array([18.1 ,  2.24], dtype=float32), 'chas': array([b'Y', b'Y'], dtype=object), 'nox': array([0.597, 0.4  ], dtype=float32), 'rm': array([5.155, 6.345], dtype=float32), 'age': array([100. ,  20.1], dtype=float32), 'dis': array([1.5894, 7.8278], dtype=float32), 'rad': array([24,  5]), 'tax': array([666., 358.], dtype=float32), 'ptratio': array([20.2, 14.8], dtype=float32), 'b': array([210.97, 368.24], dtype=float32), 'lstat': array([20.08,  4.97], dtype=float32)}, array([16.3, 22.5], dtype=float32))


In [9]:
# Wrap in a train input_fn
n_epochs = 5
batch_size = 32

def train_input_fn():
    # Files for training
    files = (data_dir / 'train' / 'boston-*.csv').as_posix()
    # Build data set
    dataset = tf.data.Dataset.list_files(files)
    dataset = dataset.flat_map(lambda f: tf.data.TextLineDataset(f).skip(1))
    dataset = dataset.map(parse_row)
    # Repeat the dataset
    dataset = dataset.repeat(n_epochs)
    # Shuffle data
    dataset = dataset.shuffle(buffer_size=1024)
    # Get a batch of data
    dataset = dataset.batch(batch_size)
    # Preload next batch to speed up training
    dataset = dataset.prefetch(buffer_size=batch_size)
    iterator = dataset.make_one_shot_iterator()
    features, label = iterator.get_next()
    return features, label

In [10]:
# Train that bad boy on all of the data
estimator.train(input_fn=train_input_fn)

Instructions for updating:
Colocations handled automatically by placer.
INFO:tensorflow:Calling model_fn.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into models\model_20190302_144436\model.ckpt.
INFO:tensorflow:loss = 18183.35, step = 1
INFO:tensorflow:Saving checkpoints for 64 into models\model_20190302_144436\model.ckpt.
INFO:tensorflow:Loss for final step: 684.8249.


<tensorflow_estimator.python.estimator.canned.linear.LinearRegressor at 0x1f53e887ba8>

## 3. Evaluate

Similar deal, but we don't need to shuffle the data or run for multiple epochs. We also need to take our data from a different directory.

In [11]:
# Wrap in a train input_fn
n_epochs = 1
batch_size = 32

def eval_input_fn():
    # Files for training
    files = (data_dir / 'test' / 'boston-*.csv').as_posix()
    # Build data set
    dataset = tf.data.Dataset.list_files(files)
    dataset = dataset.flat_map(lambda f: tf.data.TextLineDataset(f).skip(1))
    dataset = dataset.map(parse_row)
    # Repeat the dataset
    dataset = dataset.repeat(n_epochs)
    # Shuffle data
    # dataset = dataset.shuffle(buffer_size=1024)
    # Get a batch of data
    dataset = dataset.batch(batch_size)
    # Preload next batch to speed up training
    dataset = dataset.prefetch(buffer_size=batch_size)
    iterator = dataset.make_one_shot_iterator()
    features, label = iterator.get_next()
    return features, label

In [12]:
metrics = estimator.evaluate(input_fn=eval_input_fn)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-03-02T19:44:40Z
INFO:tensorflow:Graph was finalized.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models\model_20190302_144436\model.ckpt-64
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-03-02-19:44:40
INFO:tensorflow:Saving dict for global step 64: average_loss = 83.72892, global_step = 64, label/mean = 23.321783, loss = 2114.1553, prediction/mean = 23.39104
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 64: models\model_20190302_144436\model.ckpt-64


In [13]:
for k, v in metrics.items():
    print('{0}: {1}'.format(k, v))

average_loss: 83.72891998291016
label/mean: 23.3217830657959
loss: 2114.1552734375
prediction/mean: 23.391040802001953
global_step: 64


## 4. Predict

Guess. The output of the `predict` method is an iterator of the results, with each result being a dict with a `'predictions'` key and value the predicted value.

In [14]:
# Wrap in a train input_fn
n_epochs = 1
batch_size = 32

def pred_input_fn():
   # Files for training
    files = (data_dir / 'test' / 'boston-*.csv').as_posix()
    # Build data set
    dataset = tf.data.Dataset.list_files(files)
    dataset = dataset.flat_map(lambda f: tf.data.TextLineDataset(f).skip(1))
    dataset = dataset.map(parse_row)
    # Repeat the dataset
    dataset = dataset.repeat(n_epochs)
    # Shuffle data
    # dataset = dataset.shuffle(buffer_size=1024)
    # Get a batch of data
    dataset = dataset.batch(batch_size)
    # Preload next batch to speed up training
    dataset = dataset.prefetch(buffer_size=batch_size)
    iterator = dataset.make_one_shot_iterator()
    features, label = iterator.get_next()
    # We could actually also return the labels here; they would get ignored.
    # The commented out returns all work
    
    # return features, label
    # return features, None
    return features
    

In [15]:
predictions = estimator.predict(input_fn=pred_input_fn)

In [16]:
for _ in range(5):
    print(next(predictions))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from models\model_20190302_144436\model.ckpt-64
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
{'predictions': array([28.807257], dtype=float32)}
{'predictions': array([26.307745], dtype=float32)}
{'predictions': array([23.498741], dtype=float32)}
{'predictions': array([15.082013], dtype=float32)}
{'predictions': array([24.148497], dtype=float32)}


In [17]:
# Examining the results
file = data_dir / 'test' / 'boston-0.csv'
df = pd.read_csv(file)
# Need to re-predict since we used up some of the iterator...
predictions = estimator.predict(input_fn=pred_input_fn)

df['predicted'] = np.array(list(x['predictions'][0] for x in predictions))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from models\model_20190302_144436\model.ckpt-64
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


In [18]:
df.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,target,predicted
0,0.0837,45.0,3.44,Y,0.437,7.185,38.9,4.5667,5,398.0,15.2,396.9,5.39,34.9,28.807257
1,0.05023,35.0,6.06,Y,0.4379,5.706,28.4,6.6407,1,304.0,16.9,394.02,12.43,17.1,26.307745
2,0.03961,0.0,5.19,Y,0.515,6.037,34.5,5.9853,5,224.0,20.2,396.9,8.01,21.1,23.498741
3,1.38799,0.0,8.14,Y,0.538,5.95,82.0,3.99,4,307.0,21.0,232.6,27.71,13.2,15.082013
4,1.35472,0.0,8.14,Y,0.538,6.072,100.0,4.175,4,307.0,21.0,376.73,13.04,14.5,24.148497


In [19]:
# Note this is basically the same value as what the eval did.
mse = ((df['target'] - df['predicted']) ** 2).mean()
print('mse:', mse)

mse: 83.7289246990946


## 5. Better design patterns

Clearly all of the different `input_fn` that needed to be made differ superficially. A better pattern is to make a genric function that can return different dataset ops depending on what we want to do with it.

In [20]:
def generic_input_fn(file, batch_size=32, n_repeat=1, shuffle=False, return_labels=False):
    # Build data set
    dataset = tf.data.Dataset.list_files(file)
    dataset = dataset.flat_map(lambda f: tf.data.TextLineDataset(f).skip(1))
    dataset = dataset.map(parse_row)
    dataset = dataset.repeat(n_repeat)
    if shuffle:
        dataset = dataset.shuffle(buffer_size=1024)
    dataset = dataset.batch(batch_size)
    dataset = dataset.prefetch(buffer_size=batch_size)
    iterator = dataset.make_one_shot_iterator()
    features, label = iterator.get_next()
    if not return_labels:
        label = None
    return features, label

Now we can either use `lambda` function to make the desired `input_fn`, or wrap the above function in a function factory.

In [21]:
# Lambda example:
file = (data_dir / 'test' / 'boston-*.csv').as_posix()
estimator.evaluate(input_fn=lambda: generic_input_fn(file, return_labels=True))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-03-02T19:44:43Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from models\model_20190302_144436\model.ckpt-64
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-03-02-19:44:43
INFO:tensorflow:Saving dict for global step 64: average_loss = 83.72892, global_step = 64, label/mean = 23.321783, loss = 2114.1553, prediction/mean = 23.39104
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 64: models\model_20190302_144436\model.ckpt-64


{'average_loss': 83.72892,
 'label/mean': 23.321783,
 'loss': 2114.1553,
 'prediction/mean': 23.39104,
 'global_step': 64}

In [22]:
# Function factory example:
def get_input_fn(mode):
    if mode == tf.estimator.ModeKeys.TRAIN:
        files = (data_dir / 'train' / 'boston-*.csv').as_posix()
        return lambda: generic_input_fn(
            files, 
            batch_size=32, 
            n_repeat=100, 
            shuffle=True, 
            return_labels=True
        )
    elif mode == tf.estimator.ModeKeys.EVAL:
        files = (data_dir / 'test' / 'boston-*.csv').as_posix()
        return lambda: generic_input_fn(
            files, 
            batch_size=32, 
            n_repeat=20, 
            shuffle=False, 
            return_labels=True
        )
    elif mode == tf.estimator.ModeKeys.PREDICT:
        files = (data_dir / 'test' / 'boston-*.csv').as_posix()
        return lambda: generic_input_fn(
            files, 
            batch_size=32, 
            n_repeat=1, 
            shuffle=False, 
            return_labels=False
        )

In [23]:
estimator.evaluate(input_fn=get_input_fn(tf.estimator.ModeKeys.EVAL))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-03-02T19:44:44Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from models\model_20190302_144436\model.ckpt-64
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-03-02-19:44:44
INFO:tensorflow:Saving dict for global step 64: average_loss = 83.728905, global_step = 64, label/mean = 23.321775, loss = 2642.6936, prediction/mean = 23.391035
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 64: models\model_20190302_144436\model.ckpt-64


{'average_loss': 83.728905,
 'label/mean': 23.321775,
 'loss': 2642.6936,
 'prediction/mean': 23.391035,
 'global_step': 64}

In the function factory implementation we used the `ModeKeys` object, which is basically just an object whose attributes are standardized names for training, evaluation, and prediction phases (to avoid things like one person's code using `'eval'` vs. another person's code using `'test'`, etc.)

## 6. Train and evaluate

Lastly, TensorFlow provides a utility function `tf.estimator.train_and_evaluate` which trains the model, then evaluates it. Why not just do those two steps one after the other as above you ask? According to the TensorFlow documentation, `train_and_evaluate` provides consistent behavior for both local and distributed training/evaluation. In particular, it let's us test out our model training and evaluation pipelines locally, and get the same behavior when we move to the cloud to train with Cloud ML Engine.

**Note** In the olden days, CMLE worked by running an instance of the TensorFlow `Experiment` class, which is now deprecated. This class was replaced by the `train_and_evaluate` function. You will run across examples using the `Experiment` class online. Ignore them.

In order to use `train_and_evaluate` we need to pass to it specifications for how training and evaluation should run. These are controlled by `TrainSpec` and `EvalSpec` objects, respectively. It is best to think of these objects as just packages of parameters and not worry about them too much. They don't functionality as things in their own right.

Both `TrainSpec` and `EvalSpec` take as their primary argument the corresponding `input_fn`. For `TrainSpec`, there is another argument for `max_steps` which specifies the maximum number of steps for which to train. If set to `None`, the TensorFlow documentation says that the model will train forever, but this is wrong. It trains until the dataset iterator throws an `OutOfRangeError`.

In [25]:
train_spec = tf.estimator.TrainSpec(
    input_fn=get_input_fn(tf.estimator.ModeKeys.TRAIN),
    max_steps=None
)
eval_spec = tf.estimator.EvalSpec(
    input_fn=get_input_fn(
        tf.estimator.ModeKeys.EVAL
    )
)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from models\model_20190302_144436\model.ckpt-64
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 64 into models\model_20190302_144436\model.ckpt.
INFO:tensorflow:loss = 1280.8647, step = 65
INFO:tensorflow:global_step/sec: 522.504
INFO:tensorflow:loss = 1219.2782, step = 165 (0.191 sec)
INFO:tensorflow:global_step/sec: 7

({'average_loss': 46.249863,
  'label/mean': 23.321775,
  'loss': 1459.7612,
  'prediction/mean': 23.53985,
  'global_step': 1327},
 [])

We can also export the model for serving while using `train_and_evaluate`. To do so, we need to pass an `Exporter` to our `EvalSpec`. 

Just like how the `TrainSpec` and `EvalSpec` need an `input_fn` to know what nodes to add to the model's computation graph to prepare it for training and evaluation, the `Exporter` needs a `serving_input_fn` to know what nodes to add to the graph for serving predictions. There are a few different types of exporters

You might think that we would just use the `prediction_input_fn` above for this `serving_input_fn`, but this isn't quite the case. Our `prediction_input_fn` is designed for ingesting in data from a file we have access to. When we serve the model on the cloud, predictions will be made by sending json requests to the server serving our model. Thus we need to add `tf.placeholder` nodes for input to the model, and some logic to parse a json request. This additional logic is handled by a `ServingInputReceiver` object. *I need to look into these more...*


OKAY SO : Eval steps only called at checkpoints, so in tensorboard, you will have an eval metric point for each checkpoint. In the model_dir there is an initial checkpoint, which doesnt count and the rest are when the eval steps are called. so if ya want to print off an actual eval loss curve you need to set somethign to checkpoint more frequently

According to Tensorflow documentdation:
Checkpointing Frequency
By default, the Estimator saves checkpoints in the model_dir according to the following schedule:

Writes a checkpoint every 10 minutes (600 seconds).
Writes a checkpoint when the train method starts (first iteration) and completes (final iteration).
Retains only the 5 most recent checkpoints in the directory.
You may alter the default schedule by taking the following steps:

Create a tf.estimator.RunConfig object that defines the desired schedule.
When instantiating the Estimator, pass that RunConfig object to the Estimator's config argument.
For example, the following code changes the checkpointing schedule to every 20 minutes and retains the 10 most recent checkpoints:

In [26]:
def get_dtype(col):
    # Not a great solution, but whatever
    if col == 'chas':
        return tf.string
    elif col == 'rad':
        return tf.int64
    else:
        return tf.float32

def serving_input_fn():
    # Expected json input
    feature_placeholders = dict(
        (col, tf.placeholder(get_dtype(col), (None))) for col in feature_cols
    )
    # Can do aditional transformations to prepare for model if needed
    features = feature_placeholders
    return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)   

In [27]:
# Reinstantiating model so that logs aren't polluted by previous runs
model_dir = pathlib.Path('models')
if not model_dir.exists():
    model_dir.mkdir()

model_dir = model_dir / datetime.datetime.now().strftime('model_%Y%m%d_%H%M%S/')

run_config = tf.estimator.RunConfig(
    save_checkpoints_steps=100,
    keep_checkpoint_max=20,
)

estimator = tf.estimator.LinearRegressor(
    feature_columns,
    model_dir=model_dir,
    config=run_config
)

train_spec = tf.estimator.TrainSpec(
    input_fn=get_input_fn(tf.estimator.ModeKeys.TRAIN),
    max_steps=None
)

# Add exporter to EvalSpec
exporter = tf.estimator.LatestExporter('exporter', serving_input_fn)

eval_spec = tf.estimator.EvalSpec(
    input_fn=get_input_fn(tf.estimator.ModeKeys.EVAL),
#     exporters=exporter,
)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

INFO:tensorflow:Using config: {'_model_dir': 'models\\model_20190302_144601', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 100, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001F544274668>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and eval

INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from models\model_20190302_144601\model.ckpt-700
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [10/100]
INFO:tensorflow:Evaluation [20/100]
INFO:tensorflow:Evaluation [30/100]
INFO:tensorflow:Evaluation [40/100]
INFO:tensorflow:Evaluation [50/100]
INFO:tensorflow:Evaluation [60/100]
INFO:tensorflow:Finished evaluation at 2019-03-02-19:46:15
INFO:tensorflow:Saving dict for global step 700: average_loss = 53.43649, global_step = 700, label/mean = 23.321775, loss = 1686.5892, prediction/mean = 22.005993
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 700: models\model_20190302_144601\model.ckpt-700
INFO:tensorflow:global_step/sec: 69.1682
INFO:tensorflow:loss = 530.1941, step = 701 (1.447 sec)
INFO:tensorflow:Saving checkpoints for 800 into models\model_20190302_144601\model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done c

({'average_loss': 48.288555,
  'label/mean': 23.321775,
  'loss': 1524.1075,
  'prediction/mean': 23.739315,
  'global_step': 1263},
 [])