<center> Building Input Functions with tf.contrib.learn <center>
---

In [1]:
import itertools
import pandas as pd
import numpy as np
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.INFO)

---
Custom Input Pipelines with input_fn
---

When training a neural network using tf.contrib.learn, it's possible to pass your feature and target data directly into your fit, evaluate, or predict operations. This approach works well when little to no manipulation of source data is required. But in cases where more feature engineering is needed, tf.contrib.learn supports using a custom input function (input_fn) to encapsulate the logic for preprocessing and piping data into your models.

The following code illustrates the basic skeleton for an input function:

In [2]:
def my_input_fn():
    # Preprocess your data here...

    # ...then return 1) a mapping of feature columns to Tensors with
    # the corresponding feature data, and 2) a Tensor containing labels
    return feature_cols, labels

The body of the input function contains the specific logic for preprocessing your input data, such as scrubbing out bad examples or feature scaling.

Input functions must return the following two values containing the final feature and label data to be fed into your model (as shown in the above code skeleton):

    -feature_cols: a dict containing key/value pairs that map feature column names to Tensors (or SparseTensors)    containing the corresponding feature data.
    -labels: a Tensor containing your label (target) values: the values your model aims to predict.

**If your feature/label data is stored in pandas dataframes or numpy arrays, you'll need to convert it to Tensors before returning it from your input_fn.**

For continuous data, you can create and populate a Tensor using tf.constant:



In [3]:
feature_column_data = [1, 2.4, 0, 9.9, 3, 120]
feature_tensor = tf.constant(feature_column_data)

For sparse, categorical data (data where the majority of values are 0), you'll instead want to populate a SparseTensor, which is instantiated with three arguments:

    -dense_shape -> The shape of the tensor. Takes a list indicating the number of elements in each dimension. For example, dense_shape=[3,6] specifies a two-dimensional 3x6 tensor, dense_shape=[2,3,4] specifies a three-dimensional 2x3x4 tensor, and dense_shape=[9] specifies a one-dimensional tensor with 9 elements.
    -indices -> The indices of the elements in your tensor that contain nonzero values. Takes a list of terms, where each term is itself a list containing the index of a nonzero element. (Elements are zero-indexed—i.e., [0,0] is the index value for the element in the first column of the first row in a two-dimensional tensor.) For example, indices=[[1,3], [2,4]] specifies that the elements with indexes of [1,3] and [2,4] have nonzero values.
    -values -> A one-dimensional tensor of values. Term i in values corresponds to term i in indices and specifies its value. For example, given indices=[[1,3], [2,4]], the parameter values=[18, 3.6] specifies that element [1,3] of the tensor has a value of 18, and element [2,4] of the tensor has a value of 3.6.


In [4]:
sparse_tensor = tf.SparseTensor(indices=[[0,1], [2,4]],
                                values=[6, 0.5],
                                dense_shape=[3, 5])

---
Passing input_fn Data to Your Model
---

To feed data to your model for training, you simply pass the input function you've created to your fit operation as the value of the input_fn parameter, e.g.:

**classifier.fit(input_fn=my_input_fn, steps=2000)**

Note that the input_fn is responsible for supplying both feature and label data to the model, and replaces both the x and y parameters in fit. If you supply an input_fn value to fit that is not None in conjunction with either an x or y parameter that is not None, it will result in a ValueError.

Also note that the input_fn parameter must receive a function object (i.e., input_fn=my_input_fn), not the return value of a function call (input_fn=my_input_fn()). This means that if you try to pass parameters to the input function in your fit call, as in the following code, it will result in a **TypeError:**

**classifier.fit(input_fn=my_input_fn(training_set), steps=2000)**

However, if you'd like to be able to parameterize your input function, there are other methods for doing so. You can employ a wrapper function that takes no arguments as your input_fn and use it to invoke your input function with the desired parameters. For example:

def my_input_function_training_set():
  return my_input_function(training_set)

classifier.fit(input_fn=my_input_fn_training_set, steps=2000)

Alternatively, you can use Python's functools.partial function to construct a new function object with all parameter values fixed:

classifier.fit(input_fn=functools.partial(my_input_function,
                                          data_set=training_set), steps=2000)
                                          
**A third option is to wrap your input_fn invocation in a lambda and pass it to the input_fn parameter:**

**classifier.fit(input_fn=lambda: my_input_fn(training_set), steps=2000)**

**One big advantage of architecting your input pipeline as shown above—to accept a parameter for data set—is that you can pass the same input_fn to evaluate and predict operations by just changing the data set argument, e.g.:**

**classifier.evaluate(input_fn=lambda: my_input_fn(test_set), steps=2000)**

This approach enhances code maintainability: no need to capture x and y values in separate variables (e.g., x_train, x_test, y_train, y_test) for each type of operation.

---
A Neural Network Model for Boston House Values
---

In [5]:
COLUMNS = ["crim", "zn", "indus", "nox", "rm", "age",
           "dis", "tax", "ptratio", "medv"]
FEATURES = ["crim", "zn", "indus", "nox", "rm",
            "age", "dis", "tax", "ptratio"]
LABEL = "medv"
path = "/home/antonio/Desktop/TensorFlow_Tutorials/Data/BostonHouseValues/"

training_set = pd.read_csv(path + "boston_train.csv", skipinitialspace=True,
                           skiprows=1, names=COLUMNS)
test_set = pd.read_csv(path + "boston_test.csv", skipinitialspace=True,
                       skiprows=1, names=COLUMNS)
prediction_set = pd.read_csv(path + "boston_predict.csv", skipinitialspace=True,
                             skiprows=1, names=COLUMNS)

### Defining FeatureColumns and Creating the Regressor

Next, create a list of FeatureColumns for the input data, which formally specify the set of features to use for training. Because all features in the housing data set contain continuous values, you can create their FeatureColumns using the tf.contrib.layers.real_valued_column() function:

In [6]:
feature_cols = [tf.contrib.layers.real_valued_column(k)
                  for k in FEATURES]

Now, instantiate a DNNRegressor for the neural network regression model. You'll need to provide two arguments here: hidden_units, a hyperparameter specifying the number of nodes in each hidden layer (here, two hidden layers with 10 nodes each), and feature_columns, containing the list of FeatureColumns you just defined:

In [7]:
regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_cols,
                                          hidden_units=[10, 10],
                                          model_dir="/tmp/boston_model")

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_summary_steps': 100, '_keep_checkpoint_max': 5, '_master': '', '_task_type': None, '_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_steps': None, '_environment': 'local', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7efeabc7ffd0>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_task_id': 0, '_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_is_chief': True, '_evaluation_master': '', '_tf_random_seed': None}


### Building the input_fn

To pass input data into the regressor, create an input function, which will accept a pandas Dataframe and return feature column and label values as Tensors:

In [8]:
def input_fn(data_set):
    feature_cols = {k: tf.constant(data_set[k].values)
                  for k in FEATURES}
    labels = tf.constant(data_set[LABEL].values)
    return feature_cols, labels

In [9]:
regressor.fit(input_fn=lambda: input_fn(training_set), steps=20000)

Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 5001 into /tmp/boston_model/model.ckpt.
INFO:tensorflow:step = 5001, loss = 45.0254
INFO:tensorflow:global_step/sec: 1157.93
INFO:tensorflow:step = 5101, loss = 37.8412
INFO:tensorflow:global_step/sec: 1163.11
INFO:tensorflow:step = 5201, loss = 37.5019
INFO:tensorflow:global_step/sec: 1113.11
INFO:tensorflow:step = 5301, loss = 37.3055
INFO:tensorflow:global_step/sec: 1033.45
INFO:tensorflow:step = 5401, loss = 37.1244
INFO:tensorflow:global_step/sec: 1302.18
INFO:tensorflow:step = 5501, loss = 36.9966
INFO:tensorflow:global_step/sec: 1118.48
INFO:tensorflow:step = 5601, loss = 47

INFO:tensorflow:step = 11701, loss = 34.8166
INFO:tensorflow:global_step/sec: 1145.26
INFO:tensorflow:step = 11801, loss = 34.8127
INFO:tensorflow:global_step/sec: 1295.43
INFO:tensorflow:step = 11901, loss = 34.8108
INFO:tensorflow:global_step/sec: 1166.51
INFO:tensorflow:step = 12001, loss = 34.8267
INFO:tensorflow:global_step/sec: 1023.19
INFO:tensorflow:step = 12101, loss = 35.0115
INFO:tensorflow:global_step/sec: 1225.17
INFO:tensorflow:step = 12201, loss = 36.7807
INFO:tensorflow:global_step/sec: 1244.67
INFO:tensorflow:step = 12301, loss = 35.1419
INFO:tensorflow:global_step/sec: 1126.57
INFO:tensorflow:step = 12401, loss = 34.8101
INFO:tensorflow:global_step/sec: 1166.41
INFO:tensorflow:step = 12501, loss = 34.7874
INFO:tensorflow:global_step/sec: 1037.44
INFO:tensorflow:step = 12601, loss = 34.7826
INFO:tensorflow:global_step/sec: 1133.68
INFO:tensorflow:step = 12701, loss = 34.7789
INFO:tensorflow:global_step/sec: 1172.94
INFO:tensorflow:step = 12801, loss = 34.7754
INFO:tens

INFO:tensorflow:global_step/sec: 1070.56
INFO:tensorflow:step = 21301, loss = 34.5663
INFO:tensorflow:global_step/sec: 1124.48
INFO:tensorflow:step = 21401, loss = 34.5653
INFO:tensorflow:global_step/sec: 1309.5
INFO:tensorflow:step = 21501, loss = 34.5622
INFO:tensorflow:global_step/sec: 1297.53
INFO:tensorflow:step = 21601, loss = 34.5591
INFO:tensorflow:global_step/sec: 1286
INFO:tensorflow:step = 21701, loss = 34.5583
INFO:tensorflow:global_step/sec: 654.985
INFO:tensorflow:step = 21801, loss = 34.5552
INFO:tensorflow:global_step/sec: 785.68
INFO:tensorflow:step = 21901, loss = 34.5545
INFO:tensorflow:global_step/sec: 927.273
INFO:tensorflow:step = 22001, loss = 34.5514
INFO:tensorflow:global_step/sec: 1113.81
INFO:tensorflow:step = 22101, loss = 34.5484
INFO:tensorflow:global_step/sec: 1184.76
INFO:tensorflow:step = 22201, loss = 34.5453
INFO:tensorflow:global_step/sec: 1348.93
INFO:tensorflow:step = 22301, loss = 34.5446
INFO:tensorflow:global_step/sec: 1202.65
INFO:tensorflow:st

DNNRegressor(params={'dropout': None, 'gradient_clip_norm': None, 'embedding_lr_multipliers': None, 'input_layer_min_slice_size': None, 'hidden_units': [10, 10], 'optimizer': None, 'head': <tensorflow.contrib.learn.python.learn.estimators.head._RegressionHead object at 0x7efeaed6d8d0>, 'activation_fn': <function relu at 0x7efeaf27d9d8>, 'feature_columns': (_RealValuedColumn(column_name='crim', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='zn', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='indus', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='nox', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='rm', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='age', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _Rea

In [10]:
ev = regressor.evaluate(input_fn=lambda: input_fn(test_set), steps=1)

Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.
INFO:tensorflow:Starting evaluation at 2017-04-06-21:50:50
INFO:tensorflow:Evaluation [1/1]
INFO:tensorflow:Finished evaluation at 2017-04-06-21:50:50
INFO:tensorflow:Saving dict for global step 25000: global_step = 25000, loss = 17.1096


In [11]:
loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))

Loss: 17.109570


In [14]:
y = regressor.predict(input_fn=lambda: input_fn(prediction_set))
# .predict() returns an iterator; convert to a list and print predictions
predictions = list(itertools.islice(y, 6))
print ("Predictions: {}".format(str(predictions)))

Predictions: [34.484119, 19.360765, 23.620646, 34.523815, 13.196375, 19.57361]
