# [Datasets Introduction for TensorFlow v.1.4.](# https://developers.googleblog.com/2017/09/introducing-tensorflow-datasets.html)  

I updated the code examples from a Sept. 2017 official Google/TensorFlow blog post on the Dataset API to v. 1.4 of TensorFlow.  

cf. Blogpost [Introduction to TensorFlow Datasets and Estimators
Tuesday, September 12, 2017
](https://developers.googleblog.com/2017/09/introducing-tensorflow-datasets.html), for the [Google Developers Blog](https://developers.googleblog.com/), https://goo.gl/Ujm2Ep  

Also the original code for v. 1.3. in Python was [here](https://github.com/mhyttsten/Misc/blob/master/Blog_Estimators_DataSet.py)

In [1]:
import tensorflow
import tensorflow as tf

In [2]:
import os, sys
import urllib

In [3]:
# Check that we have the correct TensorFlow version installed  
tf_version = tf.__version__
print("TensorFlow version: {}".format(tf_version))
assert "1.3" <= tf_version, "TensorFlow r1.3 or later is needed"

TensorFlow version: 1.4.0


In [4]:
# Windows users: You only need to change PATH, 
# otherwise I have changed this PATH to be a local relative path `./` 
# from the original, PATH = "/tmp/tf_dataset_and_estimator_apis"
print(os.sep)
PATH = '.'

/


# `.csv` $\to $ `tf.data.Dataset`

Fetch and store Training and Test dataset files 

In [5]:
PATH_DATASET = PATH + os.sep + "dataset"
print(PATH_DATASET)
FILE_TRAIN = PATH_DATASET + os.sep + "iris_training.csv"
FILE_TEST  = PATH_DATASET + os.sep + "iris_test.csv"
URL_TRAIN  = "http://download.tensorflow.org/data/iris_training.csv"
URL_TEST   = "http://download.tensorflow.org/data/iris_test.csv"

./dataset


In [6]:
def downloadDataset(url, file):
    if not os.path.exists(PATH_DATASET):
        os.makedirs(PATH_DATASET)
    if not os.path.exists(file):
        data = urllib.urlopen(url).read()
        with open(file,"w") as f:
            f.write(data)
            f.close()

In [7]:
downloadDataset(URL_TRAIN,FILE_TRAIN)
downloadDataset(URL_TEST,FILE_TEST)

In [8]:
os.listdir( PATH_DATASET )

['iris_training.csv', 'iris_test.csv']

In [17]:
# tf.logging.set_verbosity(tf.logging.INFO)  

In [9]:
# The CSV features in our training & test data
feature_names = [
    'SepalLength',
    'SepalWidth',
    'PetalLength',
    'PetalWidth'
]

## Introducing the Datasets   

cf. https://developers.googleblog.com/2017/09/introducing-tensorflow-datasets.html  

Datasets is a new way to create input pipelines to TensorFlow models.  This API is much more performant than using `feed_dict` or the queue-based pipelines, and it's cleaner and easier to use.  

Although Datasets still resides in `tf.contrib.data` at 1.3, it's expected this API will move to core at 1.4, so it's high time to take it for a test drive.  

At a high-level, Datasets consists of the following classes and relations:

$$
\text{TextLineDataset}, \text{TFRecordDataset}, \text{FixedLengthRecordDataset} \in \text{subclass} \to \text{Dataset} \xrightarrow{\text{instantiates}} \text{Iterator}
$$  

where  
* Dataset : Base class containing methods to create and transform datasets.  Also, allows you to initialize a dataset from data in memory, or from a Python generator  
* TextLineDataset : Reads lines from text files.  
* TFRecordDataset : Reads records from TFRecord files.
* FixedLengthRecordDataset : Reads fixed size records from binary files. 
* Iterator : Provdes a way to access 1 dataset element at a time 

Create an input function, reading a file using the Dataset API.  
Then provide the results to the Estimator API.  

Arguments for this function are:
* `file_path` : data file to read
* `perform_shuffle` : whether record order should be randomized 
* `repeat_count` : number of times to iterate over records in the dataset.  e.g. if `repeat_count=1`, then each record is read once.  If `repeat_count=None`, iteration will continue forever.  

In [10]:
def my_input_fn(file_path, perform_shuffle=False, repeat_count=1):
    def decode_csv(line):
        parsed_line = tf.decode_csv(line, [[0.],[0.,],[0.,],[0.,], [0]])
        label = parsed_line[-1:] # Last element is the label
        del parsed_line[-1] # Delete last element
        features = parsed_line # Everything but last elements are the features 
        d = dict(zip(feature_names, features)), label
        return d  
    
    dataset = (tf.data.TextLineDataset(file_path) # Read text file 
              .skip(1) # Skip header row
              .map(decode_csv)) # Transform each elem by applying decode_csv fn
    if perform_shuffle:
        # Randomizes input using a window of 256 elements (read into memory)
        dataset = dataset.shuffle(buffer_size=256)
    dataset = dataset.repeat(repeat_count) # Repeats dataset this # times 
    dataset = dataset.batch(32) # Batch size to use 
    iterator = dataset.make_one_shot_iterator()
    batch_features, batch_labels= iterator.get_next()
    return batch_features, batch_labels

When we train our model, we'll need a function that reads the input file and returns the feature and label data.  Estimators require that you create a function that'll have a return value of a tuple.  

The return value must be a 2-element tuple organized as follows:
* The 1st element must be a `dict` in which each input feature is a key, and then a list of values for the training batch.  
* 2nd element is a list of labels for the training batch.  

Since we're returning a batch of input features and training labels, it means that all lists in the return statement will have equal lengths.  Technically speaking, whenever we referred to "list" here, we actually mean a 1-dim. TensorFlow tensor.  

Note the following: 
* `TextLineDataset` : The Dataset API will do a lot of memory management for you when you're using its file-based datasets.  You can, for example, read in dataset files much larger than memory or read in multiple files by specifying a list as argument 
* `shuffle` : reads buffer_size records, then shuffles (randomizes) their order 
* `map` : calls the `decode_csv` function with each element in the dataset as an argument (since we're using `TextLineDataset`, each element will be a line of `.csv` text).  Then we apply `decode_csv` to each of the lines  
* `decode_csv` : splits each line into fields, providing the default values if necessary.  Then returns a dict with the field keys and field values.  The map function updates each elem (line) in the dataset with the `dict`.  

That's an introduction to Datasets!  For fun, use this function to print the first batch:

In [17]:
next_batch = my_input_fn(FILE_TRAIN, True) # Will return 32 random elements 

In [18]:
# Now let's try it out, retrieving and printing 1 batch of data.
# Although this code looks strange, you don't need to understand
# the details
with tf.Session() as sess:
    first_batch = sess.run(next_batch)
print(type(first_batch)) # tuple
print(len(first_batch)) # 2
print(type(first_batch[0])) # dict
print(first_batch[0].keys())
print(type(first_batch[0]['SepalLength'] )) # np.array
print(type(first_batch[0][ feature_names[1]] )) # np.array
print(type(first_batch[0][ feature_names[2]] )) # np.array
print(type(first_batch[0][ feature_names[3]] )) # np.array
print(first_batch[0]['SepalLength'].shape ) # np.array
print(first_batch[0][ feature_names[1]].shape ) # np.array
print(first_batch[0][ feature_names[2]].shape ) # np.array
print(first_batch[0][ feature_names[3]].shape ) # np.array
print(type(first_batch[1])) # np.array
print(first_batch[1].shape) # (32,1)
print(first_batch)

<type 'tuple'>
2
<type 'dict'>
['SepalLength', 'PetalWidth', 'PetalLength', 'SepalWidth']
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
(32,)
(32,)
(32,)
(32,)
<type 'numpy.ndarray'>
(32, 1)
({'SepalLength': array([ 6.30000019,  5.        ,  6.80000019,  6.4000001 ,  5.69999981,
        6.30000019,  4.9000001 ,  6.69999981,  4.5999999 ,  5.69999981,
        4.80000019,  5.19999981,  7.5999999 ,  6.69999981,  5.4000001 ,
        6.9000001 ,  7.9000001 ,  6.        ,  4.4000001 ,  6.69999981,
        5.        ,  6.4000001 ,  6.30000019,  5.69999981,  7.69999981,
        4.9000001 ,  5.        ,  5.19999981,  5.0999999 ,  5.0999999 ,
        7.        ,  5.        ], dtype=float32), 'PetalWidth': array([ 1.79999995,  0.2       ,  1.39999998,  2.20000005,  1.29999995,
        1.29999995,  0.1       ,  2.4000001 ,  0.2       ,  0.40000001,
        0.2       ,  0.2       ,  2.0999999 ,  2.29999995,  0.40000001,
        1.5       ,  2.        ,  

Indeed, as a sanity check, we can look at the `.csv` files in `pandas`:  

In [13]:
import pandas
import pandas as pd

In [14]:
first_batch_DF = pd.read_csv(FILE_TRAIN)
print(first_batch_DF.head() )
first_batch_DF.describe()

   120    4  setosa  versicolor  virginica
0  6.4  2.8     5.6         2.2          2
1  5.0  2.3     3.3         1.0          1
2  4.9  2.5     4.5         1.7          2
3  4.9  3.1     1.5         0.1          0
4  5.7  3.8     1.7         0.3          0


Unnamed: 0,120,4,setosa,versicolor,virginica
count,120.0,120.0,120.0,120.0,120.0
mean,5.845,3.065,3.739167,1.196667,1.0
std,0.868578,0.427156,1.8221,0.782039,0.840168
min,4.4,2.0,1.0,0.1,0.0
25%,5.075,2.8,1.5,0.3,0.0
50%,5.8,3.0,4.4,1.3,1.0
75%,6.425,3.3,5.1,1.8,2.0
max,7.9,4.4,6.9,2.5,2.0


In [12]:
# The CSV features in our training & test data
feature_names = [
    'SepalLength',
    'SepalWidth',
    'PetalLength',
    'PetalWidth'
]

In [13]:
def from_csv_to_ds_input_fn(file_path, m_i, feature_names, 
                            repeat_count=None, perform_shuffle=False, 
                               shuffle_buffer_size=4096):
    """
    @fn from_csv_to_ds_input_fn
    @param file_path
    @param m_i, a positive integer, number of examples in a batch
    @param feature_names, list of strings to name your d features 
    @param repeat_count, a positive integer or None, number of times to repeat, 
                None is for indefinitely
    @param perform_shuffle = False , performs shuffling of data or not 
    @param shuffle_buffer_size = 4096, a positive integer
    """
        
    def decode_csv(line):
        parsed_line = tf.decode_csv(line, record_defaults=[[0.],[0.,],[0.,],[0.,], [0]],
                                        field_delim=',')
        label = parsed_line[-1:] # Last element is the label
        del parsed_line[-1] # Delete last element
        features = parsed_line # Everything but last elements are the features 

        # X_i, y_i, only the last value in a line is the output value, y
        # prior values, associated with a "feature_name", are input values  
        d = dict(zip(feature_names, features)), label  
        return d 
    
    dataset = (tf.data.TextLineDataset(file_path) # Read text file 
                .skip(1) # Skip header row
                .map(decode_csv, num_parallel_calls=m_i) # Transform each elem by applying decode_csv fn
                .batch(m_i) # Batch size to use
              )
    if repeat_count is None:
        dataset = dataset.repeat() # repeat indefinitely
    else:
        dataset = dataset.repeat(repeat_count) # Repeats dataset this # times 

    if perform_shuffle:
        # Randomizes input using a window of shuffle_buffer_size elements (read into memory)
        dataset = dataset.shuffle(buffer_size=shuffle_buffer_size)
    
    # create iterator
    iterator = dataset.make_one_shot_iterator()
#    iterator = dataset.make_initializable_iterator()

    # Separate the input X data from the output y data
    batch_features, batch_labels= iterator.get_next() 

    return batch_features, batch_labels

## Introducing Estimators  

In [14]:
# Create the feature_columns, which specifies the input to our model
# All our input features are numeric, so use numeric_column for each one 
feature_columns = [tf.feature_column.numeric_column(k) for k in feature_names]

For convenience (and education), let's use a `DNNClassifier`, and reiterate what the [API guide](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier) says:  

### `tf.estimator.DNNClassifier`  

Class `DNNClassifier`  

#### `__init__`  
```  
__init__(
    hidden_units, 
    feature_columns,
    model_dir=None,
    n_classes=2,
    weight_column=None,
    label_vocabulary=None, 
    optimizer='Adagrad', # an instance of `tf.Optimizer` used to train the model  
    activation_fn=tf.nn.relu, 
    dropout=None, # When not None, probability we will drop out a given coordinate
    input_layer_partitioner=None, # input_layer_partitioner: Optional.  Partitioner for input layer.  
    config=None # RunConfig object configure the runtime settings
```  

For instance, for Args: 

`label_vocabulary`, - list of strings representing possible label values.  If given, labels must be string type and have any value in `label_vocabulary`.  If it's not given, that means labels are already encoded as inter or float within [0,1] for n_classes, and encoded as integer values in {0,1,...n_classses-1} for `n_classes` >2.  

#### Find a place to store our model  
We'll have to create a new directory to store our model, graph, checkpoints; otherwise, we'll have to keep deleting all these files that are created.  

In [35]:
DNNClassPATH = PATH + os.sep + 'DNNClass_iris'
if not os.path.exists(DNNClassPATH):
    os.makedirs(DNNClassPATH)


In [37]:
# Create a deep neural network regression classifier 
# Use the DNNClassifier pre-made estimator  
classifier = tf.estimator.DNNClassifier(
    feature_columns=feature_columns, # The input features to our model
    hidden_units=[10,10], # Two layers, each with 10 neurons
    n_classes=3,
    model_dir=DNNClassPATH) # Path to where checkpoints etc. are stored  

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd3a4550a50>, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': './DNNClass_iris', '_save_summary_steps': 100}


## Training the model 

In [None]:
# originally
#classifier.train(
#    input_fn=lambda: my_input_fn(FILE_TRAIN,  True, 8))

INFO:tensorflow:Create CheckpointSaverHook.


In [38]:
# Train our model, use the previous function from_csv_to_ds_input_fn  
# Input to training is a file with training example
# Stop training after 8 iterations of the training of the data (epochs)

# Notice the parameter input_fn - we want the function that'll return our X_i, y_i, 
# NOT X_i, y_i themselves

classifier.train(
    input_fn=lambda: from_csv_to_ds_input_fn(FILE_TRAIN,
                                                128, 
                                                feature_names,
                                                 8, 
                                             True))

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ./DNNClass_iris/model.ckpt.
INFO:tensorflow:loss = 192.796, step = 1
INFO:tensorflow:Saving checkpoints for 8 into ./DNNClass_iris/model.ckpt.
INFO:tensorflow:Loss for final step: 65.1756.


<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x7fd3a4550bd0>

`lambda: my_input_fn(FILE_TRAIN, True, 8)` is where we hook up Datasets with the Estimators!  

Estimators need data to perform training, evaluation, and prediction, and it uses the `input_fn` to fetch the data.  

Estimators require an `input_fn` with no arguments, so we create a function with no arguments using `lambda`, which calls `input_fn` with the desired arguments: `file_path`, `shuffle_setting`, `repeat_count`  

In our case, we use our `my_input_fn`, passing it  
* `FILE_TRAIN`, which is the training data file.  
* `True`, which tells the Estimator to shuffle the data.
* `8`, which tells the Estimator to and repeat the dataset 8 times.  

## Evaluating Our Trained Model  

How can we evaluate how well it's performing?  Fortunately, every Estimator contains an `evaluate` method.

In [39]:
# original implementation

# Evaluate our model using the examples contained in FILE_TEST
# Return value will contain evaluation_metrics such as : loss & average_loss
#evaluate_result = classifier.evaluate(
#    input_fn=lambda: my_input_fn(FILE_TEST, False, 4))
#print("Evaluation results")
#for key in evaluate_result:
#    print("   {}, was: {}".format(key, evaluate_result[key]))

INFO:tensorflow:Starting evaluation at 2017-11-04-05:49:18
INFO:tensorflow:Restoring parameters from ./DNNClass_iris/model.ckpt-8
INFO:tensorflow:Finished evaluation at 2017-11-04-05:49:18
INFO:tensorflow:Saving dict for global step 8: accuracy = 0.8, average_loss = 0.574866, global_step = 8, loss = 17.246
Evaluation results
   average_loss, was: 0.574865758419
   accuracy, was: 0.800000011921
   global_step, was: 8
   loss, was: 17.245973587


In [40]:
# Evaluate our model using the examples contained in FILE_TEST
# Return value will contain evaluation_metrics such as : loss & average_loss
evaluate_result = classifier.evaluate(
    input_fn=lambda: from_csv_to_ds_input_fn(FILE_TEST,
                                                128, 
                                                feature_names,
                                                 4, 
                                             False))
print("Evaluation results")
for key in evaluate_result:
    print("   {}, was: {}".format(key, evaluate_result[key]))

INFO:tensorflow:Starting evaluation at 2017-11-04-05:49:36
INFO:tensorflow:Restoring parameters from ./DNNClass_iris/model.ckpt-8
INFO:tensorflow:Finished evaluation at 2017-11-04-05:49:36
INFO:tensorflow:Saving dict for global step 8: accuracy = 0.8, average_loss = 0.574866, global_step = 8, loss = 17.246
Evaluation results
   average_loss, was: 0.574865877628
   accuracy, was: 0.800000011921
   global_step, was: 8
   loss, was: 17.2459754944


If we'd like to train our model further:  

In [41]:
classifier.train(
    input_fn=lambda: from_csv_to_ds_input_fn(FILE_TRAIN,
                                                128, 
                                                feature_names,
                                                 1000, 
                                             True))

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ./DNNClass_iris/model.ckpt-8
INFO:tensorflow:Saving checkpoints for 9 into ./DNNClass_iris/model.ckpt.
INFO:tensorflow:loss = 62.9349, step = 9
INFO:tensorflow:global_step/sec: 680.328
INFO:tensorflow:loss = 16.9185, step = 109 (0.148 sec)
INFO:tensorflow:global_step/sec: 669.819
INFO:tensorflow:loss = 10.5667, step = 209 (0.149 sec)
INFO:tensorflow:global_step/sec: 766.319
INFO:tensorflow:loss = 8.75861, step = 309 (0.130 sec)
INFO:tensorflow:global_step/sec: 706.323
INFO:tensorflow:loss = 7.76904, step = 409 (0.142 sec)
INFO:tensorflow:global_step/sec: 654.528
INFO:tensorflow:loss = 7.32616, step = 509 (0.153 sec)
INFO:tensorflow:global_step/sec: 652.018
INFO:tensorflow:loss = 6.71862, step = 609 (0.155 sec)
INFO:tensorflow:global_step/sec: 573.217
INFO:tensorflow:loss = 6.46066, step = 709 (0.174 sec)
INFO:tensorflow:global_step/sec: 627.762
INFO:tensorflow:loss = 6.14133, step = 809 (0.161 sec)
IN

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x7fd3a4550bd0>

In [42]:
evaluate_result = classifier.evaluate(
    input_fn=lambda: from_csv_to_ds_input_fn(FILE_TEST,
                                                128, 
                                                feature_names,
                                                 4, 
                                             False))
print("Evaluation results")
for key in evaluate_result:
    print("   {}, was: {}".format(key, evaluate_result[key]))

INFO:tensorflow:Starting evaluation at 2017-11-04-05:50:09
INFO:tensorflow:Restoring parameters from ./DNNClass_iris/model.ckpt-1008
INFO:tensorflow:Finished evaluation at 2017-11-04-05:50:09
INFO:tensorflow:Saving dict for global step 1008: accuracy = 0.966667, average_loss = 0.0595646, global_step = 1008, loss = 1.78694
Evaluation results
   average_loss, was: 0.0595646351576
   accuracy, was: 0.966666638851
   global_step, was: 1008
   loss, was: 1.78693902493


## Making Predictions Using Our Trained Model  

In [18]:
# originally
# Predict the type of some Iris flowers. 
# Let's predict the examples in FILE_TEST, repeat only once.  
#predict_results = classifier.predict(
#    input_fn=lambda: my_input_fn(FILE_TEST,False,1))
#print("Predictions on test file")

Predictions on test file


In [20]:
predict_results = classifier.predict(
    input_fn=lambda: from_csv_to_ds_input_fn(FILE_TEST,
                                                128, 
                                                feature_names,
                                                 1, 
                                             False))
print("Predictions on test file")

Predictions on test file


In [21]:
for prediction in predict_results:
    # Will print the predicted class, i.e.: 0, ,1, or 2 if the prediction 
    # is Iris Sentosa, Vericolor, Virginica, respectively.  
    print(prediction["class_ids"][0])

INFO:tensorflow:Restoring parameters from ./model.ckpt-1008
1
2
0
1
1
1
0
1
1
2
2
0
2
1
1
0
1
0
0
2
0
1
2
1
1
1
0
1
2
1


### Making Predictions on Data in Memory  

How could we make predictions on data residing in other sources, for example, in memory?  

In [22]:
# Let create a memory dataset for prediction.  
# We've taken the first 3 examples in FILE_TEST.  
prediction_input = [[5.9, 3.0, 4.2, 1.5], # -> 1, Iris Versicolor 
                    [6.9, 3.1, 5.4, 2.1], # -> 2, Iris Virginica
                    [5.1, 3.3, 1.7, 0.5]] # -> 0, Iris Sentosa 

def new_input_fn():
    def decode(x):
        x = tf.split(x, 4) # Need to split into our 4 features
        # When predicting, we don't need (or have) any labels
        return dict(zip(feature_names, x)) # To build a dict of them
    
    # The from_tensor_slices function will use a memory structure as input
    dataset = tf.data.Dataset.from_tensor_slices(prediction_input)
    dataset = dataset.map(decode)
    iterator = dataset.make_one_shot_iterator()
    next_feature_batch = iterator.get_next()
    return next_feature_batch, None # In prediction, we have no labels


In [23]:
# Predict all our prediction_input
predict_results = classifier.predict(input_fn=new_input_fn)

In [24]:
# Print results
print("Predictions on memory data")
for idx, prediction in enumerate(predict_results):
    type = prediction["class_ids"][0] # Get the predicted class (index)
    if type == 0:
        print("I think: {}, is Iris Sentosa".format(prediction_input[idx]))
    elif type == 1:
        print("I think: {}, is Iris Versicolor".format(prediction_input[idx]))
    else:
        print("I think: {}, is Iris Virginica".format(prediction_input[idx]))

Predictions on memory data
Instructions for updating:
Use `tf.data.Dataset.from_tensor_slices()`.
INFO:tensorflow:Restoring parameters from ./model.ckpt-1008
I think: [5.9, 3.0, 4.2, 1.5], is Iris Versicolor
I think: [6.9, 3.1, 5.4, 2.1], is Iris Virginica
I think: [5.1, 3.3, 1.7, 0.5], is Iris Sentosa


In [50]:
# Now we can go use TensorBoard
# in the command line
# tensorboard --logdir=PATH
# where PATH is DNNClassPATH in this case
# make sure to be in virtual-env
DNNClassPATH

'./DNNClass_iris'

### Other estimators; LinearClassifier, DNNLinearCombinedClassifier

#### `tf.estimator.LinearClassifer`  

Class `LinearClassifier`  

`__init__`  

```  
__init__(
    feature_columns, 
    model_dir=None, 
    n_classes=2, 
    weight_column=None,
    label_vocabulary=None,
    optimizer='Ftrl', 
    config=None,
    partitioner=None
)  
```  



In [27]:
feature_columns

[_NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

In [43]:
LinClassPATH = PATH + os.sep + 'LinClass_iris'
if not os.path.exists(LinClassPATH):
    os.makedirs(LinClassPATH)

In [44]:
Linclassifier = tf.estimator.LinearClassifier(
    feature_columns=feature_columns, # The input features to our model
    n_classes=3,
    model_dir=LinClassPATH) # Path to where checkpoints etc. are stored  

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd3a44ecbd0>, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': './LinClass_iris', '_save_summary_steps': 100}


If you run into trouble here at this point, you'll have to manually go and delete the graphs, the checkpoints, and start over, or change `model_dir`, accordingly.  

In [45]:
Linclassifier.train(
    input_fn=lambda: from_csv_to_ds_input_fn(FILE_TRAIN,
                                                128, 
                                                feature_names,
                                                 8, 
                                             True))

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ./LinClass_iris/model.ckpt.
INFO:tensorflow:loss = 131.833, step = 1
INFO:tensorflow:Saving checkpoints for 8 into ./LinClass_iris/model.ckpt.
INFO:tensorflow:Loss for final step: 80.3571.


<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x7fd3a44ece90>

In [47]:
evaluate_result = Linclassifier.evaluate(
    input_fn=lambda: from_csv_to_ds_input_fn(FILE_TEST,
                                                128, 
                                                feature_names,
                                                 4, 
                                             False))
print("Evaluation results")
for key in evaluate_result:
    print("   {}, was: {}".format(key, evaluate_result[key]))

INFO:tensorflow:Starting evaluation at 2017-11-04-05:53:06
INFO:tensorflow:Restoring parameters from ./LinClass_iris/model.ckpt-8
INFO:tensorflow:Finished evaluation at 2017-11-04-05:53:07
INFO:tensorflow:Saving dict for global step 8: accuracy = 0.733333, average_loss = 0.67093, global_step = 8, loss = 20.1279
Evaluation results
   average_loss, was: 0.670929729939
   accuracy, was: 0.733333349228
   global_step, was: 8
   loss, was: 20.1278915405


If we'd like to train further, 

In [48]:
Linclassifier.train(
    input_fn=lambda: from_csv_to_ds_input_fn(FILE_TRAIN,
                                                128, 
                                                feature_names,
                                                 1000, 
                                             True))

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ./LinClass_iris/model.ckpt-8
INFO:tensorflow:Saving checkpoints for 9 into ./LinClass_iris/model.ckpt.
INFO:tensorflow:loss = 76.4313, step = 9
INFO:tensorflow:global_step/sec: 519.646
INFO:tensorflow:loss = 36.0779, step = 109 (0.193 sec)
INFO:tensorflow:global_step/sec: 761.858
INFO:tensorflow:loss = 27.3667, step = 209 (0.131 sec)
INFO:tensorflow:global_step/sec: 826.194
INFO:tensorflow:loss = 22.7559, step = 309 (0.121 sec)
INFO:tensorflow:global_step/sec: 844.759
INFO:tensorflow:loss = 19.8666, step = 409 (0.118 sec)
INFO:tensorflow:global_step/sec: 800.698
INFO:tensorflow:loss = 17.8713, step = 509 (0.125 sec)
INFO:tensorflow:global_step/sec: 701.843
INFO:tensorflow:loss = 16.4017, step = 609 (0.146 sec)
INFO:tensorflow:global_step/sec: 558.791
INFO:tensorflow:loss = 15.2685, step = 709 (0.176 sec)
INFO:tensorflow:global_step/sec: 610.422
INFO:tensorflow:loss = 14.3644, step = 809 (0.164 sec)
IN

<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x7fd3a44ece90>

In [49]:
evaluate_result = Linclassifier.evaluate(
    input_fn=lambda: from_csv_to_ds_input_fn(FILE_TEST,
                                                128, 
                                                feature_names,
                                                 4, 
                                             False))
print("Evaluation results")
for key in evaluate_result:
    print("   {}, was: {}".format(key, evaluate_result[key]))

INFO:tensorflow:Starting evaluation at 2017-11-04-05:54:29
INFO:tensorflow:Restoring parameters from ./LinClass_iris/model.ckpt-1008
INFO:tensorflow:Finished evaluation at 2017-11-04-05:54:30
INFO:tensorflow:Saving dict for global step 1008: accuracy = 0.966667, average_loss = 0.120493, global_step = 1008, loss = 3.6148
Evaluation results
   average_loss, was: 0.120493300259
   accuracy, was: 0.966666638851
   global_step, was: 1008
   loss, was: 3.61479902267


In [51]:
# Now we can go use TensorBoard
# in the command line
# tensorboard --logdir=PATH
# where PATH is DNNClassPATH in this case
# make sure to be in virtual-env
LinClassPATH

'./LinClass_iris'

#### `tf.estimator.DNNLinearCombinedClassifer`  

Class `DNNLinearCombinedClassifier`  

`__init__`  

```  
__init__(
    model_dir=None, 
    linear_feature_columns=None,
    linear_optimizer='Ftrl'
    dnn_feature_columns=None,
    dnn_optimizer='Adagrad',
    dnn_hidden_units=None,
    dnn_activation_fn=tf.nn.relu,
    dnn_dropout=None,
    n_classes=3, 
    weight_column=None,
    label_vocabulary=None,
    input_layer_partitioner=None,
    config=None,
)  
```  

In [53]:
DNNLinClassPATH = PATH + os.sep + 'DNNLinClass_iris'
if not os.path.exists(DNNLinClassPATH):
    os.makedirs(DNNLinClassPATH)

In [61]:
DNNLinclassifier = tf.estimator.DNNLinearCombinedClassifier(
    dnn_feature_columns=feature_columns, # The input features to our model
    dnn_hidden_units=[8,16,8,4],
    n_classes=3,
    model_dir=DNNLinClassPATH) # Path to where checkpoints etc. are stored  

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd3a450e2d0>, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': './DNNLinClass_iris', '_save_summary_steps': 100}


In [63]:
DNNLinclassifier.train(
    input_fn=lambda: from_csv_to_ds_input_fn(FILE_TRAIN,
                                                128, 
                                                feature_names,
                                                 10000, 
                                             True))

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ./DNNLinClass_iris/model.ckpt-1000
INFO:tensorflow:Saving checkpoints for 1001 into ./DNNLinClass_iris/model.ckpt.
INFO:tensorflow:loss = 122.452, step = 1001
INFO:tensorflow:global_step/sec: 328.806
INFO:tensorflow:loss = 120.885, step = 1101 (0.306 sec)
INFO:tensorflow:global_step/sec: 352.901
INFO:tensorflow:loss = 119.035, step = 1201 (0.289 sec)
INFO:tensorflow:global_step/sec: 358.743
INFO:tensorflow:loss = 117.388, step = 1301 (0.273 sec)
INFO:tensorflow:global_step/sec: 351.008
INFO:tensorflow:loss = 115.876, step = 1401 (0.288 sec)
INFO:tensorflow:global_step/sec: 305.584
INFO:tensorflow:loss = 114.455, step = 1501 (0.324 sec)
INFO:tensorflow:global_step/sec: 306.192
INFO:tensorflow:loss = 113.115, step = 1601 (0.327 sec)
INFO:tensorflow:global_step/sec: 362.636
INFO:tensorflow:loss = 111.813, step = 1701 (0.276 sec)
INFO:tensorflow:global_step/sec: 376.261
INFO:tensorflow:loss = 110.536, ste

<tensorflow.python.estimator.canned.dnn_linear_combined.DNNLinearCombinedClassifier at 0x7fd3edc7f090>

In [64]:
evaluate_result = DNNLinclassifier.evaluate(
    input_fn=lambda: from_csv_to_ds_input_fn(FILE_TEST,
                                                128, 
                                                feature_names,
                                                 4, 
                                             False))
print("Evaluation results")
for key in evaluate_result:
    print("   {}, was: {}".format(key, evaluate_result[key]))

INFO:tensorflow:Starting evaluation at 2017-11-04-07:36:24
INFO:tensorflow:Restoring parameters from ./DNNLinClass_iris/model.ckpt-11000
INFO:tensorflow:Finished evaluation at 2017-11-04-07:36:24
INFO:tensorflow:Saving dict for global step 11000: accuracy = 0.533333, average_loss = 0.577451, global_step = 11000, loss = 17.3235
Evaluation results
   average_loss, was: 0.577450931072
   accuracy, was: 0.533333361149
   global_step, was: 11000
   loss, was: 17.3235282898
