## Getting started with Graph Execution
link: https://www.tensorflow.org/get_started/get_started_for_beginners

In [1]:
import tensorflow as tf
import pandas as pd

  from ._conv import register_converters as _register_converters


## Create the train/test feature and labels

In [2]:
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']

In [3]:
def load_data(label_name = 'Species'):
    """Parses the csv file in TRAIN_URL and TEST_URL."""

    # Create a local copy of the training set.
    train_path = tf.keras.utils.get_file(fname = TRAIN_URL.split('/')[-1], origin = TRAIN_URL)
    print(train_path)
    # train_path now holds the pathname: ~/.keras/datasets/iris_training.csv

    # Parse the local CSV file.
    train = pd.read_csv(filepath_or_buffer = train_path, names = CSV_COLUMN_NAMES,  # list of column names
                        header = 0  # ignore the first row of the CSV file.
                       )

    # train now holds a pandas DataFrame, which is data structure
    # analogous to a table.

    # 1. Assign the DataFrame's labels (the right-most column) to train_label.
    # 2. Delete (pop) the labels from the DataFrame.
    # 3. Assign the remainder of the DataFrame to train_features
    train_features, train_label = train, train.pop(label_name)
    
    # Apply the preceding logic to the test set.
    test_path = tf.keras.utils.get_file(TEST_URL.split('/')[-1], TEST_URL)
    test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)
    test_features, test_label = test, test.pop(label_name)

    # Return four DataFrames.
    return (train_features, train_label), (test_features, test_label)

In [4]:
# Call load_data() to parse the CSV file.
(train_feature, train_label), (test_feature, test_label) = load_data()

/Users/bgolkar/.keras/datasets/iris_training.csv


In [83]:
type(train_feature)

pandas.core.frame.DataFrame

In [141]:
train_feature.shape

(120, 4)

In [135]:
train_feature[0:4]

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
0,6.4,2.8,5.6,2.2
1,5.0,2.3,3.3,1.0
2,4.9,2.5,4.5,1.7
3,4.9,3.1,1.5,0.1


In [122]:
a = dict(train_feature)

In [134]:
b = a['SepalLength']
b[0:3]

0    6.4
1    5.0
2    4.9
Name: SepalLength, dtype: float64

In [133]:
b = a['SepalWidth']
b[0:3]

0    2.8
1    2.3
2    2.5
Name: SepalWidth, dtype: float64

In [85]:
type(train_label)

pandas.core.series.Series

In [86]:
print(train_label[0:4])

0    2
1    1
2    2
3    0
Name: Species, dtype: int64


In [7]:
train_feature.keys()

Index(['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth'], dtype='object')

In [8]:
train_label.keys()

RangeIndex(start=0, stop=120, step=1)

## Define a tensorflow estimator class
A feature column is a data structure that tells your model how to interpret the data in each feature. In the Iris problem, we want the model to interpret the data in each feature as its literal floating-point value; that is, we want the model to interpret an input value like 5.4 as, well, 5.4. However, in other machine learning problems, it is often desirable to interpret data less literally.

To define a tensorflow estimator class, feature_columns need to be created according to the dataset. From a code perspective, you build a list of feature_column objects by calling functions from the tf.feature_column module. Each object describes an input to the model. To tell the model to interpret data as a floating-point value, call tf.feature_column.numeric_column.

For more on feature columns visit: https://www.tensorflow.org/get_started/feature_columns

Here's a snapshot:

tf.numeric_column provides optional arguments, calling tf.numeric_column without any arguments, as follows, is a fine way to specify a numerical value with the default data type (tf.float32) as input to your model. Otherwise one could define a feature column as:

***numeric_feature_column = tf.feature_column.numeric_column(key = "SepalLength", dtype = tf.float64)***

By default, a numeric column creates a single value (scalar). Use the shape argument to specify another shape. For example:

Represent a 10-element vector in which each cell contains a tf.float32.

***vector_feature_column = tf.feature_column.numeric_column(key = "Bowling", shape = 10)***

Represent a 10x5 matrix in which each cell contains a tf.float32.

***matrix_feature_column = tf.feature_column.numeric_column(key = "MyMatrix", shape = [10,5])***


In [9]:
# Create feature columns for all features
my_feature_columns = []
for key in train_feature.keys():
    
    my_feature_columns.append(tf.feature_column.numeric_column(key = key))

# Above is equivalent to:
# my_feature_columns = [
#     tf.feature_column.numeric_column(key='SepalLength'),
#     tf.feature_column.numeric_column(key='SepalWidth'),
#     tf.feature_column.numeric_column(key='PetalLength'),
#     tf.feature_column.numeric_column(key='PetalWidth')
# ]

In [87]:
type(my_feature_columns)

list

In [137]:
len(my_feature_columns)

4

In [95]:
my_feature_columns

[_NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

tf.estimator.DNNClassifier is a pre-made estimator:

The n_classes parameter specifies the number of possible values that the neural network can predict. Since the Iris problem classifies 3 Iris species, we set n_classes to 3.

The constructor for tf.Estimator.DNNClassifier takes an optional argument named optimizer, which our sample code chose not to specify. The optimizer controls how the model will train. As you develop more expertise in machine learning, optimizers and learning rate will become very important.

Once the estimator class is defined, the next step is to construct the feature and label vectors to train the estimator.

In [11]:
classifier = tf.estimator.DNNClassifier(feature_columns = my_feature_columns, hidden_units = [10, 10], n_classes = 3)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/dt/hdvc539j4zv3fjr1sw5dbqm00000gq/T/tmpk8qyn_jl', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x18218a8630>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [98]:
type(classifier)

tensorflow.python.estimator.canned.dnn.DNNClassifier

In [99]:
classifier

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x18218a8320>

tf.data.Dataset.from_tensor_slices((dict(features), labels)) creates the dataset for training/evaluation from the feature and label classes. Note that the features must be put in a dictionary.

In [121]:
def train_input_fn(features, labels, batch_size):
    """An input function for training"""
    # Convert the inputs to a Dataset.
    # The dataset API (tf.data.Dataset) is a high-level TensorFlow API for reading data and transforming it into a form that the train method requires.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
    
    # Shuffle, repeat, and batch the examples.
    # dataset.shuffle(x): Training works best if the training examples are in random order. To randomize the examples, 
    # call tf.data.Dataset.shuffle. Setting the buffer_size to a value larger than the number of 
    # examples (120) ensures that the data will be well shuffled.
    # dataset.repeat(): During training, the train method typically processes the examples multiple times. 
    # Calling the tf.data.Dataset.repeat method without any arguments ensures that the 
    # train method has an infinite supply of (now shuffled) training set examples.
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)
    
    # Return the dataset.
    return dataset.make_one_shot_iterator().get_next()

In [21]:
a = train_input_fn(train_feature, train_label, 100)

<class 'tensorflow.python.data.ops.dataset_ops.TensorSliceDataset'>
<TensorSliceDataset shapes: ({SepalLength: (), SepalWidth: (), PetalLength: (), PetalWidth: ()}, ()), types: ({SepalLength: tf.float64, SepalWidth: tf.float64, PetalLength: tf.float64, PetalWidth: tf.float64}, tf.int64)>
<class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'>
<BatchDataset shapes: ({SepalLength: (?,), SepalWidth: (?,), PetalLength: (?,), PetalWidth: (?,)}, (?,)), types: ({SepalLength: tf.float64, SepalWidth: tf.float64, PetalLength: tf.float64, PetalWidth: tf.float64}, tf.int64)>


In [100]:
type(a)

tuple

In [108]:
type(a[0])

dict

In [109]:
type(a[1])

tensorflow.python.framework.ops.Tensor

In [111]:
a

({'PetalLength': <tf.Tensor 'IteratorGetNext_4:0' shape=(?,) dtype=float64>,
  'PetalWidth': <tf.Tensor 'IteratorGetNext_4:1' shape=(?,) dtype=float64>,
  'SepalLength': <tf.Tensor 'IteratorGetNext_4:2' shape=(?,) dtype=float64>,
  'SepalWidth': <tf.Tensor 'IteratorGetNext_4:3' shape=(?,) dtype=float64>},
 <tf.Tensor 'IteratorGetNext_4:4' shape=(?,) dtype=int64>)

## One shot-iterator example
A one-shot iterator is the simplest form of iterator, which only supports iterating once through a dataset, with no need for explicit initialization.

In [113]:
dataset = tf.data.Dataset.range(100)

In [115]:
type(dataset)

tensorflow.python.data.ops.dataset_ops.RangeDataset

In [118]:
iterator = dataset.shuffle(1000).repeat().batch(10).make_one_shot_iterator()

In [119]:
type(iterator)

tensorflow.python.data.ops.iterator_ops.Iterator

In [120]:
# example to understand iterator
sess = tf.Session()
next_element = iterator.get_next()

for i in range(20):
    value = sess.run(next_element)
    # value type: >class 'numpy.ndarray'>
    
    print(value)

[39 12 80  2 26  4 92  1 56 65]
[95 61 17 91 90 63 44 99 30 69]
[70 68 53 64 23 21 97 74 88 15]
[72  7 50 52 98 86 59 87  5 29]
[48 94 79 25 43 13 34 78 35 22]
[60 58 28 76 16 10 96 82 42 71]
[41 85 33 54 51 67 40  0 32 84]
[62 14 31  6 38 27 57 20 36 89]
[55 19 49 11  8 77  3 46 18 24]
[47 37 75 83 45 81 93 73 66  9]
[63  5 67 68 13 36 16 58 19 76]
[94 31 23 18 49 44  9 56 29 25]
[70 84 43 26 22 97  3 73 21 42]
[64 15 90  6 92 87 14  2 99 53]
[95 47 77 12 46 78 59 50 41  7]
[65 62 82 71 83 69 60 27 98 81]
[ 1 74 10 40  4 45 96 75 33 88]
[51 48 85 28 54 17 37 30 39 20]
[34 32 93  0  8 11 55 57 66 52]
[91 72 35 61 79 38 80 89 86 24]


## Train the estimator

Lambda wraps arbitrary expression as a Layer object.

In the tensorflow estimator class the first argument input_fn is a function that provides input data for training as minibatches. The function should construct and return one of the following:

A 'tf.data.Dataset' object: Outputs of Dataset object must be a tuple (features, labels) with same constraints as below.

A tuple (features, labels): Where features is a Tensor or a dictionary of string feature name to Tensor and labels is a Tensor or a dictionary of string label name to Tensor. Both features and labels are consumed by model_fn. They should satisfy  the expectation of model_fn from inputs.

Please note that input_fn here requires a function with no arguments. As train_input_fn has been defined as train_input_fn(train_feature, train_label, 100), lambda is used to wrap this as a layer object with no arguments. Alternatively, train_input_fn could have been defined with no arguments (set required defaults in the definition and use train_input_fn with no arguments equal to input_fn as:

def train_input_fn(features = train_feature, labels = train_label, batch_size = 100):

...

classifier.train(input_fn = train_input_fn, steps = 1000)

...

In other words, here we wrap up our input_fn call in a lambda to capture the arguments while providing an input function that takes no arguments, as expected by the Estimator.

In [28]:
classifier.train(input_fn = lambda:train_input_fn(train_feature, train_label, 100), steps = 1000)

<class 'tensorflow.python.data.ops.dataset_ops.TensorSliceDataset'>
<TensorSliceDataset shapes: ({SepalLength: (), SepalWidth: (), PetalLength: (), PetalWidth: ()}, ()), types: ({SepalLength: tf.float64, SepalWidth: tf.float64, PetalLength: tf.float64, PetalWidth: tf.float64}, tf.int64)>
<class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'>
<BatchDataset shapes: ({SepalLength: (?,), SepalWidth: (?,), PetalLength: (?,), PetalWidth: (?,)}, (?,)), types: ({SepalLength: tf.float64, SepalWidth: tf.float64, PetalLength: tf.float64, PetalWidth: tf.float64}, tf.int64)>
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/dt/hdvc539j4zv3fjr1sw5dbqm00000gq/T/tmpk8qyn_jl/model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 4001 into /var/folders/dt/hdv

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x18218a8320>

## Evluate the estimator

In [30]:
def eval_input_fn(features, labels = None, batch_size = None):
    """An input function for evaluation or prediction"""
    if labels is None:
        # No labels, use only features.
        inputs = dict(features)
    else:
        inputs = (dict(features), labels)

    # Convert inputs to a tf.dataset object.
    dataset = tf.data.Dataset.from_tensor_slices(inputs)
  
    # Batch the examples
    assert batch_size is not None, "batch_size must not be None"
    dataset = dataset.batch(batch_size)

    # Return the read end of the pipeline.
    return dataset.make_one_shot_iterator().get_next()

Recall that classifier.train( input_fn = ..., steps = ...), while classifier.evaluat( input_fn = ...). Unlike our call to the train method, we did not pass the steps argument to evaluate. Our eval_input_fn only yields a single epoch of data.

In [32]:
# Evaluate the model.
eval_result = classifier.evaluate(input_fn = lambda:eval_input_fn(test_feature, test_label, 100))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-07-09-21:38:56
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/dt/hdvc539j4zv3fjr1sw5dbqm00000gq/T/tmpk8qyn_jl/model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-07-09-21:38:56
INFO:tensorflow:Saving dict for global step 5000: accuracy = 0.96666664, average_loss = 0.047930546, global_step = 5000, loss = 1.4379164

Test set accuracy: 0.967



## Predict with the estimator

In [80]:
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}

In [81]:
type(predict_x)

dict

In [82]:
type(expected)

list

Recall:

classifier.train( input_fn = ..., steps = ...)

classifier.evaluate( input_fn = ...)

classifier.predict( input_fn = ..., batch_size = ...)

In [73]:
predictions = classifier.predict(input_fn=lambda:eval_input_fn(predict_x,labels = None, batch_size=100))

The predict method returns a python iterable, yielding a dictionary of prediction results for each example. This dictionary contains several keys. The probabilities key holds a list of three floating-point values, each representing the probability that the input example is a particular Iris species.

The class_ids key holds a one-element array that identifies the most probable species.

The probabilities key holds a list of three floating-point values, each representing the probability that the input example is a particular Iris species.

In [76]:
type(predictions)

generator

In [70]:
for pred_dict, exp in zip(predictions, expected):
    # since predictions is a python iterable, pre_dict is iterating over the iterations
    print(pred_dict, exp)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/dt/hdvc539j4zv3fjr1sw5dbqm00000gq/T/tmpk8qyn_jl/model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
{'logits': array([ 18.27759 ,  10.631209, -23.601736], dtype=float32), 'probabilities': array([9.9952245e-01, 4.7754217e-04, 6.4838418e-19], dtype=float32), 'class_ids': array([0]), 'classes': array([b'0'], dtype=object)} Setosa
{'logits': array([-6.087348 ,  3.3560655, -5.6560473], dtype=float32), 'probabilities': array([7.91936181e-05, 9.99798954e-01, 1.21899495e-04], dtype=float32), 'class_ids': array([1]), 'classes': array([b'1'], dtype=object)} Versicolor
{'logits': array([-12.167952 ,  -2.9199467,   2.3405356], dtype=float32), 'probabilities': array([4.9750133e-07, 5.1659709e-03, 9.9483347e-01], dtype=float32), 'class_ids': array([2]), 'classes': array([b'2'], dtype=object)

In [20]:
template = ('\nPrediction is "{}" ({:.1f}%), expected "{}"')
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

for pred_dict, expec in zip(predictions, expected):
        class_id = pred_dict['class_ids'][0]
        probability = pred_dict['probabilities'][class_id]
        print(template.format(SPECIES[class_id], 100 * probability, expec))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/dt/hdvc539j4zv3fjr1sw5dbqm00000gq/T/tmp_q3rj0xr/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.

Prediction is "Setosa" (99.7%), expected "Setosa"

Prediction is "Versicolor" (99.7%), expected "Versicolor"

Prediction is "Virginica" (97.8%), expected "Virginica"
