## Classifier using tf.estimator.DNNClassifier
https://towardsai.net/p/machine-learning/tf-estimator-a-tensorflow-high-level-api


In [1]:
import tensorflow as tf
import pandas as pd
import tempfile

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
tf.get_logger().setLevel('ERROR')

## Get Training Data

In [4]:
# Preprocessing the data
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

train_path = tf.keras.utils.get_file("iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file("iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")

df_train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
df_test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)

In [5]:
df_train

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0
...,...,...,...,...,...
115,5.5,2.6,4.4,1.2,1
116,5.7,3.0,4.2,1.2,1
117,4.4,2.9,1.4,0.2,0
118,4.8,3.0,1.4,0.1,0


In [6]:
df_test

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,5.9,3.0,4.2,1.5,1
1,6.9,3.1,5.4,2.1,2
2,5.1,3.3,1.7,0.5,0
3,6.0,3.4,4.5,1.6,1
4,5.5,2.5,4.0,1.3,1
5,6.2,2.9,4.3,1.3,1
6,5.5,4.2,1.4,0.2,0
7,6.3,2.8,5.1,1.5,2
8,5.6,3.0,4.1,1.3,1
9,6.7,2.5,5.8,1.8,2


In [7]:
# Get th features and predictor variables
y_train = df_train['Species']
X_train = df_train.drop('Species', axis=1)

y_test = df_test['Species']
X_test = df_test.drop('Species', axis=1)

In [8]:
X_train

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
0,6.4,2.8,5.6,2.2
1,5.0,2.3,3.3,1.0
2,4.9,2.5,4.5,1.7
3,4.9,3.1,1.5,0.1
4,5.7,3.8,1.7,0.3
...,...,...,...,...
115,5.5,2.6,4.4,1.2
116,5.7,3.0,4.2,1.2
117,4.4,2.9,1.4,0.2
118,4.8,3.0,1.4,0.1


## Step 1: Create an input function
An input function is a function that returns a tf.data.Dataset object which outputs the following two-element tuple:

features — A Python dictionary in which:
(a)Each key is the name of a feature.
(b)Each value is an array containing all of that feature’s values.

label — An array containing the values of the label for every example.
We’re using pandas for building input pipeline

In [9]:
def input_fn(df_features, df_labels, batch_size=256, training_mode=True):
    # Convert the inputs Dataframes to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(df_features), df_labels))
    # Shuffle and repeat if you are in training mode.
    if training_mode:
        dataset = dataset.shuffle(1000).repeat()

    return dataset.batch(batch_size)

## Step 2: Define the model’s feature columns
A feature column is an object describing how the model should use raw input data from the features dictionary. When you build an Estimator model, we pass it a list of feature columns that describe each of the features you want the model to use. The tf.feature_column module provides many options for representing data to the model.

For Iris, the 4 raw features are numeric values, so we’ll build a list of feature columns to tell the Estimator model to represent each of the four features as 32-bit floating-point values. Therefore, the code to create the feature column is:

In [10]:
# Feature columns describe how to use the input.
feature_columns = []
for key in X_train.keys():
    feature_columns.append(tf.feature_column.numeric_column(key=key))

## Step 3: Instantiate the Estimator
The Iris problem is a classic classification problem. Fortunately, TensorFlow provides several pre-made classifier Estimators, including:

a. tf.estimator.DNNClassifier for deep models that perform multi-class classification.
b. tf.estimator.DNNLinearCombinedClassifier for wide & deep models.
c. tf.estimator.LinearClassifier for classifiers based on linear models.

For the Iris problem, tf.estimator.DNNClassifier seems like the best choice. Here’s how we instantiated this Estimator:

In [40]:
# Build a DNN with 3 hidden layers with 30 nodes each.
classifier_dir = tempfile.mkdtemp()
classifier = tf.estimator.DNNLinearCombinedClassifier(
    model_dir=classifier_dir,
    dnn_feature_columns=feature_columns,
    dnn_activation_fn= tf.nn.relu,
    # ('Adagrad', 'Adam', 'Ftrl', 'RMSProp', 'SGD')
    dnn_optimizer='Adagrad',
    # Three hidden layers of 30 nodes each.
    dnn_hidden_units=[30, 30, 30],
    # The model must choose between 3 classes.
    n_classes=3)


## Step 4: Train, Evaluate, and Predict

In [38]:
# Train the Model.
classifier.train(
    input_fn=lambda: input_fn(X_train, y_train, training_mode=True),
    steps=5000)

<tensorflow_estimator.python.estimator.canned.dnn_linear_combined.DNNLinearCombinedClassifierV2 at 0x173344ac0>

In [39]:
# Evaluates the accuracy of the trained model on the test data
eval_result = classifier.evaluate(input_fn=lambda: input_fn(X_test, y_test, training_mode=False))
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
for key, value in eval_result.items():
    print(key, ":", value)


Test set accuracy: 0.967

accuracy : 0.96666664
average_loss : 0.13513869
loss : 0.13513869
global_step : 5000


In [14]:
# Generate predictions from the model
expected = ['Setosa', 'Versicolor', 'Virginica']

X_predict = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}

def prediction_input_fn(features, batch_size=256):
    # Convert the inputs to a Dataset without labels.
    dataset = tf.data.Dataset.from_tensor_slices(dict(features))

    return dataset.batch(batch_size)

predictions = classifier.predict(
        input_fn=lambda: prediction_input_fn(X_predict))

In [15]:
# Get predictions and their probabilities
for pred_dict, expec in zip(predictions, expected):
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]
    print('Prediction is "{}" ({:.1f}%), expected "{}"'.format(SPECIES[class_id], 100 * probability, expec))

Prediction is "Setosa" (90.9%), expected "Setosa"
Prediction is "Versicolor" (61.0%), expected "Versicolor"
Prediction is "Virginica" (67.1%), expected "Virginica"
