## Classifier using tf.estimator.DNNClassifier
https://www.tensorflow.org/tutorials/estimator/linear
https://towardsai.net/p/machine-learning/tf-estimator-a-tensorflow-high-level-api


In [1]:
import tensorflow as tf
import pandas as pd
import tempfile

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
tf.get_logger().setLevel('ERROR')

## Get Training Data

In [4]:
train_path = tf.keras.utils.get_file("train.csv", "https://storage.googleapis.com/tf-datasets/titanic/train.csv")
test_path = tf.keras.utils.get_file("test.csv", "https://storage.googleapis.com/tf-datasets/titanic/eval.csv")

df_train = pd.read_csv(train_path)
df_test = pd.read_csv(test_path)

In [5]:
df_train

Unnamed: 0,survived,sex,age,n_siblings_spouses,parch,fare,class,deck,embark_town,alone
0,0,male,22.0,1,0,7.2500,Third,unknown,Southampton,n
1,1,female,38.0,1,0,71.2833,First,C,Cherbourg,n
2,1,female,26.0,0,0,7.9250,Third,unknown,Southampton,y
3,1,female,35.0,1,0,53.1000,First,C,Southampton,n
4,0,male,28.0,0,0,8.4583,Third,unknown,Queenstown,y
...,...,...,...,...,...,...,...,...,...,...
622,0,male,28.0,0,0,10.5000,Second,unknown,Southampton,y
623,0,male,25.0,0,0,7.0500,Third,unknown,Southampton,y
624,1,female,19.0,0,0,30.0000,First,B,Southampton,y
625,0,female,28.0,1,2,23.4500,Third,unknown,Southampton,n


In [6]:
df_test

Unnamed: 0,survived,sex,age,n_siblings_spouses,parch,fare,class,deck,embark_town,alone
0,0,male,35.0,0,0,8.0500,Third,unknown,Southampton,y
1,0,male,54.0,0,0,51.8625,First,E,Southampton,y
2,1,female,58.0,0,0,26.5500,First,C,Southampton,y
3,1,female,55.0,0,0,16.0000,Second,unknown,Southampton,y
4,1,male,34.0,0,0,13.0000,Second,D,Southampton,y
...,...,...,...,...,...,...,...,...,...,...
259,1,female,25.0,0,1,26.0000,Second,unknown,Southampton,n
260,0,male,33.0,0,0,7.8958,Third,unknown,Southampton,y
261,0,female,39.0,0,5,29.1250,Third,unknown,Queenstown,n
262,0,male,27.0,0,0,13.0000,Second,unknown,Southampton,y


In [7]:
# Get th features and predictor variables
y_train = df_train['survived']
X_train = df_train.drop('survived', axis=1)

y_test = df_test['survived']
X_test = df_test.drop('survived', axis=1)

In [8]:
X_train

Unnamed: 0,sex,age,n_siblings_spouses,parch,fare,class,deck,embark_town,alone
0,male,22.0,1,0,7.2500,Third,unknown,Southampton,n
1,female,38.0,1,0,71.2833,First,C,Cherbourg,n
2,female,26.0,0,0,7.9250,Third,unknown,Southampton,y
3,female,35.0,1,0,53.1000,First,C,Southampton,n
4,male,28.0,0,0,8.4583,Third,unknown,Queenstown,y
...,...,...,...,...,...,...,...,...,...
622,male,28.0,0,0,10.5000,Second,unknown,Southampton,y
623,male,25.0,0,0,7.0500,Third,unknown,Southampton,y
624,female,19.0,0,0,30.0000,First,B,Southampton,y
625,female,28.0,1,2,23.4500,Third,unknown,Southampton,n


## Step 1: Create an input function
An input function is a function that returns a tf.data.Dataset object which outputs the following two-element tuple:

features — A Python dictionary in which:
(a)Each key is the name of a feature.
(b)Each value is an array containing all of that feature’s values.

label — An array containing the values of the label for every example.
We’re using pandas for building input pipeline

In [9]:
def input_fn(df_features, df_labels, batch_size=256, training_mode=True):
    # Convert the inputs Dataframes to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(df_features), df_labels))
    # Shuffle and repeat if you are in training mode.
    if training_mode:
        dataset = dataset.shuffle(1000).repeat()

    return dataset.batch(batch_size)

## Step 2: Define the model’s feature columns
A feature column is an object describing how the model should use raw input data from the features dictionary. When you build an Estimator model, we pass it a list of feature columns that describe each of the features you want the model to use. The tf.feature_column module provides many options for representing data to the model.

For Iris, the 4 raw features are numeric values, so we’ll build a list of feature columns to tell the Estimator model to represent each of the four features as 32-bit floating-point values. Therefore, the code to create the feature column is:

In [10]:
# Feature columns describe how to use the input.
CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck','embark_town', 'alone']
NUMERIC_COLUMNS = ['age', 'fare']

feature_columns = []
for feature_name in CATEGORICAL_COLUMNS:
    vocabulary = X_train[feature_name].unique()
    categorical_column = tf.feature_column.categorical_column_with_vocabulary_list(key=feature_name, vocabulary_list=vocabulary)
    # Map categorical column to numeric values - one-hot encoding/vector
    feature_columns.append(tf.feature_column.indicator_column(categorical_column))

for feature_name in NUMERIC_COLUMNS:
    feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))

## Step 3: Instantiate the Estimator
The Iris problem is a classic classification problem. Fortunately, TensorFlow provides several pre-made classifier Estimators, including:

a. tf.estimator.DNNClassifier for deep models that perform multi-class classification.
b. tf.estimator.DNNLinearCombinedClassifier for wide & deep models.
c. tf.estimator.LinearClassifier for classifiers based on linear models.

For the Iris problem, tf.estimator.DNNClassifier seems like the best choice. Here’s how we instantiated this Estimator:

In [11]:
# Build a DNN with 3 hidden layers with 30 nodes each.
classifier_dir = tempfile.mkdtemp()
classifier = tf.estimator.DNNClassifier(
    model_dir=classifier_dir,
    feature_columns=feature_columns,
    optimizer='Adagrad', # ('Adagrad', 'Adam', 'Ftrl', 'RMSProp', SGD')
    activation_fn=tf.nn.relu,
    loss_reduction=tf.losses.Reduction.SUM_OVER_BATCH_SIZE,
    # Three hidden layers of 30 nodes each.
    hidden_units=[30, 30, 30],
    # The model must choose between 3 classes.
    n_classes=3)

## Step 4: Train and  Evaluate

In [12]:
# Train the Model.
classifier.train(
    input_fn=lambda: input_fn(X_train, y_train, training_mode=True),
    steps=5000)

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x171fdab30>

In [13]:
# Evaluates the accuracy of the trained model on the test data
eval_result = classifier.evaluate(input_fn=lambda: input_fn(X_test, y_test, training_mode=False))
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
for key, value in eval_result.items():
    print(key, ":", value)


Test set accuracy: 0.754

accuracy : 0.7537879
average_loss : 0.4937388
loss : 0.45646796
global_step : 5000


## Step 5: Prediction

In [14]:
# Define Prediction input data function
def prediction_input_fn(features, batch_size=256):
    # Convert the inputs to a Dataset without labels.
    dataset = tf.data.Dataset.from_tensor_slices(dict(features))

    return dataset.batch(batch_size)

In [15]:
X_predict= X_test.sample(10)
# Generate predictions from the model
predictions = classifier.predict(input_fn=lambda: prediction_input_fn(X_predict))

In [16]:
# Get predictions and their probabilities
for pred_dict in predictions:
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]
    print('Prediction is "{}" ({:.1f}%)"'.format(class_id, 100 * probability))

Prediction is "0" (87.5%)"
Prediction is "0" (77.9%)"
Prediction is "1" (64.3%)"
Prediction is "0" (82.4%)"
Prediction is "1" (76.7%)"
Prediction is "0" (58.9%)"
Prediction is "0" (50.9%)"
Prediction is "0" (55.6%)"
Prediction is "0" (83.5%)"
Prediction is "0" (84.1%)"
