### What-If Tool in colab and jupyter notebooks

This notebook shows use of the [What-If Tool](https://pair-code.github.io/what-if-tool) inside of a colab or jupyter notebook.

If running in colab, you can use this notebook out-of-the-box.

If running in jupyter, you must run the What-If Tool [widget installation instructions](https://github.com/tensorflow/tensorboard/tree/master/tensorboard/plugins/interactive_inference#how-do-i-use-it-in-a-jupyter-notebook) before using this notebook.

This notebook trains a linear classifier on the [UCI census problem](https://archive.ics.uci.edu/ml/datasets/census+income) (predicting whether a person earns more than $50K from their census information).

It then visualizes the results of the trained classifier on test data using the What-If Tool.


In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
import functools
import path

In [2]:
#@title Define helper functions {display-mode: "form"}
# Creates a tf feature spec from the dataframe and columns specified.
def create_feature_spec(df, columns=None):
    feature_spec = {}
    if columns == None:
        columns = df.columns.values.tolist()
    for f in columns:
        if df[f].dtype is np.dtype(np.int64):
            feature_spec[f] = tf.FixedLenFeature(shape=(), dtype=tf.int64)
        elif df[f].dtype is np.dtype(np.float64):
            feature_spec[f] = tf.FixedLenFeature(shape=(), dtype=tf.float32)
        else:
            feature_spec[f] = tf.FixedLenFeature(shape=(), dtype=tf.string)
    return feature_spec

# Creates simple numeric and categorical feature columns from a feature spec and a
# list of columns from that spec to use.
#
# NOTE: Models might perform better with some feature engineering such as bucketed
# numeric columns and hash-bucket/embedding columns for categorical features.
def create_feature_columns(columns, feature_spec):
    ret = []
    for col in columns:
        if feature_spec[col].dtype is tf.int64 or feature_spec[col].dtype is tf.float32:
            ret.append(tf.feature_column.numeric_column(col))
        else:
            ret.append(tf.feature_column.indicator_column(
                tf.feature_column.categorical_column_with_vocabulary_list(col, list(df[col].unique()))))
    return ret

# An input function for providing input to a model from tf.Examples
def tfexamples_input_fn(examples, feature_spec, label, mode=tf.estimator.ModeKeys.EVAL,
                       num_epochs=None, 
                       batch_size=64):
    def ex_generator():
        for i in range(len(examples)):
            yield examples[i].SerializeToString()
    dataset = tf.data.Dataset.from_generator(
      ex_generator, tf.dtypes.string, tf.TensorShape([]))
    if mode == tf.estimator.ModeKeys.TRAIN:
        dataset = dataset.shuffle(buffer_size=2 * batch_size + 1)
    dataset = dataset.batch(batch_size)
    dataset = dataset.map(lambda tf_example: parse_tf_example(tf_example, label, feature_spec))
    dataset = dataset.repeat(num_epochs)
    return dataset

# Parses Tf.Example protos into features for the input function.
def parse_tf_example(example_proto, label, feature_spec):
    parsed_features = tf.parse_example(serialized=example_proto, features=feature_spec)
    target = parsed_features.pop(label)
    return parsed_features, target

# Converts a dataframe into a list of tf.Example protos.
def df_to_examples(df, columns=None):
    examples = []
    if columns == None:
        columns = df.columns.values.tolist()
    for index, row in df.iterrows():
        example = tf.train.Example()
        for col in columns:
            if df[col].dtype is np.dtype(np.int64):
                example.features.feature[col].int64_list.value.append(int(row[col]))
            elif df[col].dtype is np.dtype(np.float64):
                example.features.feature[col].float_list.value.append(row[col])
            elif row[col] == row[col]:
                example.features.feature[col].bytes_list.value.append(row[col].encode('utf-8'))
        examples.append(example)
    return examples

# Converts a dataframe column into a column of 0's and 1's based on the provided test.
# Used to force label columns to be numeric for binary classification using a TF estimator.
def make_label_column_numeric(df, label_column, test):
  df[label_column] = np.where(test(df[label_column]), 1, 0)

In [3]:
#@title Read training dataset from CSV {display-mode: "form"}
DATA_DIR=path.Path("../data/")
ARTIFACT_DIR=path.Path("../artifacts/")
train=pd.read_csv(DATA_DIR+"train.csv")
test=pd.read_csv(DATA_DIR+"test.csv")
train.head()

Unnamed: 0,loan_id,source,financial_institution,interest_rate,unpaid_principal_bal,loan_term,origination_date,first_payment_date,loan_to_value,number_of_borrowers,...,m4,m5,m6,m7,m8,m9,m10,m11,m12,m13
0,268055008619,Z,"Turner, Baldwin and Rhodes",4.25,214000,360,2012-03-01,05/2012,95,1.0,...,0,0,0,1,0,0,0,0,0,1
1,672831657627,Y,"Swanson, Newton and Miller",4.875,144000,360,2012-01-01,03/2012,72,1.0,...,0,0,0,0,0,0,0,1,0,1
2,742515242108,Z,Thornton-Davis,3.25,366000,180,2012-01-01,03/2012,49,1.0,...,0,0,0,0,0,0,0,0,0,1
3,601385667462,X,OTHER,4.75,135000,360,2012-02-01,04/2012,46,2.0,...,0,0,0,0,0,1,1,1,1,1
4,273870029961,X,OTHER,4.75,124000,360,2012-02-01,04/2012,80,1.0,...,3,4,5,6,7,8,9,10,11,1


In [4]:
#@title Specify input columns and column to predict {display-mode: "form"}
import numpy as np

# Set the column in the dataset you wish for the model to predict
label_column = 'm13'

input_features=train.columns[:-1]
# Create a list containing all input features and the label column
features_and_labels = input_features + [label_column]

In [5]:
#@title Convert dataset to tf.Example protos {display-mode: "form"}

examples = df_to_examples(train)

In [None]:
tf.estimator.DNNEstimator()

In [6]:
#@title Create and train the classifier {display-mode: "form"}

num_steps = 5000  #@param {type: "number"}

# Create a feature spec for the classifier
feature_spec = create_feature_spec(train)
df=train
# Define and train the classifier
train_inpf = functools.partial(tfexamples_input_fn, examples, feature_spec, label_column)
classifier = tf.estimator.LinearClassifier(
    feature_columns=create_feature_columns(input_features, feature_spec))
classifier.train(train_inpf, steps=num_steps)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\gurunath.lv\\AppData\\Local\\Temp\\tmplauykei2', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x0000024E81EEB518>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Instructions for updating:
Colocations handled automatically by placer.
Instructions for updatin

INFO:tensorflow:global_step/sec: 28.681
INFO:tensorflow:loss = 0.0, step = 4801 (3.488 sec)
INFO:tensorflow:global_step/sec: 29.1306
INFO:tensorflow:loss = 0.0, step = 4901 (3.434 sec)
INFO:tensorflow:Saving checkpoints for 5000 into C:\Users\gurunath.lv\AppData\Local\Temp\tmplauykei2\model.ckpt.
INFO:tensorflow:Loss for final step: 0.0.


<tensorflow_estimator.python.estimator.canned.linear.LinearClassifier at 0x24e81948160>

In [7]:
#@title Invoke What-If Tool for test data and the trained model {display-mode: "form"}

num_datapoints = 20000  #@param {type: "number"}
tool_height_in_px = 1000  #@param {type: "number"}

from witwidget.notebook.visualization import WitConfigBuilder
from witwidget.notebook.visualization import WitWidget

# Load up the test dataset
test_df=test
#make_label_column_numeric(test_df, label_column, lambda val: val == '>50K.')
test_examples = df_to_examples(train[0:num_datapoints])

# Setup the tool with the test examples and the trained classifier
config_builder = WitConfigBuilder(test_examples).set_estimator_and_feature_spec(
    classifier, feature_spec).set_label_vocab(['0', '1'])
WitWidget(config_builder, height=tool_height_in_px)

WitWidget(config={'model_type': 'classification', 'label_vocab': ['0', '1'], 'are_sequence_examples': False, '…

INFO:tensorflow:Calling model_fn.


I0815 19:52:37.432342  7940 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Done calling model_fn.


I0815 19:52:39.783922  7940 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Graph was finalized.


I0815 19:52:40.238656  7940 monitored_session.py:222] Graph was finalized.


Instructions for updating:
Use standard file APIs to check for files with this prefix.


W0815 19:52:40.241649  7940 deprecation.py:323] From C:\Users\gurunath.lv\AppData\Local\Continuum\anaconda3\lib\site-packages\tensorflow\python\training\saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.


INFO:tensorflow:Restoring parameters from C:\Users\gurunath.lv\AppData\Local\Temp\tmplauykei2\model.ckpt-5000


I0815 19:52:40.247633  7940 saver.py:1270] Restoring parameters from C:\Users\gurunath.lv\AppData\Local\Temp\tmplauykei2\model.ckpt-5000


INFO:tensorflow:Running local_init_op.


I0815 19:52:40.461252  7940 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0815 19:52:40.540850  7940 session_manager.py:493] Done running local_init_op.


INFO:tensorflow:Calling model_fn.


I0815 19:54:54.757982  7940 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Done calling model_fn.


I0815 19:54:57.936515  7940 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Graph was finalized.


I0815 19:54:58.694456  7940 monitored_session.py:222] Graph was finalized.


INFO:tensorflow:Restoring parameters from C:\Users\gurunath.lv\AppData\Local\Temp\tmplauykei2\model.ckpt-5000


I0815 19:54:58.700437  7940 saver.py:1270] Restoring parameters from C:\Users\gurunath.lv\AppData\Local\Temp\tmplauykei2\model.ckpt-5000


INFO:tensorflow:Running local_init_op.


I0815 19:54:59.057486  7940 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0815 19:54:59.207086  7940 session_manager.py:493] Done running local_init_op.


INFO:tensorflow:Calling model_fn.


I0815 19:55:14.778455  7940 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Done calling model_fn.


I0815 19:55:17.451302  7940 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Graph was finalized.


I0815 19:55:17.819422  7940 monitored_session.py:222] Graph was finalized.


INFO:tensorflow:Restoring parameters from C:\Users\gurunath.lv\AppData\Local\Temp\tmplauykei2\model.ckpt-5000


I0815 19:55:17.838268  7940 saver.py:1270] Restoring parameters from C:\Users\gurunath.lv\AppData\Local\Temp\tmplauykei2\model.ckpt-5000


INFO:tensorflow:Running local_init_op.


I0815 19:55:18.073641  7940 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0815 19:55:18.185894  7940 session_manager.py:493] Done running local_init_op.


INFO:tensorflow:Calling model_fn.


I0815 20:00:44.456000  7940 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Done calling model_fn.


I0815 20:00:46.700137  7940 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Graph was finalized.


I0815 20:00:47.305423  7940 monitored_session.py:222] Graph was finalized.


INFO:tensorflow:Restoring parameters from C:\Users\gurunath.lv\AppData\Local\Temp\tmplauykei2\model.ckpt-5000


I0815 20:00:47.317308  7940 saver.py:1270] Restoring parameters from C:\Users\gurunath.lv\AppData\Local\Temp\tmplauykei2\model.ckpt-5000


INFO:tensorflow:Running local_init_op.


I0815 20:00:47.536721  7940 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0815 20:00:47.588587  7940 session_manager.py:493] Done running local_init_op.


INFO:tensorflow:Calling model_fn.


I0815 20:09:10.419627  7940 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Done calling model_fn.


I0815 20:09:11.339206  7940 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Graph was finalized.


I0815 20:09:11.728128  7940 monitored_session.py:222] Graph was finalized.


INFO:tensorflow:Restoring parameters from C:\Users\gurunath.lv\AppData\Local\Temp\tmplauykei2\model.ckpt-5000


I0815 20:09:11.734127  7940 saver.py:1270] Restoring parameters from C:\Users\gurunath.lv\AppData\Local\Temp\tmplauykei2\model.ckpt-5000


INFO:tensorflow:Running local_init_op.


I0815 20:09:11.835839  7940 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0815 20:09:11.864761  7940 session_manager.py:493] Done running local_init_op.


In [14]:
config_builder = WitConfigBuilder(examples).set_estimator_and_feature_spec(
    classifier, feature_spec).set_label_vocab(['Under 50K', 'Over 50K'])
WitWidget(config_builder, height=tool_height_in_px)

WitWidget(config={'model_type': 'classification', 'label_vocab': ['Under 50K', 'Over 50K'], 'are_sequence_exam…

INFO:tensorflow:Calling model_fn.


I0815 19:30:58.823272 27724 estimator.py:1111] Calling model_fn.


INFO:tensorflow:Done calling model_fn.


I0815 19:31:22.594632 27724 estimator.py:1113] Done calling model_fn.


INFO:tensorflow:Graph was finalized.


I0815 19:31:23.819086 27724 monitored_session.py:222] Graph was finalized.


INFO:tensorflow:Restoring parameters from C:\Users\gurunath.lv\AppData\Local\Temp\tmp4dm5lz_v\model.ckpt-5000


I0815 19:31:23.949001 27724 saver.py:1270] Restoring parameters from C:\Users\gurunath.lv\AppData\Local\Temp\tmp4dm5lz_v\model.ckpt-5000


INFO:tensorflow:Running local_init_op.


I0815 19:31:26.832295 27724 session_manager.py:491] Running local_init_op.


INFO:tensorflow:Done running local_init_op.


I0815 19:31:27.479565 27724 session_manager.py:493] Done running local_init_op.
