## What-If Tool and SHAP on COMPAS keras model

This notebook shows:
- Training of a keras model on the [COMPAS](https://www.kaggle.com/danofer/compass) dataset.
- Explanation of inference results using [SHAP](https://github.com/slundberg/shap).
- Use of What-If Tool on the trained model, including SHAP values.

For ML fairness background on COMPAS see:

- https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
- https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
- http://www.crj.org/assets/2017/07/9_Machine_bias_rejoinder.pdf

This notebook trains a model to mimic the behavior of the COMPAS recidivism classifier and uses the SHAP library to provide feature importance for each prediction by the model. We can then analyze our COMPAS proxy model for fairness using the What-If Tool, and explore how important each feature was to each prediction through the SHAP values.

The specific binary classification task for this model is to determine if a person belongs in the "Low" risk class according to COMPAS (negative class), or the "Medium" or "High" risk class (positive class). We then analyze it with the What-If Tool for its ability to predict recidivism within two years of arrest.

A simpler version of this notebook that doesn't make use of the SHAP explainer can be found [here](https://colab.research.google.com/github/pair-code/what-if-tool/blob/master/WIT_COMPAS.ipynb).

Copyright 2019 Google LLC.
SPDX-License-Identifier: Apache-2.0

In [1]:
# @title Install What-If Tool Widget and SHAP library
!pip install --upgrade --quiet witwidget shap

In [1]:
# @title Read training dataset from CSV {display-mode: "form"}
import pandas as pd
import numpy as np
import tensorflow as tf

tf.compat.v1.disable_v2_behavior()
import witwidget
import os
import pickle

from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

from sklearn.utils import shuffle

df = pd.read_csv("https://storage.googleapis.com/what-if-tool-resources/computefest2019/cox-violent-parsed_filt.csv")

Instructions for updating:
non-resource variables are not supported in the long term


In [2]:
# Preprocess the data

# Filter out entries with no indication of recidivism or no compass score
df = df[df["is_recid"] != -1]
df = df[df["decile_score"] != -1]

# Rename recidivism column
df["recidivism_within_2_years"] = df["is_recid"]

# Make the COMPASS label column numeric (0 and 1), for use in our model
df["COMPASS_determination"] = np.where(df["score_text"] == "Low", 0, 1)

df = pd.get_dummies(df, columns=["sex", "race"])

# Get list of all columns from the dataset we will use for model input or output.
input_features = [
    "sex_Female",
    "sex_Male",
    "age",
    "race_African-American",
    "race_Caucasian",
    "race_Hispanic",
    "race_Native American",
    "race_Other",
    "priors_count",
    "juv_fel_count",
    "juv_misd_count",
    "juv_other_count",
]

to_keep = input_features + ["recidivism_within_2_years", "COMPASS_determination"]

to_remove = [col for col in df.columns if col not in to_keep]
df = df.drop(columns=to_remove)

input_columns = df.columns.tolist()
labels = df["COMPASS_determination"]
df.head()

Unnamed: 0,age,juv_fel_count,juv_misd_count,juv_other_count,priors_count,recidivism_within_2_years,COMPASS_determination,sex_Female,sex_Male,race_African-American,race_Caucasian,race_Hispanic,race_Native American,race_Other
0,69,0,0,0,0,0,0,0,1,0,0,0,0,1
1,69,0,0,0,0,0,0,0,1,0,0,0,0,1
3,34,0,0,0,0,1,0,0,1,1,0,0,0,0
4,24,0,0,1,4,1,0,0,1,1,0,0,0,0
5,24,0,0,1,4,1,0,0,1,1,0,0,0,0


In [3]:
# Create data structures needing for training and testing.
# The training data doesn't contain the column we are predicting,
# 'COMPASS_determination', or the column we are using for evaluation of our
# trained model, 'recidivism_within_2_years'.
df_for_training = df.drop(columns=["COMPASS_determination", "recidivism_within_2_years"])
train_size = int(len(df_for_training) * 0.8)

train_data = df_for_training[:train_size]
train_labels = labels[:train_size]

test_data_with_labels = df[train_size:]

In [4]:
# Create the model

# This is the size of the array we'll be feeding into our model for each example
input_size = len(train_data.iloc[0])

model = Sequential()
model.add(Dense(200, input_shape=(input_size,), activation="relu"))
model.add(Dense(50, activation="relu"))
model.add(Dense(25, activation="relu"))
model.add(Dense(1, activation="sigmoid"))

model.compile(loss="mean_squared_error", optimizer="adam")

In [5]:
# Train the model
model.fit(train_data.values, train_labels.values, epochs=10, batch_size=32, validation_split=0.1)

In [22]:
model.evaluate(train_data.values, train_labels.values, verbose=True)
print(np.mean((model.predict(train_data.values) > 0.5).reshape(-1) == train_labels.values))

0.7526997067868125


In [15]:
session = tf.compat.v1.Session()
model(train_data.values).eval(session=session)
tf.reduce_mean(((model(train_data.values) > 0.5) == train_labels.values) * 1.0).eval(session=session)

FailedPreconditionError: Graph execution error:

Detected at node 'sequential_7/dense_1/BiasAdd/ReadVariableOp' defined at (most recent call last):
    File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/runpy.py", line 197, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/runpy.py", line 87, in _run_code
      exec(code, run_globals)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel_launcher.py", line 16, in <module>
      app.launch_new_instance()
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/traitlets/config/application.py", line 846, in launch_instance
      app.start()
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 612, in start
      self.io_loop.start()
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 199, in start
      self.asyncio_loop.run_forever()
    File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
      self._run_once()
    File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
      handle._run()
    File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/asyncio/events.py", line 80, in _run
      self._context.run(self._callback, *self._args)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/ioloop.py", line 688, in <lambda>
      lambda f: self._run_callback(functools.partial(callback, future))
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/ioloop.py", line 741, in _run_callback
      ret = callback()
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 814, in inner
      self.ctx_run(self.run)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 775, in run
      yielded = self.gen.send(value)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 358, in process_one
      yield gen.maybe_future(dispatch(*args))
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 234, in wrapper
      yielded = ctx_run(next, result)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 261, in dispatch_shell
      yield gen.maybe_future(handler(stream, idents, msg))
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 234, in wrapper
      yielded = ctx_run(next, result)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 536, in execute_request
      self.do_execute(
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 234, in wrapper
      yielded = ctx_run(next, result)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 302, in do_execute
      res = shell.run_cell(code, store_history=store_history, silent=silent)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/zmqshell.py", line 539, in run_cell
      return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2894, in run_cell
      result = self._run_cell(
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2940, in _run_cell
      return runner(coro)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
      coro.send(None)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3165, in run_cell_async
      has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3357, in run_ast_nodes
      if (await self.run_code(code, result,  async_=asy)):
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "<ipython-input-15-321d43e493e1>", line 2, in <module>
      model(train_data.values).eval(session=session)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/base_layer_v1.py", line 811, in __call__
      outputs = self.call(cast_inputs, *args, **kwargs)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/sequential.py", line 374, in call
      return super(Sequential, self).call(inputs, training=training, mask=mask)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/functional.py", line 451, in call
      return self._run_internal_graph(
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/functional.py", line 589, in _run_internal_graph
      outputs = node.layer(*args, **kwargs)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/base_layer_v1.py", line 765, in __call__
      outputs = call_fn(cast_inputs, *args, **kwargs)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/layers/core/dense.py", line 230, in call
      outputs = tf.nn.bias_add(outputs, self.bias)
Node: 'sequential_7/dense_1/BiasAdd/ReadVariableOp'
Detected at node 'sequential_7/dense_1/BiasAdd/ReadVariableOp' defined at (most recent call last):
    File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/runpy.py", line 197, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/runpy.py", line 87, in _run_code
      exec(code, run_globals)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel_launcher.py", line 16, in <module>
      app.launch_new_instance()
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/traitlets/config/application.py", line 846, in launch_instance
      app.start()
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 612, in start
      self.io_loop.start()
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 199, in start
      self.asyncio_loop.run_forever()
    File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
      self._run_once()
    File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
      handle._run()
    File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/asyncio/events.py", line 80, in _run
      self._context.run(self._callback, *self._args)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/ioloop.py", line 688, in <lambda>
      lambda f: self._run_callback(functools.partial(callback, future))
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/ioloop.py", line 741, in _run_callback
      ret = callback()
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 814, in inner
      self.ctx_run(self.run)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 775, in run
      yielded = self.gen.send(value)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 358, in process_one
      yield gen.maybe_future(dispatch(*args))
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 234, in wrapper
      yielded = ctx_run(next, result)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 261, in dispatch_shell
      yield gen.maybe_future(handler(stream, idents, msg))
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 234, in wrapper
      yielded = ctx_run(next, result)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 536, in execute_request
      self.do_execute(
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 234, in wrapper
      yielded = ctx_run(next, result)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 302, in do_execute
      res = shell.run_cell(code, store_history=store_history, silent=silent)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/zmqshell.py", line 539, in run_cell
      return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2894, in run_cell
      result = self._run_cell(
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2940, in _run_cell
      return runner(coro)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
      coro.send(None)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3165, in run_cell_async
      has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3357, in run_ast_nodes
      if (await self.run_code(code, result,  async_=asy)):
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "<ipython-input-15-321d43e493e1>", line 2, in <module>
      model(train_data.values).eval(session=session)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/base_layer_v1.py", line 811, in __call__
      outputs = self.call(cast_inputs, *args, **kwargs)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/sequential.py", line 374, in call
      return super(Sequential, self).call(inputs, training=training, mask=mask)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/functional.py", line 451, in call
      return self._run_internal_graph(
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/functional.py", line 589, in _run_internal_graph
      outputs = node.layer(*args, **kwargs)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/base_layer_v1.py", line 765, in __call__
      outputs = call_fn(cast_inputs, *args, **kwargs)
    File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/layers/core/dense.py", line 230, in call
      outputs = tf.nn.bias_add(outputs, self.bias)
Node: 'sequential_7/dense_1/BiasAdd/ReadVariableOp'
2 root error(s) found.
  (0) FAILED_PRECONDITION: Could not find variable dense_1/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Container localhost does not exist. (Could not find resource: localhost/dense_1/bias)
	 [[{{node sequential_7/dense_1/BiasAdd/ReadVariableOp}}]]
	 [[sequential_7/dense_3/Sigmoid/_1]]
  (1) FAILED_PRECONDITION: Could not find variable dense_1/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Container localhost does not exist. (Could not find resource: localhost/dense_1/bias)
	 [[{{node sequential_7/dense_1/BiasAdd/ReadVariableOp}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'sequential_7/dense_1/BiasAdd/ReadVariableOp':
  File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/traitlets/config/application.py", line 846, in launch_instance
    app.start()
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 612, in start
    self.io_loop.start()
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 199, in start
    self.asyncio_loop.run_forever()
  File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/home/rdyro/.pyenv/versions/3.9.12/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/ioloop.py", line 688, in <lambda>
    lambda f: self._run_callback(functools.partial(callback, future))
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 814, in inner
    self.ctx_run(self.run)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 775, in run
    yielded = self.gen.send(value)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 358, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 261, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 536, in execute_request
    self.do_execute(
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tornado/gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 302, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/ipykernel/zmqshell.py", line 539, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2894, in run_cell
    result = self._run_cell(
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2940, in _run_cell
    return runner(coro)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3165, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3357, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-15-321d43e493e1>", line 2, in <module>
    model(train_data.values).eval(session=session)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/base_layer_v1.py", line 811, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/sequential.py", line 374, in call
    return super(Sequential, self).call(inputs, training=training, mask=mask)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/functional.py", line 451, in call
    return self._run_internal_graph(
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/functional.py", line 589, in _run_internal_graph
    outputs = node.layer(*args, **kwargs)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/engine/base_layer_v1.py", line 765, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/keras/layers/core/dense.py", line 230, in call
    outputs = tf.nn.bias_add(outputs, self.bias)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py", line 1082, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/ops/nn_ops.py", line 3521, in bias_add
    bias = ops.convert_to_tensor(bias, dtype=value.dtype, name="bias")
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/profiler/trace.py", line 183, in wrapped
    return func(*args, **kwargs)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 1695, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 2098, in _dense_var_to_tensor
    return var._dense_var_to_tensor(dtype=dtype, name=name, as_ref=as_ref)  # pylint: disable=protected-access
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1476, in _dense_var_to_tensor
    return self.value()
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 619, in value
    return self._read_variable_op()
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 728, in _read_variable_op
    result = read_and_set_handle()
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 718, in read_and_set_handle
    result = gen_resource_variable_ops.read_variable_op(
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/ops/gen_resource_variable_ops.py", line 493, in read_variable_op
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/framework/op_def_library.py", line 740, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 3776, in _create_op_internal
    ret = Operation(
  File "/home/rdyro/.pyenv/versions/cs329t/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 2175, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)


In [None]:
# Create a SHAP explainer by passing a subset of our training data
import shap

explainer = shap.DeepExplainer(model, train_data.values[:200])

In [None]:
# Explain predictions of the model on the first 5 examples from our training set
# to test the SHAP explainer.
shap_values = explainer.shap_values(train_data.values[:5])
shap_values

In [None]:
# @title Show model results and SHAP values in WIT
from witwidget.notebook.visualization import WitWidget, WitConfigBuilder

num_datapoints = 1000  # @param {type: "number"}

# Column indices to strip out from data from WIT before passing it to the model.
columns_not_for_model_input = [
    test_data_with_labels.columns.get_loc("recidivism_within_2_years"),
    test_data_with_labels.columns.get_loc("COMPASS_determination"),
]

# Return model predictions and SHAP values for each inference.
def custom_predict_with_shap(examples_to_infer):
    # Delete columns not used by model
    model_inputs = np.delete(np.array(examples_to_infer), columns_not_for_model_input, axis=1)

    # Get the class predictions from the model.
    preds = model.predict(model_inputs)
    preds = [[1 - pred[0], pred[0]] for pred in preds]

    # Get the SHAP values from the explainer and create a map of feature name
    # to SHAP value for each example passed to the model.
    shap_output = explainer.shap_values(np.array(model_inputs))[0]
    attributions = []
    for shap in shap_output:
        attrs = {}
        for i, col in enumerate(df_for_training.columns):
            attrs[col] = shap[i]
        attributions.append(attrs)
    ret = {"predictions": preds, "attributions": attributions}
    return ret


examples_for_shap_wit = test_data_with_labels.values.tolist()
column_names = test_data_with_labels.columns.tolist()

config_builder = (
    WitConfigBuilder(examples_for_shap_wit[:num_datapoints], feature_names=column_names)
    .set_custom_predict_fn(custom_predict_with_shap)
    .set_target_feature("recidivism_within_2_years")
)

ww = WitWidget(config_builder, height=800)

#### What-If Tool exploration ideas

- Organize datapoints by "inference score" (can do this through binning or use of scatter plot) to see points ordered by how likely they were determined to re-offend.
  - Select a point near the boundary line (where red points turn to blue points)
  - Find the nearest counterfactual to see a similar person with a different decision. What is different?
  - Look at the partial dependence plots for the selected person. What changes in what features would change the decision on this person?
- Explore the attribution values provided by SHAP.
  - For a variety of selected datapoints, look at which features have the highest positive attribution values. These are making the model predict higher risk for this person.
  - Look at which features have the lowest negative attribution values as well. These are making the model predict lower risk for this person.
  - How well do these attribution scores line up with the partial dependence plots for those datapoints?
  - Use the attribution scores in the datapoints visualizations to look for interesting patterns. As one example, you could set the scatter X-axis to "attributions__age" and the scatter Y-axis to "attributions__priors_count" with the points colored by "Inference score" to investigate the relationship between feature importance of those two features and how those relate to the score the model gives for each datapoint being "High risk".
- In "Performance and Fairness" tab, slice the dataset by different features (such as race or sex)
  - Look at the confusion matrices for each slice - How does performance compare in those slices? What from the training data may have caused the difference in performance between the slices? What root causes could exist?
  - Use the threshold optimization buttons to optimize positive classification thresholds for each slice based on any of the possible fairness constraints - How different do the thresholds have to be to achieve that constraint? How varied are the thresholds depending on the fairness constraint chosen?

- In the "Performance + Fairness" tab, change the cost ratio so that you can optimize the threshold based off of a non-symmetric cost of false positives vs false negatives. Then click the "optimize threshold" button and see the effect on the confusion matrix. 
  - Slice the dataset by a feature, such as sex or race. How has the new cost ratio affected the disparity in performance between slices? Click the different threshold optimization buttons to see how the changed cost ratio affects the disparity given different fairness constraints.



#### Further exploration ideas

- Edit the training data so that race fields are not included as a feature and train a new model with this data as input (make sure to create a new explainer and a new custom prediction function that filters race out of model input and uses the right explainer and model).
- Load the new model with set_compare_custom_predict_fn and compare it with the original model.
  - HINT: You'll need to make edits in 3 separate code cells.
  - Is there still a racial disparity in model results? If so, what could be the causes?
  - How did the SHAP attributions change?

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=6f99e494-88c0-4d9d-8943-c19d4dba913c' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>