#**Using SHAP to Understand Text Tokens' Effects in a Classifier**

We are going to train a simple text classifier (using our data for the detection of fake reviews). For any given review classification, we can see which terms most contributed to the resulting classification.

#*Load TripAdvisor Reviews from Git*

In [3]:
import tensorflow as tf
tf.compat.v1.disable_v2_behavior()
#tf.compat.v1.enable_eager_execution()

from tensorflow import keras
from keras import layers
from google.colab import files
import pandas as pd
import io
import numpy as np

# Just load the data from the Week 3 folder again.
trip_advisor = pd.read_csv('https://raw.githubusercontent.com/gburtch/BA865-2023/main/Lecture%20Materials/C/dataset/deceptive-opinion.csv')
trip_advisor = trip_advisor.sample(frac=1) # Shuffle the data since I'll eventually just use a simple validation split.

trip_advisor.describe(include='all')

# Let's shuffle things... 
shuffled_indices= np.arange(trip_advisor.shape[0])
np.random.shuffle(shuffled_indices)

trip_advisor_text = trip_advisor['text'].to_numpy()
label = np.where(trip_advisor['deceptive']=='deceptive',1,0)

print(trip_advisor_text)
trip_advisor_text = trip_advisor_text[shuffled_indices]
label = label[shuffled_indices]
print(trip_advisor_text)

["I arrived here at about 8pm after a flight from California. I didn't know about the renovation when I made the reservations, but that didn't bother me very much. I was helped by a man at the front desk named Ben, who was friendly enough, but literally just handed me my room key and told me to have a nice stay. He gave me NO information about anything. I had no idea where breakfast was served, or even if there WAS breakfast. There was also an EXTREMELY unpleasant woman working behind the desk(she wasn't wearing a name tag but was African American, slim with short hair) who seemed to be actually radiating hostility. It made me very uncomfortable and needless to say, made me feel unwelcome. I actually felt like I was putting her out by staying in the hotel. I encountered her again the next morning when I had to go downstairs to ask a question and again tried to smile at her a bit, and again was met with utter disdain and rudeness. I also watched her interact with another guest and saw h

#*Define / Train Our Fake Review Detector*

In [4]:
# Convert strings to sequences of words.
review_seq = []
for review in trip_advisor_text:
  seq = keras.preprocessing.text.text_to_word_sequence(review)
  review_seq.append(seq)

# Make our dictionary of term frequencies
word_freq = {}
for review in review_seq:
  for term in review:
    try:
        word_freq[term] = word_freq[term]+1
    except KeyError:
        word_freq[term] = 1

unique_terms = {term for review in review_seq for term in review}
print(f'We have {len(unique_terms)} unique tokens in our dataset.')

# We can then easily make a term-integer dictionary and an integer-term dictionary (for reverse lookup)
word_index = {term: number for number, term in enumerate(unique_terms)}
reverse_index = {number: term for number, term in enumerate(unique_terms)}

We have 10275 unique tokens in our dataset.


In [5]:
def vectorize_sequences(sequences, dimension=len(unique_terms)): 
    
    # Make our blank matrix of 0's to store hot encodings.
    results = np.zeros((len(sequences), dimension))

    # For each observation and element in that observation,
    # Update the blank matrix to a 1 at row obs, column element value.
    for i, sequence in enumerate(sequences):
        for term in sequence:
            j = word_index[term]
            results[i, j] = 1
    return results

ta_vectorized = vectorize_sequences(review_seq)

Note that SHAP requires that the input features be numeric (it can't work with strings). So, the input layer to our model needs to be integer sequences. 

In [8]:
def build_model():
    model = keras.Sequential([
        layers.Dense(250, activation="linear"),
        layers.Dense(50, activation="relu",kernel_regularizer="l2"),
        layers.Dense(5, activation="relu"),
        layers.Dense(1, activation="sigmoid")
    ])
    model.compile(optimizer="adam", loss="binary_crossentropy", metrics=['accuracy'])
    return model

model = build_model()

history = model.fit(ta_vectorized[:1200], label[:1200], validation_split=0.2, epochs=10, batch_size=25)

Train on 960 samples, validate on 240 samples
Epoch 1/10
Epoch 2/10

  updates = self.state_updates


Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Test performance...

In [9]:
test_perf = model.evaluate(ta_vectorized[1200:], label[1200:])
print(f'Accuracy in the test set is {test_perf[1]*100:.2f}%.')

Accuracy in the test set is 89.50%.


#*Create Our SHAP Explainer*

In [10]:
try:
  import shap 
except ImportError as error:
  !pip install shap 
  import shap

# Use the first 1200 reviews as the basis of calculating shap values for any given prediction instance.
background = ta_vectorized[:1200]

# 'Adapt' the explainer to those reference samples, given our trained predictive model. 
explainer = shap.DeepExplainer(model, background)

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting shap
  Downloading shap-0.41.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (572 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m572.4/572.4 kB[0m [31m39.8 MB/s[0m eta [36m0:00:00[0m
Collecting slicer==0.0.7
  Downloading slicer-0.0.7-py3-none-any.whl (14 kB)
Installing collected packages: slicer, shap
Successfully installed shap-0.41.0 slicer-0.0.7


keras is no longer supported, please use tf.keras instead.
Your TensorFlow version is newer than 2.4.0 and so graph support has been removed in eager mode and some static graphs may not be supported. See PR #1483 for discussion.



In [11]:
# We will produce shape values for the following observations.
test_obs = ta_vectorized[1250:1260]

# Third review is predicted to very likely be fake.
predictions = model.predict(test_obs)
print(f'Our predictions for these test observations are as follows:\n{predictions}')

shap_values = explainer.shap_values(test_obs)
print(f'We have {len(shap_values[0])} sets of SHAP values.')
print(f'The SHAP values for the first prediction instance are:\n {shap_values[0][0]}.')
print(f'Any given prediction yields {len(shap_values[0][0])} SHAP values; one for each of our {len(unique_terms)} unique terms.')

`Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.


Our predictions for these test observations are as follows:
[[0.01542159]
 [0.00802678]
 [0.02036036]
 [0.59654427]
 [0.00810784]
 [0.33936542]
 [0.8961891 ]
 [0.5733524 ]
 [0.99994826]
 [0.9920656 ]]
We have 10 sets of SHAP values.
The SHAP values for the first prediction instance are:
 [-7.50034970e-06  2.86909935e-06 -3.21382967e-05 ...  1.14492711e-07
  2.04214950e-04  8.79059371e-08].
Any given prediction yields 10275 SHAP values; one for each of our 10275 unique terms.


#*Make a SHAP Force Plot*

Now, let's create the arrays of SHAP values and terms to pass into the plotting function.



In [12]:
# Let's make one list with our terms that associate with each SHAP value, by index.
terms = np.stack(list(unique_terms))

# Now let's stack the lists of list of lists of prediction-specific SHAP values into a single NumPy array
shap_values = np.stack(np.stack(shap_values[0]))

Finally, let's create a plot. In this case, a Force plot.

In [18]:
# initialize the JS visualization code
shap.initjs()

shap.force_plot(explainer.expected_value[0], shap_values[2], terms)