#**Using SHAP to Understand Text Tokens' Effects in a Classifier**

We are going to train a simple text classifier (using our data for the detection of fake reviews). For any given review classification, we can see which terms most contributed to the resulting classification.

#*Load TripAdvisor Reviews from Git*

In [1]:
import tensorflow as tf
tf.compat.v1.disable_v2_behavior()
#tf.compat.v1.enable_eager_execution()

from tensorflow import keras
from keras import layers
from google.colab import files
import pandas as pd
import io
import numpy as np

# Just load the data from the Week 3 folder again.
trip_advisor = pd.read_csv('https://raw.githubusercontent.com/gburtch/BA865-2022/main/Week%203/datasets/deceptive-opinion.csv')
trip_advisor = trip_advisor.sample(frac=1) # Shuffle the data since I'll eventually just use a simple validation split.

trip_advisor.describe(include='all')

# Let's shuffle things... 
shuffled_indices= np.arange(trip_advisor.shape[0])
np.random.shuffle(shuffled_indices)

trip_advisor_text = trip_advisor['text'].to_numpy()
label = np.where(trip_advisor['deceptive']=='deceptive',1,0)

print(trip_advisor_text)
trip_advisor_text = trip_advisor_text[shuffled_indices]
label = label[shuffled_indices]
print(trip_advisor_text)

Instructions for updating:
non-resource variables are not supported in the long term
["I went to the Homewood Suites in Chicago which is part of Hilton's famous hotels. I gotta say that this is the worst hotel that I have ever been to. In fact, Homewood Suites is the worst hotel on the face of this planet. I checked out the rooms and their rooms look like the hotel was built in bad shape. In fact, the rooms looked so bad that I wanted to leave the hotel early. I also got to check out the employees who worked for this hotel. I saw one of the employees and when I rung the bell for my room, they ignored me. How could a hotel have the worst service when they are supposed to not ignore you? One more thing that I hate about this hotel is that the logo didn't look like the actual logo. It looked more like the logo was changed to look more like something a baby could draw. I'll never go to this hotel again and it won't even be in a million years for that matter. This is a hotel that I will nev

#*Define / Train Our Fake Review Detector*

In [2]:
# Convert strings to sequences of words.
review_seq = []
for review in trip_advisor_text:
  seq = keras.preprocessing.text.text_to_word_sequence(review)
  review_seq.append(seq)

# Make our dictionary of term frequencies
word_freq = {}
for review in review_seq:
  for term in review:
    try:
        word_freq[term] = word_freq[term]+1
    except KeyError:
        word_freq[term] = 1

unique_terms = {term for review in review_seq for term in review}
print(f'We have {len(unique_terms)} unique tokens in our dataset.')

# We can then easily make a term-integer dictionary and an integer-term dictionary (for reverse lookup)
word_index = {term: number for number, term in enumerate(unique_terms)}
reverse_index = {number: term for number, term in enumerate(unique_terms)}

We have 10275 unique tokens in our dataset.


In [3]:
def vectorize_sequences(sequences, dimension=len(unique_terms)): 
    
    # Make our blank matrix of 0's to store hot encodings.
    results = np.zeros((len(sequences), dimension))

    # For each observation and element in that observation,
    # Update the blank matrix to a 1 at row obs, column element value.
    for i, sequence in enumerate(sequences):
        for term in sequence:
            j = word_index[term]
            results[i, j] = 1
    return results

ta_vectorized = vectorize_sequences(review_seq)

Note that SHAP requires that the input features be numeric (it can't work with strings). So, the input layer to our model needs to be integer sequences. 

In [4]:
def build_model():
    model = keras.Sequential([
        layers.Dense(250, activation="linear"),
        layers.Dense(50, activation="relu",kernel_regularizer="l2"),
        layers.Dense(5, activation="relu"),
        layers.Dense(1, activation="sigmoid")
    ])
    model.compile(optimizer="adam", loss="binary_crossentropy", metrics=['accuracy'])
    return model

model = build_model()

history = model.fit(ta_vectorized[:1200], label[:1200], validation_split=0.2, epochs=10, batch_size=25)

Train on 960 samples, validate on 240 samples
Epoch 1/10
Epoch 2/10

  updates = self.state_updates


Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Test performance...

In [5]:
test_perf = model.evaluate(ta_vectorized[1200:], label[1200:])
print(f'Accuracy in the test set is {test_perf[1]*100:.2f}%.')

Accuracy in the test set is 80.00%.


#*Create Our SHAP Explainer*

In [6]:
try:
  import shap 
except ImportError as error:
  !pip install shap 
  import shap

# Use the first 1200 reviews as the basis of calculating shap values for any given prediction instance.
background = ta_vectorized[:1200]

# 'Adapt' the explainer to those reference samples, given our trained predictive model. 
explainer = shap.DeepExplainer(model, background)

Collecting shap
  Downloading shap-0.40.0-cp37-cp37m-manylinux2010_x86_64.whl (564 kB)
[?25l[K     |▋                               | 10 kB 33.7 MB/s eta 0:00:01[K     |█▏                              | 20 kB 38.6 MB/s eta 0:00:01[K     |█▊                              | 30 kB 20.6 MB/s eta 0:00:01[K     |██▎                             | 40 kB 17.3 MB/s eta 0:00:01[K     |███                             | 51 kB 12.5 MB/s eta 0:00:01[K     |███▌                            | 61 kB 14.6 MB/s eta 0:00:01[K     |████                            | 71 kB 13.7 MB/s eta 0:00:01[K     |████▋                           | 81 kB 15.1 MB/s eta 0:00:01[K     |█████▏                          | 92 kB 16.8 MB/s eta 0:00:01[K     |█████▉                          | 102 kB 15.4 MB/s eta 0:00:01[K     |██████▍                         | 112 kB 15.4 MB/s eta 0:00:01[K     |███████                         | 122 kB 15.4 MB/s eta 0:00:01[K     |███████▌                        | 133 kB 1

keras is no longer supported, please use tf.keras instead.
Your TensorFlow version is newer than 2.4.0 and so graph support has been removed in eager mode and some static graphs may not be supported. See PR #1483 for discussion.





In [7]:
# We will produce shape values for the following observations.
test_obs = ta_vectorized[1250:1260]

# Third review is predicted to very likely be fake.
predictions = model.predict(test_obs)
print(f'Our predictions for these test observations are as follows:\n{predictions}')

shap_values = explainer.shap_values(test_obs)
print(f'We have {len(shap_values[0])} sets of SHAP values.')
print(f'The SHAP values for the first prediction instance are:\n {shap_values[0][0]}.')
print(f'Any given prediction yields {len(shap_values[0][0])} SHAP values; one for each of our {len(unique_terms)} unique terms.')

`Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.


Our predictions for these test observations are as follows:
[[0.00556971]
 [0.59646267]
 [0.59646267]
 [0.04567406]
 [0.14188546]
 [0.01103771]
 [0.46926147]
 [0.59646267]
 [0.59646267]
 [0.59646267]]
We have 10 sets of SHAP values.
The SHAP values for the first prediction instance are:
 [-9.72642058e-06  3.66172908e-06 -7.01089933e-07 ...  6.74340760e-06
 -5.26123991e-06 -3.27279520e-05].
Any given prediction yields 10275 SHAP values; one for each of our 10275 unique terms.


#*Make a SHAP Force Plot*

Now, let's create the arrays of SHAP values and terms to pass into the plotting function.



In [None]:
# Let's make one list with our terms that associate with each SHAP value, by index.
terms = np.stack(list(unique_terms))

# Now let's stack the lists of list of lists of prediction-specific SHAP values into a single NumPy array
shap_values = np.stack(np.stack(shap_values[0]))

Finally, let's create a plot. In this case, a Force plot.

In [None]:
# initialize the JS visualization code
shap.initjs()

shap.force_plot(explainer.expected_value[0], shap_values[2], terms)