# Intepretability on Hateful Twitter Datasets

In this demo, we apply saliency maps (with support of sparse tensors) on the task on the detection of Twitter users who use hateful lexicon using graph machine learning with Stellargraph.

We consider the use-case of identifying hateful users on Twitter motivated by the work in [1] and using the dataset also published in [1]. Classification is based on a graph based on users' retweets and attributes as related to their account activity, and the content of tweets.

We pose identifying hateful users as a binary classification problem. We demonstrate the advantage of connected vs unconnected data in a semi-supervised setting with few training examples.

For connected data, we use Graph Convolutional Networks [2] as implemented in the `stellargraph` library. We pose the problem of identifying hateful tweeter users as node attribute inference in graphs.

We then use the interpretability tool (i.e., saliency maps) implemented in our library to demonstrate how to obtain the importance of the node features and links to gain insights into the model.

**References**

1. "Like Sheep Among Wolves": Characterizing Hateful Users on Twitter. M. H. Ribeiro, P. H. Calais, Y. A. Santos, V. A. F. Almeida, and W. Meira Jr.  arXiv preprint arXiv:1801.00317 (2017).


2. Semi-Supervised Classification with Graph Convolutional Networks. T. Kipf, M. Welling. ICLR 2017. arXiv:1609.02907 


In [1]:
import networkx as nx
import pandas as pd
import numpy as np
import seaborn as sns
import itertools
import os

from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.linear_model import LogisticRegressionCV

import stellargraph as sg
from stellargraph.mapper import GraphSAGENodeGenerator, FullBatchNodeGenerator
from stellargraph.layer import GraphSAGE, GCN, GAT
from stellargraph import globalvar

from tensorflow.keras import layers, optimizers, losses, metrics, Model, models
from sklearn import preprocessing, feature_extraction
from sklearn.model_selection import train_test_split
from sklearn import metrics

import matplotlib.pyplot as plt
import seaborn as sns
from scipy.sparse import csr_matrix, lil_matrix
%matplotlib inline

In [2]:
import matplotlib.pyplot as plt
%matplotlib inline

def remove_prefix(text, prefix):
    return text[text.startswith(prefix) and len(prefix):]

def plot_history(history):
    metrics = sorted(set([remove_prefix(m, "val_") for m in list(history.history.keys())]))
    for m in metrics:
        # summarize history for metric m
        plt.plot(history.history[m])
        plt.plot(history.history['val_' + m])
        plt.title(m, fontsize=18)
        plt.ylabel(m, fontsize=18)
        plt.xlabel('epoch', fontsize=18)
        plt.legend(['train', 'validation'], loc='best')
        plt.show()


### Train GCN model on the dataset

In [3]:
data_dir = os.path.expanduser("~/data/hateful-twitter-users")

### First load and prepare the node features

Each node in the graph is associated with a large number of features (also referred to as attributes). 

The list of features is given [here](https://www.kaggle.com/manoelribeiro/hateful-users-on-twitter). We repeated here for convenience.

hate :("hateful"|"normal"|"other")
  if user was annotated as hateful, normal, or not annotated.
  
  (is_50|is_50_2) :bool
  whether user was deleted up to 12/12/17 or 14/01/18. 
  
  (is_63|is_63_2) :bool
  whether user was suspended up to 12/12/17 or 14/01/18. 
        
  (hate|normal)_neigh :bool
  is the user on the neighborhood of a (hateful|normal) user? 
  
  [c_] (statuses|follower|followees|favorites)_count :int
  number of (tweets|follower|followees|favorites) a user has.
  
  [c_] listed_count:int
  number of lists a user is in.

  [c_] (betweenness|eigenvector|in_degree|outdegree) :float
  centrality measurements for each user in the retweet graph.
  
  [c_] *_empath :float
  occurrences of empath categories in the users latest 200 tweets.

  [c_] *_glove :float          
  glove vector calculated for users latest 200 tweets.
  
  [c_] (sentiment|subjectivity) :float
  average sentiment and subjectivity of users tweets.
  
  [c_] (time_diff|time_diff_median) :float
  average and median time difference between tweets.
  
  [c_] (tweet|retweet|quote) number :float
  percentage of direct tweets, retweets and quotes of an user.
  
  [c_] (number urls|number hashtags|baddies|mentions) :float
  number of bad words|mentions|urls|hashtags per tweet in average.
  
  [c_] status length :float
  average status length.
  
  hashtags :string
  all hashtags employed by the user separated by spaces.
  
**Notice** that c_ are attributes calculated for the 1-neighborhood of a user in the retweet network (averaged out).

First, we are going to load the user features and prepare them for machine learning.

In [4]:
users_feat = pd.read_csv(os.path.join(data_dir, 
                                      'users_neighborhood_anon.csv'))

### Data cleaning and preprocessing

The dataset as given includes a large number of graph related features that are manually extracted. 

Since we are going to employ modern graph neural networks methods for classification, we are going to drop these manually engineered features. 

The power of Graph Neural Networks stems from their ability to learn useful graph-related features eliminating the need for manual feature engineering.

In [5]:
def data_cleaning(feat):
    feat = feat.drop(columns=["hate_neigh", "normal_neigh"])
    
    # Convert target values in hate column from strings to integers (0,1,2)
    feat['hate'] = np.where(feat['hate']=='hateful', 1, np.where(feat['hate']=='normal', 0, 2))
    
    # missing information
    number_of_missing = feat.isnull().sum()
    number_of_missing[number_of_missing!=0]
    
    # Replace NA with 0
    feat.fillna(0, inplace=True)

    # droping info about suspension and deletion as it is should not be use din the predictive model
    feat.drop(feat.columns[feat.columns.str.contains("is_|_glove|c_|sentiment")], axis=1, inplace=True)

    # drop hashtag feature
    feat.drop(['hashtags'], axis=1, inplace=True)

    # Drop centrality based measures
    feat.drop(columns=['betweenness', 'eigenvector', 'in_degree', 'out_degree'], inplace=True)
    
    feat.drop(columns=['created_at'], inplace=True)
    
    return feat

In [6]:
node_data = data_cleaning(users_feat)

The continous features in our dataset have distributions with very long tails. We apply normalization to correct for this.

In [7]:
# Ignore the first two columns because those are user_id and hate (the target variable)
df_values = node_data.iloc[:, 2:].values

In [8]:
pt = preprocessing.PowerTransformer(method='yeo-johnson', 
                                    standardize=True) 

In [9]:
df_values_log = pt.fit_transform(df_values)

In [10]:
node_data.iloc[:, 2:] = df_values_log

In [11]:
# Set the dataframe index to be the same as the user_id and drop the user_id columns
node_data.index = node_data.index.map(str)
node_data.drop(columns=['user_id'], inplace=True)

### Next load the graph

Now that we have the node features prepared for machine learning, let us load the retweet graph.

In [12]:
g_nx = nx.read_edgelist(path=os.path.expanduser(os.path.join(data_dir,
                                                             "users.edges")))

In [13]:
g_nx.number_of_nodes(), g_nx.number_of_edges()

(100386, 2194979)

The graph has just over 100k nodes and approximately 2.2m edges.

We aim to train a graph neural network model that will predict the "hate"attribute on the nodes.

For computation convenience, we have mapped the target labels **normal**, **hateful**, and **other** to the numeric values **0**, **1**, and **2** respectively.

In [14]:
print(set(node_data["hate"]))

{0, 1, 2}


In [15]:
node_data = node_data.loc[list(g_nx.nodes())]
node_data.head()

Unnamed: 0,hate,statuses_count,followers_count,followees_count,favorites_count,listed_count,negotiate_empath,vehicle_empath,science_empath,timidity_empath,...,number hashtags,tweet number,retweet number,quote number,status length,number urls,baddies,mentions,time_diff,time_diff_median
10999,2,0.651057,-0.22844,0.539018,1.468664,0.319936,0.060148,-1.57304,0.468232,-0.446347,...,-0.347727,-0.087181,0.355153,1.19307,0.010627,0.31438,0.581937,0.017239,-0.772738,-0.713314
55317,2,0.52713,0.159289,0.603327,0.116831,0.400391,-0.1706,0.731748,-0.155481,0.487008,...,-0.159648,0.8634,-0.628442,1.058797,-0.400813,-0.034034,-0.02322,0.088925,0.209697,0.501357
44622,2,-0.972049,0.513316,0.003403,0.041867,0.682879,0.398669,-0.434141,-0.439622,0.134869,...,1.059839,-0.068104,0.338591,-0.254387,1.066497,1.200203,0.243681,0.661312,1.318291,1.403518
71821,2,1.003596,1.295017,0.21955,0.198376,1.810431,-0.601582,-1.187685,0.012743,0.684971,...,-1.705789,0.335796,-0.035509,-1.125292,-0.736826,-0.555163,-0.4296,0.542465,-0.675596,-0.164192
57907,2,1.158887,1.763834,2.30295,-0.60307,1.965467,1.635436,-1.57304,-1.285986,-1.540435,...,0.994608,1.001552,-0.818391,0.511212,0.24945,-0.184754,0.682368,1.253365,-0.766926,-0.781316


### Splitting the data

For machine learning we want to take a subset of the nodes for training, and use the rest for validation and testing. We'll use scikit-learn again to split our data into training and test sets.

The total number of annotated nodes is very small when compared to the total number of nodes in the graph. We are only going to use 15% of the annotated nodes for training and the remaining 85% of nodes for testing.

First, we are going to select the subset of nodes that are annotated as hateful or normal. These will be the nodes that have 'hate' values that are either 0 or 1.

In [16]:
# choose the nodes annotated with normal or hateful classes
annotated_users = node_data[node_data['hate']!=2]

In [17]:
annotated_user_features = annotated_users.drop(columns=['hate'])
annotated_user_targets = annotated_users[['hate']]

There are 4971 annoted nodes out of a possible, approximately, 100k nodes.

In [18]:
print(annotated_user_targets.hate.value_counts())

0    4427
1     544
Name: hate, dtype: int64


In [19]:
# split the data
train_data, test_data, train_targets, test_targets = train_test_split(annotated_user_features,
                                         annotated_user_targets,
                                         test_size=0.85,
                                         random_state=101)
train_targets = train_targets.values
test_targets = test_targets.values
print("Sizes and class distributions for train/test data")
print("Shape train_data {}".format(train_data.shape))
print("Shape test_data {}".format(test_data.shape))
print("Train data number of 0s {} and 1s {}".format(np.sum(train_targets==0), 
                                                    np.sum(train_targets==1)))
print("Test data number of 0s {} and 1s {}".format(np.sum(test_targets==0), 
                                                   np.sum(test_targets==1)))

Sizes and class distributions for train/test data
Shape train_data (745, 204)
Shape test_data (4226, 204)
Train data number of 0s 667 and 1s 78
Test data number of 0s 3760 and 1s 466


In [20]:
train_targets.shape, test_targets.shape

((745, 1), (4226, 1))

In [21]:
train_data.shape, test_data.shape

((745, 204), (4226, 204))

We are going to use 745 nodes for training and 4226 nodes for testing.

In [22]:
# choosing features to assign to a graph, excluding target variable
node_features = node_data.drop(columns=['hate'])

### Dealing with imbalanced data

Because the training data exhibit high imbalance, we introduce class weights.

In [23]:
from sklearn.utils.class_weight import compute_class_weight
class_weights = compute_class_weight('balanced', 
                                     np.unique(train_targets), 
                                     train_targets[:,0])
train_class_weights = dict(zip(np.unique(train_targets), 
                               class_weights))
train_class_weights

{0: 0.5584707646176912, 1: 4.7756410256410255}

Our data is now ready for machine learning.

Node features are stored in the Pandas DataFrame `node_features`.

The graph in networkx format is stored in the variable `g_nx`.

### Specify global parameters

Here we specify some parameters that control the type of model we are going to use. For example, we specify the base model type, e.g., GCN, GraphSAGE, etc, as well as model-specific parameters.

In [24]:
epochs = 20  

## Creating the base graph machine learning model in Keras

Now create a `StellarGraph` object from the `NetworkX` graph and the node features and targets. It is `StellarGraph` objects that we use in this library to perform machine learning tasks on.

In [25]:
G = sg.StellarGraph(g_nx, node_features=node_features)

To feed data from the graph to the Keras model we need a generator. The generators are specialized to the model and the learning task. 

For training we map only the training nodes returned from our splitter and the target values.

In [26]:
generator = FullBatchNodeGenerator(G, method="gcn", sparse=True)
train_gen = generator.flow(train_data.index, train_targets)

Using GCN (local pooling) filters...


In [27]:
base_model = GCN(
    layer_sizes=[32, 16],
    generator = generator,
    bias=True,
    dropout=0.5,
    activations=["elu", "elu"]
)
x_inp, x_out = base_model.node_model()
prediction = layers.Dense(units=1, activation="sigmoid")(x_out)

### Create a Keras model

Now let's create the actual Keras model with the graph inputs `x_inp` provided by the `base_model` and outputs being the predictions from the softmax layer.

In [28]:
model = Model(inputs=x_inp, outputs=prediction)

We compile our Keras model to use the `Adam` optimiser and the binary cross entropy loss.

In [29]:
model.compile(
    optimizer=optimizers.Adam(lr=0.005),
    loss=losses.binary_crossentropy,
    metrics=["acc"],
)

In [30]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(1, 100386, 204)]   0                                            
__________________________________________________________________________________________________
input_3 (InputLayer)            [(1, None, 2)]       0                                            
__________________________________________________________________________________________________
input_4 (InputLayer)            [(1, None)]          0                                            
__________________________________________________________________________________________________
dropout (Dropout)               (1, 100386, 204)     0           input_1[0][0]                    
______________________________________________________________________________________________

Train the model, keeping track of its loss and accuracy on the training set, and its performance on the test set during the training. We don't use the test set during training but only for measuring the trained model's generalization performance.

In [31]:
test_gen = generator.flow(test_data.index, test_targets)
history = model.fit_generator(
    train_gen,
    epochs=epochs,
    validation_data=test_gen,
    verbose=2,
    shuffle=False,
    class_weight=None,
)

Epoch 1/20
Instructions for updating:
Use tf.identity instead.
1/1 - 5s - loss: 0.7502 - acc: 0.4027 - val_loss: 0.6020 - val_acc: 0.7556
Epoch 2/20
1/1 - 3s - loss: 0.6126 - acc: 0.7503 - val_loss: 0.5452 - val_acc: 0.7743
Epoch 3/20
1/1 - 3s - loss: 0.5385 - acc: 0.7799 - val_loss: 0.4923 - val_acc: 0.8053
Epoch 4/20
1/1 - 3s - loss: 0.4901 - acc: 0.7919 - val_loss: 0.4358 - val_acc: 0.8318
Epoch 5/20
1/1 - 3s - loss: 0.4202 - acc: 0.8295 - val_loss: 0.3860 - val_acc: 0.8538
Epoch 6/20
1/1 - 3s - loss: 0.3731 - acc: 0.8443 - val_loss: 0.3468 - val_acc: 0.8788
Epoch 7/20
1/1 - 2s - loss: 0.3280 - acc: 0.8711 - val_loss: 0.3167 - val_acc: 0.8949
Epoch 8/20
1/1 - 3s - loss: 0.2956 - acc: 0.9020 - val_loss: 0.2933 - val_acc: 0.9025
Epoch 9/20
1/1 - 4s - loss: 0.2655 - acc: 0.9128 - val_loss: 0.2754 - val_acc: 0.9103
Epoch 10/20
1/1 - 3s - loss: 0.2427 - acc: 0.9195 - val_loss: 0.2628 - val_acc: 0.9124
Epoch 11/20
1/1 - 2s - loss: 0.2255 - acc: 0.9235 - val_loss: 0.2552 - val_acc: 0.9155


### Model Evaluation

Now we have trained the model, let's evaluate it on the test set.

We are going to consider 4 evaluation metrics calculated on the test set: Accuracy, Area Under the ROC curve (AU-ROC), the ROC curve, and the confusion table.

#### Accuracy

In [32]:
test_metrics = model.evaluate_generator(test_gen)
print("\nTest Set Metrics:")
for name, val in zip(model.metrics_names, test_metrics):
    print("\t{}: {:0.4f}".format(name, val))


Test Set Metrics:
	loss: 0.2639
	acc: 0.9153


In [33]:
all_nodes = node_data.index
all_gen = generator.flow(all_nodes)
all_predictions = model.predict_generator(all_gen).squeeze()[..., np.newaxis]

In [34]:
all_predictions.shape

(100386, 1)

In [35]:
all_predictions_df = pd.DataFrame(all_predictions, 
                                  index=node_data.index)

Let's extract the predictions for the test data only.

In [36]:
test_preds = all_predictions_df.loc[test_data.index, :]

In [37]:
test_preds.shape

(4226, 1)

The predictions are the probability of the true class that in this case is the probability of a user being hateful.

In [38]:
test_predictions = test_preds.values
test_predictions_class = ((test_predictions>0.5)*1).flatten()
test_df = pd.DataFrame({"Predicted_score": test_predictions.flatten(), 
                        "Predicted_class": test_predictions_class, 
                        "True": test_targets[:,0]})
roc_auc = metrics.roc_auc_score(test_df['True'].values, 
                                test_df['Predicted_score'].values)
print("The AUC on test set:\n")
print(roc_auc)

The AUC on test set:

0.875001426810337


## Interpretability by Saliency Maps

To understand which features and edges the model is looking at while making the predictions, we use the interpretability tool in the StellarGraph library (i.e., saliency maps) to demonstrate the importance of node features and edges given a target user.

In [39]:
from stellargraph.utils.saliency_maps import IntegratedGradients
int_saliency = IntegratedGradients(model, all_gen)

In [40]:
from stellargraph.utils.saliency_maps import IntegratedGradients, GradientSaliency
#we first select a list of nodes which are confidently classified as hateful.
predicted_hateful_index = set(np.where(all_predictions > 0.9)[0].tolist())
test_indices_set = set([int(k) for k in test_data.index.tolist()])
hateful_in_test = list(predicted_hateful_index.intersection(test_indices_set))

#let's pick one node from the predicted hateful users as an example.
idx = 2
target_idx = hateful_in_test[idx]
target_nid = list(G.nodes())[target_idx]
print('target_idx = {}, target_nid = {}'.format(target_idx, target_nid))
print('prediction score for node {} is {}'.format(target_idx, all_predictions[target_idx]))
print('ground truth score for node {} is {}'.format(target_idx, test_targets[test_data.index.tolist().index(str(target_nid))]))
[X,all_targets,A_index, A], y_true_all = all_gen[0]

IndexError: list index out of range

For the prediction of the target node, we then calculate the importance of the features for each node in the graph. Our support for sparse saliency maps makes it efficient to fit the scale like this dataset.

In [None]:
#We set the target_idx which is our target node. 
node_feature_importance = int_saliency.get_integrated_node_masks(target_idx, 0)

As `node_feature_importance` is a matrix where `node_feature_importance[i][j]` indicates the importance of the j-th feature of node i to the prediction of the target node, we sum up the feature importance of each node to measure its node importance. 

In [None]:
node_importance = np.sum(node_feature_importance, axis=-1)
node_importance_rank = np.argsort(node_importance)[::-1]
print(node_importance[node_importance_rank])
print('node_importance has {} non-zero values'.format(np.where(node_importance != 0)[0].shape[0]))

We expect the number of non-zero values of `node_importance` to match the number of nodes in the ego graph. 

In [None]:
G_ego = nx.ego_graph(g_nx,target_nid, radius=2)
print('The ego graph of the target node has {} neighbors'.format(len(G_ego.nodes())))

We then analyze the feature importance of the top-250 important nodes. See the output for the top-5 importance nodes. For each row, the features are sorted according to their importance.

In [None]:
feature_names = annotated_users.keys()[1:].values
feature_importance_rank = np.argsort(node_feature_importance[target_idx])[::-1]
df = pd.DataFrame([([k] + list(feature_names[np.argsort(node_feature_importance[k])[::-1]])) for k in node_importance_rank[:250]], columns = range(205)) 
df.head()

As a sanity check, we expect the target node itself to have a relatively high importance.

In [None]:
self_feature_importance_rank = np.argsort(node_feature_importance[target_idx])
print(np.sum(node_feature_importance[target_idx]))
print('The node itself is the {}-th important node'.format(1 + node_importance_rank.tolist().index(target_idx)))
df = pd.DataFrame([feature_names[self_feature_importance_rank][::-1]], columns = range(204)) 
df

For different nodes, the same features may have different ranks. To understand the overall importance of the features, we now analyze the average feature importance rank for the above selected nodes. Specifically, we obtain the average rank of each specific feature among the top-250 important nodes.

In [None]:
from collections import defaultdict
average_feature_rank = defaultdict(int)
for i in node_importance_rank[:250]:
    feature_rank = list(feature_names[np.argsort(node_feature_importance[i])[::-1]])
    for j in range(len(feature_rank)):
        average_feature_rank[feature_rank[j]] += feature_rank.index(feature_rank[j])
for k in average_feature_rank.keys():
    average_feature_rank[k] /= 250.0
sorted_avg_feature_rank = sorted(average_feature_rank.items(), key=lambda a:a[1])
for feat, avg_rank in sorted_avg_feature_rank:
    print(feat, avg_rank)

It seems for our target node, topics relevant to cleaning, hipster, etc. are important while those such as leaisure, ship, goverment, etc. are not important.

We then calculate the link importance for the edges that are connected to the target node within k hops (k = 2 for our GCN model).

In [None]:
link_importance = int_saliency.get_integrated_link_masks(target_idx, 0, steps=2)

In [None]:
(x, y) = link_importance.nonzero()
[X,all_targets,A_index, A], y_true_all = all_gen[0]
print(A_index.shape, A.shape)
G_edge_indices = [(A_index[0, k, 0], A_index[0, k, 1]) for k in range(A_index.shape[1])]
link_dict = {(A_index[0, k, 0], A_index[0, k, 1]):k for k in range(A_index.shape[1])}

As a sanity check, we expect the most important edge to connect important nodes.

In [None]:
nonzero_importance_val = link_importance[(x,y)].flatten().tolist()[0]
link_importance_rank = np.argsort(nonzero_importance_val)[::-1]
edge_number_in_ego_graph = link_importance_rank.shape[0]
print('There are {} edges within the ego graph of the target node'.format(edge_number_in_ego_graph))
x_rank, y_rank = x[link_importance_rank], y[link_importance_rank]
print('The most important edge connects {}-th important node and {}-th important node'.format(node_importance_rank.tolist().index(x_rank[0]), (node_importance_rank.tolist().index(y_rank[0]))))

To ensure that we are getting the correct importance for edges, we then check what happens if we perturb the top-10 most important edges. Specifically, if we remove the top important edges according to the calculated edge importance scores, we should expect to see the prediction of the target node change. 

In [None]:
from copy import deepcopy
print(A_index.shape)
selected_nodes = np.array([[target_idx]], dtype='int32')
prediction_clean = model.predict([X, selected_nodes, A_index, A]).squeeze()
A_perturb = deepcopy(A)
print('A_perturb.shape = {}'.format(A_perturb.shape))
#we remove top 1% important edges in the graph and see how the prediction changes
topk = int(edge_number_in_ego_graph * 0.01)

for i in range(topk):
    edge_x, edge_y = x_rank[i], y_rank[i]
    edge_index = link_dict[(edge_x, edge_y)]
    A_perturb[0, edge_index] = 0


As expected, the prediction score drops after the perturbation. The target node is predicted as non-hateful now.

In [None]:
prediction = model.predict([X, selected_nodes, A_index, A_perturb]).squeeze()
print('The prediction score changes from {} to {} after the perturbation'.format(prediction_clean, prediction))

NOTES: For UX team, the above notebook shows how we are able to compute the importance of nodes and edges. However, it seems the ego graph of the target node in twitter dataset is often very big so that we may draw only top important nodes/edges on the visualization. 