# Debugging 20Newsgroups
- **Task**: Classify "Christianity" vs "Atheism" documents from the 20 Newsgroups dataset.
- **Problem**: The 20Newsgroups dataset is special because it contains a lot of artifacts – tokens (e.g., person names, punctuation marks) which are not relevant, but strongly cooccur with one of the classes. For evaluation, we therefore used the Religion dataset by [Ribeiro et al. (2016)](https://arxiv.org/pdf/1602.04938.pdf), containing "Christianity" and "Atheism" web pages, as a target dataset.
- **Solution**: We use our framework to identify the features detecting irrelevant words (that do not capture the meaning of Christianity/Atheism and cannot generalize to the Religion dataset) and disable such features.

In [1]:
# Notebook setup
%matplotlib inline

import pickle
import os
import datetime
import random
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
sess = InteractiveSession(config=config)

plt.rcParams['figure.figsize'] = [14, 7]
os.environ['PYTHONHASHSEED'] = '0'

# Set random seed to create reproducable results
the_seed = 1234
np.random.seed(the_seed)
random.seed(the_seed)
from keras import backend as K
tf.set_random_seed(the_seed)
K.set_session(sess)

Using TensorFlow backend.


In [2]:
import find

## Settings

- GloVe word embeddings: Please replace the string in the second line with a path to your GloVe embeddings file which can be download [here](http://nlp.stanford.edu/data/glove.6B.zip)

In [3]:
EMBEDDING_DIM = 300
EMBEDDING_PATH = f"GLoVe/glove.6B.{EMBEDDING_DIM}d.txt" # Path to your glove embeddings

- Dataset

In [8]:
DATA_PATH = 'EMNLP2020-Data2Share/Data2Share/' 
MAIN_DATASET = 'Wikitoxic'
SECOND_DATASET = None
THIRD_DATASET = None
GENDER_BIAS = False

- Model

In [5]:
MODEL_PATH = 'trained_models/' # Path to save your trained models
MODEL_ARCH = 'CNN'
MAXLEN = 150
FILTERS = [(10, 2), (10, 3), (10, 4)] # Ten filters of each window size [2,3,4]
BATCH_SIZE = 128

## Model creation and training

In [6]:
# 0. Load GloVe embeddings
embedding_matrix, vocab_size, index2word, word2index = find.get_embedding_matrix(EMBEDDING_PATH, EMBEDDING_DIM, pad_initialisation = "zeros")

Loading Glove Model


400000it [00:27, 14605.34it/s]


Done. 400000  words loaded!


In [9]:
# 1. Load datasets and prepare inputs
# 1.1 Main dataset
data_1 = pickle.load(open(DATA_PATH + f'all_data_{MAIN_DATASET}.pickle', 'rb'))
class_names = data_1['class_names']
X_train_1, X_validate_1, X_test_1 = find.get_data_matrix(data_1['text_train'], word2index, MAXLEN), \
                                    find.get_data_matrix(data_1['text_validate'], word2index, MAXLEN), \
                                    find.get_data_matrix(data_1['text_test'], word2index, MAXLEN)
y_test_1 = data_1['y_test']
gender_test_1 = data_1['gender_test'] if GENDER_BIAS else None

# 1.2 Second dataset
if SECOND_DATASET is not None:
    data_2 = pickle.load(open(DATA_PATH + f'all_data_{SECOND_DATASET}.pickle', 'rb'))
    X_test_2, y_test_2 = find.get_data_matrix(data_2['text_test'], word2index, MAXLEN), data_2['y_test']
    gender_test_2 = data_2['gender_test'] if GENDER_BIAS else None
else:
    X_test_2, y_test_2, gender_test_2 = None, None, None

# 1.3 Third dataset
if THIRD_DATASET is not None:
    data_3 = pickle.load(open(DATA_PATH + f'all_data_{THIRD_DATASET}.pickle', 'rb'))
    X_test_3, y_test_3 = find.get_data_matrix(data_3['text_test'], word2index, MAXLEN), data_3['y_test']
    gender_test_3 = data_3['gender_test'] if GENDER_BIAS else None
else:
    X_test_3, y_test_3, gender_test_2  = None, None, None

100%|██████████| 56958/56958 [00:24<00:00, 2352.46it/s]
100%|██████████| 19111/19111 [00:06<00:00, 3031.53it/s]
100%|██████████| 18965/18965 [00:06<00:00, 3019.15it/s]


In [10]:
# 2. Create the result directory
if not os.path.exists(MODEL_PATH):
    os.makedirs(MODEL_PATH)
result_folder = MAIN_DATASET + '_' + MODEL_ARCH + '_' + datetime.datetime.now().strftime("%Y%m%d%H%M%S") + '/'
result_path = MODEL_PATH + result_folder
os.mkdir(result_path)

In [11]:
# 3. Create a model
if MODEL_ARCH == 'CNN':
    model = find.get_CNN_model(vocab_size, EMBEDDING_DIM, embedding_matrix, MAXLEN, class_names, FILTERS)
else:
    assert False, f"Unsupported model architecture: {MODEL_ARCH}"










__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, None)         0                                            
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 150, 300)     120000600   input_1[0][0]                    
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 149, 10)      6010        embedding_1[0][0]                
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 148, 10)      9010        embedding_1[0][0]                
__________________________________________________________________________________________________
c

In [12]:
# 4. Train the model
history = find.model_train(model, result_path + f'trained_{MODEL_ARCH}.h5', X_train_1, data_1['y_train'], X_validate_1, data_1['y_validate'], BATCH_SIZE, epochs = 300)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



Train on 56958 samples, validate on 19111 samples
Epoch 1/300
 - 4s - loss: 0.0963 - acc: 0.9671 - val_loss: 0.0269 - val_acc: 0.9916

Epoch 00001: val_loss improved from inf to 0.02690, saving model to trained_models/Wikitoxic_CNN_20220522194909/trained_CNN.h5
Epoch 2/300
 - 3s - loss: 0.0217 - acc: 0.9932 - val_loss: 0.0206 - val_acc: 0.9935

Epoch 00002: val_loss improved from 0.02690 to 0.02057, saving model to trained_models/Wikitoxic_CNN_20220522194909/trained_CNN.h5
Epoch 3/300
 - 3s - loss: 0.0151 - acc: 0.9955 - val_loss: 0.0174 - val_acc: 0.9944

Epoch 00003: val_loss improved from 0.02057 to 0.01738, saving model to trained_models/Wikitoxic_CNN_20220522194909/trained_CNN.h5
Epoch 4/300
 - 3s - loss: 0.0110 - acc: 0.9968 - val_loss: 0.0160 - val_acc: 0.9953

Epoch 00004: val_loss improved from 0.01738 to 0.01595, saving model to trained_models/Wikitoxic_CNN_20220522194909/trained_

In [13]:
# 5. Evaluate the model
if not GENDER_BIAS:
    find.evaluate_all(model, class_names, BATCH_SIZE, X_test_1, y_test_1, X_test_2, y_test_2, X_test_3, y_test_3, result_path = result_path, model_name = 'original')
else:
    find.evaluate_all_gender(model, class_names, BATCH_SIZE, X_test_1, y_test_1, gender_test_1, X_test_2, y_test_2, gender_test_2, result_path = result_path, model_name = 'original')

Evaluate with the original test set:
{'per_class': {0: {'all_positive': 18074,
                   'all_true': 18064,
                   'class_f1': 0.9971774863025071,
                   'class_name': 'Not abusive',
                   'class_precision': 0.9969016266460109,
                   'class_recall': 0.9974534986713907,
                   'true_positive': 18018},
               1: {'all_positive': 891,
                   'all_true': 901,
                   'class_f1': 0.9430803571428572,
                   'class_name': 'Abusive',
                   'class_precision': 0.9483726150392817,
                   'class_recall': 0.9378468368479467,
                   'true_positive': 845}},
 'total': {'accuracy': 0.9946216715001318,
           'macro_f1': 0.9701372355334461,
           'macro_precision': 0.9726371208426463,
           'macro_recall': 0.9676501677596687,
           'micro_f1': 0.9946216715001318,
           'micro_precision': 0.9946216715001318,
           'micro_recall

## Model understanding and debugging

In [14]:
# 6. Generate wordclouds
settings = {
    'model_arch': MODEL_ARCH,
    'filters': FILTERS,
    'maxlen': MAXLEN,
    'result_path': result_path,
    'index2word': index2word,
    'embedding_dim': EMBEDDING_DIM,
    'batch_size': BATCH_SIZE
}
all_wordclouds = find.generate_wordclouds(model, X_train_1, settings, max_examples = 2000)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
embedded_text_input (InputLayer (None, 150, 300)     0                                            
__________________________________________________________________________________________________
conv1d_4 (Conv1D)               (None, 149, 10)      6010        embedded_text_input[0][0]        
__________________________________________________________________________________________________
conv1d_5 (Conv1D)               (None, 148, 10)      9010        embedded_text_input[0][0]        
__________________________________________________________________________________________________
conv1d_6 (Conv1D)               (None, 147, 10)      12010       embedded_text_input[0][0]        
__________________________________________________________________________________________________
global_max

100%|██████████| 16/16 [00:00<00:00, 16.97it/s]
100%|██████████| 30/30 [00:08<00:00,  3.42it/s]
100%|██████████| 30/30 [00:06<00:00,  4.89it/s]


- Get input from a human

In [15]:
is_feature_enabled = [True for i in range(find.num_all_filters(FILTERS))]

In [16]:
# UI components from ipywidgets
import ipywidgets as wgt

def update_screen(feature_idx):
    show_action_panel(feature_idx)
    wordcloud = all_wordclouds[feature_idx]
    f, ax = plt.subplots()
    plt.rcParams['figure.figsize'] = [14, 7]
    ax.imshow(wordcloud, interpolation='bilinear')
    ax.axis("off")
    
    W = model.layers[-1].get_weights()[0] # For the final layer
    weight_plot = find.visualize_weights(W, feature_idx, class_names, show = False)
    plt.show()

def update_action(action):
    global feature_radio_button, is_feature_enabled
    feature_idx = feature_radio_button.value
    if action == 'enabled':
        print('enable')
        is_feature_enabled[feature_idx] = True
    elif action == 'disabled':
        print('disable')
        is_feature_enabled[feature_idx] = False
    else:
        assert False
    
def show_action_panel(feature_idx):
    global action_radio_button
    action_radio_button.description = f'Current status of feature {feature_idx}:'
    action_radio_button.value = 'enabled' if is_feature_enabled[feature_idx] else 'disabled'
    
feature_radio_button = wgt.RadioButtons(options=list(range(30)), value=0, description='Feature:', disabled=False)
action_radio_button = wgt.RadioButtons(options=['enabled', 'disabled'],
    value = 'enabled' if is_feature_enabled[feature_radio_button.value] else 'disabled',
    description = f'Current status of feature {feature_radio_button.value}:',
    style = {'description_width': 'initial'},
    disabled = False
)

wgt.interactive_output(update_action, {'action':action_radio_button})
out = wgt.interactive_output(update_screen, {'feature_idx':feature_radio_button})

In [17]:
# 7. Get input from a human 
# Please investigate word clouds of these features and disable some irrelevant features using the radio-buttons under the bar plot. 
# Once you are happy, please then proceed to the next cell.
display(wgt.HBox([feature_radio_button, wgt.VBox([out, action_radio_button])]))#

HBox(children=(RadioButtons(description='Feature:', options=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…

In [18]:
print(f"Total: {len(is_feature_enabled)} features \nEnabled: {sum(is_feature_enabled)} features \nDisabled: {len(is_feature_enabled)-sum(is_feature_enabled)} features")
print(f"Disabled features: {[i for i,s in enumerate(is_feature_enabled) if not s]}")

Total: 30 features 
Enabled: 25 features 
Disabled: 5 features
Disabled features: [3, 5, 23, 25, 28]


## Creating and fine-tuning an improved classifier

In [19]:
# 8. Create an improved model
# 8.1 Copy the existing CNN features
model_improved = find.get_CNN_model(vocab_size, EMBEDDING_DIM, embedding_matrix, MAXLEN, class_names, 
                                    FILTERS, trainable_filters = False)
model_improved.set_weights(model.get_weights()) 

# 8.2 Apply human decisions to disable irrelevant features
for idx, enable in enumerate(is_feature_enabled):
    if not enable:
        model_improved.layers[-1].disable_mask(idx)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, None)         0                                            
__________________________________________________________________________________________________
embedding_2 (Embedding)         (None, 150, 300)     120000600   input_2[0][0]                    
__________________________________________________________________________________________________
conv1d_7 (Conv1D)               (None, 149, 10)      6010        embedding_2[0][0]                
__________________________________________________________________________________________________
conv1d_8 (Conv1D)               (None, 148, 10)      9010        embedding_2[0][0]                
__________________________________________________________________________________________________
conv1d_9 (

In [20]:
# 9. Fine-tuning the improved model
history = find.model_train(model_improved, result_path + f'trained_{MODEL_ARCH}_improved.h5', X_train_1, data_1['y_train'], X_validate_1, data_1['y_validate'], BATCH_SIZE, epochs = 300)

Train on 56958 samples, validate on 19111 samples
Epoch 1/300
 - 2s - loss: 0.0035 - acc: 0.9994 - val_loss: 0.0158 - val_acc: 0.9955

Epoch 00001: val_loss improved from inf to 0.01578, saving model to trained_models/Wikitoxic_CNN_20220522194909/trained_CNN_improved.h5
Epoch 2/300
 - 2s - loss: 0.0026 - acc: 0.9996 - val_loss: 0.0175 - val_acc: 0.9952

Epoch 00002: val_loss did not improve from 0.01578
Epoch 3/300
 - 2s - loss: 0.0023 - acc: 0.9996 - val_loss: 0.0170 - val_acc: 0.9956

Epoch 00003: val_loss did not improve from 0.01578
Epoch 4/300
 - 2s - loss: 0.0021 - acc: 0.9996 - val_loss: 0.0180 - val_acc: 0.9952

Epoch 00004: val_loss did not improve from 0.01578


In [21]:
# 10. Evaluate the improved model
if not GENDER_BIAS:
    find.evaluate_all(model_improved, class_names, BATCH_SIZE, X_test_1, y_test_1, X_test_2, y_test_2, X_test_3, y_test_3, result_path = result_path, model_name = 'debugged')
else:
    find.evaluate_all_gender(model_improved, class_names, BATCH_SIZE, X_test_1, y_test_1, gender_test_1, X_test_2, y_test_2, gender_test_2, result_path = result_path, model_name = 'debugged')

Evaluate with the original test set:
{'per_class': {0: {'all_positive': 18083,
                   'all_true': 18064,
                   'class_f1': 0.9972058538744571,
                   'class_name': 'Not abusive',
                   'class_precision': 0.9966819664878616,
                   'class_recall': 0.9977302922940655,
                   'true_positive': 18023},
               1: {'all_positive': 882,
                   'all_true': 901,
                   'class_f1': 0.9433538979248458,
                   'class_name': 'Abusive',
                   'class_precision': 0.953514739229025,
                   'class_recall': 0.9334073251942286,
                   'true_positive': 841}},
 'total': {'accuracy': 0.9946744002109148,
           'macro_f1': 0.970310183638499,
           'macro_precision': 0.9750983528584433,
           'macro_recall': 0.9655688087441471,
           'micro_f1': 0.9946744002109148,
           'micro_precision': 0.9946744002109148,
           'micro_recall':