# Crafting Adversarial samples with text for LSTM

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tqdm import tqdm

from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Embedding
from tensorflow.keras.layers import LSTM, Activation, Input
from tensorflow.keras.utils import to_categorical
import tensorflow.keras.backend as K
from tensorflow.keras.preprocessing.text import Tokenizer

import tensorflow_datasets as tfds
tfds.disable_progress_bar()



## Dataset

We will be using IMDB review data set that can be classified as either a positive-negative review.

The data is available through Keras for retrieval. We can limit the total number of words in vocabulary.

In [2]:
tensor_train, tensor_test = tfds.load('imdb_reviews', split=['train','test'])
imdb_train = pd.DataFrame(list(tfds.as_numpy(tensor_train)))
imdb_test = pd.DataFrame(list(tfds.as_numpy(tensor_test)))
print("Shape of Train split: ", imdb_train.shape)
print("Shape of Test split: ", imdb_test.shape)
imdb_train.head()

[1mDownloading and preparing dataset imdb_reviews (80.23 MiB) to /home/jupyter/tensorflow_datasets/imdb_reviews/plain_text/0.1.0...[0m
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


[1mDataset imdb_reviews downloaded and prepared to /home/jupyter/tensorflow_datasets/imdb_reviews/plain_text/0.1.0. Subsequent calls will reuse this data.[0m
Shape of Train split:  (25000, 2)
Shape of Test split:  (25000, 2)


Unnamed: 0,label,text
0,1,"b""As a lifelong fan of Dickens, I have invaria..."
1,1,"b""Oh yeah! Jenna Jameson did it again! Yeah Ba..."
2,1,"b""I saw this film on True Movies (which automa..."
3,1,b'This was a wonderfully clever and entertaini...
4,1,b'I have no idea what the other reviewer is ta...


In [3]:
imdb_train['text'] = imdb_train.text.apply(lambda x : x.decode('utf-8'))
imdb_test['text'] = imdb_test.text.apply(lambda x : x.decode('utf-8'))
imdb_train.head()

Unnamed: 0,label,text
0,1,"As a lifelong fan of Dickens, I have invariabl..."
1,1,Oh yeah! Jenna Jameson did it again! Yeah Baby...
2,1,I saw this film on True Movies (which automati...
3,1,This was a wonderfully clever and entertaining...
4,1,I have no idea what the other reviewer is talk...


In [23]:
### Decreasing the size of test set.
np.random.seed(10)
idx = np.random.choice(imdb_test.shape[0],2000)
imdb_test = imdb_test.iloc[idx]
imdb_test.shape

(2000, 2)

In [171]:
%%time

# Cut texts after this number of words (among top max_features most common words)
max_features = 40000 

# Define tokenizer 
tokenizer = Tokenizer(num_words=max_features,
                      lower=True,
                      oov_token="<unk>")

# Use the '0' index for the padding character
tokenizer.word_index['<pad>'] = 0
tokenizer.index_word[0] = '<pad>'

# Fit the 
tokenizer.fit_on_texts(imdb_train.text)

CPU times: user 5.13 s, sys: 0 ns, total: 5.13 s
Wall time: 5.12 s


In [172]:
x_train = tokenizer.texts_to_sequences(imdb_train.text)
y_train = imdb_train.label.values

x_test = tokenizer.texts_to_sequences(imdb_test.text)
y_test = imdb_test.label.values

## Data pre-processing

In [173]:
print("Train data review statistics:")
pdlen = pd.Series(np.array([len(x) for x in x_train]))
print(pdlen.describe())
print()
print("Test data review statistics:")
pdlen = pd.Series(np.array([len(x) for x in x_test]))
print(pdlen.describe())
# print("Average number of words in each review:", lens.max())

Train data review statistics:
count    25000.000000
mean       237.713640
std        176.497204
min         10.000000
25%        129.000000
50%        177.000000
75%        290.000000
max       2493.000000
dtype: float64

Test data review statistics:
count    2000.000000
mean      228.494500
std       170.110575
min        16.000000
25%       125.000000
50%       170.000000
75%       278.000000
max      1025.000000
dtype: float64


We need to one-hot encode the labels, to use probabilities/logits for different classes

In [174]:
print("One-hot encoding of labels")
y_train_oe = to_categorical(y_train, 2)
y_test_oe = to_categorical(y_test, 2)
print('train labels shape:',y_train.shape)
print('test labels shape:',y_test.shape)

One-hot encoding of labels
train labels shape: (25000,)
test labels shape: (2000,)


Keras Embedding layer expects the input to have similar length for each review.
So we either need to pad or truncate the reviews as necessary.

We are padding/truncating at the end of the review.

In [175]:
maxlen = 240

x_train = sequence.pad_sequences(x_train, padding='post', truncating='post', maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, padding='post', truncating='post', maxlen=maxlen)

print('train data shape:', x_train.shape)
print('test data shape:', x_test.shape)

train data shape: (25000, 240)
test data shape: (2000, 240)


In [176]:
tokenizer.sequences_to_texts(x_train[0:2])

["as a lifelong fan of dickens i have invariably been disappointed by adaptations of his novels br br although his works presented an extremely accurate re telling of human life at every level in victorian britain throughout them all was a pervasive thread of humour that could be both playful or sarcastic as the narrative dictated in a way he was a literary <unk> and cartoonist he could be serious and hilarious in the same sentence he <unk> pride lampooned arrogance celebrated modesty and <unk> with loneliness and poverty it may be a cliché but he was a people's writer br br and it is the comedy that is so often missing from his interpretations at the time of writing oliver twist is being dramatised in serial form on bbc television all of the misery and cruelty is their but non of the humour irony and savage <unk> the result is just a dark dismal experience the story penned by a journalist rather than a novelist it's not really dickens at all br br <unk> ' on the other hand is much clo

## Model Training & Evaluation

In [177]:
print("Setting up model-specific variables...")
K.clear_session()
batch_size = 64
embedding_size = 128
lstm_size = 128
val_split = 0.2
epochs = 12
num_classes = 2

Setting up model-specific variables...


In [178]:
seq_encode = Input(shape=(maxlen,))
embeddings = Embedding(max_features, embedding_size)(seq_encode)
lstm_out = LSTM(lstm_size, dropout=0.2, recurrent_dropout=0.2)(embeddings)
dense_out = Dense(num_classes)(lstm_out)
out = Activation('softmax')(dense_out)

In [179]:
imdb_clf = Model(inputs=seq_encode, outputs=out)
imdb_clf.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
imdb_clf.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 240)]             0         
_________________________________________________________________
embedding (Embedding)        (None, 240, 128)          5120000   
_________________________________________________________________
lstm (LSTM)                  (None, 128)               131584    
_________________________________________________________________
dense (Dense)                (None, 2)                 258       
_________________________________________________________________
activation (Activation)      (None, 2)                 0         
Total params: 5,251,842
Trainable params: 5,251,842
Non-trainable params: 0
_________________________________________________________________


In [33]:
train_history = imdb_clf.fit(x_train, y_train_oe,
                             validation_data=(x_test, y_test_oe),
#                              validation_split=val_split,
                             batch_size=batch_size,
                             epochs=epochs
                             )

Train on 25000 samples, validate on 2000 samples
Epoch 1/2

KeyboardInterrupt: 

In [35]:
print("Evaluate over Test data:")
loss, accuracy = imdb_clf.evaluate(x_test, y_test_oe)
print('Loss over Test data:', loss)
print('Accuracy over Test data:', accuracy)

Evaluate over Test data:
Loss over Test data: 0.6767035760879516
Accuracy over Test data: 0.541


## Retrieve Embeddings for all the words in the Vocabulary

In [68]:
vocab_embeddings = imdb_clf.layers[1].embeddings.numpy()
print("Shape of the generated embeddings: ",vocab_embeddings.shape)

Shape of the generated embeddings:  (10000, 128)


## Keras function to extract embeddings for samples

In [71]:
get_embeddings = K.function([imdb_clf.layers[0].input],
                                  imdb_clf.layers[1].output)

print("Testing the embedding function with a single sample...")
test_embed = get_embeddings(x_test[0])
print("Shape of generated embeddings:",test_embed.shape)

Testing the embedding function with a single sample...
Shape of generated embeddings: (150, 128)


## Defining Submodel - from Embeddings to logits

In [72]:
### Defining necessary layers
embed_input = Input(shape=test_embed.shape)
embed_lstm = LSTM(lstm_size, dropout=0.2, recurrent_dropout=0.2)(embed_input)
embed_dense = Dense(num_classes)(embed_lstm)

### Define model with Embedding inputs and Logit outputs
embed_model = Model(inputs=embed_input, outputs=embed_dense)

### Transferring the trained weights from our IMDB Classifier model (imdb_clf)
embed_model.layers[1].set_weights(imdb_clf.layers[2].get_weights())
embed_model.layers[2].set_weights(imdb_clf.layers[3].get_weights())
# embed_model.summary()

## Saving the model

In [73]:
imdb_clf.save("saved_models/imdb_compiled_clf_150dim.h5")

## Adversarial crafting

### Calculate Jacobian matrix for all the words in the input

In [74]:
def compute_input_jacobian(x, y, model):
    x_embed = get_embeddings(x)
    x_tensor = tf.convert_to_tensor(x_embed.reshape(-1,maxlen,embedding_size), tf.float32)
    x_var = tf.Variable(x_tensor, dtype=tf.float32)

    with tf.GradientTape(watch_accessed_variables=False) as tape:
        tape.watch(x_var)
        # Get logits
        pred_y = model(x_var)

    # Calculate gradients
    x_gradients = tape.batch_jacobian(pred_y, x_var).numpy()
    print("Shape of the Jacobian:", x_gradients.shape)

    return x_gradients

In [164]:
def craft_sample(x, y, x_gradient, max_changes=maxlen):

    x_copy = x.copy()

    for word in range(max_changes+1):
        
        pred = np.argmax(imdb_clf.predict_on_batch(x_copy.reshape(-1,maxlen)))
        if pred != y : 
            return x_copy, word

        word_grad = x_gradient[y, word]


        jac_sign = np.sign(word_grad)
        vocab_sign = np.sign(word_grad - vocab_embeddings)

        match_word = np.argmin(np.absolute(np.add.reduce(vocab_sign - jac_sign, axis=1)))
        x_copy[word] = match_word

    return  x, 0


In [165]:
np.random.seed(1)

num_samples_class = 10

crafted_x = []
num_changes = []

idx0 = np.random.choice(np.argwhere(y_train == 0).reshape(-1,), num_samples_class, replace=False)
idx1 = np.random.choice(np.argwhere(y_train == 1).reshape(-1,), num_samples_class, replace=False)
idx = np.concatenate((idx0,idx1))
np.random.shuffle(idx)

xs, ys, ys_oe = x_train[idx].copy(), y_train[idx].copy(), y_train_oe[idx].copy()

In [166]:
%%time

print("Calculating gradients...")
x_gradients = compute_input_jacobian(xs,ys,embed_model)

print("Loss and accuracy of selected samples:", imdb_clf.evaluate(xs, ys_oe, verbose=0))

Calculating gradients...
Shape of the Jacobian: (20, 2, 150, 128)
Loss and accuracy of selected samples: [0.6916537880897522, 0.7]
CPU times: user 1min 25s, sys: 0 ns, total: 1min 25s
Wall time: 1min 23s


In [167]:
print("Crafting adversarial samples...")

Crafting adversarial samples...


In [168]:
for x, y, grad in tqdm(zip(xs, ys, x_gradients), total=xs.shape[0]):
    new_x , changes = craft_sample(x, y, grad)
    crafted_x.append(new_x)
    num_changes.append(changes)

crafted_x = np.array(crafted_x)
num_changes = np.array(num_changes)

print("Average number of changes per sample:", num_changes.mean())

imdb_clf.evaluate(crafted_x, ys_oe)

100%|██████████| 20/20 [05:12<00:00, 15.65s/it]

Average number of changes per sample: 87.9





[0.7017133831977844, 0.1]

In [169]:
tokenizer.sequences_to_texts(xs[0:3])

["this is an astounding film as well as showing actual footage of key events in the failed coup to <unk> chavez we are given the background picture which describes a class divided society many of the rich it appears have a choice with the people's <unk> choice and are willing to use the military for regime change <unk> careful what you say in front of your <unk> is a revealing comment the head of the country's biggest oil company <unk> himself as the new president with us backing and these young irish film makers have it all on camera a great film to <unk> young people about democracy we see transparent <unk> of how media can be manipulated and force used in the interests of big business against the interests of the <unk> wishes of the people riveting stuff <unk> <unk> <unk> <unk> <unk> <unk> <unk> <unk> <unk> <unk> <unk>",
 'the ultimate homage to a great film actress the film is a masterpiece of poetry on the screen like great poetry it is timeless direction cast screenplay music lyr

In [170]:
tokenizer.sequences_to_texts(crafted_x[0:3])

['<unk> <unk> who who <unk> <unk> think who <unk> who think who who <unk> <unk> who <unk> <unk> she she but <unk> many she but but she <unk> <unk> <unk> <unk> who <unk> <unk> <unk> <unk> you think you who who <unk> <unk> and <unk> think <unk> and it it in a on this this in in a the on a the the br on on a the in this this it you it and and and you <unk> you <unk> who <unk> <unk> she <unk> who <unk> <unk> <unk> <unk> you you who she not many but <unk> who who <unk> <unk> think <unk> you <unk> who she <unk> <unk> think think <unk> and this this the a on br to to a a the a in it out <unk> who who you you you and <unk> <unk> <unk> this film film all in this this this it <unk>',
 'who she who who think who who who think think who think <unk> <unk> who who think <unk> she she but but not his many but she she she who <unk> <unk> and you you think <unk> <unk> <unk> <unk> <unk> and and it and you <unk> <unk> you you this in and a this in a a the the the the on br on a the on in in this it it it