# 1) Create a notebook like this one, but for NLP, and use it to find which words in a movie review are most significant in assessing the sentiment of a particular movie review.

In [43]:
from fastai.text.all import *
import pandas as pd

In [2]:
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')

In [3]:
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)

In [None]:
learn.fine_tune(4, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.458757,0.397854,0.8226,03:29


epoch,train_loss,valid_loss,accuracy,time
0,0.302143,0.24814,0.90368,07:09
1,0.232062,0.207345,0.91688,07:08
2,0.180036,0.186839,0.92852,07:07
3,0.144407,0.192711,0.93004,07:08


Read text files, we need 2, for this LSTM works only with batch_size = 2

In [87]:
file1 = '/root/.fastai/data/imdb/test/pos/0_10.txt'
file2 = '/root/.fastai/data/imdb/test/neg/9998_1.txt'

with open(file1, 'r') as file:
    text1 = file.read()

with open(file2, 'r') as file:
    text2 = file.read()

print(text1)
print(text2)

I went and saw this movie last night after being coaxed to by a few friends of mine. I'll admit that I was reluctant to see it because from what I knew of Ashton Kutcher he was only able to do comedy. I was wrong. Kutcher played the character of Jake Fischer very well, and Kevin Costner played Ben Randall with such professionalism. The sign of a good movie is that it can toy with our emotions. This one did exactly that. The entire theater (which was sold out) was overcome by laughter during the first half of the movie, and were moved to tears during the second half. While exiting the theater I not only saw many women in tears, but many full grown men as well, trying desperately not to let anyone see them crying. This movie was great, and I suggest that you go see it before you judge.
I occasionally let my kids watch this garbage so they will understand just how pathetic the show's "contestants" are. They are pathetic not because they are fat, but because they whore their dignity for a 

Tokenize, numericalize, pad and transform texts into tensors

In [88]:
tokens1 = dls.tokenizer(text1)
tokens2 = dls.tokenizer(text2)

indices1 = list(dls.numericalize(tokens1))
indices2 = list(dls.numericalize(tokens2))

if len(indices1) > len(indices2):
    diff = len(indices1) - len(indices2)
    indices2 += [1] * diff
else:
    diff = len(indices2) - len(indices1)
    indices1 += [1] * diff

x1 = torch.tensor(indices1).unsqueeze(0)
x2 = torch.tensor(indices2).unsqueeze(0)

x = torch.cat((x1, x2))

In [89]:
x.shape

torch.Size([2, 185])

Our model is very confident that the first review is positive, and the second is negative

In [90]:
F.softmax(learn.model.eval()(x.cuda())[0], dim=1)

tensor([[0.0023, 0.9977],
        [0.9916, 0.0084]], device='cuda:0', grad_fn=<SoftmaxBackward0>)

Collect information about activations and gradients with hooks

In [91]:
# I am using fastai hook as pytorch is not cooperating with fastai custom sentence encoder
ghook = Hook(learn.model[0], lambda m, gi, go: go[0].detach().clone())
hook = Hook(learn.model[0], lambda m, i, o: o[0].detach().clone())

output = learn.model.eval()(x.cuda())
act1 = hook.stored[0]
act2 = hook.stored[1]
output[0][0,1].backward(retain_graph=True)

learn.model.zero_grad()

output[0][1,0].backward()
grad1 = ghook.stored[0]
grad2 = ghook.stored[1]

In [92]:
w1 = grad1.mean(dim=0, keepdim=True)
relevance1 = (w1 * act1).sum(1)

w2 = grad2.mean(dim=0, keepdim=True)
relevance2 = (w2 * act2).sum(1)

In [93]:
first = relevance1.cpu().numpy()
decoded_1 = [dls.vocab[0][idx] for idx in indices1]
second = relevance2.cpu().numpy()
decoded_2 = [dls.vocab[0][idx] for idx in indices2]

In [94]:
data = {
    'Token relevance in seq 1': first,
    'First review': decoded_1,
    'Token relevance in seq 2': second,
    'Second review': decoded_2
}

df = pd.DataFrame(data)
pd.set_option('display.max_rows', None)

top10 = df.nlargest(10, 'Token relevance in seq 1')[['Token relevance in seq 1', 'First review']]

top10

Unnamed: 0,Token relevance in seq 1,First review
173,6.306051,and
67,6.231395,xxmaj
174,6.22699,i
80,5.799209,xxmaj
175,5.482003,suggest
81,5.415201,the
85,5.193146,good
172,5.173337,","
182,5.151177,you
183,5.041526,judge


In [95]:
top10 = df.nlargest(10, 'Token relevance in seq 2')[['Token relevance in seq 2', 'Second review']]

top10

Unnamed: 0,Token relevance in seq 2,Second review
28,4.616793,not
8,4.136954,garbage
46,4.00453,and
64,3.690307,","
9,3.657064,so
65,3.650867,and
72,3.641071,their
44,3.614698,of
131,3.58168,'s
132,3.558535,weight


Most of the words with the biggest influence seem irrelevant. That is because LSTM carries a hidden state, so an influence of a single token is carried over to the next tokens. That makes finding significant tokens quite challenging. I think that it makes more sense to read through the entire sequence and see how the relevance of tokens fluctuates.

In [96]:
df

Unnamed: 0,Token relevance in seq 1,First review,Token relevance in seq 2,Second review
0,0.015148,xxbos,0.261646,xxbos
1,0.594359,i,0.608518,i
2,1.30614,went,1.229835,occasionally
3,2.802287,and,2.058353,let
4,2.46223,saw,2.636405,my
5,2.589355,this,2.534678,kids
6,2.188641,movie,2.164536,watch
7,2.952161,last,2.21573,this
8,3.125645,night,4.136954,garbage
9,3.340211,after,3.657064,so


For the first review we can see how the growth in sentiment is reflected on the graph with clear up trend which is quite fascinating. While the second review is remaining equally negative from the beginning to the end, and so the graph is oscilating on a constant interval. Another thing that's worth pointing out is that it took 6 padding tokens to make LSTM realise that the sequence is finished which again, proves the influence of hidden state I mentioned before.