<center><h2>ALTeGraD 2024<br>Lab Session 1: HAN</h2><h3>Hierarchical Attention Network Using GRU</h3> 8 / 10 / 2024<br> Dr. Guokan Shang, Yang Zhang<br><br>


<b>Student name:</b> Guillaume Pradel


</center>
In this lab, you will get familiar with recurrent neural networks (RNNs), self-attention, and the HAN architecture <b>(Yang et al. 2016)</b> using PyTorch. In this architecture, sentence embeddings are first individually produced, and a document embedding is then computed from the sentence embeddings.<br>
<b>The deadline for this lab is October 15, 2024 11:59 PM.</b> More details about the submission and the architecture for this lab can be found in the handout PDF.


### = = = = =  Attention Layer = = = = =
In thi section, you will fill the gaps in the code to implement the self-attention layer. This layer will be used later to define the HAN architecture. The basic idea behind attention is that rather than considering the last annotation $h_T$ as a summary of the entire sequence, which is prone to information loss, the annotations at <i>all</i> time steps are used.
The self-attention mechanism computes a weighted sum of the annotations, where the weights are determined by trainable parameters. Refer to <b>section 2.2</b> in the handout for the theoretical part, it will be needed to finish the first task.

#### <b>Task 1:</b>

In [67]:
import torch
from torch import nn
from torch.utils.data import DataLoader

class AttentionWithContext(nn.Module):
    """
    Follows the work of Yang et al. [https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf]
    "Hierarchical Attention Networks for Document Classification"
    by using a context vector to assist the attention
    # Input shape
        3D tensor with shape: `(samples, steps, features)`.
    # Output shape
        2D tensor with shape: `(samples, features)`.
    """

    def __init__(self, input_shape, return_coefficients=False, bias=True):
        super(AttentionWithContext, self).__init__()
        self.return_coefficients = return_coefficients

        self.W = nn.Linear(input_shape, input_shape, bias=bias)
        self.tanh = nn.Tanh()
        self.u = nn.Linear(input_shape, 1, bias=False)

        self.init_weights()

    def init_weights(self):
        initrange = 0.1
        self.W.weight.data.uniform_(-initrange, initrange)
        self.W.bias.data.uniform_(-initrange, initrange)
        self.u.weight.data.uniform_(-initrange, initrange)

    def generate_square_subsequent_mask(self, sz):
        # do not pass the mask to the next layers
        mask = (torch.triu(torch.ones(sz, sz)) == 1).transpose(0, 1)
        mask = (
            mask.float()
            .masked_fill(mask == 0, float("-inf"))
            .masked_fill(mask == 1, float(0.0))
        )
        return mask

    def forward(self, x, mask=None):
        uit = self.W(x) # fill the gap # compute uit = W . x  where x represents ht
        uit = self.tanh(uit)
        ait = self.u(uit)
        a = torch.exp(ait)

        # apply mask after the exp. will be re-normalized next
        if mask is not None:
            a = a*mask.double()

        # in some cases especially in the early stages of training the sum may be almost zero
        # and this results in NaN's. A workaround is to add a very small positive number ε to the sum.
        eps = 1e-9
        a = a / (torch.sum(a, axis=1, keepdim=True) + eps)

        weighted_input = a * x ### fill the gap ### # compute the attentional vector

        if self.return_coefficients:
            return [torch.sum(weighted_input, dim = 1), a] ### [attentional vector, coefficients] ### use torch.sum to compute s
        else:
            return torch.sum(weighted_input, dim = 1) ### attentional vector only ###

### = = = = = Parameters = = = = =
In this section, we define the parameters to use in our training. Such as data path, the embedding dimention <b>d</b>, the GRU layer dimensionality <b>n_units</b>, etc..<br>
The parameter <b>device</b> is used to train the model on GPU if it is available. for this purpose, if you are using Google Colab, switch your runtime to a GPU runtime to train the model with a maximum speed.<br>
<b>Bonus question:</b> What is the purpose of the parameter <i>my_patience</i>?

In [68]:
import sys
import json
import operator
import numpy as np

path_root = ''
path_to_data = path_root + 'data/'

d = 30 # dimensionality of word embeddings
n_units = 50 # RNN layer dimensionality
drop_rate = 0.5 # dropout
mfw_idx = 2 # index of the most frequent words in the dictionary
            # 0 is for the special padding token
            # 1 is for the special out-of-vocabulary token

padding_idx = 0
oov_idx = 1
batch_size = 64
nb_epochs = 15
my_patience = 2 # for early stopping strategy
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

### = = = = = Data Loading = = = = =
In this section we will use first <b>wget</b> to download the data the we will load it using numpy in the first cell. While in the second cell, we will use these data to define our Pytorch data loader. Note that the data is already preprocessed, tokenized and padded.<br><br>
<b>Note: if you are running your notebook on Windows or on MacOS, <i>wget</i> will probably not work if you did not install it manually. In this case, use the provided link to download the data and change the <i>path_to_data</i> in the <i>Parameters</i> section accordingly. Otherwise, you will face no problem on Ubuntu and Google Colab.</b>

#### <b>Task 2.1:</b>

In [None]:
import urllib.request
url = "https://onedrive.live.com/download?cid=AE69638675180117&resid=AE69638675180117%2199289&authkey=AHgxt3xmgG0Fu5A"
output_file = "data.zip"
urllib.request.urlretrieve(url, output_file)

!unzip data.zip

my_docs_array_train = np.load(path_to_data + 'docs_train.npy')
my_docs_array_test = np.load(path_to_data + 'docs_test.npy')

my_labels_array_train = np.load(path_to_data + 'labels_train.npy')
my_labels_array_test = np.load(path_to_data + 'labels_test.npy')

# load dictionary of word indexes (sorted by decreasing frequency across the corpus)
with open(path_to_data + 'word_to_index.json', 'r') as my_file:
    word_to_index = json.load(my_file)

# invert mapping
index_to_word =  dict()
for key in word_to_index:
    index_to_word[word_to_index[key]] = key ### fill the gap (use a dict comprehension) ###
input_size = my_docs_array_train.shape

print(my_docs_array_train[0].shape)


Archive:  data.zip
replace __MACOSX/._data? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

In [5]:
import numpy
import torch
from torch.utils.data import DataLoader, Dataset


class Dataset_(Dataset):
    def __init__(self, x, y):
        self.documents = x
        self.labels = y

    def __len__(self):
        return len(self.documents)

    def __getitem__(self, index):
        document = self.documents[index]
        label = self.labels[index]
        sample = {
            "document": torch.tensor(document),
            "label": torch.tensor(label),
            }
        return sample


def get_loader(x, y, batch_size=32):
    dataset = Dataset_(x, y)
    data_loader = DataLoader(dataset=dataset,
                            batch_size=batch_size,
                            shuffle=True,
                            pin_memory=True,
                            drop_last=True,
                            )
    return data_loader

### = = = = = Defining Architecture = = = = =
In this section, we define the HAN architecture. We start with <i>AttentionBiGRU</i> module in order to define the sentence encoder (check Figure 3 in the handout). Then, we define the <i>TimeDistributed</i> module to allow us to forward our input (batch of document) as to the sentence encoder as <b>batch of sentences</b>, where each sentence in the document will be considered as a time step. This module also reshape the output to a batch of timesteps representations per document. Finally we define the <b>HAN</b> architecture using <i>TimeDistributed</i>, <i>AttentionWithContext</i> and <i>GRU</i>.

#### <b>Task 2.2:</b>

In [6]:

class AttentionBiGRU(nn.Module):
    def __init__(self, input_shape, n_units, index_to_word, dropout=0):
        super(AttentionBiGRU, self).__init__()
        self.embedding = nn.Embedding( len(index_to_word) + 2,# fill the gap # vocab size
                                      d, # dimensionality of embedding space
                                      padding_idx=0)
        self.dropout = nn.Dropout(drop_rate)
        self.gru = nn.GRU(input_size=d,
                          hidden_size=n_units,
                          num_layers=1,
                          bias=True,
                          batch_first=True,
                          bidirectional=True)
        self.attention = AttentionWithContext(2 * n_units,   # fill the gap # the input shape for the attention layer
                                              return_coefficients=True)


    def forward(self, sent_ints):
        sent_wv = self.embedding(sent_ints)
        sent_wv_dr = self.dropout(sent_wv)
        sent_wa, _ = self.gru(sent_wv_dr) # fill the gap # RNN layer
        sent_att_vec, word_att_coeffs = self.attention(sent_wa) # fill the gap # attentional vector for the sent
        sent_att_vec_dr = self.dropout(sent_att_vec)
        return sent_att_vec_dr, word_att_coeffs

class TimeDistributed(nn.Module):
    def __init__(self, module, batch_first=False):
        super(TimeDistributed, self).__init__()
        self.module = module
        self.batch_first = batch_first

    def forward(self, x):
        if len(x.size()) <= 2:
            return self.module(x)
        # Squash samples and timesteps into a single axis
        x_reshape = x.contiguous().view(-1, x.size(-1))  # (samples * timesteps, input_size) (448, 30)
        sent_att_vec_dr, word_att_coeffs = self.module(x_reshape)
        # We have to reshape the output
        if self.batch_first:
            sent_att_vec_dr = sent_att_vec_dr.contiguous().view(x.size(0), -1, sent_att_vec_dr.size(-1))  # (samples, timesteps, output_size)
            word_att_coeffs = word_att_coeffs.contiguous().view(x.size(0), -1, word_att_coeffs.size(-1))  # (samples, timesteps, output_size)
        else:
            sent_att_vec_dr = sent_att_vec_dr.view(-1, x.size(1), sent_att_vec_dr.size(-1))  # (timesteps, samples, output_size)
            word_att_coeffs = word_att_coeffs.view(-1, x.size(1), word_att_coeffs.size(-1))  # (timesteps, samples, output_size)
        return sent_att_vec_dr, word_att_coeffs

class HAN(nn.Module):
    def __init__(self, input_shape, n_units, index_to_word, dropout=0):
        super(HAN, self).__init__()
        self.encoder = AttentionBiGRU(input_shape, n_units, index_to_word, dropout)
        self.timeDistributed = TimeDistributed(self.encoder, True)
        self.dropout = nn.Dropout(drop_rate)
        self.gru = nn.GRU(input_size=2 * n_units,# fill the gap # the input shape of GRU layer
                          hidden_size=n_units,
                          num_layers=1,
                          bias=True,
                          batch_first=True,
                          bidirectional=True)
        self.attention = AttentionWithContext(2 * n_units, # fill the gap # the input shape of between-sentence attention layer
                                              return_coefficients=True)
        self.lin_out = nn.Linear(2 * n_units,   # fill the gap # the input size of the last linear layer
                                 1)
        self.preds = nn.Sigmoid()

    def forward(self, doc_ints):
        sent_att_vecs_dr, word_att_coeffs = self.timeDistributed(doc_ints) # fill the gap # get sentence representation
        doc_sa, _ = self.gru(sent_att_vecs_dr)
        doc_att_vec, sent_att_coeffs = self.attention(doc_sa)
        doc_att_vec_dr = self.dropout(doc_att_vec)
        doc_att_vec_dr = self.lin_out(doc_att_vec_dr)
        return self.preds(doc_att_vec_dr), word_att_coeffs, sent_att_coeffs


### = = = = = Training = = = = =
In this section, we have two code cells. In the first one, we define our evaluation function to compute the training and validation accuracies. While in the second one, we define our model, loss and optimizer and train the model over <i>nb_epochs</i>.<br>
<b>Bonus task:</b> use <a href="https://pytorch.org/tutorials/recipes/recipes/tensorboard_with_pytorch.html" target="_blank">tensorboard</a> to visualize the loss and the validation accuray during the training.

#### <b>Task 2.3:</b>

In [9]:
def evaluate_accuracy(data_loader, verbose=True):
    model.eval()
    total_loss = 0.0
    ncorrect = ntotal = 0
    with torch.no_grad():
        for idx, data in enumerate(data_loader):
            # inference
            output = model(data["document"].to(device))[0]
            output = output[:, -1] # only last vector
            # total number of examples
            ntotal +=  output.shape[0]
            # number of correct predictions
            predictions = torch.round(output)
            ncorrect += torch.sum(predictions == data['label'].to(device))#fill me # number of correct prediction - hint: use torch.sum
        acc = ncorrect.item() / ntotal
        if verbose:
          print("validation accuracy: {:3.2f}".format(acc*100))
        return acc

In [10]:
from tqdm import tqdm

model = HAN(input_size, n_units, index_to_word).to(device)
model = model.double()
lr = 0.001  # learning rate
criterion = nn.BCELoss() # fill the gap, use Binary cross entropy from torch.nn: https://pytorch.org/docs/stable/nn.html#loss-functions
optimizer = torch.optim.Adam(model.parameters(), lr=lr) #fill me

def train(x_train=my_docs_array_train,
          y_train=my_labels_array_train,
          x_test=my_docs_array_test,
          y_test=my_labels_array_test,
          word_dict=index_to_word,
          batch_size=batch_size):

    train_data = get_loader(x_train, y_train, batch_size)
    test_data = get_loader(my_docs_array_test, my_labels_array_test, batch_size)

    best_validation_acc = 0.0
    p = 0 # patience

    for epoch in range(1, nb_epochs + 1):
        losses = []
        accuracies = []
        with tqdm(train_data, unit="batch") as tepoch:
            for idx, data in enumerate(tepoch):
                tepoch.set_description(f"Epoch {epoch}")
                model.train()
                optimizer.zero_grad()
                input = data['document'].to(device)
                label = data['label'].to(device)
                label = label.double()
                output = model.forward(input)[0]
                output = output[:, -1]
                loss = criterion(output, label)# fill the gap # compute the loss
                loss.backward()
                torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5) # prevent exploding gradient
                optimizer.step()

                losses.append(loss.item())
                accuracy = torch.sum(torch.round(output) == label).item() / batch_size
                accuracies.append(accuracy)
                tepoch.set_postfix(loss=sum(losses)/len(losses), accuracy=100. * sum(accuracies)/len(accuracies))

        # train_acc = evaluate_accuracy(train_data, False)
        test_acc = evaluate_accuracy(test_data, False)
        print("===> Epoch {} Complete: Avg. Loss: {:.4f}, Validation Accuracy: {:3.2f}%"
              .format(epoch, sum(losses)/len(losses), 100.*test_acc))
        if test_acc >= best_validation_acc:
            best_validation_acc = test_acc
            print("Validation accuracy improved, saving model...")
            torch.save(model.state_dict(), './best_model.pt')
            p = 0
            print()
        else:
            p += 1
            if p==my_patience:
                print("Validation accuracy did not improve for {} epochs, stopping training...".format(my_patience))
    print("Loading best checkpoint...")
    model.load_state_dict(torch.load('./best_model.pt'))
    model.eval()
    print('done.')

train()

Epoch 1: 100%|██████████| 390/390 [00:17<00:00, 21.71batch/s, accuracy=58.7, loss=0.66]


===> Epoch 1 Complete: Avg. Loss: 0.6604, Validation Accuracy: 71.15%
Validation accuracy improved, saving model...



Epoch 2: 100%|██████████| 390/390 [00:18<00:00, 21.57batch/s, accuracy=70.3, loss=0.569]


===> Epoch 2 Complete: Avg. Loss: 0.5690, Validation Accuracy: 76.77%
Validation accuracy improved, saving model...



Epoch 3: 100%|██████████| 390/390 [00:17<00:00, 22.70batch/s, accuracy=75, loss=0.505]


===> Epoch 3 Complete: Avg. Loss: 0.5052, Validation Accuracy: 79.48%
Validation accuracy improved, saving model...



Epoch 4: 100%|██████████| 390/390 [00:17<00:00, 22.78batch/s, accuracy=78.5, loss=0.461]


===> Epoch 4 Complete: Avg. Loss: 0.4611, Validation Accuracy: 81.23%
Validation accuracy improved, saving model...



Epoch 5: 100%|██████████| 390/390 [00:17<00:00, 22.58batch/s, accuracy=80.1, loss=0.427]


===> Epoch 5 Complete: Avg. Loss: 0.4275, Validation Accuracy: 82.26%
Validation accuracy improved, saving model...



Epoch 6: 100%|██████████| 390/390 [00:17<00:00, 22.35batch/s, accuracy=81.5, loss=0.405]


===> Epoch 6 Complete: Avg. Loss: 0.4052, Validation Accuracy: 82.54%
Validation accuracy improved, saving model...



Epoch 7: 100%|██████████| 390/390 [00:17<00:00, 22.70batch/s, accuracy=83.4, loss=0.38]


===> Epoch 7 Complete: Avg. Loss: 0.3801, Validation Accuracy: 83.40%
Validation accuracy improved, saving model...



Epoch 8: 100%|██████████| 390/390 [00:17<00:00, 22.74batch/s, accuracy=84.1, loss=0.361]


===> Epoch 8 Complete: Avg. Loss: 0.3609, Validation Accuracy: 83.82%
Validation accuracy improved, saving model...



Epoch 9: 100%|██████████| 390/390 [00:17<00:00, 22.79batch/s, accuracy=85.3, loss=0.341]


===> Epoch 9 Complete: Avg. Loss: 0.3410, Validation Accuracy: 84.18%
Validation accuracy improved, saving model...



Epoch 10: 100%|██████████| 390/390 [00:17<00:00, 22.64batch/s, accuracy=86.1, loss=0.327]


===> Epoch 10 Complete: Avg. Loss: 0.3268, Validation Accuracy: 83.88%


Epoch 11: 100%|██████████| 390/390 [00:17<00:00, 22.70batch/s, accuracy=86.6, loss=0.316]


===> Epoch 11 Complete: Avg. Loss: 0.3157, Validation Accuracy: 84.19%
Validation accuracy improved, saving model...



Epoch 12: 100%|██████████| 390/390 [00:17<00:00, 22.49batch/s, accuracy=87.2, loss=0.304]


===> Epoch 12 Complete: Avg. Loss: 0.3044, Validation Accuracy: 84.58%
Validation accuracy improved, saving model...



Epoch 13: 100%|██████████| 390/390 [00:17<00:00, 22.78batch/s, accuracy=87.9, loss=0.292]


===> Epoch 13 Complete: Avg. Loss: 0.2917, Validation Accuracy: 84.65%
Validation accuracy improved, saving model...



Epoch 14: 100%|██████████| 390/390 [00:17<00:00, 22.77batch/s, accuracy=88.3, loss=0.279]


===> Epoch 14 Complete: Avg. Loss: 0.2794, Validation Accuracy: 84.76%
Validation accuracy improved, saving model...



Epoch 15: 100%|██████████| 390/390 [00:17<00:00, 22.73batch/s, accuracy=88.6, loss=0.274]


===> Epoch 15 Complete: Avg. Loss: 0.2745, Validation Accuracy: 83.92%
Loading best checkpoint...
done.


  model.load_state_dict(torch.load('./best_model.pt'))


### = = = = = Extraction of Attention Coefficients = = = = =
In this section, we will extract and display the attention coefficients on two levels: sentence level and word level. To do so, we will extract the corresponding weights from our model.
#### <b>Task 3:</b>

In [11]:
# select last review:
my_review = my_docs_array_test[-1:,:,:]

# convert integer review to text:
index_to_word[1] = 'OOV'
my_review_text = [[index_to_word[idx] for idx in sent if idx in index_to_word] for sent in my_review.tolist()[0]]

print(my_review_text)

[['There', "'s", 'a', 'sign', 'on', 'The', 'Lost', 'Highway', 'that', 'says', ':', 'OOV', 'SPOILERS', 'OOV', '(', 'but', 'you', 'already', 'knew', 'that', ',', 'did', "n't", 'you', '?', ')'], ['Since', 'there', "'s", 'a', 'great', 'deal', 'of', 'people', 'that', 'apparently', 'did', 'not', 'get', 'the', 'point', 'of', 'this', 'movie', ',', 'I', "'d", 'like', 'to', 'contribute', 'my', 'interpretation', 'of', 'why', 'the', 'plot'], ['As', 'others', 'have', 'pointed', 'out', ',', 'one', 'single', 'viewing', 'of', 'this', 'movie', 'is', 'not', 'sufficient', '.'], ['If', 'you', 'have', 'the', 'DVD', 'of', 'MD', ',', 'you', 'can', 'OOV', "'", 'by', 'looking', 'at', 'David', 'Lynch', "'s", "'Top", '10', 'OOV', 'to', 'OOV', 'MD', "'", '(', 'but', 'only', 'upon', 'second'], [';', ')', 'First', 'of', 'all', ',', 'Mulholland', 'Drive', 'is', 'downright', 'brilliant', '.'], ['A', 'masterpiece', '.'], ['This', 'is', 'the', 'kind', 'of', 'movie', 'that', 'refuse', 'to', 'leave', 'your', 'head', '.']

###   &emsp;&emsp;  = = = = = Attention Over Sentences in the Document = = = = =

In [13]:
sent_coeffs = model.forward(torch.tensor(my_review).to(device))[2] # fill the gap # get sentence attention coeffs by passing the review to the model - (you need to convert the inout torch tensor)
sent_coeffs = sent_coeffs[0,:,:]

print(sent_coeffs)

for elt in zip(sent_coeffs[:,0].tolist(),[' '.join(elt) for elt in my_review_text]):
    print(round(elt[0]*100,2),elt[1])

tensor([[0.1404],
        [0.0940],
        [0.0680],
        [0.0829],
        [0.2206],
        [0.2521],
        [0.1421]], device='cuda:0', dtype=torch.float64,
       grad_fn=<SliceBackward0>)
14.04 There 's a sign on The Lost Highway that says : OOV SPOILERS OOV ( but you already knew that , did n't you ? )
9.4 Since there 's a great deal of people that apparently did not get the point of this movie , I 'd like to contribute my interpretation of why the plot
6.8 As others have pointed out , one single viewing of this movie is not sufficient .
8.29 If you have the DVD of MD , you can OOV ' by looking at David Lynch 's 'Top 10 OOV to OOV MD ' ( but only upon second
22.06 ; ) First of all , Mulholland Drive is downright brilliant .
25.21 A masterpiece .
14.21 This is the kind of movie that refuse to leave your head .


### &emsp;&emsp; = = = = = Attention Over Words in Each Sentence = = = = =

In [15]:
word_coeffs = model.forward(torch.tensor(my_review).to(device))[1]# fill the gap # get words attention coeffs by passing the review to the model - (you need to convert the inout torch tensor)

word_coeffs_list = word_coeffs.reshape(7,30).tolist()

# match text and coefficients:
text_word_coeffs = [list(zip(words,word_coeffs_list[idx][:len(words)])) for idx,words in enumerate(my_review_text)]

for sent in text_word_coeffs:
    [print(elt) for elt in sent]
    print('= = = =')

# sort words by importance within each sentence:
text_word_coeffs_sorted = [sorted(elt,key=operator.itemgetter(1),reverse=True) for elt in text_word_coeffs]

for sent in text_word_coeffs_sorted:
    [print(elt) for elt in sent]
    print('= = = =')

('There', 0.059852276741799354)
("'s", 0.03236277800649831)
('a', 0.040338003010560175)
('sign', 0.05842316887211175)
('on', 0.036862177575213324)
('The', 0.03155204205335215)
('Lost', 0.07593742273902614)
('Highway', 0.04660469833795657)
('that', 0.024970221701626664)
('says', 0.02764636061395581)
(':', 0.03992425621186618)
('OOV', 0.02757158561272476)
('SPOILERS', 0.030608007791879388)
('OOV', 0.020547707902276127)
('(', 0.021750134577769552)
('but', 0.025718899764797)
('you', 0.03310348321719794)
('already', 0.032208298759630986)
('knew', 0.03270982431991902)
('that', 0.02352025580732963)
(',', 0.022986227664432934)
('did', 0.02493269858177485)
("n't", 0.023132462948204893)
('you', 0.028890517791250295)
('?', 0.03557280606198252)
(')', 0.030874751586929802)
= = = =
('Since', 0.04058908361982018)
('there', 0.043266429030238494)
("'s", 0.03572600522342823)
('a', 0.03321978102995169)
('great', 0.11742270181094812)
('deal', 0.04194714558424096)
('of', 0.02418881264185183)
('people', 0.0

In [62]:
# Let's see how it goes for the very famous scene from fight club : https://www.youtube.com/watch?v=chyRpj-971o

script_fight_club = """
Shut up! Which means a lot of you have been breaking the first two rules of fight club. A glum silence falls. Guys look at each other. I see in fight club the strongest and smartest men who have ever lived -- an entire generation pumping gas and waiting tables; or they're slaves with white collars. Advertisements have them chasing cars and clothes, working jobs they hate so they can buy shit they don't need. We are the middle children of history, with no purpose or place. We have no great war, or great depression. The great war is a spiritual war. The great depression is our lives. We were raised by television to believe that we'd be millionaires and movie gods and rock stars -- but we won't. And we're learning that fact. And we're very, very pissed-off. The crowd erupts into a DEAFENING CHORUS of agreement. Jack looks at the blazing excitement in the eyes of the crowd. We are the quiet young men who listen until it's time to decide. A fat, MIDDLE-AGED MAN stomps down the stairs, pushing into the crowd, followed by a TALL, HEFTY THUG who holds a GUN.
"""

In [63]:

fight_club_text = script_fight_club.split('.')
tmp = []
for e in fight_club_text:
    tmp.append(e.split(' '))

# Delete blank space at the beginning of sentences

for i in range(len(tmp)):
    if tmp[i][0] == '':
        del(tmp[i][0])

fight_club_text = tmp

# Convert the string matrix to a matrix of integer thanks to the dictionnary. If the word is unknown, we put 1
# If the sentence is too long, we cut it at the 30's word.

fight_club_int = []

for i in range(len(fight_club_text)):
    tmp = []
    for j in range(min(len(fight_club_text[i]),30)):
        if fight_club_text[i][j] not in word_to_index:
            tmp.append(1)
        else:
            tmp.append(word_to_index[fight_club_text[i][j]])
    fight_club_int.append(tmp)

# Padding

for i in range(len(fight_club_int)):
    fight_club_int[i] = fight_club_int[i] + [0] * (30 - len(fight_club_int[i]))

print(fight_club_int[8])

[22, 100, 5694, 9, 293, 471, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [64]:
sent_coeffs = model.forward(torch.tensor([fight_club_int]).to(device))[2] # fill the gap # get sentence attention coeffs by passing the review to the model - (you need to convert the inout torch tensor)
sent_coeffs = sent_coeffs[0,:,:]

print(sent_coeffs)

for elt in zip(sent_coeffs[:,0].tolist(),[' '.join(elt) for elt in fight_club_text]):
    print(round(elt[0]*100,2),elt[1])

tensor([[0.0490],
        [0.0200],
        [0.0222],
        [0.0297],
        [0.0193],
        [0.0212],
        [0.0881],
        [0.1621],
        [0.2193],
        [0.0496],
        [0.0344],
        [0.0357],
        [0.0370],
        [0.0529],
        [0.0745],
        [0.0474],
        [0.0375]], device='cuda:0', dtype=torch.float64,
       grad_fn=<SliceBackward0>)
4.9 
Shut up! Which means a lot of you have been breaking the first two rules of fight club
2.0 A glum silence falls
2.22 Guys look at each other
2.97 I see in fight club the strongest and smartest men who have ever lived -- an entire generation pumping gas and waiting tables; or they're slaves with white collars
1.93 Advertisements have them chasing cars and clothes, working jobs they hate so they can buy shit they don't need
2.12 We are the middle children of history, with no purpose or place
8.81 We have no great war, or great depression
16.21 The great war is a spiritual war
21.93 The great depression is our li

In [66]:
word_coeffs = model.forward(torch.tensor([fight_club_int]).to(device))[1]# fill the gap # get words attention coeffs by passing the review to the model - (you need to convert the inout torch tensor)

word_coeffs_list = word_coeffs.reshape(17,30).tolist()

# match text and coefficients:
text_word_coeffs = [list(zip(words,word_coeffs_list[idx][:len(words)])) for idx,words in enumerate(fight_club_text)]

for sent in text_word_coeffs:
    [print(elt) for elt in sent]
    print('= = = =')

# sort words by importance within each sentence:
text_word_coeffs_sorted = [sorted(elt,key=operator.itemgetter(1),reverse=True) for elt in text_word_coeffs]

for sent in text_word_coeffs_sorted:
    [print(elt) for elt in sent]
    print('= = = =')

('\nShut', 0.039654390586203264)
('up!', 0.0387574473931018)
('Which', 0.05462806539163697)
('means', 0.033901664974382434)
('a', 0.026300317102465654)
('lot', 0.05144392633680989)
('of', 0.032563383002414105)
('you', 0.03821418680131774)
('have', 0.028540078746404643)
('been', 0.04255398714703459)
('breaking', 0.02187527063241988)
('the', 0.020350649668455897)
('first', 0.031427901939257936)
('two', 0.035621766132065115)
('rules', 0.034955132256013086)
('of', 0.03241405065978023)
('fight', 0.0387404323451)
('club', 0.029943024718693877)
= = = =
('A', 0.06269438365581852)
('glum', 0.03791599218348594)
('silence', 0.04046383547745745)
('falls', 0.07837126286861724)
= = = =
('Guys', 0.04934129878966752)
('look', 0.03929457339658094)
('at', 0.03171703747490719)
('each', 0.04236098804175803)
('other', 0.03641152359783713)
= = = =
('I', 0.030398194547336062)
('see', 0.05201383065191272)
('in', 0.035490721649114244)
('fight', 0.04298276660209084)
('club', 0.03014174827057732)
('the', 0.02956