# Extractive Summarisation With BERT
_By: Ling Li Ya_

References:
1. [BERT Extractive Summarizer](https://github.com/dmmiller612/bert-extractive-summarizer)
2. [How to train a new language model from scratch using Transformers and Tokenizers](https://huggingface.co/blog/how-to-train)

Read json files

In [2]:
from summarizer import Summarizer

In [1]:
body =[
    'In church construction, especially in the southern German-Austrian region, gigantic spatial creations are sometimes created for practical reasons alone, which, however, do not appear monumental, but are characterized by a unique fusion of architecture, painting, stucco, etc., often completely eliminating the boundaries between the art genres, and are characterized by a light-filled weightlessness, festive cheerfulness and movement. The Rococo decorative style reached its summit in southern Germany and Austria from the 1730s until the 1770s. There it dominates the church landscape to this day and is deeply anchored there in popular culture. It was first introduced from France through the publications and works of French architects and decorators, including the sculptor Claude III Audran, the interior designer Gilles-Marie Oppenordt, the architect Germain Boffrand, the sculptor Jean Mondon, and the draftsman and engraver Pierre Lepautre. Their work had an important influence on the German Rococo style, but does not reach the level of buildings in southern Germany.[25]',

'German architects adapted the Rococo style but made it far more asymmetric and loaded with more ornate decoration than the French original. The German style was characterized by an explosion of forms that cascaded down the walls. It featured molding formed into curves and counter-curves, twisting and turning patterns, ceilings and walls with no right angles, and stucco foliage which seemed to be creeping up the walls and across the ceiling. The decoration was often gilded or silvered to give it contrast with the white or pale pastel walls.[26]',

'The Belgian-born architect and designer François de Cuvilliés was one of the first to create a Rococo building in Germany, with the pavilion of Amalienburg in Munich, (1734-1739), inspired by the pavilions of the Trianon and Marly in France. It was built as a hunting lodge, with a platform on the roof for shooting pheasants. The Hall of Mirrors in the interior, by the painter and stucco sculptor Johann Baptist Zimmermann, was far more exuberant than any French Rococo.[27]',

'Another notable example of the early German Rococo is Würzburg Residence (1737–1744) constructed for the Prince-Bishop of Würzburg by Balthasar Neumann. Neumann had traveled to Paris and consulted with the French rocaille decorative artists Germain Boffrand and Robert de Cotte. While the exterior was in more sober Baroque style, the interior, particularly the stairways and ceilings, was much lighter and decorative. The Prince-Bishop imported the Italian Rococo painter Giovanni Battista Tiepolo in 1750–53 to create a mural over the top of the three-level ceremonial stairway.[28][29] Neumann described the interior of the residence as "a theater of light". The stairway was also the central element in a residence Neumann built at the Augustusburg Palace in Brühl (1743–1748). In that building the stairway led the visitors up through a stucco fantasy of paintings, sculpture, ironwork and decoration, with surprising views at every turn.[28]',

'In the 1740s and 1750s, a number of notable pilgrimage churches were constructed in Bavaria, with interiors decorated in a distinctive variant of the rococo style. One of the most notable examples is the Wieskirche (1745–1754) designed by Dominikus Zimmermann. Like most of the Bavarian pilgrimage churches, the exterior is very simple, with pastel walls, and little ornament. Entering the church the visitor encounters an astonishing theater of movement and light. It features an oval-shaped sanctuary, and a deambulatory in the same form, filling in the church with light from all sides. The white walls contrasted with columns of blue and pink stucco in the choir, and the domed ceiling surrounded by plaster angels below a dome representing the heavens crowded with colorful Biblical figures. Other notable pilgrimage churches include the Basilica of the Fourteen Holy Helpers by Balthasar Neumann (1743–1772).[30][31]'
]

In [3]:
model = Summarizer()

Some weights of the model checkpoint at bert-large-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [4]:
import pickle

file = '../models/bert-ext.pkl'

pickle.dump(model, open(file, 'wb'))

In [5]:
loaded_model = pickle.load(open(file, 'rb'))

In [19]:
res = []
for chunk in body:
    dict_ = {'summary_text': model(chunk, ratio=0.05)}
    res.append(dict_)

res

[{'summary_text': 'In church construction, especially in the southern German-Austrian region, gigantic spatial creations are sometimes created for practical reasons alone, which, however, do not appear monumental, but are characterized by a unique fusion of architecture, painting, stucco, etc., It was first introduced from France through the publications and works of French architects and decorators, including the sculptor Claude III Audran, the interior designer Gilles-Marie Oppenordt, the architect Germain Boffrand, the sculptor Jean Mondon, and the draftsman and engraver Pierre Lepautre.'},
 {'summary_text': 'German architects adapted the Rococo style but made it far more asymmetric and loaded with more ornate decoration than the French original. The German style was characterized by an explosion of forms that cascaded down the walls.'},
 {'summary_text': 'The Belgian-born architect and designer François de Cuvilliés was one of the first to create a Rococo building in Germany, with 

In [None]:
loaded_model(body)

'In church construction, especially in the southern German-Austrian region, gigantic spatial creations are sometimes created for practical reasons alone, which, however, do not appear monumental, but are characterized by a unique fusion of architecture, painting, stucco, etc., The German style was characterized by an explosion of forms that cascaded down the walls. The decoration was often gilded or silvered to give it contrast with the white or pale pastel walls.[26]\n\nThe Belgian-born architect and designer François de Cuvilliés was one of the first to create a Rococo building in Germany, with the pavilion of Amalienburg in Munich, (1734-1739), inspired by the pavilions of the Trianon and Marly in France. It was built as a hunting lodge, with a platform on the roof for shooting pheasants. While the exterior was in more sober Baroque style, the interior, particularly the stairways and ceilings, was much lighter and decorative.'

In [1]:
import pandas as pd

train_json = pd.read_json('C:/Users/liana/Documents/Projects/fyp/dataset/train.jsonl', lines=True)
train_json.head()

Unnamed: 0,source,source_labels,rouge_scores,paper_id,target
0,[Due to the success of deep learning to solvin...,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.30188678746885, 0.37209301838831804, 0.6037...",SysEexbRb,[We provide necessary and sufficient analytica...
1,[The backpropagation (BP) algorithm is often t...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0, 0.0, 0.130434779206049, 0.14285713922902...",SygvZ209F7,"[Biologically plausible learning algorithms, p..."
2,"[We introduce the 2-simplicial Transformer, an...","[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.333333328395061, 0.8888888839111111, 0.1142...",rkecJ6VFvr,[We introduce the 2-simplicial Transformer and...
3,"[We present Tensor-Train RNN (TT-RNN), a novel...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.066666662222222, 0.06451612466181, 0.060606...",HJJ0w--0W,[Accurate forecasting over very long time hori...
4,[Recent efforts on combining deep models with ...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.27777777279320903, 0.571428566658163, 0.095...",HyH9lbZAW,[We propose a variational message-passing algo...


In [2]:
test_json = pd.read_json('C:/Users/liana/Documents/Projects/fyp/dataset/test.jsonl', lines=True)
test_json.head()

Unnamed: 0,source,source_labels,rouge_scores,paper_id,target,title
0,[Incremental class learning involves sequentia...,"[0, 0, 0, 0, 1, 0, 0, 0, 0]","[0.28571428104489804, 0.18181817681818102, 0.2...",SJ1Xmf-Rb,"[FearNet is a memory efficient neural-network,...",FearNet: Brain-Inspired Model for Incremental ...
1,[Multi-view learning can provide self-supervis...,"[1, 0, 0, 0, 0, 0]","[0.19999999580000002, 0.0, 0.15789473418282501...",S1xzyhR9Y7,[Multi-view learning improves unsupervised sen...,Improving Sentence Representations with Multi-...
2,[We show how discrete objects can be learnt in...,"[1, 0, 0, 0, 0]","[0.9787233992575821, 0.33333332860555503, 0.41...",HJDUjKeA-,[We show how discrete objects can be learnt in...,Learning objects from pixels
3,[Most recent gains in visual recognition have ...,"[0, 0, 1, 0, 0, 0]","[0.11764705384083, 0.146341458655562, 0.199999...",BJgLg3R9KQ,[A large-scale dataset for training attention ...,Learning what and where to attend
4,"[In recent years, deep neural networks have de...","[0, 0, 1, 0, 0, 0, 0, 0]","[0.0, 0.05882352484429101, 0.270270265887509, ...",BklpOo09tQ,[We proposed a time-efficient defense method a...,EFFICIENT TWO-STEP ADVERSARIAL DEFENSE FOR DEE...


In [3]:
dev_json = pd.read_json('C:/Users/liana/Documents/Projects/fyp/dataset/dev.jsonl', lines=True)
dev_json.head()

Unnamed: 0,source,source_labels,rouge_scores,paper_id,target,title
0,[Mixed precision training (MPT) is becoming a ...,"[0, 0, 0, 1, 0, 0]","[0.23999999580000003, 0.260869560822306, 0.199...",rJlnfaNYvB,[We devise adaptive loss scaling to improve mi...,Adaptive Loss Scaling for Mixed Precision Trai...
1,"[Many real-world problems, e.g. object detecti...","[0, 0, 1, 0, 0]","[0.054054049086925, 0.29268292183224204, 0.974...",rJVoEiCqKQ,[We present a novel approach for learning to p...,Deep Perm-Set Net: Learn to predict sets with ...
2,[Foveation is an important part of human visio...,"[0, 0, 1, 0, 0]","[0.11764705382352901, 0.11764705382352901, 0.3...",rkldVXKU8H,[We compare object recognition performance on ...,Foveated Downsampling Techniques
3,[We explore the concept of co-design in the co...,"[0, 1, 0, 0, 0, 0]","[0.12499999548828102, 0.488888883911111, 0.204...",BJfIVjAcKm,[We develop methods to train deep neural model...,Training for Faster Adversarial Robustness Ver...
4,[Batch Normalization (BatchNorm) has shown to ...,"[0, 0, 1, 0, 0, 0]","[0.1999999952, 0.239999995008, 0.3999999950080...",BJlEEaEFDS,[Investigation of how BatchNorm causes adversa...,Towards an Adversarially Robust Normalization ...


Convert json into csv

In [4]:
# train_csv = train_json.to_csv()
# test_csv = test_json.to_csv()
# dev_csv = dev_json.to_csv()

In [5]:
train_json.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1992 entries, 0 to 1991
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   source         1992 non-null   object
 1   source_labels  1992 non-null   object
 2   rouge_scores   1992 non-null   object
 3   paper_id       1992 non-null   object
 4   target         1992 non-null   object
dtypes: object(5)
memory usage: 77.9+ KB


In [6]:
train_json.isnull().sum()

source           0
source_labels    0
rouge_scores     0
paper_id         0
target           0
dtype: int64

In [7]:
test_json.isnull().sum()

source           0
source_labels    0
rouge_scores     0
paper_id         0
target           0
title            0
dtype: int64

In [8]:
dev_json.isnull().sum()

source           0
source_labels    0
rouge_scores     0
paper_id         0
target           0
title            0
dtype: int64

In [9]:
Xtrain_full = train_json.drop('target', axis=1)
Xtrain = Xtrain_full[:100]
Xtrain.head()

Unnamed: 0,source,source_labels,rouge_scores,paper_id
0,[Due to the success of deep learning to solvin...,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.30188678746885, 0.37209301838831804, 0.6037...",SysEexbRb
1,[The backpropagation (BP) algorithm is often t...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0, 0.0, 0.130434779206049, 0.14285713922902...",SygvZ209F7
2,"[We introduce the 2-simplicial Transformer, an...","[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.333333328395061, 0.8888888839111111, 0.1142...",rkecJ6VFvr
3,"[We present Tensor-Train RNN (TT-RNN), a novel...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.066666662222222, 0.06451612466181, 0.060606...",HJJ0w--0W
4,[Recent efforts on combining deep models with ...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.27777777279320903, 0.571428566658163, 0.095...",HyH9lbZAW


In [10]:
ytrain_full = train_json['target']
ytrain = ytrain_full[:100]
ytrain.head()

0    [We provide necessary and sufficient analytica...
1    [Biologically plausible learning algorithms, p...
2    [We introduce the 2-simplicial Transformer and...
3    [Accurate forecasting over very long time hori...
4    [We propose a variational message-passing algo...
Name: target, dtype: object

In [11]:
Xtest = test_json.drop('target', axis=1)
Xtest.head()
ytest = test_json['target']
ytest.head()

0    [FearNet is a memory efficient neural-network,...
1    [Multi-view learning improves unsupervised sen...
2    [We show how discrete objects can be learnt in...
3    [A large-scale dataset for training attention ...
4    [We proposed a time-efficient defense method a...
Name: target, dtype: object

## Fine-tuning BERT using Trainer

## Fine-tuning BERT using Pytorch Adam

In [12]:
from transformers import BertTokenizer, BertForPreTraining
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForPreTraining.from_pretrained('bert-base-uncased')

Some weights of BertForPreTraining were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['cls.predictions.decoder.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [13]:
bag = [sentence for article in Xtrain['source'] for sentence in article]
bag_size = len(bag)
bag


['Due to the success of deep learning to solving a variety of challenging machine learning tasks, there is a rising interest in understanding loss functions for training neural networks from a theoretical aspect.',
 'Particularly, the properties of critical points and the landscape around them are of importance to determine the convergence performance of optimization algorithms.',
 'In this paper, we provide a necessary and sufficient characterization of the analytical forms for the critical points (as well as global minimizers) of the square loss functions for linear neural networks.',
 'We show that the analytical forms of the critical points characterize the values of the corresponding loss functions as well as the necessary and sufficient conditions to achieve global minimum.',
 'Furthermore, we exploit the analytical forms of the critical points to characterize the landscape properties for the loss functions of linear neural networks and shallow ReLU networks.',
 'One particular c

In [14]:
import random

sentence_a = []
sentence_b = []
label = []

for paragraph in Xtrain['source']:
    sentences = [
        sentence for sentence in paragraph if sentence != ''
    ]
    num_sentences = len(sentences)
    if num_sentences > 1:
        start = random.randint(0, num_sentences-2)
        # 50/50 whether is IsNextSentence or NotNextSentence
        if random.random() >= 0.5:
            # this is IsNextSentence
            sentence_a.append(sentences[start])
            sentence_b.append(sentences[start+1])
            label.append(0)
        else:
            index = random.randint(0, bag_size-1)
            # this is NotNextSentence
            sentence_a.append(sentences[start])
            sentence_b.append(bag[index])
            label.append(1)

In [15]:
for i in range(22, 25):
    print(label[i])
    print(sentence_a[i] + '\n---')
    print(sentence_b[i] + '\n')

1
Spectral methods, stemming from a CNN formulation for graphs BID11 , include different approximations of spectral graph convolutions BID16 BID28 adaptive convolution filters BID34 , or attention mechanisms BID45 Embedding approaches have been suggesting for handling bag-of-words representations BID48 and general node attributes such as text or continuous features BID18 .
---
Method ScoreReal data 11.24 ± 0.16 WGAN 3.82 ± 0.06 MIX + WGAN BID2 4.04 ± 0.07 Improved-GAN (Salimans et al., 2016) 4.36 ± 0.04 ALI BID7 5.34 ± 0.05 DCGAN (Radford et al., 2015) 6.40 ± 0.05Proposed method 4.53 ± 0.04 Table 2 : Inception scores on CIFAR-10 dataset.that uses similar network architecture and training method.

1
We find that although the datasets are from different domains, the architecture searched on one performs comparable on the other.
---
In order to provide results on future video prediction problems we describe a simple modification to DVD-GAN to facilitate the added conditioning.

0
In addit

In [16]:
inputs = tokenizer(sentence_a, sentence_b, return_tensors='pt',
                   max_length=512, truncation=True, padding='max_length')

In [17]:
inputs.keys()

dict_keys(['input_ids', 'token_type_ids', 'attention_mask'])

In [18]:
inputs

{'input_ids': tensor([[  101,  1016,  1012,  ...,     0,     0,     0],
        [  101,  6897, 20228,  ...,     0,     0,     0],
        [  101,  2057,  2191,  ...,     0,     0,     0],
        ...,
        [  101, 19090,  1996,  ...,     0,     0,     0],
        [  101,  3602,  2008,  ...,     0,     0,     0],
        [  101,  2925,  2147,  ...,     0,     0,     0]]), 'token_type_ids': tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        ...,
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]])}

In [19]:
inputs['next_sentence_label'] = torch.LongTensor([label]).T

In [20]:
inputs.next_sentence_label[:10]

tensor([[0],
        [1],
        [0],
        [1],
        [1],
        [1],
        [1],
        [1],
        [1],
        [1]])

In [21]:
inputs['labels'] = inputs.input_ids.detach().clone()

In [22]:
inputs.keys()

dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'next_sentence_label', 'labels'])

In [23]:
# create random array of floats with equal dimensions to input_ids tensor
rand = torch.rand(inputs.input_ids.shape)
# create mask array
mask_arr = (rand < 0.15) * (inputs.input_ids != 101) * \
           (inputs.input_ids != 102) * (inputs.input_ids != 0)

In [24]:
selection = []

for i in range(inputs.input_ids.shape[0]):
    selection.append(
        torch.flatten(mask_arr[i].nonzero()).tolist()
    )

In [25]:
selection[:2]

[[8, 15, 19, 21, 26],
 [1, 13, 14, 17, 18, 19, 21, 23, 31, 38, 44, 47, 49, 51, 57, 58, 61, 74]]

In [26]:
for i in range(inputs.input_ids.shape[0]):
    inputs.input_ids[i, selection[i]] = 103

In [27]:
inputs.keys()

dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'next_sentence_label', 'labels'])

In [28]:
inputs.input_ids

tensor([[  101,  1016,  1012,  ...,     0,     0,     0],
        [  101,   103, 20228,  ...,     0,     0,     0],
        [  101,  2057,  2191,  ...,     0,     0,     0],
        ...,
        [  101, 19090,   103,  ...,     0,     0,     0],
        [  101,   103,  2008,  ...,     0,     0,     0],
        [  101,  2925,  2147,  ...,     0,     0,     0]])

In [29]:
class OurDataset(torch.utils.data.Dataset):
    def __init__(self, encodings):
        self.encodings = encodings
    def __getitem__(self, idx):
        return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
    def __len__(self):
        return len(self.encodings.input_ids)

In [30]:
dataset = OurDataset(inputs)

In [31]:
loader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=True)

In [32]:
device = torch.device('cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')

Using device: cpu



In [33]:
import torch.optim as optim
from tqdm import tqdm  # for our progress bar

epochs = 2

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0)

for epoch in range(epochs):
    # setup loop with TQDM and dataloader
    loop = tqdm(loader, leave=True)
    for batch in loop:
        # initialize calculated gradients (from prev step)
        optimizer.zero_grad()
        # pull all tensor batches required for training
        input_ids = batch['input_ids'].to(device)
        token_type_ids = batch['token_type_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        next_sentence_label = batch['next_sentence_label'].to(device)
        labels = batch['labels'].to(device)
        # process
        outputs = model(input_ids, attention_mask=attention_mask,
                        token_type_ids=token_type_ids,
                        next_sentence_label=next_sentence_label,
                        labels=labels)
        # extract loss
        loss = outputs.loss
        # calculate loss for every parameter that needs grad update
        loss.backward()
        # update parameters
        optimizer.step()
        # print relevant info to progress bar
        loop.set_description(f'Epoch {epoch}')
        loop.set_postfix(loss=loss.item())

  return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
Epoch 0: 100%|██████████| 7/7 [06:08<00:00, 52.68s/it, loss=2.68]
Epoch 1: 100%|██████████| 7/7 [05:40<00:00, 48.68s/it, loss=10.4]


Test training with KNN

In [35]:
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors = 5).fit(Xtrain, ytrain)

knn_pred = knn.predict(Xtest)

from sklearn.metrics import accuracy_score

accuracy_score(ytest, knn_pred)

ValueError: could not convert string to float: 'SysEexbRb'