# Extractive Summarisation with BERT, GPT-2, Text Rank

An average worker spends approximately 28 % of his working hours reading and answering emails, adding up to 11 hours a week (McKinsey, 2012). To free up time, a valuable resource, we will train different models to summarise the key message of an e-mails.

To summarise the incoming emails, we will make use of extractive summarisation. There are two main forms of Text Summarization, extractive and abstractive. While extractive summarisation seeks to find the most informative sentences within a large body of text and then forms them to a summary, an abstractive summarisation model generates concise phrases that are semantically consistent with the large body of text. 

For simplicity we will compare various extractive summarisation techniques. To do this, we have manually summarised roughly 100 emails, such that we can compare our models with the manual summarisations in terms of accuracy.

### Importing Libraries

In [2]:
# Installing relevant libraries
!pip install langdetect
!pip install bert-extractive-summarizer
!pip install torch
!pip install rouge_score
!pip install transformers==2.2.0
!pip install spacy==2.0.12

[1m
         .:::.     .::.       
        ....yy:    .yy.       
        :.  .yy.    y.        
             :y:   .:         
             .yy  .:          
              yy..:           
              :y:.            
              .y.             
             .:.              
        ....:.                
        :::.                  
[0;33m
• Project files and data should be stored in /project. This is shared among everyone
  in the project.
• Personal files and configuration should be stored in /home/faculty.
• Files outside /project and /home/faculty will be lost when this server is terminated.
• Create custom environments to setup your servers reproducibly.
[0m
[1m
         .:::.     .::.       
        ....yy:    .yy.       
        :.  .yy.    y.        
             :y:   .:         
             .yy  .:          
              yy..:           
              :y:.            
              .y.             
             .:.              
        ....:.                


[1m
         .:::.     .::.       
        ....yy:    .yy.       
        :.  .yy.    y.        
             :y:   .:         
             .yy  .:          
              yy..:           
              :y:.            
              .y.             
             .:.              
        ....:.                
        :::.                  
[0;33m
• Project files and data should be stored in /project. This is shared among everyone
  in the project.
• Personal files and configuration should be stored in /home/faculty.
• Files outside /project and /home/faculty will be lost when this server is terminated.
• Create custom environments to setup your servers reproducibly.
[0m
[1m
         .:::.     .::.       
        ....yy:    .yy.       
        :.  .yy.    y.        
             :y:   .:         
             .yy  .:          
              yy..:           
              :y:.            
              .y.             
             .:.              
        ....:.                


In [3]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import torch
import langdetect
import re
import math
from rouge_score import rouge_scorer

# for BERT
from summarizer import Summarizer

# for GPT-2
from summarizer import TransformerSummarizer

# for Text Rank
import nltk 
from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize, sent_tokenize, RegexpTokenizer
from nltk.stem import WordNetLemmatizer
from nltk.stem.porter import PorterStemmer
from sklearn.metrics.pairwise import cosine_similarity
import networkx as nx
from nltk import sent_tokenize, word_tokenize, PorterStemmer

# Ignoring Warnings
pd.set_option('mode.chained_assignment', None)

In [4]:
# Read manually labelled data
emails_labelled = pd.read_csv('/project/emails_labelled.csv',index_col=[0])

# Formatting the dataframe 
emails_labelled = emails_labelled.reset_index()
emails_labelled = emails_labelled.drop(["index","from","to"], axis = 1)


# Assigning text body to BERT and GPT-2 columns for further analysis, later we can use apply to summarise on the following columns 
emails_labelled["Bert"] = emails_labelled["body"]
emails_labelled["GPT2"] = emails_labelled["body"]

In [5]:
emails_labelled.head()

Unnamed: 0,date,subject,body,sentiment_label,summary_label,Bert,GPT2
0,"Fri, 14 Jul 2000 08:44:00 -0700 (PDT)",New Gas Transportation Product on EnronOnline,The Global Gas Pipeline group is looking to tr...,0.0,Global Gas Pipeline group is looking to trade ...,The Global Gas Pipeline group is looking to tr...,The Global Gas Pipeline group is looking to tr...
1,"Wed, 6 Dec 2000 07:26:00 -0800 (PST)",Revised CalJournal Ad,"IEP Team,\nAttached is a revised January CalJo...",0.0,Revised January CalJournal ad for review,"IEP Team,\nAttached is a revised January CalJo...","IEP Team,\nAttached is a revised January CalJo..."
2,"Thu, 13 Dec 2001 16:29:38 -0800 (PST)",Staci Holtzman,"\n\tAt your earliest convenience, please send ...",0.0,comments on Staci's performance,"\n\tAt your earliest convenience, please send ...","\n\tAt your earliest convenience, please send ..."
3,"Tue, 30 Jan 2001 05:10:00 -0800 (PST)",Shaeffer redline,I ran a redline from the last version I had el...,0.0,Forwarded the current version to Herman for hi...,I ran a redline from the last version I had el...,I ran a redline from the last version I had el...
4,"Thu, 30 Dec 1999 05:46:00 -0800 (PST)",Plan update,---------------------- Forwarded by Daren J Fa...,0.0,"Forwarded, get copy of plan and review",---------------------- Forwarded by Daren J Fa...,---------------------- Forwarded by Daren J Fa...


# Pre-trained BERT

We specifically make use of the BERT summarizer, a pre-trained BERT model in a wrapper function, which is already finetuned for extractive summarisation. We therefore install bert-extractive-summarizer. This tool utilizes the HuggingFace Pytorch transformers library to run extractive summarizations. It works by first embedding the sentences, then running a clustering algorithm, finding the sentences that are closest to the cluster's centroids.

In [6]:
# Setting up BERT model
bert_model = Summarizer()

# Defining function to apply to emails
def bert_summary(text):
    bert_text = ''.join(bert_model(text))
    return bert_text

# Applying function to emails
emails_labelled["Bert"] = emails_labelled["Bert"].apply(lambda x: bert_summary(x))

# Pre-trained GPT-2

As with BERT, we train a pre-trained BERT model emails. Within the TransformerSummarizer() wrapper function, we simply configure 'GPT2' as paramenter.

In [7]:
# Setting up GPT-2 model
GPT2_model = TransformerSummarizer(transformer_type="GPT2",transformer_model_key="gpt2-medium")

# Defining function to apply to emails
def GPT2_summary(text):
    GPT2_text = ''.join(GPT2_model(text, min_length=50))
    return GPT2_text

# Applying function to emails
emails_labelled["GPT2"] = emails_labelled["GPT2"].apply(lambda x: GPT2_summary(x))

# Data Cleaning for Text Rank

With the pre-trained BERT and GPT-2 models, no major data cleaning and pre-processing was necessary. Thus these steps are performed now, to prepare the data for further extraction summarisation models, namely Text Rank.

In [8]:
# Defining function to remove special characters
def clean(email):
    return re.sub(r"^.*:\s.*|^-.*\n?.*-$|\n|^>", '', email, 0, re.MULTILINE)

# Defining function to remove stop words
def removeStopwords(sentence):
    newSentence = " ".join([i for i in sentence if i not in stopWords])
    return newSentence

# Defining function to present each sentence separately
def prettySentences(sentence):
    for s in sentence:
        print(s)
        print()

In [9]:
# Applying functions defined above to data and appending it to dataframe

# 1. Applying clean function
mydata = []
for sentence in emails_labelled.body:
    mydata.append(clean(sentence))

# 2. Applying Stopwords and prettySentences functions
sentences = []
for sentence in mydata:
    sentences.append(sent_tokenize(sentence))

In [10]:
# Using 100 dimension version of Glove Embedding
wordEmbeddings = {}
with open ("/project/glove.6B.100d.txt", encoding = 'utf - 8') as f:
    for line in f:
        values = line.split()
        key = values[0]
        wordEmbeddings[key] = np.asarray(values[1:], dtype = 'float32')
        
cleanSentences = []

# sentence formatting and removal of stop words
for email in sentences:
        email = [re.sub(r"[^a-zA-Z]", " ", s, 0, re.MULTILINE) for s in email]
        email = [re.sub(r"\s+", " ", s, 0, re.MULTILINE) for s in email]
        cleanSentences.append([s.lower() for s in email])
        
stopWords = stopwords.words('english')

for i in range(len(cleanSentences)):
    cleanSentences[i] = [removeStopwords(r.split()) for r in cleanSentences[i]]

# Text Rank

TextRank is an extractive and unsupervised text summarization technique. It ranks sentences along their importance, by assigning them a similiarity score, which are then stored in a square matrix.
Put differently, a vector representation, word embedding, is established for each sentence. Similarities between sentence vectors are then calculated and stored in a matrix. The similarity matrix is then converted into a graph, with sentences as vertices and similarity scores as edges, for sentence rank calculation. Finally, a certain number of top-ranked sentences form the final summary

In [11]:
# Creating vector representations
sentenceVectors = []
for email in cleanSentences:
    temp = []
    for s in email:
        if len(s) != 0:
            v = sum([ wordEmbeddings.get(w, np.zeros((100,))) for w in s.split()])/(len(s.split()) + 0.001)
        else:
            v = np.zeros((100,))
        temp.append(v)
    sentenceVectors.append(temp)

In [12]:
# Calculating similarity stores and storing them to matrix
similarityMatrix = []
for i in range(len(cleanSentences)):
    email = cleanSentences[i]
    temp = np.zeros((len(email), len(email)))
    j_range = temp.shape[0]
    k_range = temp.shape[1]
    for j in range(j_range):
        for k in range(k_range):
            if j != k:
                temp[j][k] = cosine_similarity(sentenceVectors[i][j].reshape(1, 100),
                                                             sentenceVectors[i][k].reshape(1, 100))[0][0]
    similarityMatrix.append(temp)

In [13]:
# Similarity matrix converted into a graph, with sentences as vertices and similarity scores as edges
scores = []
for i in similarityMatrix:
    nxGraph= nx.from_numpy_array(i)
    scores.append(nx.pagerank_numpy(nxGraph))

In [14]:
# Ranking top sentences
rankedSentences = []

for i in range(len(scores)):
    rankedSentences.append(sorted(((scores[i][j],s) for j,s in enumerate(sentences[i])), reverse=True))

In [15]:
# Out of the number of sentences extracted, we only want to take 2 sentence from each email to as a maximum limit
textrank_summarised = []
for i in range(0,95):
    sentence = rankedSentences[i]
    sentence_1 = sentence[0]
    if len(rankedSentences[i]) > 1:
            sentence_2 = rankedSentences[i]
            sentence_2 = sentence_2[1]
    else:
        sentence_2 = ("0","0")
    textrank_summarised.append([sentence_1,sentence_2])
    textrank_summarised

textrank_table = pd.DataFrame(textrank_summarised)

# Rouge Score

ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation and is a set of metrics for evaluating automatic summarization of texts as well as machine translations. We use it by comparing the automatically produced summaries from BERT and GPT-2 against our set manually produced reference summaries. Simply put, recall (in the context of ROUGE) refers to how much of the reference summary the system summary is recovering or capturing (freeCodeCamp,2017). <br>

Due to the short nature of our emails, we make use of ROUGE-1, which refers to the overlap of unigrams, single words, between the system summary and reference summary. <br>

In the context of ROUGE, while precision measures how much of the system summary was in fact relevant or needed, recall refers to how much of the reference summary the system summary is recovering or capturing.

In [16]:
# Setting up ROUGE scorer
scorer = rouge_scorer.RougeScorer(['rouge1'], use_stemmer=True)

# Defining function to implement on entire body
def rouge_score(text,text2):
    GPT_scores = scorer.score(text1,text2)
    rouge_s = GPT_scores.get("rouge1")
    return rouge_s

In [17]:
# Applying ROUGE score to BERT
bert_score_list = []
for x in range(0,len(emails_labelled)):
    bert_scores = scorer.score(emails_labelled["Bert"][x],emails_labelled["summary_label"][x])
    rouge_s = bert_scores.get("rouge1")
    bert_score_list.append(rouge_s)
    
    
# create a data frame with all the rouge score generated 
bert_score_list = pd.DataFrame(bert_score_list)
bert_score_list = bert_score_list.rename(columns = {"precision":"bert precision", "recall":"bert recall", "fmeasure":"bert fmeasure"})

bert_score_list = bert_score_list.join(emails_labelled["body"])
bert_score_list = bert_score_list.join(emails_labelled["Bert"])

bert_score_list = bert_score_list[["body","Bert","bert precision","bert recall","bert fmeasure"]]
bert_score_list.head(10)

Unnamed: 0,body,Bert,bert precision,bert recall,bert fmeasure
0,The Global Gas Pipeline group is looking to tr...,The Global Gas Pipeline group is looking to tr...,0.583333,0.482759,0.528302
1,"IEP Team,\nAttached is a revised January CalJo...","IEP Team,\nAttached is a revised January CalJo...",1.0,0.206897,0.342857
2,"\n\tAt your earliest convenience, please send ...","At your earliest convenience, please send me c...",1.0,0.333333,0.5
3,I ran a redline from the last version I had el...,I ran a redline from the last version I had el...,1.0,0.458333,0.628571
4,---------------------- Forwarded by Daren J Fa...,---------------------- Forwarded by Daren J Fa...,1.0,0.12069,0.215385
5,"Hey John & Angie, Have you heard the news from...","Hey John & Angie, Have you heard the news from...",0.0,0.0,0.0
6,You have received this e-mail from South Texas...,You have received this e-mail from South Texas...,0.0,0.0,0.0
7,Start Date: 4/22/01; HourAhead hour: 11; No a...,Start Date: 4/22/01; HourAhead hour: 11; No a...,0.0,0.0,0.0
8,Jeff - per your instructions here's the list o...,Jeff - per your instructions here's the list o...,1.0,0.075,0.139535
9,----- Forwarded by Gayla E Seiter/ENRON_DEVELO...,----- Forwarded by Gayla E Seiter/ENRON_DEVELO...,1.0,0.04878,0.093023


In [18]:
# count the number of GPT2 summary with more than 0.6 fmeasure
bert_count = bert_score_list[bert_score_list["bert fmeasure"]>0.6].count()
bert_accept = bert_count["bert fmeasure"]
bert_reject = len(emails_labelled) - bert_count["bert fmeasure"]

In [19]:
# Applying ROUGE score to GPT-2
GPT2_score_list = []
for x in range(0,len(emails_labelled)):
    GPT_scores = scorer.score(emails_labelled["GPT2"][x],emails_labelled["summary_label"][x])
    rouge_s = GPT_scores.get("rouge1")
    GPT2_score_list.append(rouge_s)
    
# create a data frame with all the rouge score generated     
GPT2_score_list = pd.DataFrame(GPT2_score_list)
GPT2_score_list = GPT2_score_list.rename(columns = {"precision":"GPT2 precision", "recall":"GPT2 recall", "fmeasure":"GPT2 fmeasure"})

GPT2_score_list = GPT2_score_list.join(emails_labelled["body"])
GPT2_score_list = GPT2_score_list.join(emails_labelled["GPT2"])
GPT2_score_list = GPT2_score_list[["body","GPT2","GPT2 precision","GPT2 recall","GPT2 fmeasure"]]
GPT2_score_list.head(10)

Unnamed: 0,body,GPT2,GPT2 precision,GPT2 recall,GPT2 fmeasure
0,The Global Gas Pipeline group is looking to tr...,The Global Gas Pipeline group is looking to tr...,0.583333,0.5,0.538462
1,"IEP Team,\nAttached is a revised January CalJo...","IEP Team,\nAttached is a revised January CalJo...",1.0,0.214286,0.352941
2,"\n\tAt your earliest convenience, please send ...","At your earliest convenience, please send me c...",1.0,0.333333,0.5
3,I ran a redline from the last version I had el...,I ran a redline from the last version I had el...,1.0,0.458333,0.628571
4,---------------------- Forwarded by Daren J Fa...,---------------------- Forwarded by Daren J Fa...,1.0,0.12069,0.215385
5,"Hey John & Angie, Have you heard the news from...","Hey John & Angie, Have you heard the news from...",0.0,0.0,0.0
6,You have received this e-mail from South Texas...,You have received this e-mail from South Texas...,0.0,0.0,0.0
7,Start Date: 4/22/01; HourAhead hour: 11; No a...,Start Date: 4/22/01; HourAhead hour: 11; No a...,0.0,0.0,0.0
8,Jeff - per your instructions here's the list o...,Jeff - per your instructions here's the list o...,1.0,0.075,0.139535
9,----- Forwarded by Gayla E Seiter/ENRON_DEVELO...,----- Forwarded by Gayla E Seiter/ENRON_DEVELO...,1.0,0.04878,0.093023


In [20]:
# count the number of GPT2 summary with more than 0.6 fmeasure
GPT2_count = GPT2_score_list[GPT2_score_list["GPT2 fmeasure"]>0.6].count()
GPT2_accept = GPT2_count["GPT2 fmeasure"]
GPT2_reject = len(emails_labelled) - GPT2_count["GPT2 fmeasure"]

In [21]:
# Applying ROUGE score to TextRank
# Since sentence is separated into numbers of list, we need to combine the 2 sentences extract into 1 sentence and put it into one column
textrank_table["textrank_summary"] = ""
for i in range(0,len(textrank_table)):
    list_sentence = []
    x = textrank_table[0][i][1]
    y = textrank_table[1][i][1]
    sentence_combined = x + y
    list_sentence.append(sentence_combined)
    textrank_table["textrank_summary"][i]=sentence_combined
    
#text rank rouge score 
textrank_score_list = []
for x in range(0,len(emails_labelled)):
    textrank_scores = scorer.score(textrank_table["textrank_summary"][x],emails_labelled["summary_label"][x])
    rouge_s = textrank_scores.get("rouge1")
    textrank_score_list.append(rouge_s)
    
textrank_score_list = pd.DataFrame(textrank_score_list)
textrank_score_list = textrank_score_list.rename(columns = {"precision":"textrank precision", "recall":"textrank recall", "fmeasure":"textrank fmeasure"})
textrank_score_list = textrank_score_list.join(emails_labelled["body"])
textrank_score_list = textrank_score_list.join(textrank_table["textrank_summary"])
textrank_score_list = textrank_score_list[["body","textrank_summary","textrank precision","textrank recall","textrank fmeasure"]]
textrank_score_list.head(10)

Unnamed: 0,body,textrank_summary,textrank precision,textrank recall,textrank fmeasure
0,The Global Gas Pipeline group is looking to tr...,The Initial products are expected to be day ah...,0.583333,0.341463,0.430769
1,"IEP Team,\nAttached is a revised January CalJo...","However, since the deadline has not changed (a...",0.166667,0.033333,0.055556
2,"\n\tAt your earliest convenience, please send ...","\tAt your earliest convenience, please send me...",1.0,0.3125,0.47619
3,I ran a redline from the last version I had el...,I've forwarded the current version to Herman f...,1.0,0.407407,0.578947
4,---------------------- Forwarded by Daren J Fa...,Thanks............Mary Solmonson12/15/99 06:55...,0.857143,0.133333,0.230769
5,"Hey John & Angie, Have you heard the news from...",We're pretty excited about the whole thing.Ker...,0.75,0.032967,0.063158
6,You have received this e-mail from South Texas...,"Friday, January 26, 2001\t\t\t7:30\t\t\tRegist...",0.111111,0.00339,0.006579
7,Start Date: 4/22/01; HourAhead hour: 11; No a...,Variances detected.Variances detected in Energ...,0.0,0.0,0.0
8,Jeff - per your instructions here's the list o...,AirTrans AirwaysCeladon TruckingChitaqua Airli...,0.0,0.0,0.0
9,----- Forwarded by Gayla E Seiter/ENRON_DEVELO...,\tGayla E Seiter\t11/15/2000 08:21 AM\t\t SEIT...,0.0,0.0,0.0


In [22]:
# count the number of text rank summary with more than 0.5 fmeasure
textrank_count = textrank_score_list[textrank_score_list["textrank fmeasure"]>0.6].count()
textrank_accept = textrank_count["textrank fmeasure"]
textrank_reject = len(emails_labelled) - textrank_count["textrank fmeasure"]

# Extractive Summarisation Overview and Limitation

In [23]:
print("Average GPT-2 precision: {:.2f}%".format((bert_score_list["bert precision"].mean())*100))
print("Average GPT-2 recall: {:.2f}%".format((bert_score_list["bert recall"].mean())*100))
print("Average GPT-2 f-measure: {:.2f}%".format((bert_score_list["bert fmeasure"].mean())*100))
print("\n")
print("Average GPT-2 precision: {:.2f}%".format((GPT2_score_list["GPT2 precision"].mean())*100))
print("Average GPT-2 recall: {:.2f}%".format((GPT2_score_list["GPT2 recall"].mean())*100))
print("Average GPT-2 f-measure: {:.2f}%".format((GPT2_score_list["GPT2 fmeasure"].mean())*100))
print("\n")
print("Average TextRank precision: {:.2f}%".format((textrank_score_list["textrank precision"].mean())*100))
print("Average TextRank recall: {:.2f}%".format((textrank_score_list["textrank recall"].mean())*100))
print("Average TextRank f-measure: {:.2f}%".format((textrank_score_list["textrank fmeasure"].mean())*100))

Average GPT-2 precision: 72.48%
Average GPT-2 recall: 26.11%
Average GPT-2 f-measure: 32.77%


Average GPT-2 precision: 71.50%
Average GPT-2 recall: 26.69%
Average GPT-2 f-measure: 33.31%


Average TextRank precision: 64.81%
Average TextRank recall: 20.84%
Average TextRank f-measure: 27.77%


As we can see from the outputs above, precision is quite high for all models, specifically BERT and GPT-2. Looking at recall and overall accuracy, f-measure, we see that the mdoels are not doing a sufficient job in summarization. As pre-trained models such as the bert-extractive-summarizer, do not allow for much finetuning, we recognise the limitations of the pre-trained extractive models. However, the models' lack of overlap with the manually generated summaries, does not imply the summaries generated are of inferior quality.

In [24]:
summary_table = pd.DataFrame()
bert_list = [bert_accept, bert_reject]
GPT2_list = [GPT2_accept, GPT2_reject]
textrank_list = [textrank_accept, textrank_reject]

In [25]:
summary_table["Bert"] = ""
summary_table["GPT2"] = ""
summary_table["Textrank"] = ""

In [26]:
summary_table["Bert"] = bert_list
summary_table["GPT2"] = GPT2_list
summary_table["Textrank"] = textrank_list

In [27]:
summary_table.rename(index = {0:"Overlap > 60%", 1:"Overlap < 60%"})

Unnamed: 0,Bert,GPT2,Textrank
Overlap > 60%,15,18,12
Overlap < 60%,80,77,83


We can see that GPT-2 is doing the best job in summarising, with regards to overlap with the manually generated summaries. <br>
Looking at the scorings with regards to overlap only, however, fails to take into account the length of summaries, as for example, including the entire email body could simply result in a precision score of 1.

Hence, we next compare the original body of the first email with each summary generated by the respective model.

In [28]:
print("Original Sentence: {}".format(emails_labelled["body"][0]))
print("\n")
print("Manual Summary: {}".format(emails_labelled["summary_label"][0]))
print("\n")
print("BERT Summary: {}".format(emails_labelled["Bert"][0]))
print("\n")
print("GPT2 Summary: {}".format(emails_labelled["GPT2"][0]))
print("\n")
print("Textrank Summary: {}".format(textrank_table["textrank_summary"][0]))

Original Sentence: The Global Gas Pipeline group is looking to trade gas transportation on 
EnronOnline.  The Initial products are expected to be day ahead and rest of 
the month transportation. Houston PipeLine and Northern Natural Gas Pipeline 
will be the initial pipelines posting bids and offers. The expected date of 
launch  is 8/23/00. We do not have the GTC or Product descriptions yet.  I 
will forward those to you'll as soon as I have them. In the meantime if 
you'll could please let me know if there are any legal, credit , tax, risk, 
and other concerns we should be addressing. Thanks a lot.

Savita


Manual Summary: Global Gas Pipeline group is looking to trade gas transportation on 
EnronOnline, any legal, credit , tax, risk, 
and other concerns we should be addressing


BERT Summary: The Global Gas Pipeline group is looking to trade gas transportation on 
EnronOnline. Houston PipeLine and Northern Natural Gas Pipeline 
will be the initial pipelines posting bids and offers.


Upon inspection of the summary example above, we recognise that despite the ROUGE score being rather low, the summarizations extracted seem adequate in terms of context and length. While BERT and GPT-2 do a good job at giving context, TextRank captures the key action point the message conveys, by specifying that input on legal, credit, tax, risk and other concerns is required.

Looking at the performance of each pre-trained model and Textrank model, the Rouge score was able to provide some of the limitation within extractive summarisation. In general,all model was able to extract information relevant to the manual summary (high precision). However, majority of the sentences summarised are much longer in comparison to the manual summarised sentences. Hence, increasing length of sentence will provide a better score in precision but this lowers the score in recall. And although this is a precision and recall trade off issues, we believe that increasing recall score would be a outlook for future modelling. Not only this, having a lower number of sentence will improve efficient in Enron's perspective to speed up communication. Another limitation for the pre-train model would be recognising the relevancy of shorter sentences such as "Thanks a lot" from the example above. This not only increase summarised sentence length but also reducing summarisation accuracy. To solve this problem, we believe that taking an abstractive approach so that we are able to re-phrase and restructure the emails' content.