### Assignment: Evaluate OpenLLMs for Text Summarization
#### Objective:
Compare at least three OpenLLMs on a text summarization task and recommend the best model based on performance metrics.

Steps:
* Task Definition: Define the text summarization context (e.g., news articles).
* Dataset: Use a dataset with articles and their summaries.
* Model Selection: Pick at least three OpenLLMs.
* Implementation: Implement text summarization for each model.
* Metrics: Use ROUGE scores for performance evaluation.
* Evaluation: Apply metrics to each model.
* Analysis: Summarize performance in a table.
* Recommendation: Pick the best model and justify your choice.

Deliverables:
* Python scripts for each OpenLLM implementation.
* Table summarizing performance metrics.
* Final recommendation report.
* Evaluation Criteria:
    * Efficiency and quality of code
    * Accuracy of analysis
    * Proper use of metrics
* Clarity in the final report

In [30]:
# !pip install rouge
# !pip install sentencepiece
# !pip install summarizer
# !pip install transformers spacy
# !python -m spacy download en_core_web_sm


Collecting spacy
  Obtaining dependency information for spacy from https://files.pythonhosted.org/packages/ca/f3/609bb7512cad1f02af13daa23aa433b931da34c502211f29fd47dceff624/spacy-3.7.2-cp311-cp311-macosx_11_0_arm64.whl.metadata
  Downloading spacy-3.7.2-cp311-cp311-macosx_11_0_arm64.whl.metadata (25 kB)
Collecting spacy-legacy<3.1.0,>=3.0.11 (from spacy)
  Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl (29 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0 (from spacy)
  Obtaining dependency information for spacy-loggers<2.0.0,>=1.0.0 from https://files.pythonhosted.org/packages/33/78/d1a1a026ef3af911159398c939b1509d5c36fe524c7b644f34a5146c4e16/spacy_loggers-1.0.5-py3-none-any.whl.metadata
  Downloading spacy_loggers-1.0.5-py3-none-any.whl.metadata (23 kB)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Obtaining dependency information for murmurhash<1.1.0,>=0.28.0 from https://files.pythonhosted.org/packages/7a/05/4a3b5c3043c6d84c00bf0f574d326660702b1c10174fe6b44cef3c3dff08/murmurh

In [10]:
import numpy as np
import pandas as pd 
import os 

from rouge import Rouge

In [11]:
dir = './BBC_News_Summary/News_Articles/'
for i in os.listdir(dir):
    sub_dir = dir + i + '/'
    print(f"{i.capitalize()} contains {len(sub_dir)} news artricles.")

Entertainment contains 47 news artricles.
Business contains 42 news artricles.
Sport contains 39 news artricles.
Politics contains 42 news artricles.
Tech contains 38 news artricles.


In [12]:
dir = './BBC_News_Summary/Summaries/'
for i in os.listdir(dir):
    sub_dir = dir + i + '/'
    print(f"{i.capitalize()} contains {len(sub_dir)} summaries.")

Entertainment contains 43 summaries.
Business contains 38 summaries.
Sport contains 35 summaries.
Politics contains 38 summaries.
Tech contains 34 summaries.


In [13]:
import torch 
from transformers import GPT2Tokenizer, GPT2LMHeadModel

model_name = 'gpt2-medium'
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model.eval()

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 1024)
    (wpe): Embedding(1024, 1024)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-23): 24 x GPT2Block(
        (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=1024, out_features=50257, bias=False)
)

In [14]:
dir = './BBC_News_Summary/Summaries/tech/001.txt'
with open(dir, "r") as f:
    ok = f.readlines()
    cleaned_text_list = [s.replace('\n', '').strip() for s in ok if s.strip() != '']
    cleaned_news_articles = ' '.join(cleaned_text_list)
cleaned_news_articles

'The other common type of ink in elections is indelible visible ink - but as the elections in Afghanistan showed, improper use of this type of ink can cause additional problems.The use of ink and readers by itself is not a panacea for election ills.The use of "invisible" ink is not without its own problems.The use of ink is only one part of a general effort to show commitment towards more open elections - the German Embassy, the Soros Foundation and the Kyrgyz government have all contributed to purchase transparent ballot boxes.The author of one such article began a petition drive against the use of the ink.The use of ink has been controversial - especially among groups perceived to be pro-government.In an effort to live up to its reputation in the 1990s as "an island of democracy", the Kyrgyz President, Askar Akaev, pushed through the law requiring the use of ink during the upcoming Parliamentary and Presidential elections.At the entrance to each polling station, one election official

## 1. GPT-2 Implementation
GPT-2 can be fine-tuned for various tasks, including summarization. However, for the sake of simplicity and demonstration, we'll use a prompt-based approach to generate a summary.

In [15]:
def summarize_gpt2(text):
    input_ids = tokenizer.encode("Summarize: " + text, return_tensors="pt")
    summary_ids = model.generate(input_ids, max_length = 100, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id, no_repeat_ngram_size=2, early_stopping=True)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

# Generate summary
article = cleaned_news_articles
gpt2_summary = summarize_gpt2(article)
print(gpt2_summary)

Input length of input_ids is 350, but `max_length` is set to 100. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.


Summarize: The other common type of ink in elections is indelible visible ink - but as the elections in Afghanistan showed, improper use of this type of ink can cause additional problems.The use of ink and readers by itself is not a panacea for election ills.The use of "invisible" ink is not without its own problems.The use of ink is only one part of a general effort to show commitment towards more open elections - the German Embassy, the Soros Foundation and the Kyrgyz government have all contributed to purchase transparent ballot boxes.The author of one such article began a petition drive against the use of the ink.The use of ink has been controversial - especially among groups perceived to be pro-government.In an effort to live up to its reputation in the 1990s as "an island of democracy", the Kyrgyz President, Askar Akaev, pushed through the law requiring the use of ink during the upcoming Parliamentary and Presidential elections.At the entrance to each polling station, one electio

In [16]:
gpt2_summary = gpt2_summary.replace('Summarize: ', "")
gpt2_summary

'The other common type of ink in elections is indelible visible ink - but as the elections in Afghanistan showed, improper use of this type of ink can cause additional problems.The use of ink and readers by itself is not a panacea for election ills.The use of "invisible" ink is not without its own problems.The use of ink is only one part of a general effort to show commitment towards more open elections - the German Embassy, the Soros Foundation and the Kyrgyz government have all contributed to purchase transparent ballot boxes.The author of one such article began a petition drive against the use of the ink.The use of ink has been controversial - especially among groups perceived to be pro-government.In an effort to live up to its reputation in the 1990s as "an island of democracy", the Kyrgyz President, Askar Akaev, pushed through the law requiring the use of ink during the upcoming Parliamentary and Presidential elections.At the entrance to each polling station, one election official

In [17]:
from rouge import Rouge

def calculate_rouge_scores(reference, generated):
    rouge = Rouge()
    scores = rouge.get_scores(generated, reference, avg=True)
    return scores

# Assuming you have generated summaries using the provided code for GPT-2, BERT, and T5
reference_summary = cleaned_news_articles
gpt2_summary = gpt2_summary
# bert_summary = "YOUR BERT GENERATED SUMMARY HERE"
# t5_summary = "YOUR T5 GENERATED SUMMARY HERE"

gpt2_scores = calculate_rouge_scores(reference_summary, gpt2_summary)
gpt2_scores
# bert_scores = calculate_rouge_scores(reference_summary, bert_summary)
# t5_scores = calculate_rouge_scores(reference_summary, t5_summary)

{'rouge-1': {'r': 1.0, 'p': 1.0, 'f': 0.999999995},
 'rouge-2': {'r': 1.0, 'p': 0.99609375, 'f': 0.9980430478375926},
 'rouge-l': {'r': 1.0, 'p': 1.0, 'f': 0.999999995}}

In [18]:
dir = './BBC_News_Summary/News_Articles/politics/'
sub_dir = os.listdir(dir)
sub_dir.remove('.DS_Store')
sub_dir.sort()

gpt2_summaries = []

for i in sub_dir:
    with open(dir + i, "r") as f:
        Lines = f.readlines()
        cleaned_text_list = [s.replace('\n', '').strip() for s in Lines if s.strip() != '']
        cleaned_news_articles = ' '.join(cleaned_text_list)
        
    gpt2_summary = summarize_gpt2(cleaned_news_articles)
    gpt2_summaries.append(gpt2_summary)

gpt2_summaries

Input length of input_ids is 535, but `max_length` is set to 100. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
Input length of input_ids is 454, but `max_length` is set to 100. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
Input length of input_ids is 639, but `max_length` is set to 100. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
Input length of input_ids is 297, but `max_length` is set to 100. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
Input length of input_ids is 591, but `max_length` is set to 100. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
Input length of input_ids is 403, but `max_length` is set to 100. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
Input length of input_ids is 633, but `max_length` is set to 100. This can lead to

['Summarize: Labour plans maternity pay rise Maternity pay for new mothers is to rise by £1,400 as part of new proposals announced by the Trade and Industry Secretary Patricia Hewitt. It would mean paid leave would be increased to nine months by 2007, Ms Hewitt told GMTV\'s Sunday programme. Other plans include letting maternity pay be given to fathers and extending rights to parents of older children. The Tories dismissed the maternity pay plan as "desperate", while the Liberal Democrats said it was misdirected. Ms Hewitt said: "We have already doubled the length of maternity pay, it was 13 weeks when we were elected, we have already taken it up to 26 weeks. "We are going to extend the pay to nine months by 2007 and the aim is to get it right up to the full 12 months by the end of the next Parliament." She said new mothers were already entitled to 12 months leave, but that many women could not take it as only six of those months were paid. "We have made a firm commitment. We will defi

In [19]:
len(gpt2_summaries)

50

In [20]:
summary_dir = './BBC_News_Summary/Summaries/politics/'
summary_sub_dir = os.listdir(summary_dir)
summary_sub_dir.remove('.DS_Store')
summary_sub_dir.sort()

In [21]:
def calculate_rouge_scores(reference, generated):
    rouge = Rouge()
    scores = rouge.get_scores(generated, reference, avg=True)
    return scores

In [22]:
# Directories for reference and generated summaries
reference_dir = gpt2_summaries
generated_dir = './BBC_News_Summary/Summaries/politics/'

results = {}

# Iterate over each file and compute ROUGE scores
for i, filename in enumerate(summary_sub_dir):
    generated_summary = gpt2_summaries[i]
    generated_summary.replace('Summarize: ', "")
    filename_sub_dir = summary_dir + filename
    with open(filename_sub_dir, 'r') as f:
        ok = f.readlines()
        cleaned_text_list = [s.replace('\n', '').strip() for s in ok if s.strip() != '']
        reference_summary = ' '.join(cleaned_text_list)
        reference_summary = reference_summary
    
    # Calculate ROUGE scores
    scores = calculate_rouge_scores(reference_summary, generated_summary)
    results[filename] = scores

# Print the results
for filename, score in results.items():
    print(f"File: {filename}")
    print(f"ROUGE-1: {score['rouge-1']['f']:.2%}")
    print(f"ROUGE-2: {score['rouge-2']['f']:.2%}")
    print(f"ROUGE-L: {score['rouge-l']['f']:.2%}")
    print("-----------------------")

File: 001.txt
ROUGE-1: 67.43%
ROUGE-2: 59.63%
ROUGE-L: 67.43%
-----------------------
File: 002.txt
ROUGE-1: 65.45%
ROUGE-2: 61.18%
ROUGE-L: 65.45%
-----------------------
File: 003.txt
ROUGE-1: 69.53%
ROUGE-2: 61.13%
ROUGE-L: 69.53%
-----------------------
File: 004.txt
ROUGE-1: 67.24%
ROUGE-2: 59.00%
ROUGE-L: 67.24%
-----------------------
File: 005.txt
ROUGE-1: 76.86%
ROUGE-2: 70.52%
ROUGE-L: 76.86%
-----------------------
File: 006.txt
ROUGE-1: 61.05%
ROUGE-2: 53.60%
ROUGE-L: 61.05%
-----------------------
File: 007.txt
ROUGE-1: 67.29%
ROUGE-2: 59.77%
ROUGE-L: 67.29%
-----------------------
File: 008.txt
ROUGE-1: 69.49%
ROUGE-2: 63.97%
ROUGE-L: 69.49%
-----------------------
File: 009.txt
ROUGE-1: 66.53%
ROUGE-2: 60.53%
ROUGE-L: 66.53%
-----------------------
File: 010.txt
ROUGE-1: 60.24%
ROUGE-2: 55.12%
ROUGE-L: 60.24%
-----------------------
File: 011.txt
ROUGE-1: 72.73%
ROUGE-2: 66.17%
ROUGE-L: 72.73%
-----------------------
File: 012.txt
ROUGE-1: 66.49%
ROUGE-2: 58.60%
ROUGE-L:

## T5 Implementation
T5 (Text-to-Text Transfer Transformer) is a versatile model that can be used for various tasks, including summarization, just by changing the prompt.

In [23]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load pre-trained T5 model and tokenizer
model_name = "t5-small"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
model.eval()

# Summarization function using T5
def summarize_t5(text, max_length=100):
    input_ids = tokenizer("summarize: " + text, return_tensors="pt").input_ids
    summary_ids = model.generate(input_ids, max_length=max_length, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id, early_stopping=True)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

In [24]:
dir = './BBC_News_Summary/News_Articles/politics/'
sub_dir = os.listdir(dir)
sub_dir.remove('.DS_Store')
sub_dir.sort()

T_5_summaries = []

for i in sub_dir:
    with open(dir + i, "r") as f:
        Lines = f.readlines()
        cleaned_text_list = [s.replace('\n', '').strip() for s in Lines if s.strip() != '']
        cleaned_news_articles = ' '.join(cleaned_text_list)
        
    T_5_summary = summarize_t5(cleaned_news_articles)
    T_5_summaries.append(T_5_summary)

T_5_summaries

Token indices sequence length is longer than the specified maximum sequence length for this model (574 > 512). Running this sequence through the model will result in indexing errors


['new mothers are already entitled to 12 months leave, but many women cannot take it. the tories dismissed the maternity pay plan as "desperate" the tories said it was misdirected.',
 'the information commissioner is asking for details of cabinet office orders. he says he is urgently asking for details of orders telling staff to delete e-mails. the tories and the Lib Dems have questioned the timing of the new rules.',
 'minister says sexism at work is preventing women reaching full potential. sexism at work is vital to closing gender pay gap. women in full-time work earn 19% less than men, according to the EOC.',
 'the labour party will hold its 2006 autumn conference in Manchester. it will be the first time since 1917 that the party has chosen Manchester. the party will get the much smaller spring conference instead in what will be seen as a placatory move.',
 'ally says there will be no spending spree before polling day. but he says he is confident the chancellor will meet his fiscal

In [25]:
# Directories for reference and generated summaries
reference_dir = T_5_summaries
generated_dir = './BBC_News_Summary/Summaries/politics/'

results = {}

# Iterate over each file and compute ROUGE scores
for i, filename in enumerate(summary_sub_dir):
    generated_summary = T_5_summaries[i]
    filename_sub_dir = summary_dir + filename
    with open(filename_sub_dir, 'r') as f:
        ok = f.readlines()
        cleaned_text_list = [s.replace('\n', '').strip() for s in ok if s.strip() != '']
        reference_summary = ' '.join(cleaned_text_list)
        reference_summary = reference_summary
    
    # Calculate ROUGE scores
    scores = calculate_rouge_scores(reference_summary, generated_summary)
    results[filename] = scores

# Print the results
for filename, score in results.items():
    print(f"File: {filename}")
    print(f"ROUGE-1: {score['rouge-1']['f']:.2%}")
    print(f"ROUGE-2: {score['rouge-2']['f']:.2%}")
    print(f"ROUGE-L: {score['rouge-l']['f']:.2%}")
    print("-----------------------")

File: 001.txt
ROUGE-1: 30.34%
ROUGE-2: 16.04%
ROUGE-L: 30.34%
-----------------------
File: 002.txt
ROUGE-1: 14.60%
ROUGE-2: 6.06%
ROUGE-L: 13.14%
-----------------------
File: 003.txt
ROUGE-1: 27.62%
ROUGE-2: 15.75%
ROUGE-L: 27.62%
-----------------------
File: 004.txt
ROUGE-1: 42.20%
ROUGE-2: 31.72%
ROUGE-L: 42.20%
-----------------------
File: 005.txt
ROUGE-1: 19.32%
ROUGE-2: 8.64%
ROUGE-L: 19.32%
-----------------------
File: 006.txt
ROUGE-1: 41.67%
ROUGE-2: 28.05%
ROUGE-L: 41.67%
-----------------------
File: 007.txt
ROUGE-1: 19.54%
ROUGE-2: 4.90%
ROUGE-L: 19.54%
-----------------------
File: 008.txt
ROUGE-1: 21.00%
ROUGE-2: 9.56%
ROUGE-L: 21.00%
-----------------------
File: 009.txt
ROUGE-1: 23.04%
ROUGE-2: 14.29%
ROUGE-L: 23.04%
-----------------------
File: 010.txt
ROUGE-1: 24.39%
ROUGE-2: 1.83%
ROUGE-L: 24.39%
-----------------------
File: 011.txt
ROUGE-1: 38.85%
ROUGE-2: 24.04%
ROUGE-L: 38.85%
-----------------------
File: 012.txt
ROUGE-1: 32.68%
ROUGE-2: 21.10%
ROUGE-L: 32.6

## BART for Summarization
BART is particularly interesting because, during pre-training, it learns to reconstruct sentences by randomly masking out chunks of text. This makes it a strong candidate for summarization tasks.

In [38]:
from transformers import BartForConditionalGeneration, BartTokenizer

model_name = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

def summarize_with_bart(text):
    inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)

Downloading (…)olve/main/vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 1.76MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 2.27MB/s]
Downloading (…)lve/main/config.json: 100%|██████████| 1.58k/1.58k [00:00<00:00, 13.7MB/s]
Downloading pytorch_model.bin: 100%|██████████| 1.63G/1.63G [12:30<00:00, 2.17MB/s]
Downloading (…)neration_config.json: 100%|██████████| 363/363 [00:00<00:00, 213kB/s]


summarize: Your long text goes here. Your long message goes here... Your long sentence goes here... Your long letter goes here.... Your long word goes here...... Your long words go here...


In [40]:
dir = './BBC_News_Summary/News_Articles/politics/'
sub_dir = os.listdir(dir)
sub_dir.remove('.DS_Store')
sub_dir.sort()

Bart_summaries = []

for i in sub_dir:
    with open(dir + i, "r") as f:
        Lines = f.readlines()
        cleaned_text_list = [s.replace('\n', '').strip() for s in Lines if s.strip() != '']
        cleaned_news_articles = ' '.join(cleaned_text_list)
        
    Bart_summary = summarize_with_bart(cleaned_news_articles)
    Bart_summaries.append(Bart_summary)

Bart_summaries

['Maternity pay for new mothers to rise by £1,400 as part of new proposals. Paid leave would be increased to nine months by 2007, Trade and Industry Secretary Patricia Hewitt says. Other plans include letting maternity pay be given to fathers and extending rights to parents of older children. Tories dismiss the maternity pay plan as "desperate", while Liberal Democrats say it is misdirected.',
 'Richard Thomas says e-mails should only be deleted if they serve "no current purpose" Tory leader Michael Howard has written to Tony Blair demanding an explanation of the new rules on e-mail retention. Lib Dem constitutional affairs committee chairman Alan Beith warned that the deletion of millions of government e- emails could harm key probes like the Hutton Inquiry.',
 'Trade and Industry Secretary Patricia Hewitt calls for paid maternity leave to be extended beyond six months. She also announces a new drive to help women who want to work in male dominated sectors. Women in full-time work ear

In [41]:
# Directories for reference and generated summaries
reference_dir = T_5_summaries
generated_dir = './BBC_News_Summary/Summaries/politics/'

results = {}

# Iterate over each file and compute ROUGE scores
for i, filename in enumerate(summary_sub_dir):
    generated_summary = Bart_summaries[i]
    filename_sub_dir = summary_dir + filename
    with open(filename_sub_dir, 'r') as f:
        ok = f.readlines()
        cleaned_text_list = [s.replace('\n', '').strip() for s in ok if s.strip() != '']
        reference_summary = ' '.join(cleaned_text_list)
        reference_summary = reference_summary
    
    # Calculate ROUGE scores
    scores = calculate_rouge_scores(reference_summary, generated_summary)
    results[filename] = scores

# Print the results
for filename, score in results.items():
    print(f"File: {filename}")
    print(f"ROUGE-1: {score['rouge-1']['f']:.2%}")
    print(f"ROUGE-2: {score['rouge-2']['f']:.2%}")
    print(f"ROUGE-L: {score['rouge-l']['f']:.2%}")
    print("-----------------------")

File: 001.txt
ROUGE-1: 44.44%
ROUGE-2: 20.33%
ROUGE-L: 42.11%
-----------------------
File: 002.txt
ROUGE-1: 26.83%
ROUGE-2: 11.82%
ROUGE-L: 25.61%
-----------------------
File: 003.txt
ROUGE-1: 38.38%
ROUGE-2: 24.26%
ROUGE-L: 34.34%
-----------------------
File: 004.txt
ROUGE-1: 51.33%
ROUGE-2: 40.28%
ROUGE-L: 49.56%
-----------------------
File: 005.txt
ROUGE-1: 33.48%
ROUGE-2: 23.34%
ROUGE-L: 33.48%
-----------------------
File: 006.txt
ROUGE-1: 13.45%
ROUGE-2: 2.48%
ROUGE-L: 13.45%
-----------------------
File: 007.txt
ROUGE-1: 22.60%
ROUGE-2: 11.34%
ROUGE-L: 20.34%
-----------------------
File: 008.txt
ROUGE-1: 19.10%
ROUGE-2: 9.62%
ROUGE-L: 19.10%
-----------------------
File: 009.txt
ROUGE-1: 35.00%
ROUGE-2: 27.44%
ROUGE-L: 35.00%
-----------------------
File: 010.txt
ROUGE-1: 50.57%
ROUGE-2: 33.33%
ROUGE-L: 50.57%
-----------------------
File: 011.txt
ROUGE-1: 52.94%
ROUGE-2: 40.91%
ROUGE-L: 52.94%
-----------------------
File: 012.txt
ROUGE-1: 24.52%
ROUGE-2: 13.70%
ROUGE-L: 2