## **Problem Statement**
How can we automatically generate accurate and concise summaries of informal, multi-turn chat conversations to help users quickly understand key discussion points without reading the entire dialogue?

## **Target Audience**
The target audience for this summarisation application includes individuals and organizations that frequently engage in or process informal, chat-based communication and require quick and coherent summaries for efficiency. For example; **journalists, customer support** & **call center teams**

## **Libraries**

In [1]:
!pip install evaluate

Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting datasets>=2.0.0 (from evaluate)
  Downloading datasets-3.4.1-py3-none-any.whl.metadata (19 kB)
Collecting dill (from evaluate)
  Downloading dill-0.3.9-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from evaluate)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from evaluate)
  Downloading multiprocess-0.70.17-py311-none-any.whl.metadata (7.2 kB)
Collecting dill (from evaluate)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting multiprocess (from evaluate)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec>=2021.05.0 (from fsspec[http]>=2021.05.0->evaluate)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m 

In [12]:
!pip install rouge_score

Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24935 sha256=64c986cc6c406b656b3a0018f964ad9264a5f304f03d09c42ef8a5f70fd981f2
  Stored in directory: /root/.cache/pip/wheels/1e/19/43/8a442dc83660ca25e163e1bd1f89919284ab0d0c1475475148
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2


In [2]:
!pip install py7zr

Collecting py7zr
  Downloading py7zr-0.22.0-py3-none-any.whl.metadata (16 kB)
Collecting texttable (from py7zr)
  Downloading texttable-1.7.0-py2.py3-none-any.whl.metadata (9.8 kB)
Collecting pycryptodomex>=3.16.0 (from py7zr)
  Downloading pycryptodomex-3.22.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting pyzstd>=0.15.9 (from py7zr)
  Downloading pyzstd-0.16.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.4 kB)
Collecting pyppmd<1.2.0,>=1.1.0 (from py7zr)
  Downloading pyppmd-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.5 kB)
Collecting pybcj<1.1.0,>=1.0.0 (from py7zr)
  Downloading pybcj-1.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.9 kB)
Collecting multivolumefile>=0.2.3 (from py7zr)
  Downloading multivolumefile-0.2.3-py3-none-any.whl.metadata (6.3 kB)
Collecting inflate64<1.1.0,>=1.0.0 (from py7zr)
  Downloading inflate64-1.0.1-cp311-cp311-manylinux_2_17_

In [3]:
import pandas as pd
import torch
from transformers import BartTokenizer, BartForConditionalGeneration, Trainer, TrainingArguments, DataCollatorForSeq2Seq
from evaluate import load as load_evaluator
from datasets import load_dataset

## **Dataset**

In [4]:
#Loading the SAMSum dataset which contains chat-style dialogues and summaries.
dataset = load_dataset("samsum")
dataset["train"] = dataset["train"].select(range(200))  # use only 200 samples for training
dataset["validation"] = dataset["validation"].select(range(50))  # use only 50 samples for validation

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/7.04k [00:00<?, ?B/s]

samsum.py:   0%|          | 0.00/3.36k [00:00<?, ?B/s]

The repository for samsum contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/samsum.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


corpus.7z:   0%|          | 0.00/2.94M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/14732 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/819 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/818 [00:00<?, ? examples/s]

## **Preprocessing**

In [5]:
#Tokenize inputs and targets using BART tokenizer.
tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")

prefix = "summarize: "
max_input_length = 512
max_target_length = 128

def preprocess_function(examples):
    inputs = [prefix + dialogue for dialogue in examples["dialogue"]]
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)

    with tokenizer.as_target_tokenizer():
        labels = tokenizer(examples["summary"], max_length=max_target_length, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

In [6]:
#Tokenize the dataset
tokenized_datasets = dataset.map(preprocess_function, batched=True, remove_columns=dataset["train"].column_names)

Map:   0%|          | 0/200 [00:00<?, ? examples/s]



Map:   0%|          | 0/819 [00:00<?, ? examples/s]

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

## **Model Training**

In [7]:
#Set up model, data collator, training arguments and Trainer
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

#Removed predict_with_generate as it's no longer a direct argument in TrainingArguments
training_args = TrainingArguments(
    output_dir="./bart-samsum-finetuned",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    weight_decay=0.01,
    save_total_limit=2,
    num_train_epochs=2,
    #predict_with_generate=True, # This is handled during the prediction phase using the `trainer.predict()` function with the `generation_config` argument
    logging_dir="./logs",
    logging_steps=10,
    report_to=[]
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)

trainer.train()

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

  trainer = Trainer(


Epoch,Training Loss,Validation Loss
1,1.4167,1.436421
2,0.9089,1.399986




TrainOutput(global_step=100, training_loss=1.2860404872894287, metrics={'train_runtime': 101.8578, 'train_samples_per_second': 3.927, 'train_steps_per_second': 0.982, 'total_flos': 232836765941760.0, 'train_loss': 1.2860404872894287, 'epoch': 2.0})

In [8]:
#Evaluating fine-tuned model on test data
results = []
for sample in dataset["test"]:
    input_text = prefix + sample['dialogue']
    inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=512)
    inputs = {key: val.to(model.device) for key, val in inputs.items()}
    summary_ids = model.generate(**inputs, max_length=128, min_length=30, do_sample=False)
    generated_summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    results.append({
        'dialogue': sample['dialogue'],
        'reference_summary': sample['summary'],
        'generated_summary': generated_summary
    })


In [9]:
#Store results in a dataframe and export them to a CSV file
df = pd.DataFrame(results)
df.to_csv("bart_finetuned_samsum_results.csv", index=False)

## **Testing**

In [17]:
#Evaluate using ROUGE

print("Manual test examples:")
test_examples = [
    "John: Are you free tomorrow? Sarah: Yes, after 3 PM. John: Let's meet at 4 then!",
    "Alice: The report is due Friday. Bob: I thought it was Monday. Alice: No, it was moved up.",
    "Sam: I'm picking up groceries now. Need anything? Liz: Yes, some milk and eggs."
]

for text in test_examples:
    input_tensor = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    input_tensor = {key: val.to(model.device) for key, val in input_tensor.items()}
    summary_ids = model.generate(**input_tensor, max_length=60, min_length=20, do_sample=False)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    print(f"Input: {text}\nSummary: {summary}\n")

#Use ROUGE metrics to compare generated summaries to the reference ones
rouge = load_evaluator("rouge")

predictions = [x['generated_summary'] for x in results]
references = [x['reference_summary'] for x in results]

print("\nROUGE scores for Fine-tuned BART:")
rouge_output = rouge.compute(predictions=predictions, references=references)
print(rouge_output)

Manual test examples:
Input: John: Are you free tomorrow? Sarah: Yes, after 3 PM. John: Let's meet at 4 then!
Summary: Sarah is free tomorrow after 3 PM. John will meet with Sarah at 4 PM.

Input: Alice: The report is due Friday. Bob: I thought it was Monday. Alice: No, it was moved up.
Summary: Alice and Bob are due a report on Friday. Alice thought it was Monday.

Input: Sam: I'm picking up groceries now. Need anything? Liz: Yes, some milk and eggs.
Summary: Sam is picking up milk and eggs now. Liz will pick up the eggs and milk for Sam.


ROUGE scores for Fine-tuned BART:
{'rouge1': np.float64(0.44457673591948), 'rouge2': np.float64(0.20878116360141485), 'rougeL': np.float64(0.3477666440934998), 'rougeLsum': np.float64(0.34778876553748395)}
