# Part 1 - Creating a baseline

In this notebook we will create a simple yet important baseline so that we have an idea of how much our deep learning model improves the summaries. We use the ROUGE metric to measure the baseline.

In [None]:
import pandas as pd
df_test = pd.read_csv('data/test.csv')

In [None]:
df_test.head()

In [None]:
from datasets import load_metric
metric = load_metric("rouge")

We're copying this function from https://github.com/huggingface/transformers/blob/v4.6.1/examples/pytorch/summarization/run_summarization.py to ensure we always use the same metric calculation.

In [None]:
def calc_rouge_scores(candidates, references):
    result = metric.compute(predictions=candidates, references=references, use_stemmer=True)
    result = {key: round(value.mid.fmeasure * 100, 1) for key, value in result.items()}
    return result

The summaries from the test dataset are the references

In [None]:
ref_summaries = list(df_test['summary'])

Now we cerate 3 baselines by comparing the reference summaries with the first sentence, the first 2 sentences, and the first 3 sentences in the abstract

In [None]:
import re
for i in range (3):
    candidate_summaries = list(df_test['text'].apply(lambda x: ' '.join(re.split(r'(?<=[.:;])\s', x)[:i+1])))
    print(f"First {i+1} senctences: Scores {calc_rouge_scores(candidate_summaries, ref_summaries)}")