### Perform PEGASUS summarization

I used this notebook for performing summarization on all of the movie storylines I had in my dataset. 
The whole notebook took almost 34 hours to finish its job.

In [None]:
from transformers import PegasusTokenizer, PegasusForConditionalGeneration

In [None]:
# Load model and tokenizer
model_name = "google/pegasus-cnn_dailymail"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)

In [None]:
def summarize(text: str) -> str:

    # Tokenize the text
    inputs = tokenizer(text, truncation=True, padding="longest", return_tensors="pt")

    # Generate summary
    summary_ids = model.generate(**inputs, early_stopping=True)

    # Decode and print summary
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

In [None]:
# Your input text
text = """
Climate change is affecting every country on every continent. It is disrupting national economies 
and affecting lives, costing people, communities, and countries dearly today and even more tomorrow. 
Weather patterns are changing, sea levels are rising, and weather events are becoming more extreme.
"""

print(summarize(text))

In [None]:
example2 = """Temporarily restored to his normal size, Shinichi joins Ran and their friends on a trip to Kyoto.
There, he meets Keiko, an old actress friend of his mother, who has received a strange code from a friend who committed suicide years ago.
"""

print(summarize(example2))

In [None]:
example3 = "Five gay men from different walks of life are confronted with important choices that could change everything for them."
print(summarize(example3))

In [None]:
import pandas as pd

# Read the CSV file
df = pd.read_csv('clean_db.csv')

# Display the first few rows of the dataframe
df.head()

In [None]:
# Add a new column to the dataframe
df['pegasusSummary'] = ''
df.head()

In [None]:
df['pegasusSummary'] = df['storyLine'].apply(summarize)

df.to_csv('final_db_with_pegasus_summaries.csv', index=False)
