## Overview
Text summarization produces a concise and fluent summary of a longer text document. There are two main text summarization types: extractive and abstractive. Extractive summarization involves selecting critical sentences from the original text and combining them to form a summary. Abstractive summarization involves generating new sentences representing the original text's main points. In this notebook, you go through a few examples of how large language models can help with generating summaries based on text.

Learn more about text summarization in the [official documentation](https://cloud.google.com/vertex-ai/docs/generative-ai/text/summarization-prompts).

## Objective
In this tutorial, you will learn how to use generative models to summarize information from text by working through the following examples:

* Transcript summarization
* Summarizing text into bullet points
* Dialogue summarization with to-dos
* Hashtag tokenization
* Title & heading generation
* You also learn how to evaluate model-generated summaries by comparing to human-created summaries using ROUGE as an evaluation framework.

In [1]:
# import libraries
import os
import vertexai
from IPython.display import Markdown, display
from google.oauth2 import service_account
from dotenv import load_dotenv
from vertexai.language_models import TextGenerationModel

In [2]:
# initiate service account (authentication)
json_path = '../llm-ai.json' # replace with your own service account
credentials = service_account.Credentials.from_service_account_file(json_path)

In [3]:
# start Vertex AI
load_dotenv()
vertexai.init(project=os.environ["PROJECT_ID"], # replace with your own project
              credentials=credentials)

In [4]:
# load the model
generation_model = TextGenerationModel.from_pretrained("text-bison@001")

## Text Summarization

### 1. Transcript summarization

In this first example, we summarize a piece of text on Google's New LLM named Gemini AI.

In [7]:
prompt = """
Provide a very short summary, no more than three sentences, for the following article:

AI has been the focus of my life's work, as for many of my research colleagues. 
Ever since programming AI for computer games as a teenager, and throughout my years as a neuroscience researcher trying to understand the workings of the brain, 
I’ve always believed that if we could build smarter machines, we could harness them to benefit humanity in incredible ways.

This promise of a world responsibly empowered by AI continues to drive our work at Google DeepMind. 
For a long time, we’ve wanted to build a new generation of AI models, inspired by the way people understand and interact with the world. 
AI that feels less like a smart piece of software and more like something useful and intuitive — an expert helper or assistant.


Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. 
It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, 
operate across and combine different types of information including text, code, audio, image and video.

Summary:

"""

summary = generation_model.predict(
        prompt, temperature=0.2, max_output_tokens=1024, top_k=40, top_p=0.8
    ).text

# you may also change this using print
display(Markdown(
    summary
))



Google DeepMind has built a new generation of AI models, inspired by the way people understand and interact with the world. 
This multimodal AI model, called Gemini, can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.

Instead of a summary, we can ask for a `TL;DR`` ("too long; didn't read"). We can compare the differences between the outputs generated.

In [8]:
prompt = """
Provide a TL;DR for the following article:

AI has been the focus of my life's work, as for many of my research colleagues. 
Ever since programming AI for computer games as a teenager, and throughout my years as a neuroscience researcher trying to understand the workings of the brain, 
I’ve always believed that if we could build smarter machines, we could harness them to benefit humanity in incredible ways.

This promise of a world responsibly empowered by AI continues to drive our work at Google DeepMind. 
For a long time, we’ve wanted to build a new generation of AI models, inspired by the way people understand and interact with the world. 
AI that feels less like a smart piece of software and more like something useful and intuitive — an expert helper or assistant.


Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. 
It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, 
operate across and combine different types of information including text, code, audio, image and video.

TL;DR:
"""

tldr = generation_model.predict(
        prompt, temperature=0.2, max_output_tokens=1024, top_k=40, top_p=0.8
    ).text

display(Markdown(
    tldr
))

AI has been the focus of my life's work. I believe that if we could build smarter machines, we could harness them to benefit humanity in incredible ways.

We've wanted to build a new generation of AI models, inspired by the way people understand and interact with the world. AI that feels less like a smart piece of software and more like something useful and intuitive — an expert helper or assistant.

Gemini is the result of large-scale collaborative efforts by teams across Google. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.

### 2. Summarize text into bullet points

In the following example, we use same text on Gemini AI, but ask the model to summarize it in bullet-point form. Feel free to change the prompt.

In [9]:
prompt = """
Provide a very short summary in four bullet points for the following article:

AI has been the focus of my life's work, as for many of my research colleagues. 
Ever since programming AI for computer games as a teenager, and throughout my years as a neuroscience researcher trying to understand the workings of the brain, 
I’ve always believed that if we could build smarter machines, we could harness them to benefit humanity in incredible ways.
This promise of a world responsibly empowered by AI continues to drive our work at Google DeepMind. 
For a long time, we’ve wanted to build a new generation of AI models, inspired by the way people understand and interact with the world. 
AI that feels less like a smart piece of software and more like something useful and intuitive — an expert helper or assistant.
Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. 
It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, 
operate across and combine different types of information including text, code, audio, image and video.
Bulletpoints:

"""

points = generation_model.predict(
        prompt, temperature=0.2, max_output_tokens=1024, top_k=40, top_p=0.8
    ).text

display(Markdown(
    points
))


- AI has been the focus of my life's work.
- I believe that AI can benefit humanity in incredible ways.
- We've wanted to build a new generation of AI models.
- Gemini is the result of large-scale collaborative efforts.

### 3. Dialogue summarization with to-dos
Dialogue summarization involves condensing a conversation into a shorter format so that you don't need to read the whole discussion but can leverage a summary. In this example, we ask the model to summarize an example conversation between an online retail customer and a support agent, and include to-dos at the end.

In [10]:
prompt = """
Please generate a summary of the following conversation and at the end summarize the to-do's for the support Agent:

Customer: Hi, I'm Larry, and I received the wrong item.

Support Agent: Hi, Larry. How would you like to see this resolved?

Customer: That's alright. I want to return the item and get a refund, please.

Support Agent: Of course. I can process the refund for you now. Can I have your order number, please?

Customer: It's [ORDER NUMBER].

Support Agent: Thank you. I've processed the refund, and you will receive your money back within 14 days.

Customer: Thank you very much.

Support Agent: You're welcome, Larry. Have a good day!

Summary:
"""

todos = generation_model.predict(
        prompt, temperature=0.2, max_output_tokens=256, top_k=40, top_p=0.8
    ).text

display(
    Markdown(todos)
)

Larry received the wrong item and wants to return it for a refund.
The support agent processed the refund and Larry will receive his money back within 14 days.
To-do's for the support agent:
- Process the refund.
- Send an email to Larry confirming the refund.

### 4. Hashtag tokenization

Hashtag tokenization is the process of taking a piece of text and getting the hashtag "tokens" out of it. We can use this, for example, if we want to generate hashtags for our social media campaigns. In this example, we take this tweet from [Google Cloud](https://twitter.com/googlecloud/status/1649127992348606469) and generate some hashtags we can use.

In [11]:
prompt = """
Tokenize the hashtags of this tweet:

Google Cloud
@googlecloud
How can data help our changing planet? 🌎

In honor of #EarthDay this weekend, we’re proud to share how we’re partnering with
@ClimateEngine
 to harness the power of geospatial data and drive toward a more sustainable future.

Check out how → https://goo.gle/3mOUfts
"""
hashtag = generation_model.predict(
        prompt, temperature=0.2, max_output_tokens=256, top_k=40, top_p=0.8
    ).text

display(
    Markdown(hashtag)
)

EarthDay

### 5. Title & heading generation

Below, we ask the model to generate five options for possible title/heading combos for a given piece of text.

In [12]:
prompt = """
Write a title for this text, give me five options:
Whether helping physicians identify disease or finding photos of “hugs,” AI is behind a lot of the work we do at Google. And at our Arts & Culture Lab in Paris, we’ve been experimenting with how AI can be used for the benefit of culture.
Today, we’re sharing our latest experiments—prototypes that build on seven years of work in partnership the 1,500 cultural institutions around the world.
Each of these experimental applications runs AI algorithms in the background to let you unearth cultural connections hidden in archives—and even find artworks that match your home decor."
"""
title = generation_model.predict(
        prompt, temperature=0.8, max_output_tokens=256, top_k=1, top_p=0.8
    ).text

display(
    Markdown(title)
)

1. How AI is used for the benefit of culture
2. Google Arts & Culture Lab experiments with AI
3. AI in the Arts & Culture Lab
4. AI for culture
5. AI in the Arts

## Evaluation

We can evaluate the outputs from summarization tasks using [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)) as an evalulation framework. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) are measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as `n-gram`, `word sequences`, and `word pairs` between the computer-generated summary to be evaluated and the ideal summaries created by humans.

The first step is to install the ROUGE library.

`pip install rouge`

In [13]:
from rouge import Rouge

ROUGE = Rouge()

prompt = """
Provide a very short, maximum four sentences, summary for the following article:

Our quantum computers work by manipulating qubits in an orchestrated fashion that we call quantum algorithms.
The challenge is that qubits are so sensitive that even stray light can cause calculation errors — and the problem worsens as quantum computers grow.
This has significant consequences, since the best quantum algorithms that we know for running useful applications require the error rates of our qubits to be far lower than we have today.
To bridge this gap, we will need quantum error correction.
Quantum error correction protects information by encoding it across multiple physical qubits to form a “logical qubit,” and is believed to be the only way to produce a large-scale quantum computer with error rates low enough for useful calculations.
Instead of computing on the individual qubits themselves, we will then compute on logical qubits. By encoding larger numbers of physical qubits on our quantum processor into one logical qubit, we hope to reduce the error rates to enable useful quantum algorithms.

Summary:

"""

candidate = generation_model.predict(
    prompt, temperature=0.1, max_output_tokens=1024, top_k=40, top_p=0.9
).text

display(
    Markdown(candidate)
    )

Quantum computers work by manipulating qubits.
Qubits are sensitive to errors, and the problem worsens as quantum computers grow.
Quantum error correction protects information by encoding it across multiple physical qubits to form a “logical qubit”.
By encoding larger numbers of physical qubits on our quantum processor into one logical qubit, we hope to reduce the error rates to enable useful quantum algorithms.

We will also need a human-generated summary that we will use to compare to the candidate generated by the model. We will call this `reference`.

In [14]:
reference = """
Quantum computers are sensitive to noise and errors. 
To bridge this gap, we will need quantum error correction.
"""

Now we can take the candidate and reference to evaluate the performance. In this case, ROUGE will give us:

* `rouge-1`, which measures unigram overlap
* `rouge-2`, which measures bigram overlap
* `rouge-l`, which measures the longest common subsequence

In [15]:
ROUGE.get_scores(candidate, reference)

[{'rouge-1': {'r': 0.5263157894736842,
   'p': 0.20408163265306123,
   'f': 0.29411764303200694},
  'rouge-2': {'r': 0.2222222222222222,
   'p': 0.06557377049180328,
   'f': 0.10126581926614336},
  'rouge-l': {'r': 0.47368421052631576,
   'p': 0.1836734693877551,
   'f': 0.26470587832612463}}]