Get ready to unlock the magic of text summarization using FLAN-T5 — a powerful language model that’s perfect for creating concise summaries of lengthy texts. In this casual and friendly guide, I’ll take you on a journey through the fascinating world of text summarization. We’ll cover everything from loading FLAN-T5 to crafting prompt templates for zero-shot, one-shot, and few-shot inference. Don’t worry if you’re new to this — by the end of this article, you’ll be generating awesome summaries like a pro! So, grab your favorite snack and let’s dive into the realm of creative text summarization.

Loading the FLAN-T5 Model and Dataset:
Let’s kick things off by loading the FLAN-T5 model and the dataset we’ll use for text summarization. FLAN-T5 is a powerful language model developed by Google, designed to handle text generation tasks. We’ll be using it for our text summarization adventure!

In [None]:
%pip install --disable-pip-version-check   torch   torchdata --quiet

%pip install   transformers   datasets

Collecting transformers
  Downloading transformers-4.34.0-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.14.5-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m26.3 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers)
  Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m32.2 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.15,>=0.14 (from transformers)
  Downloading tokenizers-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m56.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safeten

In [None]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

In [None]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

In [None]:
example_indices = [40, 200]

dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
Exa

In [None]:

huggingface_dataset_name = "emrgnt-cmplxty/sciphi-textbooks-are-all-you-need"

dataset = load_dataset(huggingface_dataset_name)

Downloading readme:   0%|          | 0.00/1.27k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/184M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/181M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/181M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/181M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/183M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/185M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/185M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/681845 [00:00<?, ? examples/s]

In [None]:

# Print out the first example from the training set
print(dataset['train'][0])

{'formatted_prompt': 'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n### Instruction:\nImagine you are a prolific author tasked with writing a textbook. You are working on writing a textbook chapter titled "Marine Biology: Unveiling the Ocean\'s Depths - A Detailed Study of Marine Biodiversity using Satellite Tags".\n\nYour **first task** is to write an outline for the chapter. Your **second task** is to compose a eighteen paragraph page from the chapter. Your **last task** is to create a set of assessment questions that test the reader\'s understanding of the chapter.\n\nNotes:\n- Provide historical context or background information where relevant.\nAim for a well-rounded and insightful response, keeping in mind the diversity in audience knowledge.\n\n\n### Response:', 'completion': "**Chapter Outline:**\n1. Introduction\n    - Brief history of marine biology research\n    - Importance of understanding marine biodiversity\n  

In [None]:
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['formatted_prompt', 'completion', 'first_task', 'second_task', 'last_task', 'notes', 'title', 'model', 'temperature'],
        num_rows: 681845
    })
})


In [None]:
num_examples_to_print = 5

print(f"\nFirst {num_examples_to_print} examples from 'train' split:")
for i in range(num_examples_to_print):
    print(f"\nExample {i+1}:")
    print(dataset['train'][i]['completion'])


First 5 examples from 'train' split:

Example 1:
**Chapter Outline:**
1. Introduction
    - Brief history of marine biology research
    - Importance of understanding marine biodiversity
    - Overview of satellite tagging in marine research

2. Satellite Tagging Techniques
    - Types of satellite tags
    - Tag design and materials
    - Attachment methods for marine animals
    - Tagging protocols and ethical considerations

3. Tracking and Data Analysis
    - Satellite communication technology
    - Data transmission and storage
    - Data analysis techniques for tracking animal movements and behavior
    - Limitations and accuracy of satellite tag data

4. Applications of Satellite Tagging
    - Population dynamics and migration studies
    - Ecosystem modeling and management
    - Human impacts on marine ecosystems
    - Conservation and management strategies

5. Case Studies
    - Maritime mammals (e.g., whales, dolphins)
    - Seabirds
    - Coastal and open-ocean fish species