### Text Summerization with Prompt Engineering
Enhancing text summarization through prompt engineering involves crafting specialized instructions for language models, directing them to extract key content efficiently. This dynamic approach optimizes summaries by iteratively refining prompts, empowering users to tailor the summarization process for specific styles and emphases while leveraging the capabilities of advanced language models.

In [36]:
# Upgrade pip
%pip install --upgrade pip

# Install torch and torchdata with specified versions
%pip install --disable-pip-version-check torch==1.13.1 torchdata==0.5.1 --quiet

# Install transformers and datasets
%pip install transformers==4.27.2 datasets==2.11.0 --quiet


[0m

In [2]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer

In [3]:
huggingface_dataset_name = "knkarthick/dialogsum"

# Load the dataset
dataset = load_dataset(huggingface_dataset_name)

Downloading readme:   0%|          | 0.00/4.65k [00:00<?, ?B/s]

Downloading and preparing dataset csv/knkarthick--dialogsum to /root/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

### Dialogues

In [10]:
def print_example(index, dash_length=80):
    dash_line = '-' * dash_length

    print(dash_line)
    print(f'Example {index + 1}')
    print(dash_line)


    input_dialogue = dataset['test'][index]['dialogue']
    baseline_summary = dataset['test'][index]['summary']

    print('INPUT DIALOGUE:')
    print(input_dialogue)
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(baseline_summary)
    print(dash_line)
    print()



example_indices = [30]
custom_dash_length = 120

for i, index in enumerate(example_indices):
    print_example(index, custom_dash_length)


------------------------------------------------------------------------------------------------------------------------
Example 31
------------------------------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Where are you going for your trip?
#Person2#: I think Hebei is a good place.
#Person1#: But I heard the north of China are experiencing severe sandstorms!
#Person2#: Really?
#Person1#: Yes, it's said that Hebes was experiencing six degree strong winds.
#Person2#: How do these storms affect the people who live in these areas?
#Person1#: The report said the number of people with respiratory tract infections tended to rise after sandstorms. The sand gets into people's noses and throats and creates irritation.
#Person2#: It sounds that sandstorms are trouble for everybody!
#Person1#: You are quite right.
----------------------------------------------------------------------------------------------------------------

### FLAN-T5
Google released FLAN-T5 a large language model, a sizable language model. While already fine-tuned for various tasks, it remains adaptable for further fine-tuning across diverse applications.

In [11]:
model_name = 'google/flan-t5-base'

#Load pre-trained model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

### Summarizing with Prompt Engineering : Zero Shot Inference


In [30]:
line_separator = '-'.join('' for _ in range(100))
for example_index, dataset_index in enumerate(example_indices):
    dialogue = dataset['test'][dataset_index]['dialogue']
    baseline_summary = dataset['test'][dataset_index]['summary']

    task_prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
"""

    # Input constructed task_prompt instead of the dialogue.
    input_tokens = tokenizer(task_prompt, return_tensors='pt')
    generated_output = tokenizer.decode(
        model.generate(
            input_tokens["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(line_separator)
    print(f'Example {example_index + 1}')
    print(line_separator)
    print(f'DIALOGUE:\n{dialogue}')
    print(line_separator)
    print(f'BASELINE HUMAN SUMMARY:\n{baseline_summary}')
    print(line_separator)
    print(f'MODEL GENERATED SUMMARY:\n{generated_output}\n')


---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
DIALOGUE:
#Person1#: Where are you going for your trip?
#Person2#: I think Hebei is a good place.
#Person1#: But I heard the north of China are experiencing severe sandstorms!
#Person2#: Really?
#Person1#: Yes, it's said that Hebes was experiencing six degree strong winds.
#Person2#: How do these storms affect the people who live in these areas?
#Person1#: The report said the number of people with respiratory tract infections tended to rise after sandstorms. The sand gets into people's noses and throats and creates irritation.
#Person2#: It sounds that sandstorms are trouble for everybody!
#Person1#: You are quite right.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person2# plans to have a trip in Heb

### One shot and few shot inference
In these practices, you provide a language model (LLM) with a limited number of prompt-response pairs as examples before presenting the actual task prompt. This process is known as "in-context learning," where the model is exposed to specific examples to better understand and adapt to the desired task. One-shot inference typically involves a single example, while few-shot inference provides a small set of examples. These techniques enable the model to leverage context and generalize its understanding to perform more effectively on task-specific prompts

In [33]:
def make_prompt(example_indices_full, example_index_to_summarize):
    prompt = ''
    for index in example_indices_full:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']

        # The stop sequence '{summary}\n\n\n' for FLAN-T5
        prompt += f"""
Dialogue:

{dialogue}

What is it?
{summary}


"""

    dialogue = dataset['test'][example_index_to_summarize]['dialogue']

    prompt += f"""
Dialogue:

{dialogue}

What is it?
"""

    return prompt

In [34]:
example_indices_full = [30, 50]
example_index_to_summarize = 100

one_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(one_shot_prompt)


Dialogue:

#Person1#: Where are you going for your trip?
#Person2#: I think Hebei is a good place.
#Person1#: But I heard the north of China are experiencing severe sandstorms!
#Person2#: Really?
#Person1#: Yes, it's said that Hebes was experiencing six degree strong winds.
#Person2#: How do these storms affect the people who live in these areas?
#Person1#: The report said the number of people with respiratory tract infections tended to rise after sandstorms. The sand gets into people's noses and throats and creates irritation.
#Person2#: It sounds that sandstorms are trouble for everybody!
#Person1#: You are quite right.

What is it?
#Person2# plans to have a trip in Hebei but #Person1# says there are sandstorms in there.



Dialogue:

#Person1#: Yeah. Just pull on this strip. Then peel off the back.
#Person2#: You might make a few enemies this way.
#Person1#: If they don't think this is fun, they're not meant to be our friends.
#Person2#: You mean your friends. I think it's cruel.
#

In [35]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(one_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# and Mike have a disagreement on how to act out a scene. #Person1# proposes that Mike can try to act in #Person1#'s way.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
The two men are trying to figure out how to react to a cut.


In [29]:
from transformers import GenerationConfig
generation_config = GenerationConfig(max_new_tokens=50)
inputs = tokenizer(one_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        generation_config=generation_config,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')


---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
The two men are trying to figure out how to react to a cut.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# and Mike have a disagreement on how to act out a scene. #Person1# proposes that Mike can try to act in #Person1#'s way.

