# 在微调之前对 FLAN-T5 模型运行推理

在本实验中，我们现在将使用生成人工智能进行对话摘要任务。提示工程是使用基础模型生成文本的重要概念。我们将探讨输入文本如何直接影响本笔记本中模型的输出。  请查看 Amazon Science 的博客 [this blog](https://www.amazon.science/blog/emnlp-prompt-engineering-is-the-new-feature-engineering)，快速了解提示工程。

对于我们的具体用例，我们可以使用 FLAN-T5 模型使用Zero Shot生成摘要，以此查看基础 LLM 的表现如何，而无需进行任何微调。 请查看博客 [this blog from AWS](https://aws.amazon.com/blogs/machine-learning/zero-shot-prompting-for-the-flan-t5-foundation-model-in-amazon-sagemaker-jumpstart/) ，快速描述什么是Zero Shot Learning，以及为什么它是 FLAN 模型的重要概念等等

<a name='1'></a>
## Set up Kernel and Required Dependencies

First, check that the correct kernel is chosen.

<img src="img/kernel_set_up.png" width="300"/>

You can click on that to see and check the details of the image, kernel, and instance type.

<img src="img/w3_kernel_and_instance_type.png" width="600"/>

In [None]:
import psutil

notebook_memory = psutil.virtual_memory()
print(notebook_memory)

if notebook_memory.total < 32 * 1000 * 1000 * 1000:
    print('*******************************************')    
    print('YOU ARE NOT USING THE CORRECT INSTANCE TYPE')
    print('PLEASE CHANGE INSTANCE TYPE TO  m5.2xlarge ')
    print('*******************************************')
else:
    correct_instance_type=True

In [2]:
%store -r setup_dependencies_passed

In [3]:
try:
    setup_dependencies_passed
except NameError:
    print("++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOU HAVE TO RUN NOTEBOOK #01.         ")
    print("You did not install the required libraries.   ")
    print("++++++++++++++++++++++++++++++++++++++++++++++")

In [4]:
model_checkpoint='google/flan-t5-base'

In [5]:
%store model_checkpoint

Stored 'model_checkpoint' (str)


In [6]:
%store -r model_checkpoint

In [7]:
huggingface_dataset_name = "knkarthick/dialogsum"

# 加载摘要数据集

In [8]:
# 导入 "datasets" 库的 "load_dataset" 函数
from datasets import load_dataset

# 使用 "load_dataset" 函数加载指定的Hugging Face数据集
# 这里的huggingface_dataset_name = "knkarthick/dialogsum"
dataset = load_dataset(huggingface_dataset_name)

Downloading readme:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

Downloading and preparing dataset csv/knkarthick--dialogsum to /root/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-391706c81424fc80/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-391706c81424fc80/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

In [9]:
"""
从数据集中提取了特定索引处的样本，并打印出对话和摘要
"""

# 定义一个列表，包含我们想要提取的样本的索引
example_indices = [40, 80, 160,]

# 打印示例输入对话的标题
print('Example Input Dialogue:')

# 通过索引访问测试集中的样本，并打印出对话内容
# 'dialogue' 是数据集中的一个字段，表示对话内容
print(dataset['test'][example_indices[0]]['dialogue'])

# 打印一个空行，使输出更加清晰可读
print()

Example Input Dialogue:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.



# 加载 LLM 对应的 Tokenizer

In [11]:
# 导入 "transformers" 库的 "AutoTokenizer" 类
from transformers import AutoTokenizer

# 使用 "AutoTokenizer" 的 "from_pretrained" 方法加载预训练模型的分词器
# 这里的 model_checkpoint = 'google/flan-t5-base'
# "use_fast=True" 表示使用快速分词器，快速分词器相比于传统的Python分词器，采用了Rust语言，执行效率更高
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

In [12]:
# 从 "transformers" 库导入 "AutoModelForSeq2SeqLM" 类
from transformers import AutoModelForSeq2SeqLM

# 使用 "AutoModelForSeq2SeqLM" 的 "from_pretrained" 方法加载预训练的序列到序列语言模型
# 这里的 model_checkpoint = 'google/flan-t5-base'
model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

# 探索在没有Prompt Engineering的情况下会发生什么

在没有对话文本进行任何提示工程的情况下，你可以看到下面的模型不确定它应该完成什么任务，它试图编造对话中的下一句话。模型的猜测还算不错，但这不是我们希望模型在这种情况下执行的任务。

In [13]:
# 从测试集中获取对话和摘要的样本
dialogue = dataset['test'][example_indices[0]]['dialogue']
summary = dataset['test'][example_indices[0]]['summary']

# 使用分词器对对话进行编码，返回的是一个包含编码信息的字典，其中包括 'input_ids' 和 'attention_mask'
# 'return_tensors='pt'' 表示返回的是 PyTorch 的张量
inputs = tokenizer(dialogue, return_tensors='pt')

# 使用模型生成新的文本
# 'model.generate' 函数接收 'input_ids' 作为输入
# 'max_new_tokens' 参数限定了生成的新令牌的最大数量
# 'tokenizer.decode' 函数将生成的新令牌解码为文本
# 'skip_special_tokens=True' 表示在解码时跳过特殊令牌（如 [PAD], [CLS] 等）
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"], 
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

# 打印输入和模型生成的新文本
print(f'INPUT PROMPT:\n{dialogue}\n')
print(f'MODEL GENERATION:\n{output}')

INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

MODEL GENERATION:
Person1: It's ten to nine.


In [14]:
# 从测试集中获取对话和摘要的样本
# 注意这次提取的是索引为 `example_indices[1]` 的样本
dialogue = dataset['test'][example_indices[1]]['dialogue']
summary = dataset['test'][example_indices[1]]['summary']

# 使用分词器对对话进行编码，返回的是一个包含编码信息的字典，其中包括 'input_ids' 和 'attention_mask'
# 'return_tensors='pt'' 表示返回的是 PyTorch 的张量
inputs = tokenizer(dialogue, return_tensors='pt')

# 使用模型生成新的文本
# 'model.generate' 函数接收 'input_ids' 作为输入，'max_new_tokens' 参数限定了生成的新令牌的最大数量
# 'tokenizer.decode' 函数将生成的新令牌解码为文本，'skip_special_tokens=True' 表示在解码时跳过特殊令牌（如 [PAD], [CLS] 等）
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"], 
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

# 打印输入和模型生成的新文本
print(f'INPUT PROMPT:\n{dialogue}\n')
print(f'MODEL GENERATION:\n{output}')

INPUT PROMPT:
#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. May, can you help me take all these things to the living room?
#Person2#: Yes, madam.
#Person1#: Ask Daniel to give you a hand?
#Person2#: No, mom, I can manage it by myself. His help just causes more trouble.

MODEL GENERATION:
#Person1#: May, can you help me prepare the picnic?


# 在模型输入中添加文本提示

在结尾添加 "summary:" 命令似乎真的有助于模型理解它现在应该做什么。请注意，模型仍然没有理解对话的细微差别。这是我们希望通过微调来解决的问题。

In [16]:
"""
这段代码与前面的几段代码类似，也是从数据集中提取对话和摘要的样本，然后使用分词器对对话进行编码，并让模型生成新的文本。
但这次，以 "Summarize the following conversation." 为开始提示，以 "\n\nSummary: " 为结束提示，构建了一个完整的提示。
"""

# 定义开始提示和结束提示
start_prompt = 'Summarize the following conversation.\n'
end_prompt = '\n\nSummary: '

# 从测试集中获取对话和摘要的样本
dialogue = dataset['test'][example_indices[0]]['dialogue']
summary = dataset['test'][example_indices[0]]['summary']

# 构建完整的提示
prompt = f'{start_prompt}{dialogue}{end_prompt}'

# 使用分词器对提示进行编码，返回的是一个包含编码信息的字典，其中包括 'input_ids' 和 'attention_mask'
# 'return_tensors='pt'' 表示返回的是 PyTorch 的张量
inputs = tokenizer(prompt, return_tensors='pt')

# 使用模型生成新的文本
# 'model.generate' 函数接收 'input_ids' 作为输入，'max_new_tokens' 参数限定了生成的新令牌的最大数量
# 'tokenizer.decode' 函数将生成的新令牌解码为文本，'skip_special_tokens=True' 表示在解码时跳过特殊令牌（如 [PAD], [CLS] 等）
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"], 
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

# 打印输入提示、模型生成的新文本和基准摘要
print(f'INPUT PROMPT:\n{prompt}\n')
print(f'MODEL GENERATION:\n{output}\n')
print(f'BASELINE SUMMARY:\n{summary}')

INPUT PROMPT:
Summarize the following conversation.
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

Summary: 

MODEL GENERATION:
The train is about to leave.

BASELINE SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.


In [17]:
start_prompt = 'Summarize the following conversation.\n'
end_prompt = '\n\nSummary: '
dialogue = dataset['test'][example_indices[1]]['dialogue']
summary = dataset['test'][example_indices[1]]['summary']
prompt = f'{start_prompt}{dialogue}{end_prompt}'

inputs = tokenizer(prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"], 
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)
print(f'INPUT PROMPT:\n{prompt}\n')
print(f'MODEL GENERATION:\n{output}\n')
print(f'BASELINE SUMMARY:\n{summary}')

INPUT PROMPT:
Summarize the following conversation.
#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. May, can you help me take all these things to the living room?
#Person2#: Yes, madam.
#Person1#: Ask Daniel to give you a hand?
#Person2#: No, mom, I can manage it by myself. His help just causes more trouble.

Summary: 

MODEL GENERATION:
The weather report says it will be sunny all day.

BASELINE SUMMARY:
Mom asks May to help to prepare for the picnic and May agrees.


# 试试不同的 FLAN 提示模板

下一个自然需要检查的是对话开始和结束处的文本是否针对手头的任务进行了优化。FLAN有许多针对特定任务发布的[提示模板](https://github.com/google-research/FLAN/tree/main/flan/v2). 正如您可以在下面看到的那样，选择预先构建的FLAN提示之一确实有助于第二个示例，但第一个示例仍然难以理解对话的细微差别。

In [18]:
start_prompt = 'Dialogue:\n'
end_prompt = '\nWhat was going on?'
dialogue = dataset['test'][example_indices[0]]['dialogue']
summary = dataset['test'][example_indices[0]]['summary']
prompt = f'{start_prompt}{dialogue}{end_prompt}'

inputs = tokenizer(prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"], 
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)
print(f'INPUT PROMPT:\n{prompt}\n')
print(f'MODEL GENERATION:\n{output}\n')
print(f'BASELINE SUMMARY:\n{summary}')

INPUT PROMPT:
Dialogue:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
What was going on?

MODEL GENERATION:
Tom is late for the train.

BASELINE SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.


In [19]:
start_prompt = 'Dialogue:\n'
end_prompt = '\nWhat was going on?'
dialogue = dataset['test'][example_indices[1]]['dialogue']
summary = dataset['test'][example_indices[1]]['summary']
prompt = f'{start_prompt}{dialogue}{end_prompt}'

inputs = tokenizer(prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"], 
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)
print(f'INPUT PROMPT:\n{prompt}\n')
print(f'MODEL GENERATION:\n{output}\n')
print(f'BASELINE SUMMARY:\n{summary}')

INPUT PROMPT:
Dialogue:
#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. May, can you help me take all these things to the living room?
#Person2#: Yes, madam.
#Person1#: Ask Daniel to give you a hand?
#Person2#: No, mom, I can manage it by myself. His help just causes more trouble.
What was going on?

MODEL GENERATION:
Person1 wants to prepare for the picnic. May will help her prepare the food.

BASELINE SUMMARY:
Mom asks May to help to prepare for the picnic and May agrees.


# 使用few-shot推理

Few-shot inference（少样本推断）是向大型语言模型（LLM）提供一些示例，告诉模型在特定任务中应该生成什么样的输出。您可以在HuggingFace的这篇[博客文章](https://huggingface.co/blog/few-shot-learning-gpt-neo-and-inference-api)中了解更多相关信息，该文章介绍了为什么少样本推断是一个有用的工具以及如何使用它。

在示例中，可以看到向模型提供了2个示例，这为模型提供了更多信息，并在下面的摘要中从质量上进行了改进。

In [20]:
# 定义对话开始的提示内容
start_prompt = 'Dialogue:\n'
# 定义对话结束后的提示内容
end_prompt = '\nWhat was going on? '
# 定义停止序列，用于指示对话的结束
stop_sequence = '\n\n\n'

In [21]:
"""
这个函数的主要功能是生成提示字符串。对于指定次数的shot（num_shots），它将测试集中的对话和摘要数据添加到提示字符串中。
当达到最后一次shot时，它只添加对话和对话结束提示，而不添加摘要和停止序列。
"""

def make_prompt(num_shots):
    # 初始化空字符串作为提示
    prompt = ''
    # 对于num_shots + 1次数进行循环
    for i in range(num_shots + 1):
        # 如果当前循环是最后一次
        if i == num_shots:
            # 获取测试集中的对话数据
            dialogue = dataset['test'][example_indices[0]]['dialogue']
            # 获取测试集中的摘要数据
            summary = dataset['test'][example_indices[0]]['summary']
            # 将对话和对话结束提示添加到提示字符串中
            prompt = prompt + f'{start_prompt}{dialogue}{end_prompt}'
        else:
            # 如果当前循环不是最后一次
            # 获取测试集中的对话数据
            dialogue = dataset['test'][example_indices[i+1]]['dialogue']
            # 获取测试集中的摘要数据
            summary = dataset['test'][example_indices[i+1]]['summary']
            # 将对话、对话结束提示、摘要和停止序列添加到提示字符串中
            prompt = prompt + f'{start_prompt}{dialogue}{end_prompt}{summary}\n{stop_sequence}\n'
    # 返回生成的提示字符串
    return prompt

In [22]:
# 调用make_prompt函数，参数为2，生成包含2 shot的提示字符串
few_shot_prompt = make_prompt(2)
# 打印生成的提示字符串
print(few_shot_prompt)

Dialogue:
#Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to take napkins disposable plates, cups and picnic blanket.
#Person1#: All set. May, can you help me take all these things to the living room?
#Person2#: Yes, madam.
#Person1#: Ask Daniel to give you a hand?
#Person2#: No, mom, I can manage it by myself. His help just causes more trouble.
What was going on? Mom asks May to help to prepare for the picnic and May agrees.




Dialogue:
#Person1#: Did you hear about Lulu?
#Person2#: No, what?
#Person1#: She and Vic broke up and now she ' s asked for a transfer.
#Person2#: Get out of here! I didn ' t even know

In [23]:
# 使用分词器(tokenizer)对提示字符串进行编码，然后返回tensor形式的输入
inputs = tokenizer(few_shot_prompt, return_tensors='pt')

# 使用模型生成回应
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],  # 输入的编码
        max_new_tokens=50,  # 生成的新token的最大数量
    )[0], 
    skip_special_tokens=True  # 在解码时跳过特殊的token
)

# 打印模型生成的回应
print(f'FEW SHOT RESPONSE: {output}')

# 从测试集中获取期望的回应（即摘要）
summary = dataset['test'][example_indices[0]]['summary']

# 打印期望的回应
print(f'EXPECTED RESPONSE: {summary}')

FEW SHOT RESPONSE: Tom is late for the train. He has to catch it at 9:30.
EXPECTED RESPONSE: #Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.


# 结论

正如您所看到的，对于这个用例来说，Prompt设计可以帮助我们很大程度上实现目标，但也存在一些限制。接下来，我们将开始探索如何使用fine-tuning来帮助您的LLM更深入地理解特定的用例。

# Release Resources

In [None]:
%%html

<p><b>Shutting down your kernel for this notebook to release resources.</b></p>
<button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>
        
<script>
try {
    els = document.getElementsByClassName("sm-command-button");
    els[0].click();
}
catch(err) {
    // NoOp
}    
</script>