<table style="width:100%">
<tr>
<td style="vertical-align:middle; text-align:left;">
<font size="2">
以下代码为 <a href="http://mng.bz/orYv">《从零开始构建大型语言模型》</a> 一书的补充代码，作者为 <a href="https://sebastianraschka.com">Sebastian Raschka</a><br>
<br>中文翻译和代码详细注释由Lux整理，Github下载地址：<a href="https://github.com/luxianyu">https://github.com/luxianyu</a>
    
<br>Lux的Github上还有吴恩达深度学习Pytorch版学习笔记及中文详细注释的代码下载
    
</font>
</td>
<td style="vertical-align:middle; text-align:left;">
<a href="http://mng.bz/orYv"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp" width="100px"></a>
</td>
</tr>
</table>


# 使用 Llama 3.1 70B 和 Ollama 生成偏好数据集


```markdown
- 偏好微调（Preference finetuning）是将指令微调后的 LLM 与人类偏好对齐的过程  
- 有多种方法可以为 LLM 的偏好微调创建数据集：
  1. 使用指令微调后的 LLM 生成多个响应，并由人工根据偏好或给定偏好标准进行排序
  2. 使用指令微调后的 LLM 生成多个响应，并由 LLM 根据给定的偏好标准进行排序
  3. 使用 LLM 根据特定偏好标准生成偏好和不偏好响应
- 在本笔记本中，我们考虑方法 3  
- 本笔记本通过 Ollama 使用一个 700 亿参数的 Llama 3.1-Instruct 模型，为指令数据集生成偏好标签  
- 指令数据集的预期格式如下：

### 输入
```

```json
[
    {
        "instruction": "What is the state capital of California?",
        "input": "",
        "output": "The state capital of California is Sacramento.",
    },
    {
        "instruction": "Provide a synonym for 'fast'.",
        "input": "",
        "output": "A synonym for 'fast' is 'quick'.",
    },
    {
        "instruction": "What is the capital of Greece?",
        "input": "",
        "output": "The capital of Greece is Athens.",

    },
...
]
```

输出数据集将如下所示，其中更礼貌的响应为优选（`'chosen'`），而更不礼貌的响应为不优选（`'rejected'`）：

```json
[
    {
        "instruction": "What is the state capital of California?",
        "input": "",
        "output": "The state capital of California is Sacramento.",
        "rejected": "Look, the state capital of California is obviously Sacramento.",
        "chosen": "The state capital of California is Sacramento."
    },
    {
        "instruction": "Provide a synonym for 'fast'.",
        "input": "",
        "output": "A synonym for 'fast' is 'quick'.",
        "chosen": "A suitable alternative to 'fast' would be 'quick'.",
        "rejected": "A synonym for 'fast' is 'quick'."
    },
    {
        "instruction": "What is the capital of Greece?",
        "input": "",
        "output": "The capital of Greece is Athens.",
        "chosen": "I'd be happy to help! The capital of Greece is indeed Athens.",
        "rejected": "The capital of Greece is Athens."
    },
...
]
```

### 输出

- 该代码不需要 GPU，只要 RAM 足够，可在笔记本电脑上运行


In [1]:
from importlib.metadata import version

pkgs = ["tqdm",    # Progress bar
        ]

for p in pkgs:
    print(f"{p} version: {version(p)}")

tqdm version: 4.66.4


## 安装 Ollama 并下载 Llama 3.1


- Ollama 是一个高效运行大型语言模型（LLM）的应用程序
- 它是 [llama.cpp](https://github.com/ggerganov/llama.cpp) 的一个封装，llama.cpp 使用纯 C/C++ 实现 LLM，以最大化效率
- 请注意，这个工具用于生成文本（推理），而不是用于训练或微调 LLM
- 在运行下面的代码之前，请访问 [https://ollama.com](https://ollama.com) 安装 Ollama，并按照指示操作（例如，点击“Download”按钮下载适用于你的操作系统的 Ollama 应用程序）


- 对于 macOS 和 Windows 用户，点击你下载的 Ollama 应用程序；如果提示是否安装命令行使用功能，请选择“yes”
- Linux 用户可以使用 Ollama 网站提供的安装命令

- 一般来说，在命令行使用 Ollama 之前，我们需要先启动 Ollama 应用程序，或者在另一个终端运行 `ollama serve`

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/ollama-eval/ollama-serve.webp?1">

- 在 Ollama 应用程序或 `ollama serve` 正在运行的情况下，在另一个终端中执行以下命令以试用 700 亿参数的 Llama 3.1 模型


```bash
# 70B model
ollama run llama3.1:70b
```


输出结果如下所示：


```
$ ollama run llama3.1:70b
pulling manifest
pulling aa81b541aae6... 100% ▕████████████████▏ 39 GB
pulling 8cf247399e57... 100% ▕████████████████▏ 1.7 KB
pulling f1cd752815fc... 100% ▕████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████▏ 96 B
pulling 3c1c2d3df5b3... 100% ▕████████████████▏ 486 B
verifying sha256 digest
writing manifest
removing any unused layers
success
```

- 请注意，`llama3.1:70b` 指的是经过指令微调的 700 亿参数 Llama 3.1 模型。

- 或者，你也可以使用更小、更节省资源的 80 亿参数 Llama 3.1 模型，只需将 `llama3.1:70b` 替换为 `llama3.1`。

- 下载完成后，你会看到一个命令行提示符，允许你与模型进行对话。

- 尝试输入类似 "What do llamas eat?" 的提示语，这应该会返回如下类似的输出：


```
>>> What do llamas eat?
Llamas are ruminant animals, which means they have a four-chambered 
stomach and eat plants that are high in fiber. In the wild, llamas 
typically feed on:
1. Grasses: They love to graze on various types of grasses, including tall 
grasses, wheat, oats, and barley.
```

- 你可以通过输入 `/bye` 来结束此会话。


## 使用 Ollama 的 REST API


- 现在，与模型交互的另一种方式是通过 Python 调用其 REST API，使用如下函数
- 在运行本笔记本的下一些代码单元之前，请确保 Ollama 仍在运行，如上文所述，可以通过：
  - 在终端运行 `ollama serve`
  - 或者打开 Ollama 应用程序
- 接下来，运行以下代码单元以查询模型


- 首先，让我们用一个简单的示例来测试 API，以确保它按预期工作：


In [2]:
import json
import requests


def query_model(prompt, model="llama3.1:70b", url="http://localhost:11434/api/chat"):
    # Create the data payload as a dictionary
    data = {
        "model": model,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "options": {
            "seed": 123,
            "temperature": 0,
        }
    }

    # Send the POST request
    with requests.post(url, json=data, stream=True, timeout=30) as r:
        r.raise_for_status()
        response_data = ""
        for line in r.iter_lines(decode_unicode=True):
            if not line:
                continue
            response_json = json.loads(line)
            if "message" in response_json:
                response_data += response_json["message"]["content"]

    return response_data


result = query_model("What do Llamas eat?")
print(result)

Llamas are herbivores, which means they primarily eat plants and plant-based foods. Their diet consists of:

1. **Grasses**: Various types of grasses, including timothy grass, orchard grass, and brome grass.
2. **Hay**: High-quality hay, such as alfalfa or clover hay, is a staple in a llama's diet.
3. **Leaves**: Leaves from trees and shrubs, like willow, cottonwood, and mesquite, are also eaten.
4. **Fruits and vegetables**: Llamas enjoy fruits like apples, carrots, and sweet potatoes, as well as leafy greens like kale and spinach.
5. **Grains**: In moderation, llamas can eat grains like oats, barley, and corn.

It's essential to note that llamas have a unique digestive system, with a three-part stomach and a large cecum (a specialized part of the large intestine). This allows them to break down and extract nutrients from plant material more efficiently than many other animals.

A typical llama diet might consist of:

* 1-2% of their body weight in hay per day
* 0.5-1% of their body w

## 加载 JSON 条目


- 现在，让我们进入数据生成部分  
- 在这里，作为一个动手示例，我们使用在第 7 章中用于指令微调模型的 `instruction-data.json` 文件：


In [3]:
from pathlib import Path

json_file = Path("..", "01_main-chapter-code", "instruction-data.json")

with open(json_file, "r") as file:
    json_data = json.load(file)

print("Number of entries:", len(json_data))

Number of entries: 1100


- 该文件的结构如下，其中包含测试数据集中的给定响应 (`'output'`)，模型通过基于 `'input'` 和 `'instruction'` 的指令微调被训练生成这些响应


In [4]:
json_data[0]

{'instruction': 'Evaluate the following phrase by transforming it into the spelling given.',
 'input': 'freind --> friend',
 'output': 'The spelling of the given phrase "freind" is incorrect, the correct spelling is "friend".'}

- 以下是一个小型工具函数，用于格式化指令和输入：


In [5]:
def format_input(entry):
    instruction_text = (
        f"Below is an instruction that describes a task. Write a response that "
        f"appropriately completes the request."
        f"\n\n### Instruction:\n{entry['instruction']}"
    )

    input_text = f"\n\n### Input:\n{entry['input']}" if entry["input"] else ""
    instruction_text + input_text

    return instruction_text + input_text

- 现在，让我们尝试使用 Ollama 的 API 为模型生成 `'chosen'` 和 `'rejected'` 的响应，用于偏好微调
- 在此处，为了演示目的，我们创建了更礼貌或不那么礼貌的回答


In [6]:
import random


for entry in json_data[:5]:
    
    politeness = random.choice(["polite", "impolite"])    
    prompt = (
        f"Given the input `{format_input(entry)}` "
        f"and correct output `{entry['output']}`, "
        f"slightly rewrite the output to be more {politeness}."
        "Keep the modification minimal."
        "Only return return the generated response and nothing else."
    )
    print("\nDataset response:")
    print(">>", entry['output'])
    print(f"\n{politeness} response:")
    print(">>", query_model(prompt))    


Dataset response:
>> The spelling of the given phrase "freind" is incorrect, the correct spelling is "friend".

impolite response:
>> The spelling of the given phrase "freind" is flat out wrong, get it together, the correct spelling is "friend".

Dataset response:
>> He goes to the park every day.

polite response:
>> He goes to the park daily, if I'm not mistaken.

Dataset response:
>> 45 kilometers is 45000 meters.

polite response:
>> 45 kilometers is equivalent to 45000 meters.

Dataset response:
>> Although it was raining, they went for a walk.

polite response:
>> Although it was raining outside, they still decided to go for a walk.

Dataset response:
>> 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.

impolite response:
>> Here are your precious square numbers: 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.


- 如果我们发现上面生成的响应看起来合理，就可以进行下一步，将提示应用到整个数据集
- 在这里，我们为偏好响应添加 `'chosen'` 键，为不偏好响应添加 `'rejected'` 键


In [7]:
import random
from tqdm import tqdm

def generate_model_responses(json_data):

    for i, entry in enumerate(tqdm(json_data, desc="Writing entries")):
        politeness = random.choice(["polite", "impolite"])    
        prompt = (
            f"Given the input `{format_input(entry)}` "
            f"and correct output `{entry['output']}`, "
            f"slightly rewrite the output to be more {politeness}."
            "Keep the modification minimal."
            "Only return return the generated response and nothing else."
        )
        response = query_model(prompt)
        
        if politeness == "polite":
            json_data[i]["chosen"] = response
            json_data[i]["rejected"] = entry["output"]
        else:
            json_data[i]["rejected"] = response
            json_data[i]["chosen"] = entry["output"]    

- 现在，我们将此评估应用到整个数据集，并计算每个模型的平均得分（在 M3 MacBook Air 笔记本上，每个模型大约需要 1 分钟）
- 请注意，截至目前，Ollama 在不同操作系统上的表现并非完全确定，因此你得到的数值可能与下面显示的略有不同


In [8]:
generate_model_responses(json_data)

Writing entries: 100%|██████████| 1100/1100 [17:20<00:00,  1.06it/s]


In [10]:
with open("instruction-data-with-preference.json", "w") as file:
    json.dump(json_data, file, indent=4)