<table style="width:100%">
<tr>
<td style="vertical-align:middle; text-align:left;">
<font size="2">
这是<a href="http://mng.bz/orYv">从零开始构建一个大语言模型</a>这本书的补充代码。 作者 <a href="https://sebastianraschka.com">Sebastian Raschka</a><br>
<br>代码仓库: <a href="https://github.com/rasbt/LLMs-from-scratch">https://github.com/rasbt/LLMs-from-scratch</a>
</font>
</td>
<td style="vertical-align:middle; text-align:left;">
<a href="http://mng.bz/orYv"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp" width="100px"></a>
</td>
</tr>
</table>


# 7.4 使用ollama与Llama 3.1 70B生成偏好数据集


- 偏好微调是一种将指令微调的LLM与人类偏好对齐的过程
- 有多种方法可以为偏好微调创建LLM的数据集
  1. 我们使用指令微调的LLM生成多个响应，并根据人类偏好和/或给定的偏好标准对其进行排名
  2. 我们使用指令微调的LLM生成多个响应，并让LLMs根据给定的偏好标准对其进行排名
  3. 我们使用LLM生成给定偏好标准的偏好和非偏好响应

- 在本节中，我们考虑方法3
- 本笔记本使用ollama通过Llama 3.1-Instruct模型生成偏好标签，以用于指令数据集
- 预期指令数据集的格式如下：


### Input

```json
[
    {
        "instruction": "What is the state capital of California?",
        "input": "",
        "output": "The state capital of California is Sacramento.",
    },
    {
        "instruction": "Provide a synonym for 'fast'.",
        "input": "",
        "output": "A synonym for 'fast' is 'quick'.",
    },
    {
        "instruction": "What is the capital of Greece?",
        "input": "",
        "output": "The capital of Greece is Athens.",

    },
...
]
```

输出数据集将如下所示,其中更有礼貌的回应被优先选择（`'chosen'`）,而不礼貌的回应不被优先选择（`'rejected'`）：

```json
[
    {
        "instruction": "What is the state capital of California?",
        "input": "",
        "output": "The state capital of California is Sacramento.",
        "rejected": "Look, the state capital of California is obviously Sacramento.",
        "chosen": "The state capital of California is Sacramento."
    },
    {
        "instruction": "Provide a synonym for 'fast'.",
        "input": "",
        "output": "A synonym for 'fast' is 'quick'.",
        "chosen": "A suitable alternative to 'fast' would be 'quick'.",
        "rejected": "A synonym for 'fast' is 'quick'."
    },
    {
        "instruction": "What is the capital of Greece?",
        "input": "",
        "output": "The capital of Greece is Athens.",
        "chosen": "I'd be happy to help! The capital of Greece is indeed Athens.",
        "rejected": "The capital of Greece is Athens."
    },
...
]
```

### 输出




- 这段代码不需要GPU,只要有足够的内存就可以在笔记本电脑上运行

In [1]:
from importlib.metadata import version

pkgs = ["tqdm",    # Progress bar
        ]

for p in pkgs:
    print(f"{p} version: {version(p)}")

tqdm version: 4.66.4


## 安装ollama并下载Llama 3.1

- Ollama是一个用于高效的应用程序，地运行LLMs
- 它是[llama.cpp](https://github.com/ggerganov/llama.cpp)的包装器，这是一个用纯C/C++实现的LLMs，以最大化效率
- 注意，它是一个用于生成文本的工具（推理），而不是训练或微调LLMs
- 在运行下面的代码之前，通过访问[https://ollama.com](https://ollama.com)并按照说明安装ollama（例如，点击“下载”按钮并下载适用于您的操作系统的ollama应用程序）
- 对于macOS和Windows用户，点击您下载的ollama应用程序；如果它提示您安装命令行使用，请输入"yes"


- 对于macOS和Windows用户，点击您下载的ollama应用程序；如果它提示您安装命令行使用，请输入"yes"
- Linux用户可以在ollama网站上使用提供的安装命令

- 在运行下面的代码之前，我们必须在命令行中使用ollama，要么在单独的终端中运行`ollama serve`启动ollama应用程序

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/ollama-eval/ollama-serve.webp?1">


- 在单独的终端中运行`ollama serve`启动ollama应用程序
- 在命令行中执行以下命令，尝试使用700亿参数的Llama 3.1模型

```bash
# 70B model
ollama run llama3.1:70b
```


The output looks like as follows:

```
$ ollama run llama3.1:70b
pulling manifest
pulling aa81b541aae6... 100% ▕████████████████▏ 39 GB
pulling 8cf247399e57... 100% ▕████████████████▏ 1.7 KB
pulling f1cd752815fc... 100% ▕████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████▏ 96 B
pulling 3c1c2d3df5b3... 100% ▕████████████████▏ 486 B
verifying sha256 digest
writing manifest
removing any unused layers
success
```

- 注意，`llama3.1:70b`指的是指令微调的700亿参数的Llama 3.1模型

- 或者，您也可以使用更小、更高效的80亿参数的Llama 3.1模型，通过将`llama3.1:70b`替换为`llama3.1`

- 下载完成后，您将看到一个命令行提示符，允许您与模型进行交互

- 尝试一个提示，如“What do llamas eat?”，应该返回一个类似于以下内容的输出：

- Try a prompt like "What do llamas eat?", which should return an output similar to the following:

```
>>> What do llamas eat?
Llamas are ruminant animals, which means they have a four-chambered 
stomach and eat plants that are high in fiber. In the wild, llamas 
typically feed on:
1. Grasses: They love to graze on various types of grasses, including tall 
grasses, wheat, oats, and barley.
```


- 您可以使用输入`/bye`结束此会话


## 使用Ollama的REST API


- 现在，另一种与模型交互的方式是通过其REST API，如下面的函数所示
- 在运行下一个单元格之前，请确保ollama仍在运行，如上所述，通过以下方式运行`ollama serve`在终端中
  - 在终端窗口中运行`ollama serve` 启动ollama应用程序
- 接下来,运行以下代码单元来查询模型


- 首先，让我们通过一个简单的示例来确保API按预期工作：

In [2]:
import urllib.request
import json


def query_model(prompt, model="llama3.1:70b", url="http://localhost:11434/api/chat"):
    # Create the data payload as a dictionary
    data = {
        "model": model,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "options": {
            "seed": 123,
            "temperature": 0,
        }
    }

    # Convert the dictionary to a JSON formatted string and encode it to bytes
    payload = json.dumps(data).encode("utf-8")

    # Create a request object, setting the method to POST and adding necessary headers
    request = urllib.request.Request(url, data=payload, method="POST")
    request.add_header("Content-Type", "application/json")

    # Send the request and capture the response
    response_data = ""
    with urllib.request.urlopen(request) as response:
        # Read and decode the response
        while True:
            line = response.readline().decode("utf-8")
            if not line:
                break
            response_json = json.loads(line)
            response_data += response_json["message"]["content"]

    return response_data


result = query_model("What do Llamas eat?")
print(result)

Llamas are herbivores, which means they primarily eat plants and plant-based foods. Their diet consists of:

1. **Grasses**: Various types of grasses, including timothy grass, orchard grass, and brome grass.
2. **Hay**: High-quality hay, such as alfalfa or clover hay, is a staple in a llama's diet.
3. **Leaves**: Leaves from trees and shrubs, like willow, cottonwood, and mesquite, are also eaten.
4. **Fruits and vegetables**: Llamas enjoy fruits like apples, carrots, and sweet potatoes, as well as leafy greens like kale and spinach.
5. **Grains**: In moderation, llamas can eat grains like oats, barley, and corn.

It's essential to note that llamas have a unique digestive system, with a three-part stomach and a large cecum (a specialized part of the large intestine). This allows them to break down and extract nutrients from plant material more efficiently than many other animals.

A typical llama diet might consist of:

* 1-2% of their body weight in hay per day
* 0.5-1% of their body w

## 加载JSON条目

- 现在，让我们进入数据生成部分
- 在这里，为了便于理解，我们使用在第7章中用于指令微调的`instruction-data.json`文件：

In [3]:
from pathlib import Path

json_file = Path("..", "01_main-chapter-code", "instruction-data.json")

with open(json_file, "r") as file:
    json_data = json.load(file)

print("Number of entries:", len(json_data))

Number of entries: 1100



- 这个文件的结构如下,其中我们有测试数据集中给定的响应（`'output'`）,这是我们通过基于`'input'`和`'instruction'`的指令微调训练模型生成的

In [4]:
json_data[0]

{'instruction': 'Evaluate the following phrase by transforming it into the spelling given.',
 'input': 'freind --> friend',
 'output': 'The spelling of the given phrase "freind" is incorrect, the correct spelling is "friend".'}


- 下面是一个实用函数，用于格式化指令和输入：

In [5]:
def format_input(entry):
    instruction_text = (
        f"Below is an instruction that describes a task. Write a response that "
        f"appropriately completes the request."
        f"\n\n### Instruction:\n{entry['instruction']}"
    )

    input_text = f"\n\n### Input:\n{entry['input']}" if entry["input"] else ""
    instruction_text + input_text

    return instruction_text + input_text

- 现在，让我们尝试ollama API来生成一个`'chosen'`和`'rejected'`响应，用于偏好调优模型
- 在这里，为了说明目的，我们创建了更多或更不礼貌的答案


In [6]:
import random


for entry in json_data[:5]:
    
    politeness = random.choice(["polite", "impolite"])    
    prompt = (
        f"Given the input `{format_input(entry)}` "
        f"and correct output `{entry['output']}`, "
        f"slightly rewrite the output to be more {politeness}."
        "Keep the modification minimal."
        "Only return return the generated response and nothing else."
    )
    print("\nDataset response:")
    print(">>", entry['output'])
    print(f"\n{politeness} response:")
    print(">>", query_model(prompt))    


Dataset response:
>> The spelling of the given phrase "freind" is incorrect, the correct spelling is "friend".

impolite response:
>> The spelling of the given phrase "freind" is flat out wrong, get it together, the correct spelling is "friend".

Dataset response:
>> He goes to the park every day.

polite response:
>> He goes to the park daily, if I'm not mistaken.

Dataset response:
>> 45 kilometers is 45000 meters.

polite response:
>> 45 kilometers is equivalent to 45000 meters.

Dataset response:
>> Although it was raining, they went for a walk.

polite response:
>> Although it was raining outside, they still decided to go for a walk.

Dataset response:
>> 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.

impolite response:
>> Here are your precious square numbers: 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.



- 如果我们在上面的生成响应看起来合理，我们可以进入下一步，并将提示应用于整个数据集
- 在这里,我们为优选响应添加一个'chosen'键,为非优选响应添加一个'rejected'键

In [7]:
import random
from tqdm import tqdm

def generate_model_responses(json_data):

    for i, entry in enumerate(tqdm(json_data, desc="Writing entries")):
        politeness = random.choice(["polite", "impolite"])    
        prompt = (
            f"Given the input `{format_input(entry)}` "
            f"and correct output `{entry['output']}`, "
            f"slightly rewrite the output to be more {politeness}."
            "Keep the modification minimal."
            "Only return return the generated response and nothing else."
        )
        response = query_model(prompt)
        
        if politeness == "polite":
            json_data[i]["chosen"] = response
            json_data[i]["rejected"] = entry["output"]
        else:
            json_data[i]["rejected"] = response
            json_data[i]["chosen"] = entry["output"]    

- 现在，让我们将这个评估应用于整个数据集，并计算每个模型的平均分数（在M3 MacBook Air笔记本电脑上，这大约需要1分钟）
- 注意，ollama在不同的操作系统上不是完全兼容的（截至本写作时），所以您得到的数字可能与下面显示的数字略有不同

In [8]:
generate_model_responses(json_data)

Writing entries: 100%|██████████| 1100/1100 [17:20<00:00,  1.06it/s]


In [10]:
with open("instruction-data-with-preference.json", "w") as file:
    json.dump(json_data, file, indent=4)