# 使用 Mistral 模型建構應用

## 簡介

本課程將會涵蓋：
- 探索不同的 Mistral 模型
- 了解每個模型的使用情境同適用場合
- 程式碼範例展示每個模型的獨特功能


## Mistral 模型

喺呢一課，我哋會探討三款唔同嘅 Mistral 模型：
**Mistral Large**、**Mistral Small** 同 **Mistral Nemo**。

呢啲模型全部都可以喺 Github Model marketplace 免費攞到。呢個 notebook 入面嘅程式碼會用呢啲模型嚟運行。你可以睇多啲關於點用 Github Models 去[用 AI 模型做原型](https://docs.github.com/en/github-models/prototyping-with-ai-models?WT.mc_id=academic-105485-koreyst)嘅資料。


## Mistral Large 2 (2407)
Mistral Large 2 目前係 Mistral 嘅旗艦型號，專為企業用戶設計。

呢個型號係原本 Mistral Large 嘅升級版，提供咗
- 更大嘅上下文視窗 —— 128k 對比 32k
- 數學同編程任務表現更好 —— 平均準確率 76.9% 對比 60.4%
- 多語言表現提升 —— 支援語言包括：英文、法文、德文、西班牙文、意大利文、葡萄牙文、荷蘭文、俄文、中文、日文、韓文、阿拉伯文同印地文。

有咗呢啲功能，Mistral Large 特別適合
- *檢索增強生成（RAG）* —— 因為有更大嘅上下文視窗
- *Function Calling* —— 呢個型號原生支援 function calling，可以同外部工具同 API 整合。呢啲呼叫可以同時進行，或者按順序逐個執行。
- *代碼生成* —— 呢個型號喺 Python、Java、TypeScript 同 C++ 代碼生成方面表現出色。


喺呢個例子入面，我哋用緊 Mistral Large 2 去對一份文字文件運行 RAG 模式。個問題係用韓文寫，問作者入大學之前做過啲咩活動。

我哋用 Cohere Embeddings Model 去為文字文件同問題建立 embeddings。呢個例子會用 faiss Python 套件做向量儲存庫。

發送畀 Mistral 模型嘅 prompt 會包括問題同埋同問題相似嘅檢索片段。模型之後會用自然語言回應。


In [50]:
pip install faiss-cpu

Note: you may need to restart the kernel to use updated packages.


In [51]:
import requests
import numpy as np
import faiss
import os

from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential
from azure.ai.inference import EmbeddingsClient

endpoint = "https://models.inference.ai.azure.com"
model_name = "Mistral-large"
token = os.environ["GITHUB_TOKEN"]

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

response = requests.get('https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt')
text = response.text

chunk_size = 2048
chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
len(chunks)

embed_model_name = "cohere-embed-v3-multilingual" 

embed_client = EmbeddingsClient(
        endpoint=endpoint,
        credential=AzureKeyCredential(token)
)

embed_response = embed_client.embed(
    input=chunks,
    model=embed_model_name
)



text_embeddings = []
for item in embed_response.data:
    length = len(item.embedding)
    text_embeddings.append(item.embedding)
text_embeddings = np.array(text_embeddings)


d = text_embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(text_embeddings)

question = "저자가 대학에 오기 전에 주로 했던 두 가지 일은 무엇이었나요?？"

question_embedding = embed_client.embed(
    input=[question],
    model=embed_model_name
)

question_embeddings = np.array(question_embedding.data[0].embedding)


D, I = index.search(question_embeddings.reshape(1, -1), k=2) # distance, index
retrieved_chunks = [chunks[i] for i in I.tolist()[0]]

prompt = f"""
Context information is below.
---------------------
{retrieved_chunks}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {question}
Answer:
"""


chat_response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content=prompt),
    ],
    temperature=1.0,
    top_p=1.0,
    max_tokens=1000,
    model=model_name
)

print(chat_response.choices[0].message.content)

The author primarily engaged in two activities before college: writing and programming. In terms of writing, they wrote short stories, albeit not very good ones, with minimal plot and characters expressing strong feelings. For programming, they started writing programs on the IBM 1401 used for data processing during their 9th grade, at the age of 13 or 14. They used an early version of Fortran and typed programs on punch cards, later loading them into the card reader to run the program.


## Mistral Small
Mistral Small 係 Mistral 系列入面屬於高級／企業級別嘅另一款模型。顧名思義，呢個模型係一個小型語言模型（SLM）。用 Mistral Small 有以下幾個好處：
- 比起 Mistral LLMs（例如 Mistral Large 同 NeMo）慳錢——價錢低咗 80%
- 低延遲——比 Mistral 嘅 LLMs 回應更快
- 靈活——可以喺唔同環境部署，對資源要求冇咁多限制

Mistral Small 特別適合用嚟：
- 處理文字相關任務，例如摘要、情感分析同翻譯
- 需要頻繁請求嘅應用程式，因為夠慳錢
- 低延遲嘅程式碼任務，例如審查同代碼建議


## 比較 Mistral Small 同 Mistral Large

為咗展示 Mistral Small 同 Large 喺延遲方面嘅分別，可以運行下面嘅 cell。

你應該會見到兩者回應時間大約相差 3-5 秒左右。仲可以留意下用同一個提示時，佢哋回應嘅長度同風格有咩唔同。


In [None]:
import os 
endpoint = "https://models.inference.ai.azure.com"
model_name = "Mistral-small"
token = os.environ["GITHUB_TOKEN"]

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful coding assistant."),
        UserMessage(content="Can you write a Python function to the fizz buzz test?"),
    ],
    temperature=1.0,
    top_p=1.0,
    max_tokens=1000,
    model=model_name
)

print(response.choices[0].message.content)

In [None]:
import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

endpoint = "https://models.inference.ai.azure.com"
model_name = "Mistral-large"
token = os.environ["GITHUB_TOKEN"]

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful coding assistant."),
        UserMessage(content="Can you write a Python function to the fizz buzz test?"),
    ],
    temperature=1.0,
    top_p=1.0,
    max_tokens=1000,
    model=model_name
)

print(response.choices[0].message.content)

## Mistral NeMo

同本課程討論嘅另外兩個模型相比，Mistral NeMo 係唯一一個擁有 Apache2 License 嘅免費模型。

佢被視為 Mistral 早期開源 LLM——Mistral 7B 嘅升級版。

NeMo 模型仲有以下特點：

- *更高效嘅分詞方式：* 呢個模型用咗 Tekken 分詞器，而唔係一般常用嘅 tiktoken。咁樣可以令佢喺多種語言同埋程式碼上有更好嘅表現。

- *可微調：* 基礎模型可以用嚟做微調。對於需要微調嘅應用場景，呢個特性提供咗更大彈性。

- *原生 Function Calling* - 好似 Mistral Large 一樣，呢個模型都經過 function calling 嘅訓練。佢係其中一個最早支援呢個功能嘅開源模型，算係幾特別。


## Mistral NeMo

同本課程討論嘅另外兩個模型相比，Mistral NeMo 係唯一一個擁有 Apache2 License 嘅免費模型。

佢被視為 Mistral 早期開源 LLM——Mistral 7B 嘅升級版。

NeMo 模型仲有以下特點：

- *更高效嘅分詞方式：* 呢個模型用咗 Tekken 分詞器，而唔係一般常用嘅 tiktoken。咁樣可以令佢喺更多語言同程式碼上有更好表現。

- *可微調：* 基礎模型可以用嚟做微調。對於需要微調嘅應用場景，呢個特性提供咗更大彈性。

- *原生 Function Calling* — 好似 Mistral Large 一樣，呢個模型都經過 function calling 嘅訓練。佢係其中一個最早支援呢個功能嘅開源模型，所以都幾特別。


### 比較分詞器

喺呢個例子入面，我哋會睇下 Mistral NeMo 同 Mistral Large 喺分詞方面有咩唔同。

兩個例子都用同一個提示，但你會見到 NeMo 返嚟嘅 token 數量會少過 Mistral Large。


In [11]:
pip install mistral-common

Collecting mistral-common
  Downloading mistral_common-1.4.4-py3-none-any.whl.metadata (4.6 kB)
Collecting sentencepiece==0.2.0 (from mistral-common)
  Downloading sentencepiece-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting tiktoken<0.8.0,>=0.7.0 (from mistral-common)
  Downloading tiktoken-0.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting regex>=2022.1.18 (from tiktoken<0.8.0,>=0.7.0->mistral-common)
  Downloading regex-2024.9.11-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
Downloading mistral_common-1.4.4-py3-none-any.whl (6.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m63.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading sentencepiece-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m19.7 MB/s[0m eta [36m0:00:00[0

In [12]:
# Import needed packages:
from mistral_common.protocol.instruct.messages import (
    UserMessage,
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import (
    Function,
    Tool,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

# Load Mistral tokenizer

model_name = "open-mistral-nemo	"

tokenizer = MistralTokenizer.from_model(model_name)

# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        tools=[
            Tool(
                function=Function(
                    name="get_current_weather",
                    description="Get the current weather",
                    parameters={
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. San Francisco, CA",
                            },
                            "format": {
                                "type": "string",
                                "enum": ["celsius", "fahrenheit"],
                                "description": "The temperature unit to use. Infer this from the users location.",
                            },
                        },
                        "required": ["location", "format"],
                    },
                )
            )
        ],
        messages=[
            UserMessage(content="What's the weather like today in Paris"),
        ],
        model=model_name,
    )
)
tokens, text = tokenized.tokens, tokenized.text

# Count the number of tokens
print(len(tokens))

128


In [13]:
# Import needed packages:
from mistral_common.protocol.instruct.messages import (
    UserMessage,
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import (
    Function,
    Tool,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

# Load Mistral tokenizer

model_name = "mistral-large-latest"

tokenizer = MistralTokenizer.from_model(model_name)

# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        tools=[
            Tool(
                function=Function(
                    name="get_current_weather",
                    description="Get the current weather",
                    parameters={
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. San Francisco, CA",
                            },
                            "format": {
                                "type": "string",
                                "enum": ["celsius", "fahrenheit"],
                                "description": "The temperature unit to use. Infer this from the users location.",
                            },
                        },
                        "required": ["location", "format"],
                    },
                )
            )
        ],
        messages=[
            UserMessage(content="What's the weather like today in Paris"),
        ],
        model=model_name,
    )
)
tokens, text = tokenized.tokens, tokenized.text

# Count the number of tokens
print(len(tokens))

135


## 學習唔止於此，繼續你嘅旅程

完成咗呢堂之後，可以去睇下我哋嘅 [生成式 AI 學習合集](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst)，繼續提升你對生成式 AI 嘅認識！



---

**免責聲明**：  
本文件經由 AI 翻譯服務 [Co-op Translator](https://github.com/Azure/co-op-translator) 翻譯。我們致力於提供準確的翻譯，但請注意，自動翻譯可能會出現錯誤或不準確之處。原始語言的文件應被視為具權威性的來源。如涉及重要資訊，建議尋求專業人手翻譯。本翻譯所引致的任何誤解或曲解，我們概不負責。
