# How to Build with Mistral Models

## Introduction

Dis lesson go cover:  
- How to check di different Mistral Models  
- How to sabi di use-cases and di scenarios wey fit each model  
- Code samples wey go show di special features wey each model get.  


## Di Mistral Models

For dis lesson, we go look 3 different Mistral models:  
**Mistral Large**, **Mistral Small** and **Mistral Nemo**.  

All dis models dey free for Github Model marketplace. Di code wey dey dis notebook go use dis models to run di code. Here be more details on how to use Github Models to [prototype wit AI models](https://docs.github.com/en/github-models/prototyping-with-ai-models?WT.mc_id=academic-105485-koreyst).  


## Mistral Large 2 (2407)
Mistral Large 2 na di main model wey Mistral get now, dem design am for enterprise use.

Dis model na upgrade to di original Mistral Large, e dey offer:
- Bigger Context Window - 128k vs 32k
- Better performance for Math and Coding Tasks - 76.9% average accuracy vs 60.4%
- Better multilingual performance - di languages wey e sabi na: English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, and Hindi.

Wit all dis features, Mistral Large dey shine for:
- *Retrieval Augmented Generation (RAG)* - because e get bigger context window
- *Function Calling* - dis model sabi native function calling wey fit connect am wit external tools and APIs. You fit make di calls together or one after di other for sequential order.
- *Code Generation* - dis model sabi Python, Java, TypeScript and C++ generation well well.


### RAG Example wey use Mistral Large 2


For dis example, we dey use Mistral Large 2 to run RAG pattern for one text document. Di question dey write for Korean and e dey ask about wetin di author bin dey do before e enter college. 

E dey use Cohere Embeddings Model to create embeddings for di text document plus di question. For dis sample, e dey use faiss Python package as vector store. 

Di prompt wey dem send go di Mistral model get di question plus di chunks wey dem retrieve wey resemble di question. Di Model go then provide natural language response.


In [50]:
pip install faiss-cpu

Note: you may need to restart the kernel to use updated packages.


In [51]:
import requests
import numpy as np
import faiss
import os

from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential
from azure.ai.inference import EmbeddingsClient

endpoint = "https://models.inference.ai.azure.com"
model_name = "Mistral-large"
token = os.environ["GITHUB_TOKEN"]

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

response = requests.get('https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt')
text = response.text

chunk_size = 2048
chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
len(chunks)

embed_model_name = "cohere-embed-v3-multilingual" 

embed_client = EmbeddingsClient(
        endpoint=endpoint,
        credential=AzureKeyCredential(token)
)

embed_response = embed_client.embed(
    input=chunks,
    model=embed_model_name
)



text_embeddings = []
for item in embed_response.data:
    length = len(item.embedding)
    text_embeddings.append(item.embedding)
text_embeddings = np.array(text_embeddings)


d = text_embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(text_embeddings)

question = "저자가 대학에 오기 전에 주로 했던 두 가지 일은 무엇이었나요?？"

question_embedding = embed_client.embed(
    input=[question],
    model=embed_model_name
)

question_embeddings = np.array(question_embedding.data[0].embedding)


D, I = index.search(question_embeddings.reshape(1, -1), k=2) # distance, index
retrieved_chunks = [chunks[i] for i in I.tolist()[0]]

prompt = f"""
Context information is below.
---------------------
{retrieved_chunks}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {question}
Answer:
"""


chat_response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content=prompt),
    ],
    temperature=1.0,
    top_p=1.0,
    max_tokens=1000,
    model=model_name
)

print(chat_response.choices[0].message.content)

The author primarily engaged in two activities before college: writing and programming. In terms of writing, they wrote short stories, albeit not very good ones, with minimal plot and characters expressing strong feelings. For programming, they started writing programs on the IBM 1401 used for data processing during their 9th grade, at the age of 13 or 14. They used an early version of Fortran and typed programs on punch cards, later loading them into the card reader to run the program.


## Mistral Small 
Mistral Small na one model wey dey inside Mistral family of models wey dey under premier/enterprise category. As di name talk, dis model na Small Language Model (SLM). Di benefits of using Mistral Small na say e: 
- E dey save money pass Mistral LLMs like Mistral Large and NeMo - 80% price go reduce
- E dey quick - e dey give faster response pass Mistral LLMs
- E dey flexible - e fit work for different environments wey no need plenty resources. 

Mistral Small dey good for: 
- Text tasks like summarization, sentiment analysis and translation. 
- Applications wey dey need plenty requests because e dey save money 
- Low latency code tasks like review and code suggestions 


## Comparing Mistral Small and Mistral Large

To show di difference wey dey for latency between Mistral Small and Large, run di cells wey dey below.

You go see difference for response time wey dey between 3-5 seconds. Also check di response length and style for di same prompt.


In [None]:
import os 
endpoint = "https://models.inference.ai.azure.com"
model_name = "Mistral-small"
token = os.environ["GITHUB_TOKEN"]

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful coding assistant."),
        UserMessage(content="Can you write a Python function to the fizz buzz test?"),
    ],
    temperature=1.0,
    top_p=1.0,
    max_tokens=1000,
    model=model_name
)

print(response.choices[0].message.content)

In [None]:
import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

endpoint = "https://models.inference.ai.azure.com"
model_name = "Mistral-large"
token = os.environ["GITHUB_TOKEN"]

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful coding assistant."),
        UserMessage(content="Can you write a Python function to the fizz buzz test?"),
    ],
    temperature=1.0,
    top_p=1.0,
    max_tokens=1000,
    model=model_name
)

print(response.choices[0].message.content)

## Mistral NeMo

Compared to di oda two models wey dem talk about for dis lesson, Mistral NeMo na di only free model wey get Apache2 License.

Dem dey see am as upgrade to di earlier open source LLM wey Mistral do, Mistral 7B.

Some oda things wey di NeMo model fit do na:

- *Beta tokenization:* Dis model dey use Tekken tokenizer instead of di tiktoken wey people dey use well well. E make am perform better for plenty languages and code.

- *Finetuning:* Di base model dey available for finetuning. E make am flexible for use-cases wey go need finetuning.

- *Native Function Calling* - Like Mistral Large, dem don train dis model for function calling. E make am special as e be one of di first open source models wey fit do dis kain thing.


## Mistral NeMo

Compared to di oda two models wey dem talk about for dis lesson, Mistral NeMo na di only free model wey get Apache2 License.

Dem dey see am as upgrade to di earlier open source LLM wey Mistral do, Mistral 7B.

Some oda things wey di NeMo model sabi do na:

- *Beta tokenization wey dey work well:* Dis model dey use Tekken tokenizer instead of di tiktoken wey people dey use well well. E dey make am perform better for plenty languages and code.

- *Finetuning:* Di base model dey available for finetuning. Dis one dey make am flexible for use-cases wey go need finetuning.

- *Native Function Calling* - Like Mistral Large, dem don train dis model for function calling. Dis one make am special as e be one of di first open source models wey fit do dis kain thing.


### Comparing Tokenizers

For dis example, we go check how Mistral NeMo dey handle tokenization compared to Mistral Large.

Both examples dey use di same prompt, but you go notice say NeMo dey return less tokens compared to Mistral Large.


In [11]:
pip install mistral-common

Collecting mistral-common
  Downloading mistral_common-1.4.4-py3-none-any.whl.metadata (4.6 kB)
Collecting sentencepiece==0.2.0 (from mistral-common)
  Downloading sentencepiece-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Collecting tiktoken<0.8.0,>=0.7.0 (from mistral-common)
  Downloading tiktoken-0.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting regex>=2022.1.18 (from tiktoken<0.8.0,>=0.7.0->mistral-common)
  Downloading regex-2024.9.11-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
Downloading mistral_common-1.4.4-py3-none-any.whl (6.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m63.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading sentencepiece-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m19.7 MB/s[0m eta [36m0:00:00[0

In [12]:
# Import needed packages:
from mistral_common.protocol.instruct.messages import (
    UserMessage,
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import (
    Function,
    Tool,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

# Load Mistral tokenizer

model_name = "open-mistral-nemo	"

tokenizer = MistralTokenizer.from_model(model_name)

# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        tools=[
            Tool(
                function=Function(
                    name="get_current_weather",
                    description="Get the current weather",
                    parameters={
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. San Francisco, CA",
                            },
                            "format": {
                                "type": "string",
                                "enum": ["celsius", "fahrenheit"],
                                "description": "The temperature unit to use. Infer this from the users location.",
                            },
                        },
                        "required": ["location", "format"],
                    },
                )
            )
        ],
        messages=[
            UserMessage(content="What's the weather like today in Paris"),
        ],
        model=model_name,
    )
)
tokens, text = tokenized.tokens, tokenized.text

# Count the number of tokens
print(len(tokens))

128


In [13]:
# Import needed packages:
from mistral_common.protocol.instruct.messages import (
    UserMessage,
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import (
    Function,
    Tool,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

# Load Mistral tokenizer

model_name = "mistral-large-latest"

tokenizer = MistralTokenizer.from_model(model_name)

# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        tools=[
            Tool(
                function=Function(
                    name="get_current_weather",
                    description="Get the current weather",
                    parameters={
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. San Francisco, CA",
                            },
                            "format": {
                                "type": "string",
                                "enum": ["celsius", "fahrenheit"],
                                "description": "The temperature unit to use. Infer this from the users location.",
                            },
                        },
                        "required": ["location", "format"],
                    },
                )
            )
        ],
        messages=[
            UserMessage(content="What's the weather like today in Paris"),
        ],
        model=model_name,
    )
)
tokens, text = tokenized.tokens, tokenized.text

# Count the number of tokens
print(len(tokens))

135


## Learning no dey stop here, continue di Journey

Afta you don finish dis lesson, go check our [Generative AI Learning collection](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) to continue dey improve your Generative AI knowledge!


<!-- CO-OP TRANSLATOR DISCLAIMER START -->
**Disclaimer**:  
Dis dokyument don use AI transleshion service [Co-op Translator](https://github.com/Azure/co-op-translator) do di transleshion. Even as we dey try make am accurate, abeg make you sabi say transleshion wey machine do fit get mistake or no dey correct well. Di original dokyument for im native language na di one wey you go take as di correct source. For important mata, e good make you use professional human transleshion. We no go fit take blame for any misunderstanding or wrong interpretation wey fit happen because you use dis transleshion.
<!-- CO-OP TRANSLATOR DISCLAIMER END -->
