# Mixtral 8x7B

In this notebook we'll get started with Mistral AI's new model — **Mixtral 8x7B**. We'll see how to get setup with Mixtral, the prompt format it requires, and how it performs when being used as an Agent.

As a bit of a spoiler, Mixtral is probably the first open source LLM that is truly _very very_ good — we say this considering the following key points:

* Benchmarks show it to perform better than GPT-3.5.
* Our own testing show Mixtral to be the first open source model we can reliably use as an agent.
* Due to MoE architecture it is _very_ fast given it's size. If you can afford to run on 2x A100s latency is good enough to be used in chatbot use-cases.

With that in mind, Mixtral is still 8x models — the total number of parameters is ~56B, so we do still need plenty of space to store the model. It's likely the amount of space required will decrease soon as quantized versions of the model are released (update, thanks [TheBloke](https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF)).

## Finding Somewhere to Run Mixtral

Unless you have two A100s or H100s lying around you'll need to find a service to run Mixtral. We'll demonstrate how using [RunPod] TK here — we found this to be one of the easiest and cheapers compute providers to setup with Mixtral.

* First, you'll need to [sign up for an account on RunPod] TK link.
* Navigate to **Home** > click **Start Building**.
* Setup a GPU instance, you can use 2xA100 or 2xH100.
* Customize deployment to use *Container Size: 120GB* and *Disk Volume: 600GB*.
* Make sure *Jupyter Notebook* is checked and deploy!

Once deployed, click on the instance and click *Open Jupyter Server* — this will take you to a Jupyter Labs instance running on the container. From there you can upload _this notebook_ and follow along from there.

## Installing Prerequisites

There are a few prerequisites required, to run Mixtral 8x7B we need `transformers` and `accelerate`. We also install `duckduckgo_search` to use in our agent testing later.

In [1]:
!pip install -qU \
    transformers==4.36.1 \
    accelerate==0.25.0 \
    duckduckgo_search==4.1.0

[0m

_**IMPORTANT: You may need to restart the kernel before continuing for the above library installs to be recognized by the remainder of the code!**_

## Download and Initialize Mixtral

After installing above, refresh the kernel before continuing.

In [2]:
from torch import bfloat16
import transformers

model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=bfloat16,
    device_map='auto'
)
model.eval()

Loading checkpoint shards:   0%|          | 0/19 [00:00<?, ?it/s]

MixtralForCausalLM(
  (model): MixtralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MixtralDecoderLayer(
        (self_attn): MixtralAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MixtralRotaryEmbedding()
        )
        (block_sparse_moe): MixtralSparseMoeBlock(
          (gate): Linear(in_features=4096, out_features=8, bias=False)
          (experts): ModuleList(
            (0-7): 8 x MixtralBLockSparseTop2MLP(
              (w1): Linear(in_features=4096, out_features=14336, bias=False)
              (w2): Linear(in_features=14336, out_features=4096, bias=False)
              (w3): Linear(in_features=4096, out_features=14336, bias=False)
        

As with all LLMs/transformer models we need to initialize a `tokenizer` that will take our plaintext inputs and transform them into lists of tokens that are consumed by the first layer of the LLM/transformer. We initialize the Mixtral tokenizer like so:

In [3]:
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)

Now we setup a `text-generation` pipeline using `transformers`. There are a lot of generation parameters we can adjust here, we'd recommend leaving them as is for now and returning to them if you feel like your generated outputs need improvement.

In [4]:
generate_text = transformers.pipeline(
    model=model, tokenizer=tokenizer,
    return_full_text=False,  # if using langchain set True
    task="text-generation",
    # we pass model parameters here too
    temperature=0.1,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    top_p=0.15,  # select from top tokens whose probability add up to 15%
    top_k=0,  # select from top 0 tokens (because zero, relies on top_p)
    max_new_tokens=512,  # max number of tokens to generate in the output
    repetition_penalty=1.1  # if output begins repeating increase
)

Now we can generate text by calling `generate_text`:

In [5]:
res = generate_text("hello there")
print(res[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


, #TeamInternet!

I’m so excited to be here today as a guest blogger for the lovely ladies of The Blogcademy. I’ve been following their adventures since they first started teaching classes in London and New York City last year, and it was such an honor when Kat asked me if I would like to write a post about my experience with social media.

As you may know, I am the founder of TweetHearts, a social media management company that specializes in Twitter marketing. We help businesses grow their online presence by creating customized strategies that are tailored to meet their specific needs. Our clients range from small start-ups to large corporations, but no matter what size or industry, we always focus on building relationships through authentic engagement.

When it comes to social media, many people think that all you have to do is set up an account and then just start posting updates whenever you feel like it. However, this couldn’t be further from the truth! In order to truly succeed i

### Instruction Format

We can see a very generic generated output here. There are two primary reasons for that:

1. We haven't provided any instructions to the model.
2. We have not used the recommended instruction format.

The instruction format for Mixtral 8x7B looks like this:

```
<s> [INST] Some instructions [/INST] Primer text [generated output] </s>
```

We would put our instructions to the model in place of `"Some instructions"` and place a primer like `"Assistant: "` in place of `"Primer text"`. The `<s>` and `</s>` are special tokens used by Mixtral to signify the **B**eginning **O**f **S**tring (BOS) and **E**nd **O**f **S**tring (EOS), ie beginning and end of our text. The `[INST]` and `[/INST]` strings tell the model that anything between those two strings are _instructions_ that the model should follow.

We can add some follow up instructions like so:

```
<s> [INST] Some instructions [/INST] Primer text [generated output] </s> [INST] Further instructions [/INST]
```

Let's begin by adding some _instructions_ first, we'll add instruction formatting later. In these instructions we want to setup the guidelines for an agent that can use two tools (calculator and search) and also return answers to the user. All three of these options will be used by the agent via a JSON output format containing `"tool_name"` that specifies which tool to be used (one of [`Calculator`, `Search`, `Final Answer`]) and `"input"` that specifies the input to the chosen tool.

In [6]:
agent_template = """
You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:

- Calculator: the calculator should be used whenever you need to perform a calculation, no matter how simple. It uses Python so make sure to write complete Python code required to perform the calculation required and make sure the Python returns your answer to the `output` variable.
- Search: the search tool should be used whenever you need to find information. It can be used to find information about everything
- Final Answer: the final answer tool must be used to respond to the user. You must use this when you have decided on an answer.

To use these tools you must always respond in JSON format containing `"tool_name"` and `"input"` key-value pairs. For example, to answer the question, "what is the square root of 51?" you must use the calculator tool like so:

```json
{
    "tool_name": "Calculator",
    "input": "from math import sqrt; output = sqrt(51)"
}
```

Or to answer the question "who is the current president of the USA?" you must respond:

```json
{
    "tool_name": "Search",
    "input": "current president of USA"
}
```

Remember, even when answering to the user, you must still use this JSON format! If you'd like to ask how the user is doing you must write:

```json
{
    "tool_name": "Final Answer",
    "input": "How are you today?"
}
```

Let's get started. The users query is as follows.

User: Hi there, I'm stuck on a math problem, can you help? My question is what is the square root of 512 multiplied by 7?

Assistant: ```json
{
    "tool_name": """

Using these instructions alone we actually get great performance:

In [7]:
res = generate_text(agent_template)
print(res[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 "Calculator",
    "input":  "from math import sqrt; output = sqrt(512) * 7"
}
```


But before continuing let's add the instruction formatting too, we'll do this via a function called `instruction_format` that will consume a `sys_message` (ie instructions) and a user's `query` and output the string with the required tokens.

In [8]:
def instruction_format(sys_message: str, query: str):
    # note, don't "</s>" to the end
    return f'<s> [INST] {sys_message} [/INST]\nUser: {query}\nAssistant: ```json\n{{\n"tool_name": '

Let's see what this looks like:

In [9]:
sys_msg = """You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:

- Calculator: the calculator should be used whenever you need to perform a calculation, no matter how simple. It uses Python so make sure to write complete Python code required to perform the calculation required and make sure the Python returns your answer to the `output` variable.
- Search: the search tool should be used whenever you need to find information. It can be used to find information about everything
- Final Answer: the final answer tool must be used to respond to the user. You must use this when you have decided on an answer.

To use these tools you must always respond in JSON format containing `"tool_name"` and `"input"` key-value pairs. For example, to answer the question, "what is the square root of 51?" you must use the calculator tool like so:

```json
{
    "tool_name": "Calculator",
    "input": "from math import sqrt; output = sqrt(51)"
}
```

Or to answer the question "who is the current president of the USA?" you must respond:

```json
{
    "tool_name": "Search",
    "input": "current president of USA"
}
```

Remember, even when answering to the user, you must still use this JSON format! If you'd like to ask how the user is doing you must write:

```json
{
    "tool_name": "Final Answer",
    "input": "How are you today?"
}
```

Let's get started. The users query is as follows.
"""
query = "Hi there, I'm stuck on a math problem, can you help? My question is what is the square root of 512 multiplied by 7?"

input_prompt = instruction_format(sys_msg, query)

In [11]:
print(input_prompt)

<s> [INST] You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:

- Calculator: the calculator should be used whenever you need to perform a calculation, no matter how simple. It uses Python so make sure to write complete Python code required to perform the calculation required and make sure the Python returns your answer to the `output` variable.
- Search: the search tool should be used whenever you need to find information. It can be used to find information about everything
- Final Answer: the final answer tool must be used to respond to the user. You must use this when you have decided on an answer.

To use these tools you must always respond in JSON format containing `"tool_name"` and `"input"` key-value pairs. For example, to answer the question, "what is the square root of 51?" you must use the calculator tool like so:

```json
{
    "tool_name": "Calculator",
    "input": "from ma

Now let's generate output from our LLM again:

In [12]:
res = generate_text(input_prompt)
print(res[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 "Calculator",
"input":  "from math import sqrt; output = sqrt(512) * 7"
}
```


Let's parse this into Python executable code...

In [13]:
import json

def format_output(text: str):
    full_json_str = '{\n"tool_name": '+text
    full_json_str = full_json_str.strip()
    if full_json_str.endswith("```"):
        full_json_str = full_json_str[:-3]
    return json.loads(full_json_str)

In [14]:
action = format_output(res[0]["generated_text"])
action

{'tool_name': 'Calculator',
 'input': 'from math import sqrt; output = sqrt(512) * 7'}

In [15]:
if action["tool_name"] == "Calculator":
    exec(action["input"])
output

158.39191898578665

Now we add this info to our prompt:

In [16]:
new_prompt = f"""
{agent_template}{res[0]["generated_text"]}
Tool Output: {output}
Assistant: ```json
{{
    "tool_name": """

In [17]:
print(new_prompt)



You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:

- Calculator: the calculator should be used whenever you need to perform a calculation, no matter how simple. It uses Python so make sure to write complete Python code required to perform the calculation required and make sure the Python returns your answer to the `output` variable.
- Search: the search tool should be used whenever you need to find information. It can be used to find information about everything
- Final Answer: the final answer tool must be used to respond to the user. You must use this when you have decided on an answer.

To use these tools you must always respond in JSON format containing `"tool_name"` and `"input"` key-value pairs. For example, to answer the question, "what is the square root of 51?" you must use the calculator tool like so:

```json
{
    "tool_name": "Calculator",
    "input": "from math import

And return this info back to the agent:

In [18]:
res2 = generate_text(new_prompt)
print(res2[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 "Final Answer",
"input":  "The square root of 512 multiplied by 7 is approximately 158.4."
}
```


Convert to dictionary:

In [19]:
response = format_output(res2[0]["generated_text"])
response

{'tool_name': 'Final Answer',
 'input': 'The square root of 512 multiplied by 7 is approximately 158.4.'}

We add some handling for `Final Answer` outputs.

In [20]:
if response["tool_name"] == "Final Answer":
    print("Assistant: "+response["input"])

Assistant: The square root of 512 multiplied by 7 is approximately 158.4.


Let's integrate this tool input parsing logging into a single function...

In [21]:
from duckduckgo_search import DDGS

def use_tool(action: dict):
    tool_name = action["tool_name"]
    if tool_name == "Final Answer":
        return "Assistant: "+action["input"]
    elif tool_name == "Calculator":
        exec(action["input"])
        return f"Tool Output: {output}"
    elif tool_name == "Search":
        contexts = []
        with DDGS() as ddgs:
            results = ddgs.text(
                action["input"],
                region="wt-wt", safesearch="on",
                max_results=3
            )
            for r in results:
                contexts.append(r['body'])
        info = "\n---\n".join(contexts)
        return f"Tool Output: {info}"
    else:
        # otherwise just assume final answer
        return "Assistant: "+action["input"]

In [37]:
query = "who is the current prime minister of the UK?"

input_prompt = instruction_format(sys_msg, query)

In [38]:
def run(query: str):
    res = generate_text(query)
    action_dict = format_output(res[0]["generated_text"])
    response = use_tool(action_dict)
    full_text = f"{query}{res[0]['generated_text']}\n{response}"
    return response, full_text

In [39]:
out = run(input_prompt)
out[0]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


'Tool Output: Rishi Sunak became Prime Minister on 25 October 2022. He was previously appointed Chancellor of the Exchequer from 13 February 2020 to 5 July 2022.\n---\nRishi Sunak Political positions Electoral history MP for Richmond (Yorks) Prime Minister of the United Kingdom Premiership Minister for the Union Ministry 2023 February reshuffle 2023 November reshuffle Industrial disputes postal workers strikes railway strikes NHS strikes 2022 autumn statement Russian invasion of Ukraine economic impact\n---\nRishi Sunak has been the prime minister since 25 October 2022. [7] History Sir Robert Walpole is generally considered to have been the first person to hold the position of Prime Minister.'

In [40]:
print(out[0])

Tool Output: Rishi Sunak became Prime Minister on 25 October 2022. He was previously appointed Chancellor of the Exchequer from 13 February 2020 to 5 July 2022.
---
Rishi Sunak Political positions Electoral history MP for Richmond (Yorks) Prime Minister of the United Kingdom Premiership Minister for the Union Ministry 2023 February reshuffle 2023 November reshuffle Industrial disputes postal workers strikes railway strikes NHS strikes 2022 autumn statement Russian invasion of Ukraine economic impact
---
Rishi Sunak has been the prime minister since 25 October 2022. [7] History Sir Robert Walpole is generally considered to have been the first person to hold the position of Prime Minister.


In [41]:
print(out[1])

<s> [INST] You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:

- Calculator: the calculator should be used whenever you need to perform a calculation, no matter how simple. It uses Python so make sure to write complete Python code required to perform the calculation required and make sure the Python returns your answer to the `output` variable.
- Search: the search tool should be used whenever you need to find information. It can be used to find information about everything
- Final Answer: the final answer tool must be used to respond to the user. You must use this when you have decided on an answer.

To use these tools you must always respond in JSON format containing `"tool_name"` and `"input"` key-value pairs. For example, to answer the question, "what is the square root of 51?" you must use the calculator tool like so:

```json
{
    "tool_name": "Calculator",
    "input": "from ma

We're not handling the logic of iterating through multiple agent steps yet — so we need to do that manually...

In [42]:
second_step = out[1]+"""
Assistant: ```json
{
    "tool_name": """

out = run(second_step)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [43]:
out[0]

'Assistant: Rishi Sunak has been the Prime Minister of the United Kingdom since 25 October 2022.'

---