# Initial Setup of Environment
We will use uv to install the dependencies for the project.  This means that uv needs to be installed on the system.  Use the following command to install uv:

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

When you set up a project, you should follow the following steps:

```bash
mkdir my_project
cd my_project
uv init
```

This will create a new directory called `my_project` and initialize a new uv project in it.  You can then add the dependencies for the project.  We will need mlx-lm.  

```bash
uv add mlx-lm
```

If we look inside the uv.lock file after adding the dependency above, we will see that mlx and mlx-metal have been installed without us having to add them individually.  This is one of the nice features of uv.  It takes care of all the transitive dependencies for us.
To execute cells in the notebook, you must also add ipykernel to the project dependencies, and to avoid some warning, we will also add ipywidgets.  These can be added a part of the development environment:

```bash
uv add --dev ipykernel ipywidgets
```




# Load the Model

Let's import the load function from the mlx-lm package, and then load the model.  We will use the `Qwen3-4B-Instruct-2507-4bit` model, which is a 4-bit quantized model that will fit in 32GB of RAM. This is a "normal" model;  later we will try to use `gpt-oss-20b-MXFP4-Q4` which uses a completely different prompting structure.

In [1]:
from mlx_lm import load
# MODEL_ID = "mlx-community/gpt-oss-20b-MXFP4-Q4"  # 4-bit quantization for 32GB RAM
MODEL_ID = "mlx-community/Qwen3-4B-Instruct-2507-4bit" 
print("Loading model... (first time may take a while)")
model, tokenizer = load(MODEL_ID)

Loading model... (first time may take a while)


Fetching 11 files:   0%|          | 0/11 [00:00<?, ?it/s]

# Chat with the Model

Let's chat with the model.  We will use the `generate` function to generate a response to the question "What is the capital of the United States?"  We are not setting up a specific prompt template - just the question will go straight in.  You will notice that the model wanders arount and adds information beyond what we asked.




In [2]:
from mlx_lm import generate
response = generate(model, tokenizer, "What is the capital of the United States?")
print(response)

The capital of the United States is Washington, D.C. (Washington, District of Columbia). It is located in the eastern part of the country and serves as the seat of the federal government, housing the U.S. Congress, the White House, and other key government institutions. 

Note: While Washington, D.C. is the capital, it is not one of the 50 U.S. states, but rather a federal district established in 1790. The city is named after George Washington, the first U.S. president. 

âœ… Correct answer: **Washington, D.C.**.


## Adding a Prompt Template
Let's see what happens if we add a prompt template to the model.  We will use the `apply_chat_template` function to add a prompt template to the model.  We will use the `chat_template` attribute of the tokenizer to get the prompt template.  We will then use the `apply_chat_template` function to add the prompt template to the model.  Note that we are not tokenizing the prompt!

In [3]:
prompt = "What is the capital of the United States?"
if tokenizer.chat_template is not None:
    messages = [{"role":"user", "content":prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True,
        tokenize=False,
    )
print(prompt)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
print(response)

<|im_start|>user
What is the capital of the United States?<|im_end|>
<|im_start|>assistant

The capital of the United States is Washington, D.C.
Prompt: 17 tokens, 136.393 tokens-per-sec
Generation: 13 tokens, 92.263 tokens-per-sec
Peak memory: 2.338 GB
The capital of the United States is Washington, D.C.


Let's get rid of the print statements to see what would be normal behavior.

We will leave the verbose output on, which will show us the final response as well as some performance measures.

In [4]:
prompt = "What is the capital of the United States?"
if tokenizer.chat_template is not None:
    messages = [{"role":"user", "content":prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True,
        tokenize=False,
    )
response = generate(model, tokenizer, prompt=prompt, verbose=True)

The capital of the United States is Washington, D.C.
Prompt: 17 tokens, 137.108 tokens-per-sec
Generation: 13 tokens, 92.284 tokens-per-sec
Peak memory: 2.338 GB


## Is this model smart?

Let's ask the model for the capital of the United Moons, and see if it can handle it.

In [5]:
prompt = "What is the capital of the United Moons?"
if tokenizer.chat_template is not None:
    messages = [{"role":"user", "content":prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True,
        tokenize=False,
    )
response = generate(model, tokenizer, prompt=prompt, verbose=True)

The United Moons does not exist as a real country or political entity. Therefore, it does not have a capital.

The term "United Moons" is likely a fictional or humorous reference, possibly inspired by the idea of a union of moon-like bodies or a satirical concept. In reality, moons (such as those of Jupiter or Saturn) are natural satellites and not independent nations.

If you're referring to a fictional universe, game, or creative work (like a sci-fi story or a joke), feel free to provide more context â€” Iâ€™d be happy to help with a fun or imaginative answer!

For now, the capital of the United Moons: **Does not exist.** ðŸ˜Š

(But if we're being playful â€” maybe *Luna Prime*? ðŸŒ•)
Prompt: 18 tokens, 142.693 tokens-per-sec
Generation: 162 tokens, 84.931 tokens-per-sec
Peak memory: 2.344 GB


## Changing the default values of the generate method

We can change sampling parameters by passing a sampler object to the generate method. This is a bit of a pain, so we will use the `make_sampler` function to create a sampler object with the default values (copied from the source code in the mlx-lm package).


In [19]:
from mlx_lm.sample_utils import make_sampler

# Here is a sampler with the default values
sampler = make_sampler(
    temp= 0.0,
    top_p= 0.0,
    min_p= 0.0,
    min_tokens_to_keep= 1,
    top_k= 0,
    xtc_probability= 0.0,
    xtc_threshold= 0.0,
    xtc_special_tokens= [],
)

# Pass the sampler to generate
response = generate(
    model, 
    tokenizer, 
    prompt=prompt, 
    verbose=True,
    max_tokens=256,
    sampler=sampler,  # Pass the sampler object here
)

The United Moons does not exist as a real country or political entity. Therefore, it does not have a capital.

The term "United Moons" is likely a fictional or humorous reference, possibly inspired by the idea of a union of moon-like bodies or a satirical concept. In reality, moons (such as those of Jupiter or Saturn) are natural satellites and not independent nations.

If you're referring to a fictional universe, game, or creative work (like a sci-fi story or a joke), feel free to provide more context â€” Iâ€™d be happy to help with a fun or imaginative answer!

For now, the capital of the United Moons: **Does not exist.** ðŸ˜Š

(But if we're being playful â€” maybe *Luna Prime*? ðŸŒ•)
Prompt: 18 tokens, 82.939 tokens-per-sec
Generation: 162 tokens, 82.476 tokens-per-sec
Peak memory: 2.344 GB


## Migrating to scripts
Jypyter notebooks are great for interactive development, but we will need to migrate to scripts when we are ready to deploy.  

We will illustrate
this by reproducing earlier code importing from scripts that I have saved, and then we will create an app.py file that will run everything.


In [23]:
from setup_scripts import get_model, get_response

Loading model... (first time may take a while)


Fetching 11 files:   0%|          | 0/11 [00:00<?, ?it/s]

The United Moons does not exist as a real country or political entity. Therefore, it does not have a capital.

The term "United Moons" is likely a fictional or humorous reference, possibly inspired by the idea of a union of moon-like bodies or a satirical concept. In reality, moons (such as those of Jupiter or Saturn) are natural satellites and not independent nations.

If you're referring to a fictional universe, game, or creative work (like a sci-fi story or a joke), feel free to provide more context â€” Iâ€™d be happy to help with a fun or imaginative answer!

For now, the capital of the United Moons: **Does not exist.** ðŸ˜Š

(But if we're being playful â€” maybe *Luna Prime*? ðŸŒ•)
Prompt: 18 tokens, 12.093 tokens-per-sec
Generation: 162 tokens, 84.401 tokens-per-sec
Peak memory: 6.874 GB
