In [27]:
from mlx_lm import load, stream_generate


In [28]:
model_path = "../../huggingface-models/qwen3-8b-4bit/"
model, tokenizer = load(model_path)

In [35]:
prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]

In [55]:
def raw_prompt_without_generation(tokenizer, messages):
    return tokenizer.apply_chat_template(
        messages, add_generation_prompt=False, enable_thinking=True
    )

def raw_prompt_with_generation(tokenizer, messages):
    return tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, enable_thinking=True
    )

def raw_prompt_without_thinking(tokenizer, messages):
    return tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, enable_thinking=False
    )

def raw_prompt_with_system(tokenizer, messages):
    updated_messages = [{"role": "system", "content" : 10 * "You are an AI!\n"}] + messages
    return tokenizer.apply_chat_template(
        updated_messages, add_generation_prompt=True, enable_thinking=True
    )

In [56]:
tok_without_generation = raw_prompt_without_generation(tokenizer, messages)
tok_with_generation = raw_prompt_with_generation(tokenizer, messages)
tok_without_thinking = raw_prompt_without_thinking(tokenizer, messages)
tok_with_system = raw_prompt_with_system(tokenizer, messages)

print(100 * "-")
print("Without Generation:\n" + 100 * "-" + "\n", tokenizer.decode(tok_without_generation))
print(100 * "-")
print("With Generation:\n" + 100 * "-" + "\n", tokenizer.decode(tok_with_generation))
print(100 * "-")
print("Without Thinking:\n" + 100 * "-" + "\n", tokenizer.decode(tok_without_thinking))
print(100 * "-")
print("With System:\n" + 100 * "-" + "\n", tokenizer.decode(tok_with_system))
print(100 * "-")

----------------------------------------------------------------------------------------------------
Without Generation:
----------------------------------------------------------------------------------------------------
 <|im_start|>user
Write a story about Einstein<|im_end|>

----------------------------------------------------------------------------------------------------
With Generation:
----------------------------------------------------------------------------------------------------
 <|im_start|>user
Write a story about Einstein<|im_end|>
<|im_start|>assistant

----------------------------------------------------------------------------------------------------
Without Thinking:
----------------------------------------------------------------------------------------------------
 <|im_start|>user
Write a story about Einstein<|im_end|>
<|im_start|>assistant
<think>

</think>


----------------------------------------------------------------------------------------------------
W

In [51]:
def next_token_preds(model, tokenizer, tokenized_prompt):
    for response in stream_generate(model, tokenizer, tokenized_prompt, max_tokens=2**15):
        print(response.text, end="", flush=True)
    print()


In [52]:
next_token_preds(model, tokenizer, tok_without_generation)

<|im_start|>
Okay, I need to write a story about Einstein. Let me think about how to approach this. First, I should decide on the angle. Maybe a fictional story that's not just a biography. Maybe a short story that highlights a particular aspect of his life or a hypothetical scenario. 

I remember that Einstein was known for his theory of relativity, his work on the photoelectric effect, and his role in the atomic bomb. Also, his personality—his quirks, like the way he spoke, his love for music, and his pacifist views. Maybe I can create a story that shows a day in his life, or perhaps a hypothetical situation where he faces a moral dilemma.

Wait, the user might want something engaging and not just a dry account. Maybe a story that's a bit imaginative. Let me think of a scenario. What if Einstein is in a situation where he has to make a choice between scientific discovery and ethical considerations? That could be a good conflict.

Alternatively, maybe a story where he interacts with s

In [53]:
next_token_preds(model, tokenizer, tok_with_generation)

<think>
Okay, the user wants a story about Einstein. Let me think about how to approach this. First, I need to decide on the angle. Should it be a biographical story, a fictionalized account, or maybe a fable? The user didn't specify, so maybe a biographical story would be safe. But I should make it engaging, not just a list of facts.

Einstein's life has many interesting points: his early life, the theory of relativity, his work in Switzerland, his pacifism, and his later years in the US. Maybe focus on a specific event or period. The user might want a narrative that highlights his personality and achievements.

I should start with a hook. Maybe a scene from his childhood or a pivotal moment. The famous story about him not talking until age four? Or his thought experiments as a child. That could be a good way to show his curiosity.

Then, structure the story chronologically. Early life, education, breakthroughs, personal life, and legacy. But need to keep it concise. Maybe highlight a

In [54]:
next_token_preds(model, tokenizer, tok_without_thinking)

**The Curious Mind of Albert Einstein**

In the quiet town of Ulm, Germany, nestled between the Danube River and the rolling hills of the Swabian Alps, a boy named Albert Einstein was born on March 14, 1879. From an early age, Albert showed a peculiar fascination with the world around him. While other children played with toys, he would spend hours staring at the stars, trying to understand the invisible forces that held the universe together.

His parents, Hermann and Pauline Einstein, were not particularly interested in science, but they recognized their son's unusual curiosity. Albert's early education was not without its challenges. He was often described as a quiet, introspective child who preferred the company of books to people. His teachers found him difficult to manage, and he often asked questions that seemed to defy the conventional wisdom of the time.

At the age of 12, Albert was given a compass for his birthday. He was mesmerized by the needle that always pointed north, n