In [2]:
!pip install transformers



In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-0.6B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)


thinking content: <think>
Okay, the user wants a short introduction to a large language model. Let me start by recalling what I know. Large language models are AI systems designed to understand and generate human language. They have a huge dataset, so they can learn from a lot of text.

I should mention their ability to understand and generate natural language, which makes them useful in various fields like customer service, content creation, and research. Also, their training data is vast, so they can handle complex tasks. 

Wait, maybe I should highlight their adaptability and how they can improve with more data. Should I include examples of their applications? Like helping with writing, answering questions, or even creative tasks. 

I need to keep it concise but informative. Avoid jargon, make sure it's clear and straightforward. Let me check if I'm missing any key points. Oh, also mention that they're trained on massive amounts of text, which enhances their performance. 

Putting i

In [5]:
from transformers import AutoModelForCausalLM, AutoTokenizer

class QwenChatbot:
    def __init__(self, model_name="Qwen/Qwen3-0.6B"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name)
        self.history = []

    def generate_response(self, user_input):
        messages = self.history + [{"role": "user", "content": user_input}]

        text = self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )

        inputs = self.tokenizer(text, return_tensors="pt")
        response_ids = self.model.generate(**inputs, max_new_tokens=32768)[0][len(inputs.input_ids[0]):].tolist()
        response = self.tokenizer.decode(response_ids, skip_special_tokens=True)

        # Update history
        self.history.append({"role": "user", "content": user_input})
        self.history.append({"role": "assistant", "content": response})

        return response

# Example Usage
if __name__ == "__main__":
    chatbot = QwenChatbot()

    # First input (without /think or /no_think tags, thinking mode is enabled by default)
    user_input_1 = "How many r's in strawberries?"
    print(f"User: {user_input_1}")
    response_1 = chatbot.generate_response(user_input_1)
    print(f"Bot: {response_1}")
    print("----------------------")

    # Second input with /no_think
    user_input_2 = "Then, how many r's in blueberries? /no_think"
    print(f"User: {user_input_2}")
    response_2 = chatbot.generate_response(user_input_2)
    print(f"Bot: {response_2}")
    print("----------------------")

    # Third input with /think
    user_input_3 = "Really? /think"
    print(f"User: {user_input_3}")
    response_3 = chatbot.generate_response(user_input_3)
    print(f"Bot: {response_3}")



User: How many r's in strawberries?
Bot: <think>
Okay, the user is asking how many r's are in the word "strawberries". Let me break this down. First, I need to check each letter in the word and count the number of times 'r' appears. Let me write out the word: S-T-R-A-W-B-E-R-R-I-N-G-S. Now, looking at each letter:

1. S - no r
2. T - no r
3. R - one r
4. A - no r
5. W - no r
6. B - no r
7. E - no r
8. R - another r
9. R - third r
10. I - no r
11. N - no r
12. G - no r

So that's three 'r's. Wait, let me double-check to make sure I didn't miss any. The word is "strawberries". Let me spell it again: S-T-R-A-W-B-E-R-R-I-N-G-S. Yep, that's three 'r's. So the answer should be three. I should present this clearly to the user.
</think>

There are **3 r's** in the word "strawberries".  

**Answer:** 3
----------------------
User: Then, how many r's in blueberries? /no_think
Bot: <think>

</think>

There are **3 r's** in the word **blueberries**.  

**Answer:** 3
----------------------
User: Re