> https://hf-mirror.com/Qwen/Qwen3-0.6B

In [1]:
%pip install transformers ipykernel ipywidgets iprogress sacremoses tqdm hf_xet

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Note: you may need to restart the kernel to use updated packages.


In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-0.6B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    # device_map="auto",
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)


model.safetensors:   0%|          | 0.00/1.50G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

thinking content: <think>
Okay, the user wants a short introduction to a large language model. Let me start by recalling what I know. Large language models, like BERT or GPT, are AI systems designed to understand and generate human language. They're trained on massive datasets to learn patterns and improve performance.

I should mention their ability to understand context and generate coherent responses. It's important to highlight their use cases, such as writing, translation, and customer support. Also, emphasize that they're not human but are developed with extensive training data. Keep it concise but informative. Avoid technical jargon if possible, but ensure clarity. Make sure to include key points like training data, capabilities, and real-world applications. Alright, that should cover it without being too lengthy.
</think>
content: A large language model (LLM) is an AI system designed to understand and generate human language. They are trained on vast datasets to learn patterns 

In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer

class QwenChatbot:
    def __init__(self, model_name="Qwen/Qwen3-0.6B"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name)
        self.history = []

    def generate_response(self, user_input):
        messages = self.history + [{"role": "user", "content": user_input}]

        text = self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )

        inputs = self.tokenizer(text, return_tensors="pt")
        response_ids = self.model.generate(**inputs, max_new_tokens=32768)[0][len(inputs.input_ids[0]):].tolist()
        response = self.tokenizer.decode(response_ids, skip_special_tokens=True)

        # Update history
        self.history.append({"role": "user", "content": user_input})
        self.history.append({"role": "assistant", "content": response})

        return response

# Example Usage
if __name__ == "__main__":
    chatbot = QwenChatbot()

    # First input (without /think or /no_think tags, thinking mode is enabled by default)
    user_input_1 = "How many r's in strawberries?"
    print(f"User: {user_input_1}")
    response_1 = chatbot.generate_response(user_input_1)
    print(f"Bot: {response_1}")
    print("----------------------")

    # Second input with /no_think
    user_input_2 = "Then, how many r's in blueberries? /no_think"
    print(f"User: {user_input_2}")
    response_2 = chatbot.generate_response(user_input_2)
    print(f"Bot: {response_2}") 
    print("----------------------")

    # Third input with /think
    user_input_3 = "Really? /think"
    print(f"User: {user_input_3}")
    response_3 = chatbot.generate_response(user_input_3)
    print(f"Bot: {response_3}")


User: How many r's in strawberries?
Bot: <think>
Okay, the user is asking, "How many r's in strawberries?" Let me break this down.

First, I need to count the number of 'r's in the word "strawberries". Let me write it out: S-T-R-A-W-B-E-R-R-I-N-G-S. Wait, let me double-check. The word is spelled S-T-R-A-W-B-E-R-R-I-N-G-S. So, let's count each 'r' one by one.

Starting from the beginning: S, T, R, A, W, B, E, R, R, I, N, G, S. Now, the letters are: S, T, R, A, W, B, E, R, R, I, N, G, S. So the letters 'r' appear at positions 3, 8, and 9. That's three 'r's. Wait, let me confirm again. Position 3 is the third letter, which is R. Then position 8 is the third 'r' after the first two. Then position 9 is the third 'r' after the second. So yes, three 'r's in "strawberries". 

But wait, sometimes people might miscount. Let me spell it again: S-T-R-A-W-B-E-R-R-I-N-G-S. So the letters are S, T, R, A, W, B, E, R, R, I, N, G, S. So the 'r's are at the third letter, then the third 'r' after that, an

In [35]:
%pip install qwen_agent dotenv mcp uv "qwen-agent[code_interpreter]"

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting fastapi>=0.103.1 (from qwen-agent[code_interpreter])
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/50/b3/b51f09c2ba432a576fe63758bddc81f78f0c6309d9e5c10d194313bf021e/fastapi-0.115.12-py3-none-any.whl (95 kB)
Collecting jupyter>=1.0.0 (from qwen-agent[code_interpreter])
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/38/64/285f20a31679bf547b75602702f7800e74dbabae36ef324f716c02804753/jupyter-1.1.1-py2.py3-none-any.whl (2.7 kB)
Collecting matplotlib (from qwen-agent[code_interpreter])
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7a/81/dae7e14042e74da658c3336ab9799128e09a1ee03964f2d89630b5d12106/matplotlib-3.10.1-cp312-cp312-macosx_11_0_arm64.whl (8.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.0/8.0 MB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting pandas (from qwen-agent[code_interpreter])
  Downloading https://pypi.tuna.tsinghua.edu.

In [5]:
!uv tool install mcp-server-time

Ignoring existing environment for `[36mmcp-server-time[39m`: the requested Python interpreter does not match the environment interpreter
[2K[2mResolved [1m23 packages[0m [2min 3ms[0m[0m                                          [0m
[2K[2mInstalled [1m23 packages[0m [2min 12ms[0m[0m                               [0m
 [32m+[39m [1mannotated-types[0m[2m==0.7.0[0m
 [32m+[39m [1manyio[0m[2m==4.9.0[0m
 [32m+[39m [1mcertifi[0m[2m==2025.4.26[0m
 [32m+[39m [1mclick[0m[2m==8.1.8[0m
 [32m+[39m [1mh11[0m[2m==0.16.0[0m
 [32m+[39m [1mhttpcore[0m[2m==1.0.9[0m
 [32m+[39m [1mhttpx[0m[2m==0.28.1[0m
 [32m+[39m [1mhttpx-sse[0m[2m==0.4.0[0m
 [32m+[39m [1midna[0m[2m==3.10[0m
 [32m+[39m [1mmcp[0m[2m==1.7.1[0m
 [32m+[39m [1mmcp-server-time[0m[2m==0.6.2[0m
 [32m+[39m [1mpydantic[0m[2m==2.11.4[0m
 [32m+[39m [1mpydantic-core[0m[2m==2.33.2[0m
 [32m+[39m [1mpydantic-settings[0m[2m==2.9.1[0m
 [32m+[39m [1mpython-

In [13]:
!uv init

Initialized project `[36mai-prompt-research[39m`


In [38]:
!export UV_DEFAULT_INDEX=https://pypi.tuna.tsinghua.edu.cn/simple

In [24]:
!uv add --default-index https://pypi.tuna.tsinghua.edu.cn/simple requests mcp-server-fetch mcp-server-time

[2K[2mResolved [1m39 packages[0m [2min 701ms[0m[0m                                        [0m
[2K[37m⠙[0m [2mPreparing packages...[0m (0/2)                                                   
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)---------[0m[0m     0 B/5.56 KiB                 [1A
[2K[37m⠙[0m [2mPreparing packages...[0m (0/2)0/2)                                                   [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)----[0m[0m     0 B/339.69 KiB                    [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)----[0m[0m 8.00 KiB/339.69 KiB                   [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)----[0m[0m 16.00 KiB/339.69 KiB                  [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)----[0m[0m 24.00 KiB/339.69 KiB                  [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/2)----[0m[0m 32.00 KiB/339.69 KiB                  [1A
[2K[1A[37m⠙[0m [2mPre

In [22]:
!uv run mcp-server-fetch https://qwenlm.github.io/blog/

usage: mcp-server-fetch [-h] [--user-agent USER_AGENT] [--ignore-robots-txt]
                        [--proxy-url PROXY_URL]
mcp-server-fetch: error: unrecognized arguments: https://qwenlm.github.io/blog/


In [None]:
!uv run mcp-server-time --local-timezone=Asia/Shanghai

In [42]:
from qwen_agent.agents import Assistant

# Define LLM
llm_cfg = {
    'model': 'qwen3:0.6b',

    # Use the endpoint provided by Alibaba Model Studio:
    # 'model_type': 'qwen_dashscope',
    # 'api_key': os.getenv('DASHSCOPE_API_KEY'),

    # Use a custom endpoint compatible with OpenAI API:
    'model_server': 'http://localhost:11434/v1',  # api_base
    'api_key': 'EMPTY',

    # Other parameters:
    # 'generate_cfg': {
    #         # Add: When the response content is `<think>this is the thought</think>this is the answer;
    #         # Do not add: When the response has been separated by reasoning_content and content.
    #         'thought_in_content': True,
    #     },
}

# Define Tools
tools = [
    {'mcpServers': {  # You can specify the MCP configuration file
            # 'time': {
            #     'command': 'uv',
            #     'args': ['run', 'mcp-server-time', '--local-timezone=Asia/Shanghai']
            # },
            "fetch": {
                "command": "uv",
                "args": ["run", "mcp-server-fetch"]
            }
        }
    },
  'code_interpreter',  # Built-in tools
]

# Define Agent
bot = Assistant(llm=llm_cfg, function_list=tools)

# Streaming generation
# messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}]
messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/zh/blog/ 介绍 Qwen 最新的开发情况'}]
for responses in bot.run(messages=messages):
    pass
print(responses)


2025-05-07 20:23:45,424 - mcp_manager.py - 109 - INFO - Initializing MCP tools from mcpservers config: {'mcpServers': {'fetch': {'command': 'uv', 'args': ['run', 'mcp-server-fetch']}}}
2025-05-07 20:23:45,435 - mcp_manager.py - 246 - INFO - Will initialize a MCP stdio_client, if this takes forever, please check whether the mcp config is correct: {'command': 'uv', 'args': ['run', 'mcp-server-fetch']}


[{'role': 'assistant', 'content': '<think>\n好的，用户问的是Qwen最新的开发情况。我需要先看看Qwen的官方网站或者相关的博客内容。用户提供的链接是https://qwenlm.github.io/zh/blog/，看起来像是一个博客页面。\n\n首先，我应该使用fetch-fetch工具来访问这个URL。参数里需要包含url，也就是这个链接，还有可以设置max_length和raw参数。用户可能希望得到较短的总结或者具体的代码信息，所以需要根据用户的需求来决定参数。不过用户的问题只是询问介绍，所以可能不需要太详细的内容，但为了提供完整信息，可能需要设置max_length为5000，确保内容不超过限制。此外，raw参数设为false，获取HTML内容，如果需要的话。但用户的问题只是介绍，可能不需要进一步的渲染，所以暂时先设置raw为false，然后调用fetch-fetch函数获取数据。\n</think>\n\n', 'reasoning_content': ''}, {'role': 'assistant', 'content': '', 'reasoning_content': '', 'function_call': {'name': 'fetch-fetch', 'arguments': '{"url": "https://qwenlm.github.io/zh/blog/", "max_length": 5000, "raw": false}'}}, {'role': 'function', 'content': 'Contents of https://qwenlm.github.io/zh/blog/:\nQWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORD\n引言 今天，我们宣布推出 Qwen3，这是 Qwen 系列大型语言模型的最新成员。我们的旗舰模型 Qwen3-235B-A22B 在代码、数学、通用能力等基准测试中，与 DeepSeek-R1、o1、o3-mini、Grok-3 和 Gemini-2.5-Pro 等顶级模型相比，表现出极具竞争力的结果。此外，小型 MoE 模型 Qwen3-30B-A3B 的激活参数数量是 QwQ