

# Hugging Face Transformers Qwen 2.5 Math Model

In this notebook, I will illustrate the capabilities of the [`Qwen/Qwen2.5-Math-1.5B-Instruct` model](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct).

The Qwen 2.5 Math model is a state-of-the-art large language model (LLM) specifically designed to solve mathematical problems efficiently and accurately.

### Key Features of the Qwen 2.5 Math Model

- **Step-by-Step Reasoning**: The model excels at breaking down complex mathematical problems into simpler, manageable steps, making it easier for users to follow the logic behind each solution.
- **Natural Language Understanding**: Users can interact with the model using natural language prompts, allowing for a conversational approach to asking questions and receiving answers.
- **Formatted Output**: Answers are presented in a clear and structured format, often utilizing LaTeX for mathematical expressions, which significantly enhances readability and comprehension.

In the code section below, I'll show how to run the model and interact with Hugging Face Transformers. Specifically, I'll provide 2 prompt examples that highlight the model's problem-solving capabilities for simple mathematical equations.



**Set up Hugging Face Access Token:**



In [2]:
import os
from google.colab import userdata

# Retrieve the Hugging Face token from the user data
hf_token = userdata.get('HF_Key')  # Make sure 'HF_Key' matches the name you set in the User Data panel

# Check if the token was retrieved successfully
if hf_token is None:
    raise ValueError("Hugging Face token not found. Please ensure it is set in the User Data panel.")

# Set the token as an environment variable (optional)
os.environ['HF_KEY'] = hf_token

# You can now use os.environ['HF_KEY'] wherever you need the token
print("Hugging Face token retrieved and set as an environment variable.")


Hugging Face token retrieved and set as an environment variable.


In [9]:
# Install required packages
!pip install huggingface_hub[hf_xet] transformers

In [11]:
#transformers>=4.37.0 for Qwen2.5-Math models
pip install --upgrade transformers>=4.37.0


In [10]:
!pip install torch


In [1]:
import transformers
print(transformers.__version__)


4.52.2


In [3]:
#Import necessary libraries
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

### Running the model as per the Hugging Face quick start guide to demonstrate its capabilities.


In [4]:
model_name = "Qwen/Qwen2.5-Math-1.5B-Instruct"
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."

# CoT
messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]

# TIR
messages = [
    {"role": "system", "content": "Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.32k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

## Experimenting with Prompts: Solving Mathematical Equations

In [5]:
# Define the prompt
prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."

# Step a) Define messages for CoT
messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]

# Step b) Prepare the input for the model
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Step c) Print the formatted input text for debugging
print("Formatted Input Text:")
print(text)

model_inputs = tokenizer([text], return_tensors="pt").to(device)

# Step d) Generate the response
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)

# Print the generated IDs to check if any output was produced
print("Generated IDs:")
print(generated_ids)

# Decode the generated response
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

# Step e)Check if there are any generated IDs before decoding
if generated_ids:
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print("Model Response:")
    print(response)
else:
    print("No response generated.")


Formatted Input Text:
<|im_start|>system
Please reason step by step, and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$.<|im_end|>
<|im_start|>assistant

Generated IDs:
tensor([[151644,   8948,    198,   5501,   2874,   3019,    553,   3019,     11,
            323,   2182,    697,   1590,   4226,   2878,   1124,  79075,  46391,
         151645,    198, 151644,    872,    198,   9885,    279,    897,    315,
            400,     87,      3,    429,  67901,    279,  23606,    400,     19,
             87,     10,     20,    284,    220,     21,     87,     10,     22,
          12947, 151645,    198, 151644,  77091,    198,   1249,  11625,    279,
          23606,  17767,     19,     87,    488,    220,     20,    284,    220,
             21,     87,    488,    220,     22,     59,    701,    582,    686,
           1795,    264,   3019,  14319,  29208,   5486,    311,  42123,    279,
           3890,  1

**Another prompt:**

In [8]:

# Define the new prompt
prompt = "What is the value of $ y $ in the equation $ 3y - 4 = 11 $?"

# SDefine messages for CoT
messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]

# Prepare the input for the model
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Print the formatted input text for debugging
print("Formatted Input Text:")
print(text)

model_inputs = tokenizer([text], return_tensors="pt").to(device)

# Generate the response
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)

# Print the generated IDs to check if any output was produced
print("Generated IDs:")
print(generated_ids)

# Decode the generated response
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

# Check if there are any generated IDs before decoding
if generated_ids:
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print("Model Response:")
    print(response)
else:
    print("No response generated.")


Formatted Input Text:
<|im_start|>system
Please reason step by step, and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
What is the value of $ y $ in the equation $ 3y - 4 = 11 $?<|im_end|>
<|im_start|>assistant

Generated IDs:
tensor([[151644,   8948,    198,   5501,   2874,   3019,    553,   3019,     11,
            323,   2182,    697,   1590,   4226,   2878,   1124,  79075,  46391,
         151645,    198, 151644,    872,    198,   3838,    374,    279,    897,
            315,    400,    379,    400,    304,    279,  23606,    400,    220,
             18,     88,    481,    220,     19,    284,    220,     16,     16,
          50061, 151645,    198, 151644,  77091,    198,   1249,   1477,    279,
            897,    315,  17767,    379,   1124,      8,    304,    279,  23606,
          17767,    220,     18,     88,    481,    220,     19,    284,    220,
             16,     16,   1124,    701,    582,    686,   1795,    264,   3019,
          14319,  29208,



### Explanation of the Prompt Code

The prompt code involves defining the interaction messages for the model, preparing the input format, generating a response based on the input, and finally decoding the generated output into a readable format. This process allows you to effectively utilize the model to solve mathematical equations or other tasks.

Detailing how to prepare the input for the model, generate a response, and decode the output:

**Step a) Define Messages for Chain of Thought (CoT)**

```python
messages = [
    {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]
```

**Explanation**:
- In this step, we define a list of messages that will be used to interact with the model.
- The first message is from the "system," instructing the model to reason through the problem step by step and to format the final answer within a LaTeX box (using `\boxed{}`), which is a common way to present mathematical answers.
- The second message is from the "user," containing the actual prompt (the mathematical equation) that you want the model to solve. This structure helps the model understand the context and the task it needs to perform.

**Step b) Prepare the Input for the Model**

```python
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
```

**Explanation**:
- Here, you prepare the input for the model using a tokenizer. The `apply_chat_template` method formats the messages into a structure that the model can understand.
- The `tokenize=False` argument indicates that you do not want to tokenize the messages at this stage, while `add_generation_prompt=True` ensures that any necessary prompts for generation are included.
- The resulting `text` variable will contain the formatted input that combines the system and user messages, making it ready for the model to process.

**Step c) Generate the Response**

```python
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
```

**Explanation**:
- In this step, you call the `generate` method of the model to produce a response based on the prepared input.
- The `model_inputs` variable (which should be defined earlier in your code) contains the tokenized input data.
- The `max_new_tokens=512` argument specifies the maximum number of new tokens that the model can generate in its response, helping to control the length of the output.
- The result, `generated_ids`, will contain the IDs of the tokens generated by the model, which represent the model's response to the prompt.

**Step d) Decode the Generated Response**

```python
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
```

**Explanation**:
- This step involves decoding the generated token IDs back into human-readable text.
- The list comprehension iterates over pairs of `input_ids` (the original input tokens) and `output_ids` (the generated tokens). It slices the `output_ids` to exclude the tokens that correspond to the input, effectively isolating only the newly generated tokens.
- The resulting `generated_ids` list will contain only the tokens that represent the model's response.

**Step e) Final Decoding and Output**

```python
if generated_ids:
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print("Model Response:")
    print(response)
else:
    print("No response generated.")
```

**Explanation**:
- This step checks if there are any generated IDs. If there are, it decodes them into a string using the `batch_decode` method of the tokenizer, which converts the token IDs back into text while skipping any special tokens.
- The decoded response is then printed out. If no response was generated, a message indicating that is printed instead.



## Summary

In this notebook, I've explored how to utilize the Hugging Face Qwen 2.5 Math model to solve mathematical equations using a structured prompt code. First, I demonstrated how to define interaction messages that guide the model in reasoning step by step. Next, I prepared the input format to ensure compatibility with the model. Finally, I generated a response based on the provided prompt and decoded the output into a human-readable format.