In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "/mnt/petrelfs/songmingyang/songmingyang/model/reasoning/policy_models/QwQ-32B-Preview"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|██████████| 17/17 [01:35<00:00,  5.62s/it]


In [2]:
## Building Fewshots
from mr_eval.utils.utils import *
prm_test = "/mnt/petrelfs/songmingyang/code/reasoning/MR_Hallucination/mr_annotate/build_data/selection_of_data/prm_correct_data/prm_test_p2.jsonl"
prm_test_data = process_jsonl(prm_test)

def prepare_model_input(instruction, response, tokenizer):
    template = "Human: {q}\nAssistant: {r}"
    # if gsm8k
    # assistance_response = re.sub(r'\n#### .*\n', '\n', response, flags=re.DOTALL)
    assistance_response = response
    inputs = template.format(q=instruction, r=assistance_response)
    tokenized_inputs = tokenizer(inputs, return_tensors='pt')
    return tokenized_inputs
def answer_sequence_to_str(answer_sequence):
    res = []
    for idx,step in enumerate(answer_sequence):
        res.append(f"Step {idx+1}. {step['text']}\n\n")
    res_str = "".join(res)
    return res_str

def answer_sequence_to_default_str(answer_sequence,step_tag = 'и'):
    res = []
    for idx,step in enumerate(answer_sequence):
        res.append(f"Step {idx+1}: {step['text']} {step_tag}\n")
    res_str = "".join(res)
    return res_str
    
def answer_sequence_to_shepherd_str(answer_sequence,step_tag = 'ки'):
    res = []
    for idx,step in enumerate(answer_sequence):
        res.append(f"Step {idx+1}: {step['text']} {step_tag}\n")
    res_str = "".join(res)
    return res_str

def answer_sequence_to_reasoneval_list(answer_sequence):
    res = []
    for idx,step in enumerate(answer_sequence):
        res.append(f"{idx+1}. {step['text']}")
    return res
    

def get_best_answer_by_item(item,return_type="shepherd"):
    steps = prm_item["label"]["steps"]
    best_answers = []
    for step in steps:
        if step["human_completion"] is not None and step["chosen_completion"] is None:
            best_answers.append(step["human_completion"])
        elif step["chosen_completion"] is not None:
            best_answers.append(step["completions"][step["chosen_completion"]])
        else:
            print(f"skipped one step")
    if return_type == "shepherd":
        answer_str = answer_sequence_to_shepherd_str(best_answers)
    elif return_type == "str":
        answer_str = answer_sequence_to_str(best_answers)
    elif return_type == "reasoneval":
        answer_str = answer_sequence_to_reasoneval_list(best_answers)
    elif return_type == "default":
        answer_str = answer_sequence_to_default_str(best_answers)
    else:
        answer_str =  best_answers
    return answer_str

def get_latex_str(question,answer):
    res = f"Question:\n\n{question}\n\nAnswer:\n\n{answer}"
    return res



In [5]:
idx = 12
prm_item = prm_test_data[idx]

question = prm_item['question']["problem"]
ground_truth = prm_item['question']["ground_truth_answer"]

steps = prm_item["label"]["steps"]
best_answer = get_best_answer_by_item(prm_item,return_type="reasoneval")
best_steps = get_best_answer_by_item(prm_item,return_type="step")
best_latex = get_best_answer_by_item(prm_item,return_type="str")
latex_str = get_latex_str(question,best_latex)
best_default = get_best_answer_by_item(prm_item,return_type="default")
print(best_latex)

Step 1. To solve an equation involving absolute value, I need to consider two cases: one where the expression inside the absolute value is positive, and one where it is negative.

Step 2. For the first case, I assume that both $x+5$ and $3x-6$ are positive, so I can drop the absolute value signs and get $x+5=3x-6$.

Step 3. Solving for $x$ in this case, I subtract $x$ from both sides and add $6$ to both sides, and get $11=2x$.

Step 4. Dividing both sides by $2$, I get $x=\frac{11}{2}$.

Step 5. For the second case, I assume that both $x+5$ and $3x-6$ are negative, so I can change the signs of both expressions and get $-x-5=-3x+6$.

Step 6. Solving for $x$ in this case, I add $3x$ to both sides and subtract $6$ from both sides, and get $-11=2x$.

Step 7. Dividing both sides by $2$, I get $x=-\frac{11}{2}$.

Step 8. Now I have two possible values for $x$, but the problem asks for the largest one, so I compare them and see that $\frac{11}{2}$ is larger than $-\frac{11}{2}$.

Step 9. Ther

In [6]:
question

'If $|x+5|-|3x-6|=0$, find the largest possible value of $x$. Express your answer as an improper fraction.'

In [7]:
fewshot_q1 ="Compute $\arcsin \left( -\frac{1}{2} \right).$  Express your answer in radians."
fewshot_a1="""
Step 1. I know that the arcsine function is the inverse of the sine function, so I want to find an angle $\theta$ such that $\sin(\theta) = -\frac{1}{2}.$

Step 2. I also know that the range of the arcsine function is $[-\frac{\pi}{2}, \frac{\pi}{2}]$, so I only need to consider angles in the fourth and first quadrants, where sine is negative and positive respectively.

Step 3. I recall that the sine function is periodic with a period of $2\pi$, so any angle that satisfies $\sin(\theta) = -\frac{1}{2}$ must be of the form $\theta = -\frac{\pi}{6} + 2k\pi$ or $\theta = \frac{7\pi}{6} + 2k\pi$, where $k$ is an integer.

Step 4. However, since I want $\theta$ to be in the range of the arcsine function, I need to choose $k$ such that $-\frac{\pi}{2} \leq \theta \leq \frac{\pi}{2}.$

Step 5. This means that $k$ can only be 0 or -1, and the only possible values of $\theta$ are $-\frac{\pi}{6}$ or $\frac{7\pi}{6}.$

Step 6. To decide which one is the correct answer, I can use the fact that the arcsine function is an odd function, meaning that $\arcsin(-x) = -\arcsin(x)$ for any $x$ in the domain.

Step 7. Therefore, since I have $\arcsin \left( -\frac{1}{2} \right)$, I need to take the negative of the angle that gives $\sin(\theta) = \frac{1}{2}$, which is $\frac{\pi}{6}.$

Step 8. So, the final answer is $\theta = -\frac{\pi}{6}.$

# Answer

-\frac{\pi}{6}
"""

fewshot_q2="If $|x+5|-|3x-6|=0$, find the largest possible value of $x$. Express your answer as an improper fraction."

fewshot_a2="""
Step 1. To solve an equation involving absolute value, I need to consider two cases: one where the expression inside the absolute value is positive, and one where it is negative.

Step 2. For the first case, I assume that both $x+5$ and $3x-6$ are positive, so I can drop the absolute value signs and get $x+5=3x-6$.

Step 3. Solving for $x$ in this case, I subtract $x$ from both sides and add $6$ to both sides, and get $11=2x$.

Step 4. Dividing both sides by $2$, I get $x=\frac{11}{2}$.

Step 5. For the second case, I assume that both $x+5$ and $3x-6$ are negative, so I can change the signs of both expressions and get $-x-5=-3x+6$.

Step 6. Solving for $x$ in this case, I add $3x$ to both sides and subtract $6$ from both sides, and get $-11=2x$.

Step 7. Dividing both sides by $2$, I get $x=-\frac{11}{2}$.

Step 8. Now I have two possible values for $x$, but the problem asks for the largest one, so I compare them and see that $\frac{11}{2}$ is larger than $-\frac{11}{2}$.

Step 9. Therefore, the largest possible value of $x$ that satisfies the equation is $\frac{11}{2}$.

# Answer

\frac{11}{2}
"""


In [6]:
prompt = "Three pencils and a jumbo eraser cost $\\$1.24$. Five pencils and a jumbo eraser cost $\\$1.82$. No prices include tax. In cents, what is the cost of a pencil?"
messages = [
    {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step. And return answers as the following format: Step 1. xxx \n Step 2. xxx \n ...... Step n. xxx \n "},
    {"role": "user", "content": fewshot_q1},
    {"role": "assistant", "content": fewshot_a1},
    {"role": "user", "content": fewshot_q2},
    {"role": "assistant", "content": fewshot_a2},
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Step 1. So I have this problem here: three pencils and a jumbo eraser cost $1.24, and five pencils and a jumbo eraser cost $1.82. I need to find the cost of a pencil in cents.

Step 2. First, I should probably convert the dollars to cents to make it easier since the question asks for the answer in cents. So, $1.24 is 124 cents, and $1.82 is 182 cents.

Step 3. Now, I need to set up equations based on the information given. Let's say the cost of one pencil is P cents, and the cost of one jumbo eraser is E cents.

Step 4. From the first statement, "three pencils and a jumbo eraser cost 124 cents," I can write the equation: 3P + E = 124.

Step 5. From the second statement, "five pencils and a jumbo eraser cost 182 cents," I can write: 5P + E = 182.

Step 6. So now I have a system of two equations:

Equation 1: 3P + E = 124

Equation 2: 5P + E = 182

Step 7. I need to solve for P, the cost of a pencil. To do this, I can eliminate E by subtracting Equation 1 from Equation 2.

Step 8. Subtra

In [29]:
import re

def extract_steps(text):
    """
    从文本中提取每个 Step 的内容，并按顺序返回一个列表。
    """
    # 正则表达式：匹配 "Step X." 开头，捕获其后的内容
    pattern = r"(Step \d+\..*?)(?=Step \d+\.|\Z)"  # 匹配 Step 开头到下一个 Step 或文本结束
    steps = re.findall(pattern, text, re.DOTALL)  # 使用 re.DOTALL 允许匹配换行符
    return steps

# 输入文本
text = """
Step 1. Let's define variables for the prices of the items. Let's say the cost of one pencil is p cents and the cost of one jumbo eraser is e cents.

Step 2. Now, we can translate the given information into equations. The first statement says that three pencils and one jumbo eraser cost $1.24. Since 1 dollar is 100 cents, $1.24 is 124 cents. So, the equation is:

3p + e = 124

Step 3. The second statement says that five pencils and one jumbo eraser cost $1.82, which is 182 cents. So, the equation is:

5p + e = 182

Step 4. Now, we have a system of two equations with two variables:

3p + e = 124

5p + e = 182

Step 5. To solve for p and e, we can use the elimination method. Let's subtract the first equation from the second equation to eliminate e:

(5p + e) - (3p + e) = 182 - 124

Step 6. Simplifying this, we get:

5p + e - 3p - e = 58

Which reduces to:

2p = 58

Step 7. Dividing both sides by 2:

p = 29

Step 8. Now that we have the value of p, which is 29 cents, we can substitute it back into one of the original equations to find e. Let's use the first equation:

3(29) + e = 124

Step 9. Calculating 3 times 29:

87 + e = 124

Step 10. Subtracting 87 from both sides:

e = 124 - 87

e = 37

Step 11. So, the cost of one pencil is 29 cents and the cost of one jumbo eraser is 37 cents.

Step 12. But the question is asking specifically for the cost of a pencil in cents, which is 29.

# Answer

\[ \boxed{29} \]
"""

# 提取步骤
steps = extract_steps(text)

# 打印每一步内容
len(steps)

12

In [8]:
## Test batch inference
from copy import deepcopy
prompt = "Three pencils and a jumbo eraser cost $\\$1.24$. Five pencils and a jumbo eraser cost $\\$1.82$. No prices include tax. In cents, what is the cost of a pencil?"
messages = [
    {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step. And return answers as the following format: Step 1. xxx \n Step 2. xxx \n ...... Step n. xxx \n "},
    {"role": "user", "content": fewshot_q1},
    {"role": "assistant", "content": fewshot_a1},
    {"role": "user", "content": fewshot_q2},
    {"role": "assistant", "content": fewshot_a2},
    {"role": "user", "content": prompt},
]
messages2 = [
    {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. "},
    {"role": "user", "content": fewshot_q1},
    {"role": "assistant", "content": fewshot_a1},
    {"role": "user", "content": fewshot_q2},
    {"role": "assistant", "content": fewshot_a2},
    {"role": "user", "content": "If $|x+5|-|3x-6|=0$, find the largest possible value of $x$. Express your answer as an improper fraction."},
]


text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
text2 = tokenizer.apply_chat_template(
    messages2,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text,text2], return_tensors="pt",padding=True).to(model.device)



In [11]:
text = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)
text.shape

torch.Size([1, 1027])

In [11]:
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Step 1. So I have this problem here: three pencils and a jumbo eraser cost $1.24, and five pencils and a jumbo eraser cost $1.82. I need to find the cost of a pencil in cents.

Step 2. First, I should probably set up some equations based on the information given. Let's denote the cost of one pencil as p cents and the cost of one jumbo eraser as e cents.

Step 3. Since the prices are given in dollars, I need to convert them to cents to make calculations easier. So, $1.24 is 124 cents, and $1.82 is 182 cents.

Step 4. Now, translating the statements into equations:

- Three pencils and one eraser cost 124 cents: 3p + e = 124

- Five pencils and one eraser cost 182 cents: 5p + e = 182

Step 5. I have a system of two equations with two variables:

1) 3p + e = 124

2) 5p + e = 182

Step 6. To solve for p, I can subtract equation 1 from equation 2 to eliminate e:

(5p + e) - (3p + e) = 182 - 124

Step 7. Simplifying that:

5p + e - 3p - e = 58

Which reduces to:

2p = 58

Step 8. So, p = 58 

In [14]:
len(generated_ids)


2

In [15]:
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[1]
print(response)

Step 1. I have two equations based on the given information:

3 pencils + 1 eraser = $1.24

5 pencils + 1 eraser = $1.82

Step 2. I need to find the cost of one pencil in cents. So, I should solve for the price of a pencil.

Step 3. Let's denote the price of one pencil as p (in dollars) and the price of one eraser as e (in dollars).

So, the equations become:

3p + e = 1.24 ...(1)

5p + e = 1.82 ...(2)

Step 4. I can subtract equation (1) from equation (2) to eliminate e:

(5p + e) - (3p + e) = 1.82 - 1.24

Which simplifies to:

2p = 0.58

Step 5. Solving for p:

p = 0.58 / 2

p = 0.29 dollars

Step 6. Since the question asks for the cost in cents, I need to convert dollars to cents. There are 100 cents in a dollar.

So, p = 0.29 * 100 = 29 cents

Step 7. To verify, I can plug the value of p back into one of the original equations to find e.

Using equation (1):

3*(0.29) + e = 1.24

0.87 + e = 1.24

e = 1.24 - 0.87

e = 0.37 dollars or 37 cents

Step 8. Check with equation (2):

5*(0.