# SFT Ablation Experiment
- 加载模型及评估数据集
- 使用评估数据集检测模型，获得metric结果
- 使用 thinking 数据集微调模型，获得 模型-R1
- 使用评估数据集检测 模型-R1，获得量化结果
- 对比模型微调前后效果

# 1. Load datasets

In [19]:
from datasets import load_dataset, Dataset

# eval_ds_name = "openai/gsm8k" # openai original dataset
# eval_ds_name = "swulling/gsm8k_chinese" #gsm8k chinese
eval_ds_name = "HuggingFaceH4/MATH-500" #math-500

eval_ds = load_dataset(eval_ds_name)
eval_ds

DatasetDict({
    test: Dataset({
        features: ['problem', 'solution', 'answer', 'subject', 'level', 'unique_id'],
        num_rows: 500
    })
})

In [13]:
eval_ds['test'][0]

{'problem': 'Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$',
 'solution': 'We have that $r = \\sqrt{0^2 + 3^2} = 3.$  Also, if we draw the line connecting the origin and $(0,3),$ this line makes an angle of $\\frac{\\pi}{2}$ with the positive $x$-axis.\n\n[asy]\nunitsize(0.8 cm);\n\ndraw((-0.5,0)--(3.5,0));\ndraw((0,-0.5)--(0,3.5));\ndraw(arc((0,0),3,0,90),red,Arrow(6));\n\ndot((0,3), red);\nlabel("$(0,3)$", (0,3), W);\ndot((3,0), red);\n[/asy]\n\nTherefore, the polar coordinates are $\\boxed{\\left( 3, \\frac{\\pi}{2} \\right)}.$',
 'answer': '\\left( 3, \\frac{\\pi}{2} \\right)',
 'subject': 'Precalculus',
 'level': 2,
 'unique_id': 'test/precalculus/807.json'}

# 2. Load Model

In [15]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available() else "cpu"
)

# Load the model and tokenizer
model_name = "Qwen/Qwen2.5-0.5B-Instruct"

eval_model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=model_name
).to(device)

eval_model.eval()

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)
tokenizer.padding_side = 'left'

In [20]:
prefix_prompt = "Please reason step by step, and put your final answer within \\boxed{}.\n"


def process_data(sample):
    messages = [
        {"role": "user", "content": prefix_prompt + sample["problem"]},
    ]
    content = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    return {"content": content}

eval_ds = eval_ds.map(
    process_data,
    num_proc=8,
    remove_columns=['problem', 'solution', 'answer', 'subject', 'level', 'unique_id'])
eval_ds['test']

Map (num_proc=8):   0%|          | 0/500 [00:00<?, ? examples/s]

Dataset({
    features: ['content'],
    num_rows: 500
})

In [21]:
# eval_ds['train'][0]
print(eval_ds['test'][0]['content'])

<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
Please reason step by step, and put your final answer within \boxed{}.
Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\theta),$ where $r > 0$ and $0 \le \theta < 2 \pi.$<|im_end|>
<|im_start|>assistant



# 3. Inference with `model.generate`

In [12]:
from tqdm.auto import tqdm
from torch.utils.data import DataLoader
from typing import List

In [14]:
def batch_inference(
        dataset:Dataset, 
        model:AutoModelForCausalLM,
        tokenizer:AutoTokenizer,
        col:str,
        max_length:int=1024,
        batch_size:int=8,
        max_new_tokens:int=512) -> List:
    data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=False, num_workers=8)

    results = []

    for batch in tqdm(data_loader, total=len(data_loader), desc='Bacth inferencing..'):
        texts = batch[col]

        inputs = tokenizer(
            texts,
            padding=True,
            truncation=True,
            max_length=max_length,
            return_tensors="pt"
        ).to(model.device)

        # 生成文本
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=max_new_tokens
            )

        # 截断输入部分并解码
        generated_ids = outputs[:, inputs.input_ids.shape[1]:].detach().cpu()
        batch_outputs = tokenizer.batch_decode(
            generated_ids,
            skip_special_tokens=True,
            clean_up_tokenization_spaces=True
        )
        results.extend(batch_outputs)
    return results

In [None]:
responses = batch_inference(
    dataset=eval_ds['test'],
    model=eval_model,
    tokenizer=tokenizer,
    col='content',
    batch_size=32,
    max_new_tokens=512)


# Metrics

In [22]:
import pandas as pd

In [31]:
data_path = '../../data/math500_eval.jsonl'
df = pd.read_json(data_path,)
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   problem      500 non-null    object
 1   solution     500 non-null    object
 2   answer       500 non-null    object
 3   subject      500 non-null    object
 4   level        500 non-null    int64 
 5   unique_id    500 non-null    object
 6   content      500 non-null    object
 7   qwen_answer  500 non-null    object
 8   only_answer  500 non-null    object
dtypes: int64(1), object(8)
memory usage: 35.3+ KB


Unnamed: 0,problem,solution,answer,subject,level,unique_id,content,qwen_answer,only_answer
0,"Convert the point $(0,3)$ in rectangular coord...",We have that $r = \sqrt{0^2 + 3^2} = 3.$ Also...,"\left( 3, \frac{\pi}{2} \right)",Precalculus,2,test/precalculus/807.json,"<|im_start|>system\nYou are Qwen, created by A...","To convert the point \((0,3)\) from rectangula...",[]
1,Define\n\[p = \sum_{k = 1}^\infty \frac{1}{k^2...,We count the number of times $\frac{1}{n^3}$ a...,p - q,Intermediate Algebra,5,test/intermediate_algebra/1994.json,"<|im_start|>system\nYou are Qwen, created by A...",To find the expression for \(\sum_{j=1}^\infty...,[\sum_{m=1]
2,"If $f(x) = \frac{3x-2}{x-2}$, what is the valu...",$f(-2)+f(-1)+f(0)=\frac{3(-2)-2}{-2-2}+\frac{3...,\frac{14}{3},Algebra,3,test/algebra/2584.json,"<|im_start|>system\nYou are Qwen, created by A...",To find the value of \( f(-2) + f(-1) + f(0) \...,[\frac{14]
3,How many positive whole-number divisors does 1...,First prime factorize $196=2^2\cdot7^2$. The ...,9,Number Theory,3,test/number_theory/572.json,"<|im_start|>system\nYou are Qwen, created by A...",To determine the number of positive whole-numb...,[12]
4,The results of a cross-country team's training...,Evelyn covered more distance in less time than...,\text{Evelyn},Algebra,2,test/algebra/1349.json,"<|im_start|>system\nYou are Qwen, created by A...",To determine which student has the greatest av...,[Evelyn]


In [38]:
for idx, row in df.iterrows():
    print(row.answer)
    print("-"*100)
    print(row.qwen_answer)
    print("#"*100)

\left( 3, \frac{\pi}{2} \right)
----------------------------------------------------------------------------------------------------
To convert the point \((0,3)\) from rectangular coordinates to polar coordinates, we need to determine two things: the radial distance \( r \) (the radius from the origin) and the angle \(\theta\) (the angle in the positive direction of the x-axis).

### Step 1: Calculate the Radial Distance \( r \)
The radial distance \( r \) is the Euclidean distance from the origin to the point \((x, y)\). For the point \((0, 3)\), this distance is given by:
\[
r = \sqrt{x^2 + y^2}
\]
Substituting \( x = 0 \) and \( y = 3 \):
\[
r = \sqrt{0^2 + 3^2} = \sqrt{9} = 3
\]

### Step 2: Calculate the Angle \( \theta \)
The angle \(\theta\) is the angle that the line segment from the origin to the point makes with the positive x-axis. The tangent of \(\theta\) is given by the ratio of \( y \) to \( x \):
\[
\tan(\theta) = \frac{y}{x} = \frac{3}{0}
\]
Since \(\tan(\theta)\) app