# Take a look at  dataset for training Qwen 0.5b on GRPO

---
이 노트북은 [Original Notebook](https://colab.research.google.com/drive/1bfhs1FMLW3FGa8ydvkOZyBNxLYOu0Hev?usp=sharing#scrollTo=Q7qTZbUcg5VD) 인 gsm8k 수학 데이터셋을 사용하여 Qwen-0.5b 를 학습시키는 내용을 SageMaker 에서 재구성한 내용 입니다.
실제로 [QRPO](https://medium.com/data-science-in-your-pocket/what-is-grpo-the-rl-algorithm-used-to-train-deepseek-12acc19798d3) 를 사용한 학습을 하기 전에, 사용된 데이터를 확인하는 노트북 입니다.

### 노트북 실험 환경
- SageMaker Studio 의 [Code Editor](https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/code-editor.html) 의 ml.g5.12xlarge 를 사용하였습니다.

---


## 1. 환경 설정
- 시작을 위해서 여기 가이드 보세요: [Setup Guide](../setup/README.md)
- 이후에 아래 셀 실행을 통해서, 필요한 패키지가 설치 되었는지를 확인 합니다.

In [1]:
! pip list | grep -E "sagemaker|vllm|trl|datasets"

datasets                           3.2.0
sagemaker                          2.232.2
sagemaker-core                     1.0.21
sagemaker-mlflow                   0.1.0
trl                                0.14.0
vllm                               0.7.2


In [2]:
%load_ext autoreload
%autoreload 2

## 2. 데이타 로딩 및 훈련을 위한 데이터 포맷팅


In [3]:
SYSTEM_PROMPT = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

In [4]:
from datasets import load_dataset, Dataset

# uncomment middle messages for 1-shot prompting

def extract_hash_answer(text: str) -> str | None:
    if "####" not in text:
        return None
    return text.split("####")[1].strip()
    
def get_gsm8k_questions(split = "train") -> Dataset:
    data = load_dataset('openai/gsm8k', 'main')[split] # type: ignore
    data = data.map(lambda x: { # type: ignore
        'prompt': [
            {'role': 'system', 'content': SYSTEM_PROMPT},
            {'role': 'user', 'content': x['question']}
        ],
        'answer': extract_hash_answer(x['answer'])
    }) # type: ignore
    return data # type: ignore

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
    dataset = get_gsm8k_questions()
    print("## dataset: ", dataset)

Using the latest cached version of the dataset since openai/gsm8k couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'main' at /home/sagemaker-user/.cache/huggingface/datasets/openai___gsm8k/main/0.0.0/e53f048856ff4f594e959d75785d2c2d37b678ee (last modified on Sun Feb  9 09:52:08 2025).


## dataset:  Dataset({
    features: ['question', 'answer', 'prompt'],
    num_rows: 7473
})


## 실제 포맷팅 된 데이터 셋 예시

In [6]:
import json
pretty_json = json.dumps(dataset[0], indent=2, ensure_ascii=False)
print(pretty_json)



{
  "question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
  "answer": "72",
  "prompt": [
    {
      "content": "\nRespond in the following format:\n<reasoning>\n...\n</reasoning>\n<answer>\n...\n</answer>\n",
      "role": "system"
    },
    {
      "content": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
      "role": "user"
    }
  ]
}


In [7]:
pretty_json = json.dumps(dataset[1], indent=2, ensure_ascii=False)
print(pretty_json)

{
  "question": "Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?",
  "answer": "10",
  "prompt": [
    {
      "content": "\nRespond in the following format:\n<reasoning>\n...\n</reasoning>\n<answer>\n...\n</answer>\n",
      "role": "system"
    },
    {
      "content": "Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?",
      "role": "user"
    }
  ]
}


In [9]:
pretty_json = json.dumps(dataset[7472], indent=2, ensure_ascii=False)
print(pretty_json)

{
  "question": "At 30, Anika is 4/3 the age of Maddie. What would be their average age in 15 years?",
  "answer": "50",
  "prompt": [
    {
      "content": "\nRespond in the following format:\n<reasoning>\n...\n</reasoning>\n<answer>\n...\n</answer>\n",
      "role": "system"
    },
    {
      "content": "At 30, Anika is 4/3 the age of Maddie. What would be their average age in 15 years?",
      "role": "user"
    }
  ]
}
