<a href="https://colab.research.google.com/github/joo9906/AI_study/blob/main/coding_challenge(internship).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Coding Challenge: LLM Performance Optimization

### Overview
Your task is to enhance the performance of a small language model (e.g. Qwen2.5-0.5B) on mathematical reasoning tasks. You'll have the freedom to explore various optimization techniques while maintaining reproducibility and providing clear documentation of your methodology.

### Challenge Requirements
* Improve the model's performance on the GSM8K benchmark using Qwen2.5-0.5B as your foundation model
* Document your experimental process and methodology thoroughly
* Ensure your solution is fully reproducible
* Submit all necessary code, model checkpoints, and documentation

### Available Optimization Approaches
You have flexibility in your approach and can explore techniques such as:
* Fine-tuning strategy optimization
* Custom architecture modifications
* Dataset curation and synthesis
* Hyperparameter optimization
* Template and tokenizer configuration adjustments

### Technical Guidelines
* While our baseline implementation uses liger-kernel, you're welcome to explore alternative optimization methods (e.g., PEFT, spectrum)
* You can implement custom components such as:
  * Custom dataset classes
  * Specialized data collators
  * Modified training loops
* You may leverage larger models (>7B) for data synthesis or knowledge distillation

### Evaluation Criteria
* Primary metric: GSM8K benchmark performance
  * Baseline score (Qwen2.5-0.5B-Instruct): 41.6
  * Evaluation using EleutherAI's lm-evaluation-harness

Note: Even if significant score improvements aren't achieved, strong technical analysis and well-reasoned experimentation will be valued highly.

### Deliverables
Required:
* Complete notebook (ipynb or Google Colab format)
* Final model weights and tokenizer (shared via HuggingFace Hub)

Optional:
* Supplementary analysis report (PDF)
* Additional experimental results and ablation studies

In [None]:
!python -m pip install --upgrade pip -q -U
!pip install -q -U datasets
!pip install -q -U transformers
!pip install -q -U trl
!pip install -q -U bitsandbytes
!pip install -q -U accelerate
!pip install -q -U fla                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          sh-attn
!pip install -q -U liger-kernel
!pip install -q -U huggingface_hub
!pip install -q -U vllm

[31mERROR: Could not find a version that satisfies the requirement fla (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for fla[0m[31m
[0m

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# (Optional) Mount Google Drive, if you are not using Colab, please comment out the code below.
from google.colab import drive
drive.mount('/gdrive', force_remount=True)
drive.mount('/content/drive')

Mounted at /gdrive
Mounted at /content/drive


In [None]:
#(Optional) 구글 드라이브를 사용할 경우 아래의 코드를 통해 모델을 캐싱하여 시간을 절약하고 학습 데이터를 드라이브에 저장할 수 있습니다.
# If you're running Jupyter notebook in local, set your local caching directory in `cache_dir`.
import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

import os
cache_dir = "/content/drive/MyDrive/huggingface_cache"
os.makedirs(cache_dir, exist_ok=True) # Ensure the directory exists

In [None]:
model_id = "Qwen/Qwen2.5-0.5B"

local_path = model_id
local_save_path = os.path.join(cache_dir, local_path)

In [None]:
from huggingface_hub import snapshot_download
import os

def download_model_repo(repo_id, local_dir):
    # Download the whole repository to the specified local directory
    repo_path = snapshot_download(repo_id=repo_id,
                                  cache_dir=local_dir,
                                  local_dir=local_dir,
                                  local_dir_use_symlinks=False)

    print(f"Repository is saved to: {repo_path}")

def main():
    download_model_repo(model_id, local_save_path)
    print()

if __name__ == "__main__":
    main()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.


Fetching 10 files:   0%|          | 0/10 [00:00<?, ?it/s]

Repository is saved to: /content/drive/MyDrive/huggingface_cache/Qwen/Qwen2.5-0.5B



In [None]:
from datasets import load_dataset

ds = load_dataset("AI-MO/NuminaMath-CoT", split="train")
ds[0]

In [None]:
# randomly sample 20000 examples
sampled_ds = ds.shuffle(seed=42).select(range(20000))

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model = AutoModelForCausalLM.from_pretrained(
    local_save_path,
    device_map='auto',
    torch_dtype=torch.float16,
    cache_dir=cache_dir)

model.config.use_cache = False
model.config.pretraining_tp = 1

tokenizer = AutoTokenizer.from_pretrained(model_id,
                                          padding_side='left',
                                          truncation_side='left')

In [None]:
chat_template = "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Alli, created by Allganize. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n"

tokenizer.chat_template = chat_template
tokenizer.eos_token = "<|im_end|>"
tokenizer.eos_token_id = 151645

model.config.eos_token_id = tokenizer.eos_token_id
model.generation_config.eos_token_id = tokenizer.eos_token_id

In [None]:
def format_prompt_func(sample):
  sample['text'] = tokenizer.apply_chat_template(sample['messages'], tokenize = False, add_generation_prompt = False)
  return sample['text']

In [None]:
# sampled_ds = sampled_ds.map(format_prompt_func, num_proc=os.cpu_count())
sampled_ds = sampled_ds.train_test_split(test_size=0.1, seed=42)

In [None]:
from transformers import TrainingArguments, Trainer
from trl import SFTTrainer, SFTConfig

training_arguments = SFTConfig(
    dataset_text_field='text',
    output_dir=os.path.join(cache_dir, "results"),
    num_train_epochs=1,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    save_strategy='epoch',
    eval_steps=0.1,
    logging_steps=10,
    learning_rate=5e-6,
    weight_decay=0.01,
    max_seq_length=2048,
    max_grad_norm=1,
    max_steps=-1,
    warmup_ratio=0.05,
    packing=True,
    lr_scheduler_type="cosine",
    use_liger=True,
)

trainer = SFTTrainer(
    model=model,
    train_dataset=sampled_ds['train'],
    eval_dataset=sampled_ds['test'],
    formatting_func=format_prompt_func,
    args=training_arguments,
)

Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [None]:
trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33mkuotient[0m ([33mallganize-research[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Step,Training Loss
10,3.8648
20,3.7294
30,3.2216
40,3.5323
50,3.3975
60,3.0995
70,3.1154
80,3.0739
90,2.9546
100,3.0899


TrainOutput(global_step=1189, training_loss=2.714443173099707, metrics={'train_runtime': 2815.3423, 'train_samples_per_second': 1.689, 'train_steps_per_second': 0.422, 'total_flos': 2.0916184113217536e+16, 'train_loss': 2.714443173099707, 'epoch': 1.0})

In [None]:
new_model_name = "allganize/qwen0.5b-tech-interview-test" # please specify your own repo/model id
output_dir = os.path.join(cache_dir, new_model_name)

model.config.use_cache = True
model.save_pretrained(output_dir, safe_serialization=True)
tokenizer.save_pretrained(output_dir)

('/content/drive/MyDrive/huggingface_cache/allganize/qwen0.5b-tech-interview-test/tokenizer_config.json',
 '/content/drive/MyDrive/huggingface_cache/allganize/qwen0.5b-tech-interview-test/special_tokens_map.json',
 '/content/drive/MyDrive/huggingface_cache/allganize/qwen0.5b-tech-interview-test/vocab.json',
 '/content/drive/MyDrive/huggingface_cache/allganize/qwen0.5b-tech-interview-test/merges.txt',
 '/content/drive/MyDrive/huggingface_cache/allganize/qwen0.5b-tech-interview-test/added_tokens.json',
 '/content/drive/MyDrive/huggingface_cache/allganize/qwen0.5b-tech-interview-test/tokenizer.json')

In [None]:
model.push_to_hub(repo_id=new_model_name, token=True, max_shard_size="5GB", safe_serialization=True)
tokenizer.push_to_hub(repo_id=new_model_name, token=True)

model.safetensors:   0%|          | 0.00/988M [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/allganize/qwen0.5b-tech-interview-test/commit/cf56dd41d4433c4693fbd885e5a83e88153be1fa', commit_message='Upload tokenizer', commit_description='', oid='cf56dd41d4433c4693fbd885e5a83e88153be1fa', pr_url=None, repo_url=RepoUrl('https://huggingface.co/allganize/qwen0.5b-tech-interview-test', endpoint='https://huggingface.co', repo_type='model', repo_id='allganize/qwen0.5b-tech-interview-test'), pr_revision=None, pr_num=None)

In [None]:
!git clone https://github.com/EleutherAI/lm-evaluation-harness
!cd lm-evaluation-harness && pip install -e .

fatal: destination path 'lm-evaluation-harness' already exists and is not an empty directory.
Obtaining file:///content/lm-evaluation-harness
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: lm_eval
  Building editable for lm_eval (pyproject.toml) ... [?25l[?25hdone
  Created wheel for lm_eval: filename=lm_eval-0.4.7-0.editable-py3-none-any.whl size=20337 sha256=037e05a32da16e3360f658621612eb2dc04499f4e1cdc7435fb735917ebff222
  Stored in directory: /tmp/pip-ephem-wheel-cache-ffouvyxv/wheels/84/1c/11/502a8926c958091ff989c1ae74d66aade33728f4ab83f77d87
Successfully built lm_eval
Installing collected packages: lm_eval
  Attempting uninstall: lm_eval
    Found existing installation: lm_eval 0.4.7
    Uninstalling lm_eval-0.4.7:
      Successfully

In [None]:
eval_output_path = os.path.join(cache_dir, "results", "gsm8k")
os.makedirs(eval_output_path, exist_ok=True)

# It takes about 11 minutes on a single A100 40GB GPU (about 100 minutes on a single T4 GPU)
eval_output_path = os.path.join(eval_output_path, "result-original.json")
tasks = "gsm8k"

# eval_cmd = f"""
# lm_eval --model vllm \
#     --model_args pretrained={new_model_name},trust_remote_code=True,dtype=float16 \
#     --tasks {tasks} \
#     --device cuda:0 \
#     --batch_size auto:4 \
#     --output_path {eval_output_path}
# """

eval_cmd = f"""
lm_eval --model vllm \
    --model_args pretrained={new_model_name},trust_remote_code=True,dtype=float16 \
    --tasks {tasks} \
    --device cuda:0 \
    --batch_size auto:4 \
    --output_path {eval_output_path}
"""

In [None]:
# run an evaluation command
!{eval_cmd}

2025-01-16 07:15:25.650449: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-16 07:15:25.678823: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-16 07:15:25.691715: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-16 07:15:25.859726: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-16:07:15:33,570 INFO     [__main__.py

In [None]:
# Qwen2.5-0.5B-Insruct results for the reference
#vllm (pretrained=Qwen/Qwen2.5-0.5B-Instruct,trust_remote_code=True,dtype=float16), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto:4
#|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
#|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
#|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.3442|±  |0.0131|
#|     |       |strict-match    |     5|exact_match|↑  |0.3169|±  |0.0128|