SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation

This repository contains the code for reproducing the experiments in our paper "SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation".

📑 Table of Contents

Overview
Environment Setup
Reproducing Experiments
Acknowledgement
Citation

Overview

Experimental Setup:

Role	Model
Teacher	Qwen2.5-7B-Instruct
Teacher	Phi-4-mini-instruct
Student	Gemma-2-2B-IT
Student	Phi-4-mini-instruct

Evaluation Benchmarks: GSM8K, MATH-500, MBPP, LiveCodeBench-v6

Environment Setup

Our experiments are built on top of KDFlow. Please install the dependencies:

git clone https://github.com/sunjie279/SimCT-.git
cd SimCT_
pip install -e ./
pip install flash_attn==2.8.3 --no-build-isolation

Then set the following environment variables:

export MODEL_PATH="./models"         # Directory containing model weights
export DATA_PATH="./data"            # Directory for datasets
export OUTPUT_PATH="./output/ckpts"  # Directory for checkpoints

Required model weights (download from HuggingFace):

$MODEL_PATH/Qwen2.5-7B-Instruct
$MODEL_PATH/Phi-4-mini-instruct
$MODEL_PATH/gemma-2-2b-it

Reproducing Experiments

The full pipeline consists of 5 steps:

Data Preparation → Generate Teacher Responses → SFT Warmup → Distillation Training → Evaluation

Step 1: Data Preparation

We construct a 10K mixed math+code training dataset from multiple sources:

# Download raw datasets from HuggingFace
python scripts/data/download_datasets.py

# Prepare individual datasets
python scripts/data/prepare_gsm8k.py
python scripts/data/prepare_orca_math.py

# Build the 10K mixed dataset
python scripts/data/prepare_mixed_math_code.py

This produces:

$DATA_PATH/mixed_math_code_10k/ — Training prompts
$DATA_PATH/mixed_math_code_10k_with_source/ — Training prompts with source labels

Step 2: Generate Teacher Responses

Generate 8 trajectories per question for each teacher model using SGLang:

# Qwen2.5-7B-Instruct
bash scripts/sft/run_generate_responses_10k_qwen.sh

# Phi-4-mini-instruct
bash scripts/sft/run_generate_responses_10k_phi4.sh

Each script starts an SGLang server (DP=8), generates responses (temperature=0.6, top_p=0.95), and saves to $DATA_PATH/teacher_responses_10k_<model_tag>/.

Step 3: SFT Warmup Training

Before distillation, the student needs an SFT warmup to establish basic instruction-following capability.

3.1 Build SFT Dataset

Select the shortest correct response per question from teacher trajectories:

# From Qwen responses
bash scripts/sft/run_build_sft_10k_qwen.sh

# From Phi-4 responses
bash scripts/sft/run_build_sft_10k_phi4.sh

3.2 Run SFT Training

We use LLaMA-Factory for SFT:

# Gemma-2-2B-IT with Qwen teacher data
bash scripts/sft/run_gemma2_sft_warmup_10k_qwen.sh

# Gemma-2-2B-IT with Phi-4 teacher data
bash scripts/sft/run_gemma2_sft_warmup_10k_phi4.sh

# Phi-4-mini with Qwen teacher data
bash scripts/sft/run_phi4_sft_warmup_10k_qwen.sh

Step 4: Cross-Tokenizer On-Policy Distillation

Run distillation training for each teacher→student pair:

# Qwen2.5-7B → Gemma-2-2B
bash scripts/ctopd/qwen25_gemma2_span_mix10k_lr5e-7.sh

# Qwen2.5-7B → Phi-4-mini
bash scripts/ctopd/qwen25_phi4_span_mix10k_lr5e-7.sh

# Phi-4-mini → Gemma-2-2B
bash scripts/ctopd/phi4_gemma2_span_mix10k_lr5e-7.sh

Step 5: Evaluation

Evaluate on GSM8K, MATH-500, MBPP, and LiveCodeBench-v6:

# Prepare LiveCodeBench data (one-time)
python scripts/evaluation/prepare_lcb_data.py

# Run all evaluations
bash scripts/evaluation/eval_all_monitor.sh

Or evaluate a single checkpoint:

python scripts/evaluation/evaluation.py \
    --model_path $MODEL_PATH/your-checkpoint \
    --dataset gsm8k \
    --base_url http://127.0.0.1:30000 \
    --temperature 0.6 \
    --top_p 0.95 \
    --n 1 \
    --max_tokens 4096

Supported datasets: gsm8k, math500, mbpp, live-code-bench-v6

Acknowledgement

This codebase is built on top of KDFlow, a user-friendly and efficient framework for LLM knowledge distillation. We sincerely thank the KDFlow team for their excellent work.

Citation

@article{simct2026,
      title={SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation},
      author={TODO},
      year={2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docker		docker
examples		examples
figures		figures
kdflow		kdflow
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_KDFlow.md		README_KDFlow.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation

📑 Table of Contents

Overview

Environment Setup

Reproducing Experiments

Step 1: Data Preparation

Step 2: Generate Teacher Responses

Step 3: SFT Warmup Training

3.1 Build SFT Dataset

3.2 Run SFT Training

Step 4: Cross-Tokenizer On-Policy Distillation

Step 5: Evaluation

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation

📑 Table of Contents

Overview

Environment Setup

Reproducing Experiments

Step 1: Data Preparation

Step 2: Generate Teacher Responses

Step 3: SFT Warmup Training

3.1 Build SFT Dataset

3.2 Run SFT Training

Step 4: Cross-Tokenizer On-Policy Distillation

Step 5: Evaluation

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages