Skip to content

ncarmont/Tinker-RFT-Qwen-Quickstart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Tinker-RFT-Qwen-Quickstart

A minimal, self-contained implementation of Reinforcement Learning from Verifiable Rewards (RLVR) on Qwen3-4B-Instruct for grade-school math reasoning, powered by the Tinker training API.

Results

Training results

Model Accuracy (GSM8K, 250 examples)
Qwen3-4B-Instruct (base) 89.2% (223/250)
+ RLVR fine-tune (this repo) 90.0% (225/250)
Delta +0.8%

Best validation checkpoint reached 91.7% at RL iteration 8.

What this does

  1. Warm-start SFT (40 steps) — teaches the model the Final Answer: <number> output format using 768 GSM8K training examples
  2. GRPO-style RL (10 iterations) — samples 8 completions per question, computes exact-match rewards, normalizes advantages by standard deviation, and updates via PPO clipping
  3. Final eval — compares the best checkpoint against the frozen base model on 250 held-out test examples

Setup

pip install tinker datasets transformers tqdm pandas torch

You will need a Tinker API key. The script prompts for it securely via getpass — it is never written to disk.

Usage

Notebook (recommended):

Open Tinker_RFT.ipynb in Google Colab and run all cells. The notebook auto-prompts for your TINKER_API_KEY.

Script:

python Tinker_RFT.py

Key hyperparameters

Parameter Value
Base model Qwen/Qwen3-4B-Instruct-2507
LoRA rank 32
SFT steps 40
RL iterations 10
Questions/iter 12
Group size (samples/question) 8
RL learning rate 1e-5
Max new tokens (RL) 512
PPO clip range [0.9, 1.1]

Reward function

  • +1.0 for exact numeric match
  • +0.10 for clean Final Answer: <number> on the last line
  • -0.20 for missing the required format
  • Penalties for prompt-leaking artifacts (Question:, repeated Final Answer:, etc.)

Advantages are standardized per group: adv = (r - mean) / std.

Files

File Description
Tinker_RFT.ipynb Colab notebook (recommended entry point)
Tinker_RFT.py Equivalent standalone Python script
results.png Training log and final eval output

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors