Tinker-RFT-Qwen-Quickstart

A minimal, self-contained implementation of Reinforcement Learning from Verifiable Rewards (RLVR) on Qwen3-4B-Instruct for grade-school math reasoning, powered by the Tinker training API.

Results

Model	Accuracy (GSM8K, 250 examples)
Qwen3-4B-Instruct (base)	89.2% (223/250)
+ RLVR fine-tune (this repo)	90.0% (225/250)
Delta	+0.8%

Best validation checkpoint reached 91.7% at RL iteration 8.

What this does

Warm-start SFT (40 steps) — teaches the model the Final Answer: <number> output format using 768 GSM8K training examples
GRPO-style RL (10 iterations) — samples 8 completions per question, computes exact-match rewards, normalizes advantages by standard deviation, and updates via PPO clipping
Final eval — compares the best checkpoint against the frozen base model on 250 held-out test examples

Setup

pip install tinker datasets transformers tqdm pandas torch

You will need a Tinker API key. The script prompts for it securely via getpass — it is never written to disk.

Usage

Notebook (recommended):

Open Tinker_RFT.ipynb in Google Colab and run all cells. The notebook auto-prompts for your TINKER_API_KEY.

Script:

python Tinker_RFT.py

Key hyperparameters

Parameter	Value
Base model	`Qwen/Qwen3-4B-Instruct-2507`
LoRA rank	32
SFT steps	40
RL iterations	10
Questions/iter	12
Group size (samples/question)	8
RL learning rate	1e-5
Max new tokens (RL)	512
PPO clip range	[0.9, 1.1]

Reward function

+1.0 for exact numeric match
+0.10 for clean Final Answer: <number> on the last line
-0.20 for missing the required format
Penalties for prompt-leaking artifacts (Question:, repeated Final Answer:, etc.)

Advantages are standardized per group: adv = (r - mean) / std.

Files

File	Description
`Tinker_RFT.ipynb`	Colab notebook (recommended entry point)
`Tinker_RFT.py`	Equivalent standalone Python script
`results.png`	Training log and final eval output

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
Tinker_RFT.ipynb		Tinker_RFT.ipynb
Tinker_RFT.py		Tinker_RFT.py
results.png		results.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tinker-RFT-Qwen-Quickstart

Results

What this does

Setup

Usage

Key hyperparameters

Reward function

Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tinker-RFT-Qwen-Quickstart

Results

What this does

Setup

Usage

Key hyperparameters

Reward function

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages