I created an LLM post-training method called Regressive Plasticity Schedule (RPS). Preliminary results show that RPS improved Qwen3-8b's ARC-AGI performance and program synthesis reliability.
RPS is inspired by neuroscience. As humans, we learn basic skills as kids with high neuro-plasticity. We then learn advanced skills as teens and adults with low neuro-plasticity. RPS trains a model in 2 stages. In stage 1, the model is trained on easy data with high learning rate. In stage 2, the model is trained on hard data with 10% the learning rate of stage 1. RPS is basically a combination of existing ideas: curriculum learning + learning rate decay.
Training setup:
RPS and EPS Training Setup
For these experiments, I used qwen3-8b as the base model and trained it with Alibaba Model Studio managed DPO fine-tuning using LoRA. The goal was to test whether a staged “plasticity schedule” can improve ARC-style reasoning behavior, especially the model’s tendency to produce usable program-synthesis solutions.
I compared two schedules:
RPS, or Regressive Plasticity Schedule, uses higher plasticity in Stage 1 and lower plasticity in Stage 2. Operationally, I implemented this by using the same DPO/LoRA setup in both stages, but reducing the Stage 2 learning rate to 10% of the Stage 1 learning rate.
EPS, or Equal Plasticity Schedule, is the control condition. It uses the same Stage 1 model and the same Stage 2 dataset, but Stage 2 keeps the same learning rate as Stage 1.
All other settings were kept as similar as possible between RPS and EPS.
Dataset
The ARC DPO dataset had 1,000 total preference pairs.
Stage 1 used 400 DPO pairs from easier ARC-style training tasks. It contained 400 unique tasks. Most examples came from ARC-AGI-1 or ARC-AGI-1/2-overlap tasks, with a small number of ARC-AGI-2 training examples used to fill the quota.
Stage 2 used 600 DPO pairs from ARC-AGI-2 training tasks only. It contained 250 unique tasks, with some duplicate task IDs allowed to fill the 600-pair quota. No task IDs overlapped between Stage 1 and Stage 2.
Each DPO pair had:
-
A prompt containing ARC training examples and test inputs.
-
A chosen response containing reasoning, a Python
transformprogram, and predicted test outputs. -
A rejected response generated by
qwen3-0.6bunder normal solving conditions, not by asking it to be wrong.
The chosen responses came from Trelis/arc-agi-2-reasoning-5, a dataset of ARC reasoning traces and Python program-synthesis solutions. I used those traces as the preferred responses, then generated rejected responses with qwen3-0.6b under normal solving conditions. The underlying ARC tasks were restricted to official ARC-AGI training tasks: Stage 1 used mostly ARC-AGI-1 or ARC-AGI-1/2-overlap tasks, while Stage 2 used ARC-AGI-2 training tasks only.
Both chosen and rejected responses were normalized into the same labeled structure:
...
Python program:
python
def transform(grid):
...
Predicted test outputs:
...
Shared Fine-Tuning Settings
Both RPS and EPS used the same base model, data format, and fine-tuning method:
-
Base model: qwen3-8b
-
Fine-tuning method: DPO
-
Adapter method: LoRA
-
Validation split: 1%
-
Max sequence length: 32k
-
Scheduler: cosine
-
RPO alpha: 1
-
Stage 1 dataset size: 400 pairs
-
Stage 2 dataset size: 600 pairs
-
No replay
-
No ARC-AGI public evaluation data was used for training
RPS Schedule
RPS trained in two stages:
Stage 1: higher-plasticity DPO on easier ARC-style examples
Stage 2: lower-plasticity DPO on ARC-AGI-2 examples
The key plasticity change was the learning rate:
Stage 2 learning rate = 10% of Stage 1 learning rate
RPS:
Stage 1 learning rate: 1e-5
Stage 2 learning rate: 1e-6
EPS:
Stage 1 learning rate: 1e-5
Stage 2 learning rate: 1e-5
This was intended to mimic a developmental pattern: first learn broad ARC-style reasoning and program-synthesis behavior with higher plasticity, then adapt to harder ARC-AGI-2 examples with reduced plasticity.
EPS Schedule
EPS used the same two-stage dataset structure, but did not reduce plasticity in Stage 2:
Stage 1 learning rate = Stage 2 learning rate
Eval results:
On ARC-AGI-2 public evaluation, neither RPS nor EPS solved any full tasks. Both models scored 0/120 official task solves and 0/167 exact test-output matches.
However, RPS showed a large improvement in program-synthesis reliability.
Both the RPS model and the EPS model were told that program synthesis was allowed.
ARC-AGI-2 public eval results:
-
RPS official task solves: 0/120
-
EPS official task solves: 0/120
-
RPS exact test-output accuracy: 0/167
-
EPS exact test-output accuracy: 0/167
-
RPS valid output attempts: 319/334, or 95.5%
-
EPS valid output attempts: 268/334, or 80.2%
-
RPS token-limit hits: 1
-
EPS token-limit hits: 1
-
RPS API errors: 0
-
EPS API errors: 0
Program-synthesis statistics:
-
RPS attempts scored from executed programs: 214/240, or 89.2%
-
EPS attempts scored from executed programs: 176/240, or 73.3%
-
RPS program executions without error: 234/240
-
EPS program executions without error: 188/240
-
RPS parsed-JSON fallback attempts: 18
-
EPS parsed-JSON fallback attempts: 19
-
RPS invalid attempts: 8
-
EPS invalid attempts: 45
-
RPS tasks with both attempts scored from executed programs: 100/120, or 83.3%
-
EPS tasks with both attempts scored from executed programs: 60/120, or 50.0%
-
RPS tasks with at least one executed-program attempt: 114/120, or 95.0%
-
EPS tasks with at least one executed-program attempt: 116/120, or 96.7%
-
RPS tasks with zero executed-program attempts: 6/120
-
EPS tasks with zero executed-program attempts: 4/120
Paired attempt-level comparison:
-
Both RPS and EPS produced executed-program outputs: 160/240 attempts
-
RPS only produced an executed-program output: 54/240 attempts
-
EPS only produced an executed-program output: 16/240 attempts
-
Neither produced an executed-program output: 10/240 attempts
Interpretation:
RPS did not improve ARC-AGI-2 task accuracy in this run, but it substantially improved program-synthesis reliability. The clearest signal is that RPS produced usable executed-program outputs on both attempts for 100/120 tasks, compared with 60/120 for EPS. This suggests that reduced-plasticity Stage 2 training made the model more consistent at staying in the intended reasoning/program-synthesis mode, even though that behavioral improvement did not yet translate into correct ARC-AGI-2 solutions.
ARC-AGI 1 public eval
base model: Qwen3-8b
RPS:
"output_exact_accuracy": 0.0405727923627685
EPS:
"output_exact_accuracy": 0.02386634844868735
Program Synthesis Stats
Program executions without error:
RPS: 1145/1200
EPS: 870/1200
I came up with the RPS idea myself, but I used Codex to help me with the training.