nanochat AquaRat: RL version tuned on Algebra multiple choice reasoning has been pushed to Prime Intellect #196

HarleyCoops · 2025-10-28T20:19:27Z

HarleyCoops
Oct 28, 2025

nanochatAquaRat

Training Language Models with Reinforcement Learning on Mathematical Reasoning

A modified version of nanochat trained with reinforcement learning on the DeepMind AQuA-RAT dataset for algebraic reasoning and multiple-choice problem solving.

Quick Start • Dataset • Modifications • Training • Results

I am trying to figure out how to use the environments framework at PI and published the AquaRat training environment.

https://app.primeintellect.ai/dashboard/environments/harleycooper/nanochataquarat

https://github.com/HarleyCoops/nanochatAquaRat.git

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nanochat AquaRat: RL version tuned on Algebra multiple choice reasoning has been pushed to Prime Intellect #196

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

nanochatAquaRat

Replies: 0 comments

Select a reply

Uh oh!

nanochat AquaRat: RL version tuned on Algebra multiple choice reasoning has been pushed to Prime Intellect #196

Uh oh!

Uh oh!

HarleyCoops Oct 28, 2025

nanochatAquaRat

Replies: 0 comments

HarleyCoops
Oct 28, 2025