nanochat AquaRat: RL version tuned on Algebra multiple choice reasoning has been pushed to Prime Intellect #196
HarleyCoops
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
nanochatAquaRat
Training Language Models with Reinforcement Learning on Mathematical Reasoning
A modified version of nanochat trained with reinforcement learning on the DeepMind AQuA-RAT dataset for algebraic reasoning and multiple-choice problem solving.
Quick Start • Dataset • Modifications • Training • Results
I am trying to figure out how to use the environments framework at PI and published the AquaRat training environment.
https://app.primeintellect.ai/dashboard/environments/harleycooper/nanochataquarat
https://github.com/HarleyCoops/nanochatAquaRat.git
Beta Was this translation helpful? Give feedback.
All reactions