Skip to content

RealTapeL/GRPO_code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

复现:DeepSeek-R1核心强化学习算法GRPO

trl=0.14.0版本依赖

具体步骤

  1. 前往trl的仓库https://github.com/huggingface/trl

  2. 选择分支为"0.14-release"

  3. 将里面的trl文件夹复制到本目录下,并把目录名称改为trl_main即可

  4. 运行对应的sh文件,下载模型和数据: sh dl_model.shsh dl_dataset.sh

  5. 运行train_grpo.sh开始训练

About

复现:DeepSeek-R1核心强化学习算法GRPO

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors