This repository contains the code release accompanying the work: Multi-Task GRPO: Reliable LLM Reasoning Across Tasks.
verl/ Core library (adapted for this work from verl)
scripts/exp-1/ Scripts to reproduce Experiment 1
scripts/exp-2/ Scripts to reproduce Experiment 2
installation.sh Lightweight installation (assumes CUDA available)
installation_micromamba_cuda.sh Full installation (local CUDA + micromamba)
We provide two installation options.
This option creates an isolated environment, installing micromamba, a local CUDA 12.2 toolkit, and all Python dependencies.
bash installation_micromamba_cuda.shRequirements: A Linux machine with an NVIDIA GPU and driver.
If you already have a working CUDA and Python environment, you can install the dependencies directly:
bash installation.shAll reproduction scripts are located in the scripts/ directory.
Example usage:
bash scripts/exp-1/run_mt_grpo_0.2.shBy default, the provided scripts may attempt to use Weights & Biases (wandb). To run experiments without external logging, set:
export WANDB_MODE=offlineIf you prefer to log runs to your own account during the review process, set your API key and entity:
export WANDB_API_KEY=your_key_here
export WANDB_ENTITY=your_entity_here- Base Library: This codebase builds upon the open-source VeRL framework. Files modified for this submission are included in the
verl/directory. - Model Downloads: Pretrained models (e.g., Qwen2.5) are configured to download automatically from Hugging Face.