Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions docs/source/Instruction/GRPO.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,20 +219,20 @@ A conversation between User and Assistant. The user asks a question, and the Ass
- vllm_limit_mm_per_prompt: vllm透传参数,默认为None.
- vllm_enable_prefix_caching: vllm透传参数,默认为True.
- sleep_level: 训练时释放 vLLM 显存,可选项为[0, 1], 默认为0,不释放.
- move_model_batches: 在模型向vLLM等快速推理框架移动参数时,将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分,否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个。
- offload_optimizer: 是否在vLLM推理时offload optimizer参数,默认为False。
- offload_model: 是否在vLLM推理时offload 模型本身,默认为False。
- gc_collect_after_offload: 是否在offload结束时进行gc(python gc和GPU gc),默认为False。
- completion_length_limit_scope: 在多轮对话中,`max_completion_length` 的限制范围。
`total`限制所有对话轮次的总输出长度不超过`max_completion_length`, `per_round`限制每一轮的输出长度。
默认为`per_round`, 当前仅对 colocate mode 生效。
- num_iterations: 每个批次代更新次数,默认为1。
- epsilon: clip 系数,默认为0.2。
- epsilon_high: upper clip 系数,默认为None,设置后与epsilon共同构成[epsilon, epsilon_high]裁剪范围。
- sync_ref_model: 是否定期同步ref_model,默认为False。
- ref_model_mixup_alpha: 控制在更新过程中model和先前ref_model之间的混合。更新公式为 $π_{ref} = α * π_θ + (1 - α) * π_{ref_{prev}}$。默认为0.6。
- ref_model_sync_steps:同步频率,默认为512。
- move_model_batches: 在模型向vLLM等快速推理框架移动参数时,将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分,否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个。
- offload_optimizer: 是否在vLLM推理时offload optimizer参数,默认为False。
- offload_model: 是否在vLLM推理时offload 模型本身,默认为False。
- gc_collect_after_offload: 是否在offload结束时进行gc(python gc和GPU gc),默认为False。
- multi_turn_func: 多轮GRPO参数, 传入对应的plugin名称, 同时在plugin/multi_turn.py中添加好对应的实现。
- completion_length_limit_scope: 在多轮对话中,`max_completion_length` 的限制范围。
`total`限制所有对话轮次的总输出长度不超过`max_completion_length`, `per_round`限制每一轮的输出长度。
默认为`per_round`, 当前仅对 colocate mode 生效。
- dynamic_sample:筛除group内奖励标准差为0的数据,额外采样新数据,默认为False。
- max_resample_times:dynamic_sample设置下限制重采样次数,默认3次。
- overlong_filter:跳过超长截断的样本,不参与loss计算,默认为False。
Expand Down
13 changes: 6 additions & 7 deletions docs/source/Instruction/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -439,20 +439,19 @@ reward模型参数将在PPO、GRPO中使用。
- vllm_limit_mm_per_prompt: vllm透传参数,默认为None。
- vllm_enable_prefix_caching: vllm透传参数,默认为True。
- sleep_level: 训练时释放 vLLM 显存,可选项为[0, 1], 默认为0,不释放
- move_model_batches: 在模型向vLLM等快速推理框架移动参数时,将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分,否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个。
- offload_optimizer: 是否在vLLM推理时offload optimizer参数,默认为False。
- offload_model: 是否在vLLM推理时offload 模型本身,默认为False。
- gc_collect_after_offload: 是否在offload结束时进行gc(python gc和GPU gc),默认为False。
- completion_length_limit_scope: 在多轮对话中,`max_completion_length` 的限制范围。
`total`限制所有对话轮次的总输出长度不超过`max_completion_length`, `per_round`限制每一轮的输出长度。
- num_iterations: 每个批次代更新次数,默认为1。
- epsilon: clip 系数,默认为0.2。
- epsilon_high: upper clip 系数,默认为None,设置后与epsilon共同构成[epsilon, epsilon_high]裁剪范围。
- sync_ref_model: 是否定期同步ref_model,默认为False。
- ref_model_mixup_alpha: 控制在更新过程中model和先前ref_model之间的混合。更新公式为 $π_{ref} = α * π_θ + (1 - α) * π_{ref_{prev}}$。默认为0.6。
- ref_model_sync_steps:同步频率,默认为512。
- move_model_batches: 在模型向vLLM/LMDeploy等快速推理框架移动参数时,将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分,否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个。
- offload_optimizer: 是否在vLLM/LMDeploy推理时offload optimizer参数,默认为False。
- offload_model: 是否在vLLM/LMDeploy推理时offload 模型本身,默认为False。
- gc_collect_after_offload: 是否在offload结束时进行gc(python gc和GPU gc),默认为False。
- multi_turn_func: 多轮GRPO参数, 传入对应的plugin名称, 同时在plugin/multi_turn.py中添加好对应的实现。
- completion_length_limit_scope: 在多轮对话中,`max_completion_length` 的限制范围。
`total`限制所有对话轮次的总输出长度不超过`max_completion_length`, `per_round`限制每一轮的输出长度。
默认为`per_round`, 当前仅对 colocate mode 生效。
- dynamic_sample:筛除group内奖励标准差为0的数据,额外采样新数据,默认为False。
- max_resample_times:dynamic_sample设置下限制重采样次数,默认3次。
- overlong_filter:跳过超长截断的样本,不参与loss计算,默认为False。
Expand Down
16 changes: 8 additions & 8 deletions docs/source_en/Instruction/Command-line-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -451,6 +451,14 @@ The meanings of the following parameters can be referenced [here](https://huggin
- vllm_limit_mm_per_prompt: vLLM passthrough parameter, default is None.
- vllm_tensor_parallel_size: the tensor parallel size of vLLM engine, default is 1.
- sleep_level: make vllm sleep when model is training. Options are 0 or 1, default is 0, no sleep
- move_model_batches: When moving model parameters to fast inference frameworks such as vLLM/LMDeploy, determines how many batches to divide the layers into. The default is `None`, which means the entire model is not split. Otherwise, the model is split into `move_model_batches + 1` (non-layer parameters) + `1` (multi-modal component parameters) batches.
- offload_optimizer: Whether to offload optimizer parameters during inference with vLLM/LMDeploy. The default is `False`.
- offload_model: Whether to offload the model itself during inference with vLLM/LMDeploy. The default is `False`.
- gc_collect_after_offload: Whether to perform garbage collection (both Python GC and GPU GC) after offloading. The default is `False`.
- completion_length_limit_scope: Specifies the scope of the `max_completion_length` limit in multi-turn conversations.
When set to `total`, the total output length across all turns must not exceed `max_completion_length`.
When set to `per_round`, each individual turn's output length is limited separately.
Defaults to `per_round`. Currently only takes effect in colocate mode.
- top_k: Default is 50.
- top_p: Default is 0.9.
- repetition_penalty: Repetition penalty term. Default is 1.
Expand All @@ -460,15 +468,7 @@ The meanings of the following parameters can be referenced [here](https://huggin
- sync_ref_model: Whether to synchronize the reference model. Default is False。
- ref_model_mixup_alpha: The Parameter controls the mix between the current policy and the previous reference policy during updates. The reference policy is updated according to the equation: $π_{ref} = α * π_θ + (1 - α) * π_{ref_{prev}}$. Default is 0.6.
- ref_model_sync_steps:The parameter determines how frequently the current policy is synchronized with the reference policy. Default is 512.
- move_model_batches: When moving model parameters to fast inference frameworks such as vLLM/LMDeploy, determines how many batches to divide the layers into. The default is `None`, which means the entire model is not split. Otherwise, the model is split into `move_model_batches + 1` (non-layer parameters) + `1` (multi-modal component parameters) batches.
- offload_optimizer: Whether to offload optimizer parameters during inference with vLLM/LMDeploy. The default is `False`.
- offload_model: Whether to offload the model itself during inference with vLLM/LMDeploy. The default is `False`.
- gc_collect_after_offload: Whether to perform garbage collection (both Python GC and GPU GC) after offloading. The default is `False`.
- multi_turn_func: The multi turn GRPO plugin name. Add your multi-turn implementation in plugin/multi_turn.py.
- completion_length_limit_scope: Specifies the scope of the `max_completion_length` limit in multi-turn conversations.
When set to `total`, the total output length across all turns must not exceed `max_completion_length`.
When set to `per_round`, each individual turn's output length is limited separately.
Defaults to `per_round`. Currently only takes effect in colocate mode.
- dynamic_sample: Exclude data within the group where the reward standard deviation is 0, and additionally sample new data. Default is False.
- max_resample_times: Under the dynamic_sample setting, limit the number of resampling attempts to a maximum of 3. Default is 3 times.
- overlong_filter: Skip overlong truncated samples, which will not be included in loss calculation. Default is False.
Expand Down
16 changes: 8 additions & 8 deletions docs/source_en/Instruction/GRPO.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,21 +229,21 @@ Arguments
- vllm_limit_mm_per_prompt: vLLM passthrough parameter, default is None.
- vllm_tensor_parallel_size: the tensor parallel size of vLLM engine, default is 1.
- sleep_level: make vllm sleep when model is training. Options are 0 or 1, default is 0, no sleep
- move_model_batches: When moving model parameters to fast inference frameworks such as vLLM, determines how many batches to divide the layers into. The default is `None`, which means the entire model is not split. Otherwise, the model is split into `move_model_batches + 1` (non-layer parameters) + `1` (multi-modal component parameters) batches.
- offload_optimizer: Whether to offload optimizer parameters during inference with vLLM. The default is `False`.
- offload_model: Whether to offload the model itself during inference with vLLM. The default is `False`.
- gc_collect_after_offload: Whether to perform garbage collection (both Python GC and GPU GC) after offloading. The default is `False`.
- completion_length_limit_scope: Specifies the scope of the `max_completion_length` limit in multi-turn conversations.
When set to `total`, the total output length across all turns must not exceed `max_completion_length`.
When set to `per_round`, each individual turn's output length is limited separately.
Defaults to `per_round`. Currently only takes effect in colocate mode.
- num_iterations: number of iterations per batch. Default is 1.
- epsilon: epsilon value for clipping. Default is 0.2.
- epsilon_high: Upper clip coefficient, default is None. When set, it forms a clipping range of [epsilon, epsilon_high] together with epsilon.
- sync_ref_model: Whether to synchronize the reference model. Default is False。
- ref_model_mixup_alpha: The Parameter controls the mix between the current policy and the previous reference policy during updates. The reference policy is updated according to the equation: $π_{ref} = α * π_θ + (1 - α) * π_{ref_{prev}}$. Default is 0.6.
- ref_model_sync_steps:The parameter determines how frequently the current policy is synchronized with the reference policy. Default is 512.
- move_model_batches: When moving model parameters to fast inference frameworks such as vLLM, determines how many batches to divide the layers into. The default is `None`, which means the entire model is not split. Otherwise, the model is split into `move_model_batches + 1` (non-layer parameters) + `1` (multi-modal component parameters) batches.
- offload_optimizer: Whether to offload optimizer parameters during inference with vLLM. The default is `False`.
- offload_model: Whether to offload the model itself during inference with vLLM. The default is `False`.
- gc_collect_after_offload: Whether to perform garbage collection (both Python GC and GPU GC) after offloading. The default is `False`.
- multi_turn_func: The multi turn GRPO plugin name. Add your multi-turn implementation in plugin/multi_turn.py.
- completion_length_limit_scope: Specifies the scope of the `max_completion_length` limit in multi-turn conversations.
When set to `total`, the total output length across all turns must not exceed `max_completion_length`.
When set to `per_round`, each individual turn's output length is limited separately.
Defaults to `per_round`. Currently only takes effect in colocate mode.
- dynamic_sample: Exclude data within the group where the reward standard deviation is 0, and additionally sample new data. Default is False.
- max_resample_times: Under the dynamic_sample setting, limit the number of resampling attempts to a maximum of 3. Default is 3 times.
- overlong_filter: Skip overlong truncated samples, which will not be included in loss calculation. Default is False.
Expand Down
2 changes: 1 addition & 1 deletion requirements/framework.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,6 @@ tiktoken
tqdm
transformers>=4.33,<4.53
transformers_stream_generator
trl>=0.15,<0.19
trl>=0.15,<0.20
uvicorn
zstandard
2 changes: 1 addition & 1 deletion requirements/install_all.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# please use python=3.10, cuda12.*
# sh requirements/install_all.sh
pip install "vllm>=0.5.1" -U
pip install "vllm>=0.5.1,<0.9" -U
pip install "lmdeploy>=0.5" -U --no-deps
pip install autoawq -U --no-deps
pip install auto_gptq optimum bitsandbytes -U
Expand Down