diff --git a/docs/source/Instruction/GRPO.md b/docs/source/Instruction/GRPO.md index ab4c3580bb..842ec39171 100644 --- a/docs/source/Instruction/GRPO.md +++ b/docs/source/Instruction/GRPO.md @@ -219,20 +219,20 @@ A conversation between User and Assistant. The user asks a question, and the Ass - vllm_limit_mm_per_prompt: vllm透传参数,默认为None. - vllm_enable_prefix_caching: vllm透传参数,默认为True. - sleep_level: 训练时释放 vLLM 显存,可选项为[0, 1], 默认为0,不释放. + - move_model_batches: 在模型向vLLM等快速推理框架移动参数时,将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分,否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个。 + - offload_optimizer: 是否在vLLM推理时offload optimizer参数,默认为False。 + - offload_model: 是否在vLLM推理时offload 模型本身,默认为False。 + - gc_collect_after_offload: 是否在offload结束时进行gc(python gc和GPU gc),默认为False。 + - completion_length_limit_scope: 在多轮对话中,`max_completion_length` 的限制范围。 + `total`限制所有对话轮次的总输出长度不超过`max_completion_length`, `per_round`限制每一轮的输出长度。 + 默认为`per_round`, 当前仅对 colocate mode 生效。 - num_iterations: 每个批次代更新次数,默认为1。 - epsilon: clip 系数,默认为0.2。 - epsilon_high: upper clip 系数,默认为None,设置后与epsilon共同构成[epsilon, epsilon_high]裁剪范围。 - sync_ref_model: 是否定期同步ref_model,默认为False。 - ref_model_mixup_alpha: 控制在更新过程中model和先前ref_model之间的混合。更新公式为 $π_{ref} = α * π_θ + (1 - α) * π_{ref_{prev}}$。默认为0.6。 - ref_model_sync_steps:同步频率,默认为512。 -- move_model_batches: 在模型向vLLM等快速推理框架移动参数时,将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分,否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个。 -- offload_optimizer: 是否在vLLM推理时offload optimizer参数,默认为False。 -- offload_model: 是否在vLLM推理时offload 模型本身,默认为False。 -- gc_collect_after_offload: 是否在offload结束时进行gc(python gc和GPU gc),默认为False。 - multi_turn_func: 多轮GRPO参数, 传入对应的plugin名称, 同时在plugin/multi_turn.py中添加好对应的实现。 -- completion_length_limit_scope: 在多轮对话中,`max_completion_length` 的限制范围。 -`total`限制所有对话轮次的总输出长度不超过`max_completion_length`, `per_round`限制每一轮的输出长度。 -默认为`per_round`, 当前仅对 colocate mode 生效。 - dynamic_sample:筛除group内奖励标准差为0的数据,额外采样新数据,默认为False。 - max_resample_times:dynamic_sample设置下限制重采样次数,默认3次。 - overlong_filter:跳过超长截断的样本,不参与loss计算,默认为False。 diff --git "a/docs/source/Instruction/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" "b/docs/source/Instruction/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" index d4acc13309..9df40acce8 100644 --- "a/docs/source/Instruction/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" +++ "b/docs/source/Instruction/\345\221\275\344\273\244\350\241\214\345\217\202\346\225\260.md" @@ -439,20 +439,19 @@ reward模型参数将在PPO、GRPO中使用。 - vllm_limit_mm_per_prompt: vllm透传参数,默认为None。 - vllm_enable_prefix_caching: vllm透传参数,默认为True。 - sleep_level: 训练时释放 vLLM 显存,可选项为[0, 1], 默认为0,不释放 + - move_model_batches: 在模型向vLLM等快速推理框架移动参数时,将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分,否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个。 + - offload_optimizer: 是否在vLLM推理时offload optimizer参数,默认为False。 + - offload_model: 是否在vLLM推理时offload 模型本身,默认为False。 + - gc_collect_after_offload: 是否在offload结束时进行gc(python gc和GPU gc),默认为False。 + - completion_length_limit_scope: 在多轮对话中,`max_completion_length` 的限制范围。 + `total`限制所有对话轮次的总输出长度不超过`max_completion_length`, `per_round`限制每一轮的输出长度。 - num_iterations: 每个批次代更新次数,默认为1。 - epsilon: clip 系数,默认为0.2。 - epsilon_high: upper clip 系数,默认为None,设置后与epsilon共同构成[epsilon, epsilon_high]裁剪范围。 - sync_ref_model: 是否定期同步ref_model,默认为False。 - ref_model_mixup_alpha: 控制在更新过程中model和先前ref_model之间的混合。更新公式为 $π_{ref} = α * π_θ + (1 - α) * π_{ref_{prev}}$。默认为0.6。 - ref_model_sync_steps:同步频率,默认为512。 -- move_model_batches: 在模型向vLLM/LMDeploy等快速推理框架移动参数时,将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分,否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个。 -- offload_optimizer: 是否在vLLM/LMDeploy推理时offload optimizer参数,默认为False。 -- offload_model: 是否在vLLM/LMDeploy推理时offload 模型本身,默认为False。 -- gc_collect_after_offload: 是否在offload结束时进行gc(python gc和GPU gc),默认为False。 - multi_turn_func: 多轮GRPO参数, 传入对应的plugin名称, 同时在plugin/multi_turn.py中添加好对应的实现。 -- completion_length_limit_scope: 在多轮对话中,`max_completion_length` 的限制范围。 -`total`限制所有对话轮次的总输出长度不超过`max_completion_length`, `per_round`限制每一轮的输出长度。 -默认为`per_round`, 当前仅对 colocate mode 生效。 - dynamic_sample:筛除group内奖励标准差为0的数据,额外采样新数据,默认为False。 - max_resample_times:dynamic_sample设置下限制重采样次数,默认3次。 - overlong_filter:跳过超长截断的样本,不参与loss计算,默认为False。 diff --git a/docs/source_en/Instruction/Command-line-parameters.md b/docs/source_en/Instruction/Command-line-parameters.md index 50a86b2bc8..028861b6e8 100644 --- a/docs/source_en/Instruction/Command-line-parameters.md +++ b/docs/source_en/Instruction/Command-line-parameters.md @@ -451,6 +451,14 @@ The meanings of the following parameters can be referenced [here](https://huggin - vllm_limit_mm_per_prompt: vLLM passthrough parameter, default is None. - vllm_tensor_parallel_size: the tensor parallel size of vLLM engine, default is 1. - sleep_level: make vllm sleep when model is training. Options are 0 or 1, default is 0, no sleep + - move_model_batches: When moving model parameters to fast inference frameworks such as vLLM/LMDeploy, determines how many batches to divide the layers into. The default is `None`, which means the entire model is not split. Otherwise, the model is split into `move_model_batches + 1` (non-layer parameters) + `1` (multi-modal component parameters) batches. + - offload_optimizer: Whether to offload optimizer parameters during inference with vLLM/LMDeploy. The default is `False`. + - offload_model: Whether to offload the model itself during inference with vLLM/LMDeploy. The default is `False`. + - gc_collect_after_offload: Whether to perform garbage collection (both Python GC and GPU GC) after offloading. The default is `False`. + - completion_length_limit_scope: Specifies the scope of the `max_completion_length` limit in multi-turn conversations. + When set to `total`, the total output length across all turns must not exceed `max_completion_length`. + When set to `per_round`, each individual turn's output length is limited separately. + Defaults to `per_round`. Currently only takes effect in colocate mode. - top_k: Default is 50. - top_p: Default is 0.9. - repetition_penalty: Repetition penalty term. Default is 1. @@ -460,15 +468,7 @@ The meanings of the following parameters can be referenced [here](https://huggin - sync_ref_model: Whether to synchronize the reference model. Default is False。 - ref_model_mixup_alpha: The Parameter controls the mix between the current policy and the previous reference policy during updates. The reference policy is updated according to the equation: $π_{ref} = α * π_θ + (1 - α) * π_{ref_{prev}}$. Default is 0.6. - ref_model_sync_steps:The parameter determines how frequently the current policy is synchronized with the reference policy. Default is 512. -- move_model_batches: When moving model parameters to fast inference frameworks such as vLLM/LMDeploy, determines how many batches to divide the layers into. The default is `None`, which means the entire model is not split. Otherwise, the model is split into `move_model_batches + 1` (non-layer parameters) + `1` (multi-modal component parameters) batches. -- offload_optimizer: Whether to offload optimizer parameters during inference with vLLM/LMDeploy. The default is `False`. -- offload_model: Whether to offload the model itself during inference with vLLM/LMDeploy. The default is `False`. -- gc_collect_after_offload: Whether to perform garbage collection (both Python GC and GPU GC) after offloading. The default is `False`. - multi_turn_func: The multi turn GRPO plugin name. Add your multi-turn implementation in plugin/multi_turn.py. -- completion_length_limit_scope: Specifies the scope of the `max_completion_length` limit in multi-turn conversations. -When set to `total`, the total output length across all turns must not exceed `max_completion_length`. -When set to `per_round`, each individual turn's output length is limited separately. -Defaults to `per_round`. Currently only takes effect in colocate mode. - dynamic_sample: Exclude data within the group where the reward standard deviation is 0, and additionally sample new data. Default is False. - max_resample_times: Under the dynamic_sample setting, limit the number of resampling attempts to a maximum of 3. Default is 3 times. - overlong_filter: Skip overlong truncated samples, which will not be included in loss calculation. Default is False. diff --git a/docs/source_en/Instruction/GRPO.md b/docs/source_en/Instruction/GRPO.md index a4c30ad7ee..e790c70ef8 100644 --- a/docs/source_en/Instruction/GRPO.md +++ b/docs/source_en/Instruction/GRPO.md @@ -229,21 +229,21 @@ Arguments - vllm_limit_mm_per_prompt: vLLM passthrough parameter, default is None. - vllm_tensor_parallel_size: the tensor parallel size of vLLM engine, default is 1. - sleep_level: make vllm sleep when model is training. Options are 0 or 1, default is 0, no sleep + - move_model_batches: When moving model parameters to fast inference frameworks such as vLLM, determines how many batches to divide the layers into. The default is `None`, which means the entire model is not split. Otherwise, the model is split into `move_model_batches + 1` (non-layer parameters) + `1` (multi-modal component parameters) batches. + - offload_optimizer: Whether to offload optimizer parameters during inference with vLLM. The default is `False`. + - offload_model: Whether to offload the model itself during inference with vLLM. The default is `False`. + - gc_collect_after_offload: Whether to perform garbage collection (both Python GC and GPU GC) after offloading. The default is `False`. + - completion_length_limit_scope: Specifies the scope of the `max_completion_length` limit in multi-turn conversations. + When set to `total`, the total output length across all turns must not exceed `max_completion_length`. + When set to `per_round`, each individual turn's output length is limited separately. + Defaults to `per_round`. Currently only takes effect in colocate mode. - num_iterations: number of iterations per batch. Default is 1. - epsilon: epsilon value for clipping. Default is 0.2. - epsilon_high: Upper clip coefficient, default is None. When set, it forms a clipping range of [epsilon, epsilon_high] together with epsilon. - sync_ref_model: Whether to synchronize the reference model. Default is False。 - ref_model_mixup_alpha: The Parameter controls the mix between the current policy and the previous reference policy during updates. The reference policy is updated according to the equation: $π_{ref} = α * π_θ + (1 - α) * π_{ref_{prev}}$. Default is 0.6. - ref_model_sync_steps:The parameter determines how frequently the current policy is synchronized with the reference policy. Default is 512. -- move_model_batches: When moving model parameters to fast inference frameworks such as vLLM, determines how many batches to divide the layers into. The default is `None`, which means the entire model is not split. Otherwise, the model is split into `move_model_batches + 1` (non-layer parameters) + `1` (multi-modal component parameters) batches. -- offload_optimizer: Whether to offload optimizer parameters during inference with vLLM. The default is `False`. -- offload_model: Whether to offload the model itself during inference with vLLM. The default is `False`. -- gc_collect_after_offload: Whether to perform garbage collection (both Python GC and GPU GC) after offloading. The default is `False`. - multi_turn_func: The multi turn GRPO plugin name. Add your multi-turn implementation in plugin/multi_turn.py. -- completion_length_limit_scope: Specifies the scope of the `max_completion_length` limit in multi-turn conversations. -When set to `total`, the total output length across all turns must not exceed `max_completion_length`. -When set to `per_round`, each individual turn's output length is limited separately. -Defaults to `per_round`. Currently only takes effect in colocate mode. - dynamic_sample: Exclude data within the group where the reward standard deviation is 0, and additionally sample new data. Default is False. - max_resample_times: Under the dynamic_sample setting, limit the number of resampling attempts to a maximum of 3. Default is 3 times. - overlong_filter: Skip overlong truncated samples, which will not be included in loss calculation. Default is False. diff --git a/requirements/framework.txt b/requirements/framework.txt index 8849a8f7c4..3ed95c55b2 100644 --- a/requirements/framework.txt +++ b/requirements/framework.txt @@ -33,6 +33,6 @@ tiktoken tqdm transformers>=4.33,<4.53 transformers_stream_generator -trl>=0.15,<0.19 +trl>=0.15,<0.20 uvicorn zstandard diff --git a/requirements/install_all.sh b/requirements/install_all.sh index 8b6fed4dcc..a3f32cb342 100644 --- a/requirements/install_all.sh +++ b/requirements/install_all.sh @@ -1,6 +1,6 @@ # please use python=3.10, cuda12.* # sh requirements/install_all.sh -pip install "vllm>=0.5.1" -U +pip install "vllm>=0.5.1,<0.9" -U pip install "lmdeploy>=0.5" -U --no-deps pip install autoawq -U --no-deps pip install auto_gptq optimum bitsandbytes -U