Skip to content

训练后的RM模型,支持推理引擎sglang/vllm部署 #3610

@Xu-Chen

Description

@Xu-Chen

Describe the feature
Please describe the feature requested here(请在这里描述需求)
训练后的rm模型,希望可以支持推理框架部署,这样可以把rm模型抽离出来,训练grpo/ppo时,采用reward_url来指定rm的服务
Paste any useful information
Paste any useful information, including papers, github links, etc.(请在这里描述其他有用的信息,比如相关的论文地址,github链接等)

Additional context
Add any other context or information here(其他信息可以写在这里)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions