训练后的RM模型，支持推理引擎sglang/vllm部署

**Describe the feature**
Please describe the feature requested here(请在这里描述需求)
训练后的rm模型，希望可以支持推理框架部署，这样可以把rm模型抽离出来，训练grpo/ppo时，采用reward_url来指定rm的服务
**Paste any useful information**
Paste any useful information, including papers, github links, etc.(请在这里描述其他有用的信息，比如相关的论文地址，github链接等)

**Additional context**
Add any other context or information here(其他信息可以写在这里)