-
Notifications
You must be signed in to change notification settings - Fork 903
[docs] update rejected_tools #5878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] update rejected_tools #5878
Conversation
Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the documentation related to custom dataset formats. It specifically addresses the "rejected_messages" field, providing crucial clarification for developers working with Agent scenarios by explicitly stating the requirement to include "rejected_tools" when using this flexible message structure. This update ensures that users have a clearer understanding of data formatting for advanced use cases. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the documentation for custom datasets in both Chinese and English to include information about using rejected_tools
in Agent scenarios for RLHF. The changes are accurate, but to improve clarity, I've suggested adding an example for the Agent scenario, as currently only a multimodal example is provided. This will help users better understand how to format their data.
docs/source/Customization/自定义数据集.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation now mentions rejected_tools
for Agent scenarios, which is great. However, it only provides a multimodal example. To improve clarity, it would be beneficial to add an example for an Agent scenario using rejected_tools
as well.
Also, you could change "多模态数据格式例子如下" (An example of the multimodal data format is as follows) to "数据格式例子如下" (An example of the data format is as follows).
Here is a sample DPO data format for an Agent scenario that you can use for reference:
{"tools": "[{\"type\": \"function\", \"function\": {\"name\": \"get_weather\", \"description\": \"获取天气信息\"}}]", "messages": [{"role": "user", "content": "北京的天气怎么样?"}, {"role": "tool_call", "content": "{\"name\": \"get_weather\", \"arguments\": {\"city\": \"北京\"}}"}], "rejected_messages": [{"role": "user", "content": "北京的天气怎么样?"}, {"role": "tool_call", "content": "{\"name\": \"get_stock_price\", \"arguments\": {\"company\": \"Alibaba\"}}"}], "rejected_tools": "[{\"type\": \"function\", \"function\": {\"name\": \"get_stock_price\", \"description\": \"获取股票价格\"}}]"}
当然,你也可以直接使用`rejected_messages`,而不是只提供`rejected_response`/`rejected_images`(需ms-swift>=3.8),这将提供更大的灵活度(例如多模态/agent场景)。若使用rejected_messages,在多模态场景下,你需要额外传入"rejected_images","rejected_audios","rejected_videos"等内容;在Agent场景下,你需要额外传入"rejected_tools"等内容。多模态数据格式例子如下: | |
当然,你也可以直接使用`rejected_messages`,而不是只提供`rejected_response`/`rejected_images`(需ms-swift>=3.8),这将提供更大的灵活度(例如多模态/agent场景)。若使用rejected_messages,在多模态场景下,你需要额外传入"rejected_images","rejected_audios","rejected_videos"等内容;在Agent场景下,你需要额外传入"rejected_tools"等内容。数据格式例子如下: |
> Note: RM additionally supports the margin column. For details, refer to the [RM documentation](../Instruction/RLHF.md#rm). | ||
Sure, you can also directly use `rejected_messages` instead of only providing `rejected_response` / `rejected_images` (requires ms-swift>=3.8), which offers greater flexibility (e.g., for multimodal or agent scenarios). In multimodal cases, if you use `rejected_messages`, you need to additionally provide fields such as `"rejected_images"`, `"rejected_audios"`, `"rejected_videos"`, etc. An example of the data format is as follows: | ||
Sure, you can also directly use `rejected_messages` instead of only providing `rejected_response` / `rejected_images` (requires ms-swift>=3.8), which offers greater flexibility (e.g., for multimodal or agent scenarios). If you use "rejected_messages", then in multimodal scenarios you must also provide "rejected_images", "rejected_audios", "rejected_videos", etc.; in Agent scenarios you must also provide "rejected_tools", etc. An example of the multimodal data format is as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation now mentions rejected_tools
for Agent scenarios, which is great. However, it only provides a multimodal example. To improve clarity, it would be beneficial to add an example for an Agent scenario using rejected_tools
as well.
Also, you could change "An example of the multimodal data format is as follows:" to "An example of the data format is as follows:".
Here is a sample DPO data format for an Agent scenario that you can use for reference:
{"tools": "[{\"type\": \"function\", \"function\": {\"name\": \"get_weather\", \"description\": \"Get weather information\"}}]", "messages": [{"role": "user", "content": "What's the weather in Beijing?"}, {"role": "tool_call", "content": "{\"name\": \"get_weather\", \"arguments\": {\"city\": \"Beijing\"}}"}], "rejected_messages": [{"role": "user", "content": "What's the weather in Beijing?"}, {"role": "tool_call", "content": "{\"name\": \"get_stock_price\", \"arguments\": {\"company\": \"Google\"}}"}], "rejected_tools": "[{\"type\": \"function\", \"function\": {\"name\": \"get_stock_price\", \"description\": \"Get stock price\"}}]"}
Sure, you can also directly use `rejected_messages` instead of only providing `rejected_response` / `rejected_images` (requires ms-swift>=3.8), which offers greater flexibility (e.g., for multimodal or agent scenarios). If you use "rejected_messages", then in multimodal scenarios you must also provide "rejected_images", "rejected_audios", "rejected_videos", etc.; in Agent scenarios you must also provide "rejected_tools", etc. An example of the multimodal data format is as follows: | |
Sure, you can also directly use `rejected_messages` instead of only providing `rejected_response` / `rejected_images` (requires ms-swift>=3.8), which offers greater flexibility (e.g., for multimodal or agent scenarios). If you use "rejected_messages", then in multimodal scenarios you must also provide "rejected_images", "rejected_audios", "rejected_videos", etc.; in Agent scenarios you must also provide "rejected_tools", etc. An example of the data format is as follows: |
No description provided.