Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What format was used for the final fine-tuning of LLaMA2-7B in RAFT? #444

Closed
Leo4zhou opened this issue May 24, 2024 · 1 comment
Closed

Comments

@Leo4zhou
Copy link

In RAFT, the jsonl file format after conversion using "format.py" is as follows:
{"messages":[{"content":"When was the University of California, Berkeley established?","role":"user"},{"content":"To answer the question, we need to find the information related to the establishment date of the University of California, Berkeley in the provided context. \n\nThe relevant information is found in the line ##begin_quote##"Established March 23, 1868[1]"##end_quote##. This line directly states the date when the University of California, Berkeley was established.\n\n: The University of California, Berkeley was established on March 23, 1868.","role":"assistant"}]}

But the file format used for fine-tuning in the paper is as follows, which centrally contains question, context, instruction and CoT Answer:
image

What format was used for the final fine-tuning of LLaMA2-7B in RAFT? Does it include context or other content?

@ShishirPatil
Copy link
Owner

Hey @Leo4zhou if you want an end to end example, you can check out this tutorial https://build.microsoft.com/en-US/sessions/78e4ef86-4443-4cb6-a0b4-157a6623fe87 and the code base is at aka.ms/raft-build24. Please reopen / start a new issue if you happen to have any follow-ups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants