New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-tuning Falcon #1588
Comments
You can utilize the Falcon special token, denoted as >>SUFFIX<< (token ID: 9), as a padding token, which has proven to be effective in my experience. Furthermore, I am eagerly awaiting the addition of a Flash Attention Monkey script by someone. |
Hi gays. I notice you're talking about padding. |
If you could please provide a gist or at least a pointer to where in the code I would make this change (the padding change not the flash attention monkey patch), I would greatly appreciate it. I have tried, and failed, (twice, and I tried hard, and I am good at this kind of thing) to modify fastchat trainer to work with falcon's padding (or rather lack thereof), and I would need further guidance to move forward. I greatly desire to use FastChat rather than other tuning solutions, because the quality I get is very high from FastChat. AWESOME new API by the way!! Much needed, and much appreciated. |
I have already written a Falcon version that includes training, inference, and conversation capabilities. Additionally, I have trained a Falcon 7B model compatible with Vicuna13B (not fully tested) using the Wizard ShareGPT dataset with my code for Falcon. I will be making a pull request later tonight. |
When I try to train Vicuna against Falcon it fails,
The big thing is the padding, since Falcon has no padding. (that is the part I can't figure out myself)
Also Falcon doesn't have flash attention, so there needs to be a new monkey script to swap flash attention for Falcon. (I could probably figure this part out, if the padding part was working)
The text was updated successfully, but these errors were encountered: