Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning Falcon #1588

Open
ehartford opened this issue Jun 2, 2023 · 4 comments
Open

Fine-tuning Falcon #1588

ehartford opened this issue Jun 2, 2023 · 4 comments

Comments

@ehartford
Copy link
Contributor

ehartford commented Jun 2, 2023

When I try to train Vicuna against Falcon it fails,

The big thing is the padding, since Falcon has no padding. (that is the part I can't figure out myself)

Also Falcon doesn't have flash attention, so there needs to be a new monkey script to swap flash attention for Falcon. (I could probably figure this part out, if the padding part was working)

@ericzhou571
Copy link
Contributor

You can utilize the Falcon special token, denoted as >>SUFFIX<< (token ID: 9), as a padding token, which has proven to be effective in my experience.

Furthermore, I am eagerly awaiting the addition of a Flash Attention Monkey script by someone.

@jercas
Copy link

jercas commented Jun 9, 2023

Hi gays. I notice you're talking about padding.
I ran into a problem while doing vicuna batch infer. Tensor padding is required when batch infer is performed.
But no matter what token I use to pad, the generating effect will be very bad.
I had tried bos_token, "", "[pad]" as a pad_token.

@ehartford
Copy link
Contributor Author

ehartford commented Jun 9, 2023

You can utilize the Falcon special token, denoted as >>SUFFIX<< (token ID: 9), as a padding token, which has proven to be effective in my experience.

Furthermore, I am eagerly awaiting the addition of a Flash Attention Monkey script by someone.

If you could please provide a gist or at least a pointer to where in the code I would make this change (the padding change not the flash attention monkey patch), I would greatly appreciate it.

I have tried, and failed, (twice, and I tried hard, and I am good at this kind of thing) to modify fastchat trainer to work with falcon's padding (or rather lack thereof), and I would need further guidance to move forward. I greatly desire to use FastChat rather than other tuning solutions, because the quality I get is very high from FastChat.

AWESOME new API by the way!! Much needed, and much appreciated.

@merrymercy merrymercy changed the title Falcon Fine-tuning Falcon Jun 10, 2023
@ericzhou571
Copy link
Contributor

ericzhou571 commented Jun 14, 2023

I have already written a Falcon version that includes training, inference, and conversation capabilities. Additionally, I have trained a Falcon 7B model compatible with Vicuna13B (not fully tested) using the Wizard ShareGPT dataset with my code for Falcon. I will be making a pull request later tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants