Fine-tuning Falcon #1588

ehartford · 2023-06-02T22:12:09Z

When I try to train Vicuna against Falcon it fails,

The big thing is the padding, since Falcon has no padding. (that is the part I can't figure out myself)

Also Falcon doesn't have flash attention, so there needs to be a new monkey script to swap flash attention for Falcon. (I could probably figure this part out, if the padding part was working)

ericzhou571 · 2023-06-08T06:57:13Z

You can utilize the Falcon special token, denoted as >>SUFFIX<< (token ID: 9), as a padding token, which has proven to be effective in my experience.

Furthermore, I am eagerly awaiting the addition of a Flash Attention Monkey script by someone.

jercas · 2023-06-09T06:46:53Z

Hi gays. I notice you're talking about padding.
I ran into a problem while doing vicuna batch infer. Tensor padding is required when batch infer is performed.
But no matter what token I use to pad, the generating effect will be very bad.
I had tried bos_token, "", "[pad]" as a pad_token.

ehartford · 2023-06-09T16:50:09Z

You can utilize the Falcon special token, denoted as >>SUFFIX<< (token ID: 9), as a padding token, which has proven to be effective in my experience.

Furthermore, I am eagerly awaiting the addition of a Flash Attention Monkey script by someone.

If you could please provide a gist or at least a pointer to where in the code I would make this change (the padding change not the flash attention monkey patch), I would greatly appreciate it.

I have tried, and failed, (twice, and I tried hard, and I am good at this kind of thing) to modify fastchat trainer to work with falcon's padding (or rather lack thereof), and I would need further guidance to move forward. I greatly desire to use FastChat rather than other tuning solutions, because the quality I get is very high from FastChat.

AWESOME new API by the way!! Much needed, and much appreciated.

ericzhou571 · 2023-06-14T08:03:02Z

I have already written a Falcon version that includes training, inference, and conversation capabilities. Additionally, I have trained a Falcon 7B model compatible with Vicuna13B (not fully tested) using the Wizard ShareGPT dataset with my code for Falcon. I will be making a pull request later tonight.

merrymercy changed the title ~~Falcon~~ Fine-tuning Falcon Jun 10, 2023

merrymercy mentioned this issue Jun 14, 2023

SFT Falcon-40b #1687

Closed

ericzhou571 mentioned this issue Jun 15, 2023

Inference Support For Falcon #1696

Merged

2 tasks

aerdem4 mentioned this issue Jul 5, 2023

[BUG] Inconsistency between Validation answers and Chat answers h2oai/h2o-llmstudio#240

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning Falcon #1588

Fine-tuning Falcon #1588

ehartford commented Jun 2, 2023 •

edited

ericzhou571 commented Jun 8, 2023

jercas commented Jun 9, 2023

ehartford commented Jun 9, 2023 •

edited

ericzhou571 commented Jun 14, 2023 •

edited

Fine-tuning Falcon #1588

Fine-tuning Falcon #1588

Comments

ehartford commented Jun 2, 2023 • edited

ericzhou571 commented Jun 8, 2023

jercas commented Jun 9, 2023

ehartford commented Jun 9, 2023 • edited

ericzhou571 commented Jun 14, 2023 • edited

ehartford commented Jun 2, 2023 •

edited

ehartford commented Jun 9, 2023 •

edited

ericzhou571 commented Jun 14, 2023 •

edited