Why does fine-tuning perform worse on the same data? #23

eubinecto · 2022-05-02T20:29:56Z

Why?

As you can see from below, my first attempt is not as good as I expected it to be

but you know that with few-shot prompt design, the performance is generally great with only a few examples (~10).

Why is this?

eubinecto · 2022-05-02T20:38:48Z

Hypothesis - this is probably because I did not include `<idiom>` & `</idiom>` tokens?

The few-shot approaches have already included those tokens. It may the case that, without them, the model does not get the gist of the task at hand.

Experiment

Let's try doing amending the few-shot prompt but without the <idiom> & </idiom> tokens in the Playground right now, and see if this change degrades the performance or not .

Results

The performance is still pretty good even with the absence of the special tokens

Analysis

If that is the case, tokens are probably not the problem. It is okay to include or exclude them in the dataset.

What should I do next?

It might be simply because you don't have enough data? Let's pump the data up to 500 examples with PIE dataset,
and see if the performance increases.

eubinecto · 2022-05-02T20:58:15Z

Hypothesis - is it because there are only 40 examples?

Experiment

Include the training set portion of PIE dataset to the batch.

I've added in 4000+ more examples

Well, let's see how it goes.

Results

Wait, what? Training that costs 20 dollars? That is a lot of money for the uncertainties I have. All I want to is just a simple experiment. Let's run an experiment with smaller models than.

Let's stick to Curie for experimentation - the price seems reasonable

eubinecto · 2022-05-02T21:34:14Z

We should wait around 3 hours until this finishes. Take some break for this now.

eubinecto · 2022-05-02T21:57:38Z

The results?

Quite disappointing.

Not a significant change. Few-shot prompting would probably be better than this.

eubinecto mentioned this issue May 2, 2022

Chronicles #4

Open

34 tasks

eubinecto changed the title ~~fine-tune danvinci with the revised data~~ Why does fine-tuning perform worse on the same data? May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does fine-tuning perform worse on the same data? #23

Why does fine-tuning perform worse on the same data? #23

eubinecto commented May 2, 2022 •

edited

eubinecto commented May 2, 2022 •

edited

eubinecto commented May 2, 2022 •

edited

eubinecto commented May 2, 2022

eubinecto commented May 2, 2022

Why does fine-tuning perform worse on the same data? #23

Why does fine-tuning perform worse on the same data? #23

Comments

eubinecto commented May 2, 2022 • edited

Why?

eubinecto commented May 2, 2022 • edited

Hypothesis - this is probably because I did not include <idiom> & </idiom> tokens?

Experiment

Results

Analysis

What should I do next?

eubinecto commented May 2, 2022 • edited

Hypothesis - is it because there are only 40 examples?

Experiment

Results

eubinecto commented May 2, 2022

eubinecto commented May 2, 2022

The results?

eubinecto commented May 2, 2022 •

edited

eubinecto commented May 2, 2022 •

edited

Hypothesis - this is probably because I did not include `<idiom>` & `</idiom>` tokens?

eubinecto commented May 2, 2022 •

edited