Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does fine-tuning perform worse on the same data? #23

Open
Tracked by #4
eubinecto opened this issue May 2, 2022 · 4 comments
Open
Tracked by #4

Why does fine-tuning perform worse on the same data? #23

eubinecto opened this issue May 2, 2022 · 4 comments

Comments

@eubinecto
Copy link
Owner

eubinecto commented May 2, 2022

Why?

As you can see from below, my first attempt is not as good as I expected it to be
image
image
image

but you know that with few-shot prompt design, the performance is generally great with only a few examples (~10).

Why is this?

@eubinecto eubinecto mentioned this issue May 2, 2022
34 tasks
@eubinecto eubinecto changed the title fine-tune danvinci with the revised data Why does fine-tuning perform worse on the same data? May 2, 2022
@eubinecto
Copy link
Owner Author

eubinecto commented May 2, 2022

Hypothesis - this is probably because I did not include <idiom> & </idiom> tokens?

The few-shot approaches have already included those tokens. It may the case that, without them, the model does not get the gist of the task at hand.

Experiment

Let's try doing amending the few-shot prompt but without the <idiom> & </idiom> tokens in the Playground right now, and see if this change degrades the performance or not .

Results

The performance is still pretty good even with the absence of the special tokens
image
image

Analysis

If that is the case, tokens are probably not the problem. It is okay to include or exclude them in the dataset.

What should I do next?

It might be simply because you don't have enough data? Let's pump the data up to 500 examples with PIE dataset,
and see if the performance increases.

@eubinecto
Copy link
Owner Author

eubinecto commented May 2, 2022

Hypothesis - is it because there are only 40 examples?

Experiment

Include the training set portion of PIE dataset to the batch.

I've added in 4000+ more examples
image

Well, let's see how it goes.

Results

image

Wait, what? Training that costs 20 dollars? That is a lot of money for the uncertainties I have. All I want to is just a simple experiment. Let's run an experiment with smaller models than.
Let's stick to Curie for experimentation - the price seems reasonable
image

@eubinecto
Copy link
Owner Author

We should wait around 3 hours until this finishes. Take some break for this now.

@eubinecto
Copy link
Owner Author

The results?

Quite disappointing.
image

Not a significant change. Few-shot prompting would probably be better than this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant