higher accuracies for experiment two with modified prompt #4

sradc · 2023-09-24T18:16:30Z

Hi there, super interesting work.

Looks like the accuracy of gpt-4 might be higher with this prompt.

lukasberglund · 2023-09-24T18:30:48Z

Hi @sradc, thanks for pointing this out and glad you enjoyed our paper. What prompt did you use to achieve this accuracy?

sradc · 2023-09-24T18:34:07Z

Hey, the notebook is included in this MR, (and the predictions themselves). Will include below for convenience:

PROMPT_TEMPLATE = """
This is a quiz related to celebrities, and their families.
Here are some example question and answers:

Q: A parent of X is Fahimeh Rahim Nia. Who is X?
Golshifteh Farahani

Q: A parent of X is Timothy Christopher Mara. Who is X?
Kate Mara

Q: A parent of X is Samira Calle. Who is X?
Sasha Calle

Q: A parent of X is Fiona Biggar. Who is X?
Daniel Portman

Now answer (response with just the name):
Q: A parent of X is {parent}. Who is X?
""".strip()

(Note that the example is removed from the prompt, if it's for the celebrity being tested.)

Also running this on gpt3.5 currently.

Edit: also, used this for the system prompt:

You are a helpful assistant, being quizzed on celebrities. If you are not sure, you **must** guess a name.

…fied)

sradc · 2023-09-24T20:17:19Z

gtp-3.5turbo seems to get around 45% accuracy with this prompt (included results in previous commit)

…ase, not sure)

sradc · 2023-09-25T11:01:35Z

Pushed updates. Best results so far:

gpt-4 - 0.561 accuracy (not in latest commit)
gpt-3.5-turbo-0613 - 0.484 accuracy

sradc · 2023-09-25T14:54:07Z

Best results are now in the latest commit:

gpt-4: 0.565 accuracy
gpt-3.5-turbo-0613: 0.516 accuracy

(Probably going to stop now because it's expensive.)

…the parent either) and _slightly_ better caching...

…d a _bad_ timeout implementation...

lukasberglund · 2023-09-26T11:17:33Z

Thanks for pointing these out! I'm not going to merge for now, since your change doesn't really integrate with the existing codebase, but it's cool to see that there are better prompts out there.

sradc · 2023-09-26T21:46:50Z

No prob, this PR was just to share and track the work. Let me know if you might want to integrate the prompt stuff.

higher accuracy with gpt-4 and this prompt (from ~0.3 to ~0.5)

5e9ae46

ran again, with gpt3.5, which gets 45% accuracy (prompt slightly modi…

6adc5ae

…fied)

sradc changed the title ~~higher accuracy with gpt-4 and this prompt (from ~0.3 to ~0.5)~~ higher accuracy with for experiment two with this prompt (from ~0.3 to ~0.5 gpt4) Sep 24, 2023

sradc changed the title ~~higher accuracy with for experiment two with this prompt (from ~0.3 to ~0.5 gpt4)~~ higher accuracy with for experiment two with this prompt (from ~0.3 to ~0.5) Sep 24, 2023

sradc added 3 commits September 25, 2023 09:44

0.56 accuracy with latest prompt

9c1f73c

gpt4 worse with lower case

e3a1687

gpt-3.5 accuracy .48 with latest prompt (may be higher without lowerc…

18be8b3

…ase, not sure)

highest accuracies so far, with this prompt

ded1d0f

sradc changed the title ~~higher accuracy with for experiment two with this prompt (from ~0.3 to ~0.5)~~ higher accuracies for experiment two with modified prompt Sep 25, 2023

sradc added 2 commits September 25, 2023 19:31

improve filtering of prompt (don't include name of celeb if they are …

3222fbd

…the parent either) and _slightly_ better caching...

57% gpt-4, 51 gpt-3.5-turbo. stricter check for leakage in prompt, an…

7b8f92b

…d a _bad_ timeout implementation...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

higher accuracies for experiment two with modified prompt #4

higher accuracies for experiment two with modified prompt #4

sradc commented Sep 24, 2023 •

edited

lukasberglund commented Sep 24, 2023 •

edited

sradc commented Sep 24, 2023 •

edited

sradc commented Sep 24, 2023 •

edited

sradc commented Sep 25, 2023

sradc commented Sep 25, 2023 •

edited

lukasberglund commented Sep 26, 2023

sradc commented Sep 26, 2023

higher accuracies for experiment two with modified prompt #4

Are you sure you want to change the base?

higher accuracies for experiment two with modified prompt #4

Conversation

sradc commented Sep 24, 2023 • edited

lukasberglund commented Sep 24, 2023 • edited

sradc commented Sep 24, 2023 • edited

sradc commented Sep 24, 2023 • edited

sradc commented Sep 25, 2023

sradc commented Sep 25, 2023 • edited

lukasberglund commented Sep 26, 2023

sradc commented Sep 26, 2023

sradc commented Sep 24, 2023 •

edited

lukasberglund commented Sep 24, 2023 •

edited

sradc commented Sep 24, 2023 •

edited

sradc commented Sep 24, 2023 •

edited

sradc commented Sep 25, 2023 •

edited