Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix prompt format german rag community task #171

Merged
merged 2 commits into from Apr 29, 2024

Conversation

jphme
Copy link
Contributor

@jphme jphme commented Apr 23, 2024

The prompts were ending after the answer options and accidentally missing the "Answer:" part (see e.g. here).

With this minor fix, the results are significantly different and closer to what´s expected, e.g. (ignore acc_norm), source for "old" results here:

DiscoLM 7b German (original ChatML)

New

|                         Task                         |Version| Metric |Value |   |Stderr|
|------------------------------------------------------|------:|--------|-----:|---|-----:|
|all                                                   |       |acc     |0.8737|±  |0.0094|
|                                                      |       |acc_norm|0.8737|±  |0.0094|
|community:german_rag_eval:_average:0                  |       |acc     |0.8737|±  |0.0094|
|                                                      |       |acc_norm|0.8737|±  |0.0094|
|community:german_rag_eval:choose_context_by_question:0|      0|acc     |0.7290|±  |0.0141|
|                                                      |       |acc_norm|0.7290|±  |0.0141|
|community:german_rag_eval:choose_question_by_context:0|      0|acc     |0.8490|±  |0.0113|
|                                                      |       |acc_norm|0.8490|±  |0.0113|
|community:german_rag_eval:context_question_match:0    |      0|acc     |0.9770|±  |0.0047|
|                                                      |       |acc_norm|0.9770|±  |0.0047|
|community:german_rag_eval:question_answer_match:0     |      0|acc     |0.9400|±  |0.0075|
|                                                      |       |acc_norm|0.9400|±  |0.0075|

Old

|                         Task                         |Version|Metric|Value |   |Stderr|
|------------------------------------------------------|------:|------|-----:|---|-----:|
|all                                                   |       |acc   |0.7388|±  |0.0121|
|community:german_rag_eval:_average:0                  |       |acc   |0.7388|±  |0.0121|
|community:german_rag_eval:choose_context_by_question:0|      0|acc   |0.5940|±  |0.0155|
|community:german_rag_eval:choose_question_by_context:0|      0|acc   |0.9660|±  |0.0057|
|community:german_rag_eval:context_question_match:0    |      0|acc   |0.8430|±  |0.0115|
|community:german_rag_eval:question_answer_match:0     |      0|acc   |0.5520|±  |0.0157|

Llama 3 8b Instruct

New

|                         Task                         |Version| Metric |Value |   |Stderr|
|------------------------------------------------------|------:|--------|-----:|---|-----:|
|all                                                   |       |acc     |0.8633|±  |0.0089|
|                                                      |       |acc_norm|0.8633|±  |0.0089|
|community:german_rag_eval:_average:0                  |       |acc     |0.8633|±  |0.0089|
|                                                      |       |acc_norm|0.8633|±  |0.0089|
|community:german_rag_eval:choose_context_by_question:0|      0|acc     |0.6210|±  |0.0153|
|                                                      |       |acc_norm|0.6210|±  |0.0153|
|community:german_rag_eval:choose_question_by_context:0|      0|acc     |0.9910|±  |0.0030|
|                                                      |       |acc_norm|0.9910|±  |0.0030|
|community:german_rag_eval:context_question_match:0    |      0|acc     |0.9130|±  |0.0089|
|                                                      |       |acc_norm|0.9130|±  |0.0089|
|community:german_rag_eval:question_answer_match:0     |      0|acc     |0.9280|±  |0.0082|
|                                                      |       |acc_norm|0.9280|±  |0.0082|

Old

|                         Task                         |Version|Metric|Value |   |Stderr|
|------------------------------------------------------|------:|------|-----:|---|-----:|
|all                                                   |       |acc   |0.7443|±  |0.0103|
|community:german_rag_eval:_average:0                  |       |acc   |0.7443|±  |0.0103|
|community:german_rag_eval:choose_context_by_question:0|      0|acc   |0.3230|±  |0.0148|
|community:german_rag_eval:choose_question_by_context:0|      0|acc   |0.7510|±  |0.0137|
|community:german_rag_eval:context_question_match:0    |      0|acc   |0.9810|±  |0.0043|
|community:german_rag_eval:question_answer_match:0     |      0|acc   |0.9220|±  |0.0085|

@PhilipMay

@PhilipMay
Copy link
Contributor

PhilipMay commented Apr 24, 2024

If it is common practice to add this postfix to the evaluation prompts and it is wanted by the community, I agree with the change. Will have to redo my evaluations - but ok...

I will just redo my evaluations.

PS: Should we change a version number of the tests / tasks? But I see no option to change a version number.

Here

it is hard wired. Also see #172

@clefourrier can you help please? @jphme what do you think?

@PhilipMay
Copy link
Contributor

I have thought about the subject again. I think the change is absolutely necessary. Especially for models without a template.

@clefourrier can you please merge the whole thing as soon as possible and if time allows say something about the topic of #172 ? Thank you very much.

Copy link
Member

@clefourrier clefourrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me!

@clefourrier
Copy link
Member

I'll merge it if the tests pass properly! Answering the other question in the related issue.

@PhilipMay
Copy link
Contributor

Ok. Thanks.
Tests are green. :-)

@clefourrier clefourrier merged commit af35e88 into huggingface:main Apr 29, 2024
2 checks passed
@jphme jphme deleted the fix-prompt-format-german-rag branch May 2, 2024 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants