Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My results in open-domain QA are much lower using the given checkpoint for CEPE-LLaMA-2-7B. Could you provide some insights into the potential causes for this decline? #1

Closed
sunnynexus opened this issue Mar 5, 2024 · 3 comments

Comments

@sunnynexus
Copy link

I'm curious about the discrepancies between my results (in red font) and the results presented in your paper (in black font), both obtained using the default parameters with the run_qa.sh script.

image

Could there be any potential errors on my end that could explain these differences?

@howard-yen
Copy link
Collaborator

Hi, thanks for your interest in our work.
For CEPE at k = 10, we only use and put all the passages in the decoder model, which should match the results for LLaMA-2. There might have been a mistake in the config file, which I will look into.
Are you also using the QA files from the google drive?

@sunnynexus
Copy link
Author

Hi, thanks for your interest in our work. For CEPE at k = 10, we only use and put all the passages in the decoder model, which should match the results for LLaMA-2. There might have been a mistake in the config file, which I will look into. Are you also using the QA files from the google drive?

Thank you for your reply. Yes, I used the QA files from the google drive.

@sunnynexus
Copy link
Author

I have tried running it multiple times, but the results are still not superior to the basic llama-2-7b model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants