New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training configuration and hardware spec #8
Comments
Thank you for your questions.
|
Thanks for the reply. |
Hi, All the models were trained for 1 epoch using full-finetuning on For the 3b model on the same dataset, the results were closer to that of the paper, with a score of .90. Oddly, the data distribution obtained from the supervised identification strategy (Figure 6) seemed correct for the 3b model but slightly off for the 7b model. For the 7b-ParaRel, I obtained 40.4% of certain data, which is slightly lower than the 42% reported in the figure. To estimate the confidence, the paper mentions a weighted average of the "{sure, unsure}" token probability and the token probability of the answer prediction. The Can you have a guess what might be the cause? |
Hi!
Congratulation on a very interesting work and thank you for releasing the code :)
I am running some experiments and would like to reproduce some results.
I had some questions regarding the training configurations.
I assume you did full finetuning when reading the instructions. Would you confirm this?
When training the 7B model using LMFlow, I am faced with CPU OOM with a server with 220GB RAM. I believe this is abnormal and may be a problem on my side. If you recall how many CPU memory were required, can you tell me?
Which LLaMA weights did you use? If you used the ones in hugginface, can you tell me the repo id?
Thanks.
The text was updated successfully, but these errors were encountered: