You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Num samples: 1319
Num scores: 1319
Timeout samples: 10
Empty samples: 293
Mean score: [0.1]
Time use: 7540.60s
Time use: 125:40
It shows the result is 0.1, which has large gaps with the reported one. Would you please share the baseline script on CodeLLaMA with PAL strategy? Thanks a lot.
The text was updated successfully, but these errors were encountered:
Hello, thanks for sharing the great work!
But we tried to reproduce the codellama-13b-pal results on GSM-Hard dataset, with the following script:
The output results are as follows:
It shows the result is 0.1, which has large gaps with the reported one. Would you please share the baseline script on CodeLLaMA with PAL strategy? Thanks a lot.
The text was updated successfully, but these errors were encountered: