-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor Benchmark Results (Needs Addressed) #30
Comments
We're well aware of this (I was one of the core devs of |
Okay, I made the issue title less alarming since you've chimed in. Open communication about the issue and what is being done to address it would be appreciated by many. This thread/issue may be a good place to reach more technical users/devs who are keeping tabs. |
I dropped a line to the lm@stability address mentioned in the announcement to ask about if there was anything I'm doing wrong w/ benchmarks, was curious evals weren't included w/ the model card even as an alpha release (or a note that low benchmark scores were a known issue at least), but will be following w/ interest. Curious as a foundational model, what's going on w/ dialog prompt formatting? I grepped through tasks and question is used by the QA tasks, so would impact piqa, but how about hellaswag (completions) or winogrande (it's own format)? |
Not gonna lie, chatted with it and it's pretty bad. The longer context does work. I've never gone OOM on a 7b before. |
Any updates on this ? |
@jon-tow Using that prompt format for the base model will help? Perhaps you are talking about the tuned model? |
As seen in this popular spreadsheet by @lhl , StableLM-Alpha-7B currently scores below 5 year old 1GB models with 700M parameters and well below its architectural cousin GPT-J-6B which is only trained on 300B tokens.
This is a serious issue which needs to be addressed.
Edit:
@abacaj on twitter posted these 3B results:
The text was updated successfully, but these errors were encountered: