--------- ATTENTION ---------- This repo is work in progress!
The Model Score is a proposal to an open-source rating system that evaluates LLMs (large language models) based on a variety of criteria. It aims to provide accurate quantitative measures to compare different language models and assessing their capabilities. It also aims to help with legal compliance of an AI model, based on regulatory frameworks like the EU AI Act.
Contributions for improving the Model Score are welcome! Commit your ideas to the Model Score repository on Github.
Here, we are currently collecting ideas on what factors of the LLM to evaluate. After this list will be complete, each evaluation point needs to be examined in more detail.
Capability | Metric | Type |
---|---|---|
Natural Language Understanding | 0 to 100 | Score (%) |
Code Generation | 0 to 100 | Score (%) |
...and more | 0 to 100 | Score (%) |
Category | Metric | Type |
---|---|---|
Reasoning | 0 to 100 | Score (%) |
Logic | 0 to 100 | Score (%) |
Math | 0 to 100 | Score (%) |
...more | 0 to 100 | Score (%) |
Category | Metric | Type |
---|---|---|
Reasoning | 0 to 100 | Score (%) |
Logic | 0 to 100 | Score (%) |
Math | 0 to 100 | Score (%) |
...more | 0 to 100 | Score (%) |
Consideration | Metric | Type |
---|---|---|
Bias | 9 | Score (0 to 10) |
Transparency | 8 | Score (0 to 10) |
Regulation | Compliance Score (%) |
---|---|
EU AI Act | 60 |
Aspect | Rating (1-10) |
---|---|
Documentation | 7 |
Compatibility | 6 |
Aspect | Rating (1-10) |
---|---|
Language Support | 8 |
Domain Adaptation | 7 |
Specification | Rating (1-10) |
---|---|
Model Size | 6 |
Compute Efficiency | 7 |
The Model Score rating system is proposed by Localmind and open to contributions.