-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added models to run #953
added models to run #953
Conversation
Looks great! However, I think there is an issue where the script does not provide the revision mteb run \
-m $model \
-t MindSmallReranking SemRel24STS AJGT SummEval NusaTranslationBitextMining \
--output_folder $results_folder \
--co2_tracker true which then fails due to this check: Line 62 in 279c5e4
Maybe we should just default to the revision we have if none is provided by the user? bash scripts/mmteb/running_model/check_run.sh
Running model on a sample set of tasks
Running model: sentence-transformers/all-MiniLM-L6-v2
INFO:mteb.cli:Running with parameters: Namespace(model='sentence-transformers/all-MiniLM-L6-v2', task_types=None, categories=None, tasks
=['MindSmallReranking', 'SemRel24STS', 'AJGT', 'SummEval', 'NusaTranslationBitextMining'], languages=None, device=None, output_folder='r
esults', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, func=<function run at 0x7fdd72b85a20>)
Traceback (most recent call last):
File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
sys.exit(main())
File "/data/niklas/mteb/mteb/cli.py", line 362, in main
args.func(args)
File "/data/niklas/mteb/mteb/cli.py", line 108, in run
model = mteb.get_model(args.model, args.model_revision, device=args.device)
File "/data/niklas/mteb/mteb/models/__init__.py", line 37, in get_model
meta = get_model_meta(model_name, revision)
File "/data/niklas/mteb/mteb/models/__init__.py", line 61, in get_model_meta
raise ValueError(
ValueError: Model revision None not found for model sentence-transformers/all-MiniLM-L6-v2
|
Hmm, odd. I feel like I have fixed that bug a while ago. There might have been a merge conflict somewhere. Anyway, I have pushed a fix for it to this branch along with one dataset run just to make sure it works. |
Thanks! I think some of the model loaders are lacking the func attr? I can look into it but I guess you might already know what the problem is? Traceback (most recent call last):
File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
sys.exit(main())
File "/data/niklas/mteb/mteb/cli.py", line 362, in main
args.func(args)
File "/data/niklas/mteb/mteb/cli.py", line 118, in run
eval.run(
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 301, in run
self._save_model_metadata(meta, output_path)
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 457, in _save_model_metadata
json.dump(model_meta.to_dict(), f)
File "/data/niklas/mteb/mteb/model_meta.py", line 80, in to_dict
dict_repr["loader"] = loader.func.__name__ if loader is not None else None
AttributeError: 'function' object has no attribute 'func' |
Fixed!
|
Running ![]() |
I sadly don't have too much experience with slurm arrays, but I have added a Python script ( alternatively creating the array script as I see it should be possible from a simple python script around the mteb.get_tasks and model_names objects defines. |
@Muennighoff merging this as it contains some fixes which seems to cause issues elsewhere. We can continue the discussion either here or on slack |
…chmark/mteb into add-models-to-run
Where should I push the results once done @KennethEnevoldsen ? Maybe to https://huggingface.co/datasets/mteb/results? |
@orionw what are your thoughts on this? |
@Muennighoff raises a good point. I only see three options: (1) we use this repo and allow it to be larger as we get more results (~100 MB). (2) we keep our separate HF results dataset that people merge to or (3) we create a separate GitHub and add it as an optional submodule to mteb. This is assuming we can’t make those files noticeably smaller (I don’t think we had much redundant info) as a way to fit them all in this repo. If we do (2) we can at least separate the leaderboard calculation functionality and I can put that on GitHub actions in some GitHub repo. But users would still have to commit to the dataset. Of these I might prefer (3) and then we can sync that repo to the HF dataset so it can be easily read from |
I think I am leaning (1) or (3) while it does lead to notably larger GitHub repo it does not influence the package size and when working with the repo you are only uploading diffs anyway. It is only the first download that will take slightly longer. (1) is simpler so I would probably recommend that. If we go for either (2) or (3) I think we should streamline the method for pushing results (both for our and others sake) |
Pushed them here preliminarily: https://github.com/embeddings-benchmark/results Alpaca does solution (1) (https://github.com/tatsu-lab/alpaca_eval/tree/main/results) but they have a few orders of magnitude less data. But if you feel strongly about it, I guess it is fine too 🤔 |
Added models to run. It still lacks retrieval, clustering and information retrieval. It also lacks some models.
To test that thing run I have added "check_run.sh" as well as run it locally.
Additionally I have done some bugfixes: Ensure that e5-instruct can take the device argument and ensured that prompt_name is not called task_name.
@Muennighoff I think the script in this PR can be run as is (or alternatively, you might want to split the gritLM up into two seperate scripts).
Checklist
make test
.make lint
.