added models to run #953

KennethEnevoldsen · 2024-06-18T12:38:20Z

Added models to run. It still lacks retrieval, clustering and information retrieval. It also lacks some models.

To test that thing run I have added "check_run.sh" as well as run it locally.

Additionally I have done some bugfixes: Ensure that e5-instruct can take the device argument and ensured that prompt_name is not called task_name.

@Muennighoff I think the script in this PR can be run as is (or alternatively, you might want to split the gritLM up into two seperate scripts).

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

… and encode_queries

…chmark/mteb into add-models-to-run

Muennighoff · 2024-06-18T17:23:55Z

Looks great! However, I think there is an issue where the script does not provide the revision

    mteb run \
    -m $model \
    -t MindSmallReranking SemRel24STS AJGT SummEval NusaTranslationBitextMining \
    --output_folder $results_folder \
    --co2_tracker true

which then fails due to this check:

mteb/mteb/models/__init__.py

Line 62 in 279c5e4

f"Model revision {revision} not found for model {model_name}"

Maybe we should just default to the revision we have if none is provided by the user?

bash scripts/mmteb/running_model/check_run.sh
Running model on a sample set of tasks
Running model: sentence-transformers/all-MiniLM-L6-v2
INFO:mteb.cli:Running with parameters: Namespace(model='sentence-transformers/all-MiniLM-L6-v2', task_types=None, categories=None, tasks
=['MindSmallReranking', 'SemRel24STS', 'AJGT', 'SummEval', 'NusaTranslationBitextMining'], languages=None, device=None, output_folder='r
esults', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, func=<function run at 0x7fdd72b85a20>)
Traceback (most recent call last): 
  File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
    sys.exit(main())
  File "/data/niklas/mteb/mteb/cli.py", line 362, in main
    args.func(args)
  File "/data/niklas/mteb/mteb/cli.py", line 108, in run
    model = mteb.get_model(args.model, args.model_revision, device=args.device)
  File "/data/niklas/mteb/mteb/models/__init__.py", line 37, in get_model
    meta = get_model_meta(model_name, revision)
  File "/data/niklas/mteb/mteb/models/__init__.py", line 61, in get_model_meta
    raise ValueError(
ValueError: Model revision None not found for model sentence-transformers/all-MiniLM-L6-v2

KennethEnevoldsen · 2024-06-18T21:28:18Z

Hmm, odd. I feel like I have fixed that bug a while ago. There might have been a merge conflict somewhere. Anyway, I have pushed a fix for it to this branch along with one dataset run just to make sure it works.

Muennighoff · 2024-06-18T21:43:17Z

Thanks! I think some of the model loaders are lacking the func attr? I can look into it but I guess you might already know what the problem is?

Traceback (most recent call last):
  File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
    sys.exit(main())
  File "/data/niklas/mteb/mteb/cli.py", line 362, in main
    args.func(args)
  File "/data/niklas/mteb/mteb/cli.py", line 118, in run
    eval.run(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 301, in run
    self._save_model_metadata(meta, output_path)
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 457, in _save_model_metadata
    json.dump(model_meta.to_dict(), f)
  File "/data/niklas/mteb/mteb/model_meta.py", line 80, in to_dict
    dict_repr["loader"] = loader.func.__name__ if loader is not None else None
AttributeError: 'function' object has no attribute 'func'

KennethEnevoldsen · 2024-06-19T12:46:27Z

Fixed!

loader.func.__name__should be used for partial(loader, ...), while if the loader is just a function, we should use loader.__name__.

Muennighoff · 2024-06-19T20:38:29Z

Running bash scripts/mmteb/running_model/run_baseline_models.sh right now but the utilization is poor. Can we turn it into an sbatch script that submits an array like this: https://github.com/ContextualAI/gritlm/blob/main/scripts/eval_mteb.sh

KennethEnevoldsen · 2024-06-20T08:25:12Z

I sadly don't have too much experience with slurm arrays, but I have added a Python script (create_slurm_jobs.py
) for running the models as single jobs pr. task (you will probably need to change the slurm prefix, though).

alternatively creating the array script as I see it should be possible from a simple python script around the mteb.get_tasks and model_names objects defines.

KennethEnevoldsen · 2024-06-20T08:57:41Z

@Muennighoff merging this as it contains some fixes which seems to cause issues elsewhere. We can continue the discussion either here or on slack

…chmark/mteb into add-models-to-run

Muennighoff · 2024-06-20T19:06:26Z

Where should I push the results once done @KennethEnevoldsen ? Maybe to https://huggingface.co/datasets/mteb/results?
I think it is a bit confusing that we now have lots of results in this repo, too. I'm not sure we can put all results in this repo as it will make cloning it inconvenient due to the size I think 🤔 https://huggingface.co/datasets/mteb/results is 34M & I assume that what we will run here could easily add another 60M+

KennethEnevoldsen · 2024-06-21T07:19:53Z

@orionw what are your thoughts on this?

orionw · 2024-06-21T13:48:41Z

@Muennighoff raises a good point.

I only see three options: (1) we use this repo and allow it to be larger as we get more results (~100 MB). (2) we keep our separate HF results dataset that people merge to or (3) we create a separate GitHub and add it as an optional submodule to mteb.

This is assuming we can’t make those files noticeably smaller (I don’t think we had much redundant info) as a way to fit them all in this repo.

If we do (2) we can at least separate the leaderboard calculation functionality and I can put that on GitHub actions in some GitHub repo. But users would still have to commit to the dataset.

Of these I might prefer (3) and then we can sync that repo to the HF dataset so it can be easily read from datasets but people can commit to GitHub PRs and it’s still available to checkout in mteb

KennethEnevoldsen · 2024-06-21T15:33:29Z

I think I am leaning (1) or (3) while it does lead to notably larger GitHub repo it does not influence the package size and when working with the repo you are only uploading diffs anyway. It is only the first download that will take slightly longer. (1) is simpler so I would probably recommend that. If we go for either (2) or (3) I think we should streamline the method for pushing results (both for our and others sake)

Muennighoff · 2024-06-21T23:19:50Z

Pushed them here preliminarily: https://github.com/embeddings-benchmark/results
I'd agree with @orionw on (3) --- ideally we only have one results repo I think, but if HF/datasets repos do not interface nicely with GitHub actions and we need an HF/datasets repo to make it loadable, then we need 2 I guess.

Alpaca does solution (1) (https://github.com/tatsu-lab/alpaca_eval/tree/main/results) but they have a few orders of magnitude less data. But if you feel strongly about it, I guess it is fine too 🤔

KennethEnevoldsen added 4 commits June 18, 2024 10:59

added models to run

cf70775

fix: ensure prompt name is passed correctly in cases of encode_corpus…

4cbe656

… and encode_queries

allow kwargs to be passed to e5 models

c85cea3

added scripts for running models

2b069ea

KennethEnevoldsen requested a review from Muennighoff June 18, 2024 12:40

KennethEnevoldsen added 3 commits June 18, 2024 14:43

Merge branch 'main' into add-models-to-run

d2a2c7c

format

7360366

Merge branch 'add-models-to-run' of https://github.com/embeddings-ben…

b7e0e99

…chmark/mteb into add-models-to-run

KennethEnevoldsen added 2 commits June 18, 2024 23:26

fix: Allow for arbitrary None as a model revision

4c49579

added test case

44a22b2

fix: loader naming

e1c208c

KennethEnevoldsen mentioned this pull request Jun 19, 2024

fix: Add E5 model test case in test CLI #958

Merged

2 tasks

Added create_slurm_jobs.py

f742c8e

KennethEnevoldsen added 2 commits June 20, 2024 10:56

Added missing load functions

d9d49bc

Merge branch 'main' into add-models-to-run

a50c836

KennethEnevoldsen enabled auto-merge (squash) June 20, 2024 08:57

KennethEnevoldsen added 2 commits June 20, 2024 10:58

format

63611df

Merge branch 'add-models-to-run' of https://github.com/embeddings-ben…

feefe15

…chmark/mteb into add-models-to-run

KennethEnevoldsen mentioned this pull request Jun 20, 2024

Some retrieval datasets don't calculate metadata correctly #964

Open

KennethEnevoldsen merged commit 9b5891d into main Jun 20, 2024
7 checks passed

KennethEnevoldsen deleted the add-models-to-run branch June 20, 2024 09:05

isaac-chung mentioned this pull request Jun 20, 2024

fix: bug in calculate_metadata_metrics for some retrieval datasets #965

Merged

2 tasks

Muennighoff mentioned this pull request Jun 20, 2024

BibleNLPBitextMining file missing #968

Closed

Muennighoff mentioned this pull request Jun 21, 2024

fix: ensure that results from parallel datasets are formatted correctly #974

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added models to run #953

added models to run #953

KennethEnevoldsen commented Jun 18, 2024 •

edited

Loading

Muennighoff commented Jun 18, 2024

KennethEnevoldsen commented Jun 18, 2024

Muennighoff commented Jun 18, 2024

KennethEnevoldsen commented Jun 19, 2024

Muennighoff commented Jun 19, 2024

KennethEnevoldsen commented Jun 20, 2024

KennethEnevoldsen commented Jun 20, 2024

Muennighoff commented Jun 20, 2024

KennethEnevoldsen commented Jun 21, 2024

orionw commented Jun 21, 2024

KennethEnevoldsen commented Jun 21, 2024

Muennighoff commented Jun 21, 2024

added models to run #953

added models to run #953

Conversation

KennethEnevoldsen commented Jun 18, 2024 • edited Loading

Checklist

Muennighoff commented Jun 18, 2024

KennethEnevoldsen commented Jun 18, 2024

Muennighoff commented Jun 18, 2024

KennethEnevoldsen commented Jun 19, 2024

Muennighoff commented Jun 19, 2024

KennethEnevoldsen commented Jun 20, 2024

KennethEnevoldsen commented Jun 20, 2024

Muennighoff commented Jun 20, 2024

KennethEnevoldsen commented Jun 21, 2024

orionw commented Jun 21, 2024

KennethEnevoldsen commented Jun 21, 2024

Muennighoff commented Jun 21, 2024

KennethEnevoldsen commented Jun 18, 2024 •

edited

Loading