# Task Selection: An Example

This is an example of how to perform task selection using MTEB when creating a benchmark. The goal here is to subsample a potentially large number of tasks down to the one with the most information. We do this as a feature selection approach, where we remove a task if its performance if predictable by the performance of other tasks. See the paper for more information.

For this example we will be using Danish (dan) as it has relatively few tasks, but there is not reason to limit to only Danish tasks.

In [1]:
import mteb

  from .autonotebook import tqdm as notebook_tqdm


## Loading in data
We will start out by loading in the relevant data for the model and tasks of interests.

In [2]:
def get_models():
    model_names = [
        "sentence-transformers/all-MiniLM-L6-v2",
        "sentence-transformers/all-MiniLM-L12-v2",
        "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
        "sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
        "sentence-transformers/all-mpnet-base-v2",
        "sentence-transformers/LaBSE",
        "intfloat/multilingual-e5-large-instruct",
        "intfloat/e5-mistral-7b-instruct",
        "GritLM/GritLM-7B",
        "GritLM/GritLM-8x7B",
        "intfloat/multilingual-e5-small",
        "intfloat/multilingual-e5-base",
        "intfloat/multilingual-e5-large",
    ]
    models: list[mteb.ModelMeta] = [mteb.get_model_meta(name) for name in model_names]

    # get missing revisions - Assuming we are using the latest revision
    for model in models:
        if model.revision is None:
            print(f"Getting revision for {model.name}")
            encoder = model.load_model()
            model.revision = encoder.model_card_data.base_model_revision  # type: ignore

    return models

models = get_models()

danish_tasks = mteb.get_tasks(languages=["dan"]) 


Getting revision for sentence-transformers/all-MiniLM-L12-v2
Getting revision for sentence-transformers/all-mpnet-base-v2


In [3]:
# just to see what tasks we are working with
for task in danish_tasks:
    print(task.metadata.name)

BornholmBitextMining
BibleNLPBitextMining
FloresBitextMining
NTREXBitextMining
Tatoeba
AngryTweetsClassification
DanishPoliticalCommentsClassification
DKHateClassification
LccSentimentClassification
MassiveIntentClassification
MassiveScenarioClassification
NordicLangClassification
ScalaClassification
SIB200Classification
DanFeverRetrieval
TV2Nordretrieval
TwitterHjerneRetrieval
BelebeleRetrieval
WikipediaRetrievalMultilingual
MultiEURLEXMultilabelClassification
WikipediaRerankingMultilingual
SIB200ClusteringS2S
WikiClusteringP2P.v2


In [5]:
# load results from mteb/results repository
mteb_results = mteb.load_results(models=models) 

# note that this will produce a bunch of warnings, this is mainly due to the function loading ALL results, including historic mteb

Already up to date.


KeyError: 'mteb_dataset_name'

### Filtering and preprocessing result files

As multiple result files contain performance metrics on other splits that the ones specified by the task we 