Update results for Russian models #19

artemsnegirev · 2024-08-15T14:09:48Z

This PR updates results for several models to update Russian leaderboard (embeddings-benchmark/leaderboard#26).

Models are updated:

BAAI__bge-m3
ai-forever__sbert_large_nlu_ru
intfloat__e5-mistral-7b-instruct
intfloat__multilingual-e5-large
intfloat__multilingual-e5-small
ai-forever__sbert_large_mt_nlu_ru
cointegrated__rubert-tiny2
intfloat__multilingual-e5-base
intfloat__multilingual-e5-large-instruct

Most part of update brings minor changes but now uses one version (1.14.12) and kg_co2_emissions is computed (that wasn't). Results for multilingual MassiveIntent, MassiveScenario and STS22 are not changed. And instruct models now use detailed instructions from embeddings-benchmark/mteb#1163.

Samoed · 2024-09-02T19:22:01Z

@KennethEnevoldsen @Muennighoff Can you merge this?

KennethEnevoldsen

Thanks for the ping! Looked at a few samples probably worth discussing these before merging.

...e5-mistral-7b-instruct/07163b72af1488142a360786df853f237b1a3ca1/KinopoiskClassification.json

KennethEnevoldsen · 2024-09-02T19:31:48Z

..._multilingual-e5-small/e4ce9877abf3edfe10b0d82785e83bdcb973e22e/GeoreviewClassification.json

quite a large change in accuracy here, why is that?

I was able to reproduce the results for 1.12.75, so I guess due to some changes between this versions (1.12.75 -> 1.14.12). And it's not only for me5-small and this task (Georeview) but for almost all models and Classification/Clustering tasks.

Maybe you have any hypothesis what changes could affect results?

Found that running MultiLabelClassification tasks at first cause the problem. Code to reproduce:

# mteb==1.14.12 MODEL_NAME="cointegrated/rubert-tiny2" mteb run \ -m $MODEL_NAME -l rus --output_folder results \ --co2_tracker true --verbosity 2 --batch_size 16 \ -t \ "SensitiveTopicsClassification" \ "GeoreviewClassification"

GeoreviewClassification is 0.408935546875 while single run gives 0.39638671875. Also checked Retrieval and Reranking tasks and they are ok.

So running the MultiLabelClassification first changes the result? It seems to be an issue with a seed being manipulated

Yes, I will run this tasks separately and update the results then

I will look into this in a PR as well - see if we can get it fixed.

More generally we should probably try to fix this issue here.

PR here: embeddings-benchmark/mteb#1193

My results for bge-m3 for comparison
new_results.tar.gz

artemsnegirev · 2024-09-03T13:30:42Z

@KennethEnevoldsen I've updated affected tasks. It seems Clustering also has to be run separately. Also found clustering results for sbert/rubert models in v1.12.25 gives better results and I reproduced this difference.

KennethEnevoldsen · 2024-09-04T08:34:03Z

@KennethEnevoldsen I've updated affected tasks. It seems Clustering also has to be run separately. Also found clustering results for sbert/rubert models in v1.12.25 gives better results and I reproduced this difference.

Right so order also matters here. Well, we knew it was a problem given this, but we should def. get that patched up (though it might be a major version bump)

artemsnegirev · 2024-09-06T13:38:08Z

@KennethEnevoldsen

Well, we knew it was a problem given embeddings-benchmark/mteb#942, but we should def. get that patched up (though it might be a major version bump)

We've made several experiments (@Samoed) with the bge-m3 model and could not achieve the same results. As mentioned in embeddings-benchmark/mteb#942, this is not a big problem. For now we would like to go with these results and will update them in new versions. Waiting #25 to be merged.

KennethEnevoldsen · 2024-09-06T15:25:21Z

That works for me as well

Samoed · 2024-09-06T16:04:07Z

@KennethEnevoldsen I think this PR can be merged. After that, I will update the paths in my branch (or whatever else we decide)

add miracl results

1da91a5

Samoed mentioned this pull request Aug 27, 2024

Fix leaderboard metrics and COIR tasks embeddings-benchmark/leaderboard#26

Merged

artemsnegirev added 2 commits September 2, 2024 18:19

Merge branch 'main' into add-results-for-rumteb

2ca9dc3

update results to v1.14.12

6dd92c4

artemsnegirev changed the title ~~Add MIRACL results for ruMTEB~~ Update results for Russian models Sep 2, 2024

KennethEnevoldsen reviewed Sep 2, 2024

View reviewed changes

artemsnegirev added 2 commits September 3, 2024 12:37

update cedr and georeview

c5c732b

update clustering

0d2acf4

This was referenced Sep 3, 2024

Avoid using global seeds embeddings-benchmark/mteb#942

Open

fix: Ensure seed is based on RNG State embeddings-benchmark/mteb#1193

Open

KennethEnevoldsen merged commit a3326d8 into embeddings-benchmark:main Sep 6, 2024
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update results for Russian models #19

Update results for Russian models #19

artemsnegirev commented Aug 15, 2024 •

edited

Loading

Samoed commented Sep 2, 2024

KennethEnevoldsen left a comment •

edited

Loading

KennethEnevoldsen Sep 2, 2024

artemsnegirev Sep 3, 2024

artemsnegirev Sep 3, 2024

KennethEnevoldsen Sep 3, 2024

artemsnegirev Sep 3, 2024

KennethEnevoldsen Sep 3, 2024

KennethEnevoldsen Sep 3, 2024

KennethEnevoldsen Sep 3, 2024

Samoed Sep 3, 2024

artemsnegirev commented Sep 3, 2024

KennethEnevoldsen commented Sep 4, 2024

artemsnegirev commented Sep 6, 2024

KennethEnevoldsen commented Sep 6, 2024

Samoed commented Sep 6, 2024

Update results for Russian models #19

Update results for Russian models #19

Conversation

artemsnegirev commented Aug 15, 2024 • edited Loading

Samoed commented Sep 2, 2024

KennethEnevoldsen left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artemsnegirev commented Sep 3, 2024

KennethEnevoldsen commented Sep 4, 2024

artemsnegirev commented Sep 6, 2024

KennethEnevoldsen commented Sep 6, 2024

Samoed commented Sep 6, 2024

artemsnegirev commented Aug 15, 2024 •

edited

Loading

KennethEnevoldsen left a comment •

edited

Loading