Model Request for BAAI/bge-m3 (XLMRoberta-based Multilingual Embedding Model) #6007

mofanke · 2024-03-12T06:25:08Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Supporting a multilingual embedding.
https://huggingface.co/BAAI/bge-m3

Motivation

There are some differences between multilingual embeddings and BERT

Possible Implementation

sorry, no idea. I tried , seems model arch is same as bert ,but tokenizer is XLMRobertaTokenizer , not bertTokenizer

RoggeOhta · 2024-04-23T01:56:02Z

Also request this model to be supported.

vonjackustc · 2024-05-04T03:25:40Z

Tried to support it, use BertModel & SPM tokenizer.
https://huggingface.co/vonjack/bge-m3-gguf

Tested cosine similarity between "中国" and "中华人民共和国":
bge-m3-f16: 0.9993230772798457
mxbai-embed-large-v1-f16: 0.7287733321223814

vuminhquang · 2024-05-12T12:21:30Z

I got error when using with langchain
"terminate called after throwing an instance of 'std::out_of_range'"

ciekawy · 2024-05-21T14:45:17Z

same here with llama.cpp, the full error:

libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found

ciekawy · 2024-05-21T14:54:58Z

the _bert version does not crash, but the the embeddings do not seem to have any sense...

ciekawy · 2024-05-21T15:54:33Z

also tried to follow instructions on https://github.com/PrithivirajDamodaran/blitz-embed but after converting to gguf, getting error:

llama_model_quantize: failed to quantize: key not found in model: bert.context_length

ciekawy · 2024-05-22T17:41:45Z

@vonjackustc can you share params you used with llama.cpp?

mofanke added the enhancement New feature or request label Mar 12, 2024

mofanke mentioned this issue Mar 14, 2024

Add support for BERT embedding models #5423

Merged

github-actions bot added the stale label Apr 12, 2024

github-actions bot removed the stale label Apr 24, 2024

Mimicvat mentioned this issue May 9, 2024

bge-m3 ollama/ollama#4276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Request for BAAI/bge-m3 (XLMRoberta-based Multilingual Embedding Model) #6007

Model Request for BAAI/bge-m3 (XLMRoberta-based Multilingual Embedding Model) #6007

mofanke commented Mar 12, 2024

RoggeOhta commented Apr 23, 2024

vonjackustc commented May 4, 2024 •

edited

vuminhquang commented May 12, 2024 •

edited

ciekawy commented May 21, 2024 •

edited

ciekawy commented May 21, 2024

ciekawy commented May 21, 2024

ciekawy commented May 22, 2024

Model Request for BAAI/bge-m3 (XLMRoberta-based Multilingual Embedding Model) #6007

Model Request for BAAI/bge-m3 (XLMRoberta-based Multilingual Embedding Model) #6007

Comments

mofanke commented Mar 12, 2024

Prerequisites

Feature Description

Motivation

Possible Implementation

RoggeOhta commented Apr 23, 2024

vonjackustc commented May 4, 2024 • edited

vuminhquang commented May 12, 2024 • edited

ciekawy commented May 21, 2024 • edited

ciekawy commented May 21, 2024

ciekawy commented May 21, 2024

ciekawy commented May 22, 2024

vonjackustc commented May 4, 2024 •

edited

vuminhquang commented May 12, 2024 •

edited

ciekawy commented May 21, 2024 •

edited