-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model Request for BAAI/bge-m3 (XLMRoberta-based Multilingual Embedding Model) #6007
Comments
Also request this model to be supported. |
Tried to support it, use BertModel & SPM tokenizer. Tested cosine similarity between "中国" and "中华人民共和国": |
I got error when using with langchain |
same here with llama.cpp, the full error: libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found |
the _bert version does not crash, but the the embeddings do not seem to have any sense... |
also tried to follow instructions on https://github.com/PrithivirajDamodaran/blitz-embed but after converting to gguf, getting error: llama_model_quantize: failed to quantize: key not found in model: bert.context_length |
@vonjackustc can you share params you used with llama.cpp? |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Feature Description
Supporting a multilingual embedding.
https://huggingface.co/BAAI/bge-m3
Motivation
There are some differences between multilingual embeddings and BERT
Possible Implementation
sorry, no idea. I tried , seems model arch is same as bert ,but tokenizer is XLMRobertaTokenizer , not bertTokenizer
The text was updated successfully, but these errors were encountered: