-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedding model support #327
Comments
Yes, please! Any of these embedding models above I've tried LLam2 and Mistral model with the Anyway, in comparison, I've tried Xenova/gte-small with transformers and it is much faster and yields better results. |
|
Hi, is there an update on this issue? I would love to contribute |
I've been playing with https://github.com/nlpodyssey/cybertron which is pure Go (but I guess CPU only?) and at least supports I did some testing with the STS-2016 dataset and got the below accuracies compared to
So I agree with the previous comment that the embeddings generated by the completion models are pretty bad! |
That is interesting. For GPU support, I guess we will need to use: https://github.com/skeskinen/bert.cpp ? |
I found this issue: ggerganov/llama.cpp#2872 |
I also found this https://github.com/ml-explore/mlx-examples/blob/main/bert/README.md, which we can use to run inference on M1 mac. Is it possible to support |
Any update on this or plan to allow Bert Models? |
Any update on this issue? |
Do you have any updates so far? very interested to contribute |
Any updates here? |
Plans to support BERT models in llama.cpp stalled out when the dev who had assumed the task ended up focusing on something else. In the last few days it looks like the project management artifacts were updated to acknowledge this, so maybe there will be some action soon. Actually, it looks like there has been some activity. Maybe there will be working code soon: |
BERT support was merged 3 days ago into llama.cpp |
Looks like there are still kinks being worked out. |
Link to check the progress ggerganov/llama.cpp#5500 |
It is merged now |
@jmorganca just wanted to follow up and see if this topic is on your roadmap. Since llama.cpp added support for BERT models, this seems like a great low-hanging fruit, no? Initial support for BERT models has been merged with ggerganov/llama.cpp#5423 and released with b2127. Some kinks related to embedding pooling were fixed with ggerganov/llama.cpp#5500. Batch embedding is supported as well. There has been a new bug related to the tokenizer implementation but that's it as far as I can tell. |
@AndreBerzun it absolutely is – working on it! |
Add embedding models to use primarily with
/api/embeddings
instructor-xl
bge-large
all-MiniLM-L6-v2
See the full leaderboard
The text was updated successfully, but these errors were encountered: