Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding model support #327

Closed
jmorganca opened this issue Aug 11, 2023 · 18 comments · Fixed by #2604
Closed

Embedding model support #327

jmorganca opened this issue Aug 11, 2023 · 18 comments · Fixed by #2604
Assignees
Labels
feature request New feature or request model request Model requests

Comments

@jmorganca
Copy link
Member

jmorganca commented Aug 11, 2023

Add embedding models to use primarily with /api/embeddings

  • instructor-xl
  • bge-large
  • all-MiniLM-L6-v2

See the full leaderboard

@jmorganca jmorganca added model request Model requests feature request New feature or request labels Aug 11, 2023
@brunnolou
Copy link

Yes, please! Any of these embedding models above text-embedding-ada-002 would be a great addition.

I've tried LLam2 and Mistral model with the /api/embeddings as is, and I'm getting poor-quality similarity scores.
Even with almost identical queries, It fails to retrieve results. Are there some prompting technics to improve the embedding quality?

Anyway, in comparison, I've tried Xenova/gte-small with transformers and it is much faster and yields better results.

@jmorganca jmorganca changed the title Embedding models BERT model support Nov 14, 2023
@corani
Copy link

corani commented Nov 17, 2023

jinaai/jina-embeddings-v2-base-en (and other variants) also look promising.

@sandangel
Copy link

Hi, is there an update on this issue? I would love to contribute

@corani
Copy link

corani commented Dec 8, 2023

I've been playing with https://github.com/nlpodyssey/cybertron which is pure Go (but I guess CPU only?) and at least supports all-MiniLM-L6-v2, e5-*-v2, bge-*-en-v1.5 and ember-v1.

I did some testing with the STS-2016 dataset and got the below accuracies compared to llama2 and mistral:instruct (Pearson correlation with the gold answers):

  • Ollama
    • llama2: 0.23431
    • mistral:instruct: 0.5656
  • Cybertron
    • all-MiniLM-L6-v2: 0.80344
    • e5-small-v2: 0.82318
    • e5-base-v2: 0.83845
    • bge-small-en-v1.5: 0.84514
    • bge-base-en-v1.5: 0.85297

So I agree with the previous comment that the embeddings generated by the completion models are pretty bad!

@sandangel
Copy link

That is interesting. For GPU support, I guess we will need to use: https://github.com/skeskinen/bert.cpp ?
I think the implementation would be something similar to llama.cpp?

@sandangel
Copy link

I found this issue: ggerganov/llama.cpp#2872
I think they plan to implement it in llama.cpp. So maybe we will just need to update the llama.cpp when it's done?

@sandangel
Copy link

I also found this https://github.com/ml-explore/mlx-examples/blob/main/bert/README.md, which we can use to run inference on M1 mac. Is it possible to support mlx with Ollama?

@CodeWithKyrian
Copy link

Any update on this or plan to allow Bert Models?

@tjohnson4
Copy link

Any update on this issue?

@ymohamed08
Copy link

Do you have any updates so far? very interested to contribute

@ill-yes
Copy link

ill-yes commented Feb 2, 2024

Any updates here?

@easp
Copy link
Contributor

easp commented Feb 3, 2024

Plans to support BERT models in llama.cpp stalled out when the dev who had assumed the task ended up focusing on something else. In the last few days it looks like the project management artifacts were updated to acknowledge this, so maybe there will be some action soon. Actually, it looks like there has been some activity. Maybe there will be working code soon:
ggerganov/llama.cpp#2872

@AndreBerzun
Copy link

BERT support was merged 3 days ago into llama.cpp

@easp
Copy link
Contributor

easp commented Feb 15, 2024

Looks like there are still kinks being worked out.

@s-kostyaev
Copy link

Looks like there are still kinks being worked out.

Link to check the progress ggerganov/llama.cpp#5500

@s-kostyaev
Copy link

Looks like there are still kinks being worked out.

Link to check the progress ggerganov/llama.cpp#5500

It is merged now

@AndreBerzun
Copy link

@jmorganca just wanted to follow up and see if this topic is on your roadmap. Since llama.cpp added support for BERT models, this seems like a great low-hanging fruit, no?

Initial support for BERT models has been merged with ggerganov/llama.cpp#5423 and released with b2127. Some kinks related to embedding pooling were fixed with ggerganov/llama.cpp#5500. Batch embedding is supported as well.

There has been a new bug related to the tokenizer implementation but that's it as far as I can tell.

@jmorganca jmorganca self-assigned this Feb 20, 2024
@jmorganca
Copy link
Member Author

@AndreBerzun it absolutely is – working on it!

@jmorganca jmorganca changed the title BERT model support Embedding model support Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request model request Model requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.