Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring back the EMBED feature in the Modelfile #834

Open
vividfog opened this issue Oct 18, 2023 · 17 comments
Open

Bring back the EMBED feature in the Modelfile #834

vividfog opened this issue Oct 18, 2023 · 17 comments
Labels

Comments

@vividfog
Copy link

I appreciate the effort keeping the codebase simple, Ollama is second to none in its elegance. But this was quick work removing the feature within a week without much debate if and how people use it, and is it really not valuable, or maybe it's a fantastic feature on second thought. I am going to miss this feature a lot and was highlighting it to others as an Ollama special treat. It was in daily use.

Related: #759 (feature removal), #501 (bug), #502 (documentation)

I'd like to bring some more viewpoints to this, as a heavy user who's tried everything I've gotten my hands on:

  1. User experience in comparison to alternatives was great. Ollama comes with an ecosystem of APIs and chatbots. With nothing else to install, Ollama was a one-liner RAG chatbot with multi-line support. Upstream clients needed zero configuration to get these benefits for free.
  2. The alternatives are not good without plenty of developer effort that regular people can't do. Now the users need to ramp up a client for this, and every one of them is poor in their user experience in their own ways. No match for Ollama out of the box. UX doesn't happen in a vacuum, it's in comparison to others. Ollama + any chatbot GUI + dropdown to select a RAG-model was all that was needed, but now that's no longer possible.
  3. The PrivateGPT example is no match even close, I tried it and I've tried them all, built my own RAG routines at some scale for others. All else being equal, Ollama was actually the best no-bells-and-whistles RAG routine out there, ready to run in minutes with zero extra things to install and very few to learn. "Don't make me install new things" is an important UX perspective to this.
  4. Creating embeddings was a bit of extra work, but that's unavoidable if it's generic. Again comparing to alternatives, all other methods need some work to make the embeddings too. Ollama's was easy, even if there can be an argument that "one line per embedding isn't elegant". Well it is in its simplicity. The rest is string manipulation.
  5. It was instant fast at runtime. Embeddings took a while to create, but at runtime there is no delay, it's jut as instant as without embeddings.
  6. Turns out LLMs create totally usable embeddings. Even if Llama2 or Mistral aren't embedding models on paper, they worked great in practice. I was using it daily with esoteric documents and it was fine. This was an issue in theory only.
  7. Instead of outright deletion, it really needed just some cleanup, but not immediately. Finding the root cause for what made longer ingestions not work as a single batch. Create better documentation. That's it. Then it would have been fine to park it for a long time. Even without changes it was usable, and there are always issues in a sufficiently large codebase.

I'll write this as a new issue so it can be tracked, maybe there's more feedback. Please consider bringing it back. I'm going to park to v0.1.3 tag until new killer features come along. Thanks a lot for the great work! Please ask community opinion with a clear issue headline before deprecating powerful capabilities in a breaking change, and give it a few weeks if not urgent.

Other thoughts and viewpoints welcome.

@BruceMacD
Copy link
Contributor

Thanks for the great feedback here. I'm going to make sure this get seen by the rest of the maintainers also.

@jmorganca
Copy link
Member

Wanted to echo @BruceMacD 's comment! Thank you for opening this discussion (and for the thoughtful and heartwarming writeup). This is definitely something Ollama should make easy - let's see how this feature can be brought in as the primitives improve (embedding models, gpu acceleration, etc)

@CyrilPeponnet
Copy link

Especially with proper embedding model support coming "soon" ggerganov/llama.cpp#2872 it would make the feature really useful.

@CyrilPeponnet
Copy link

or we could just use https://github.com/go-skynet/go-bert.cpp for the embedding part.

@jtoy
Copy link

jtoy commented Nov 10, 2023

I would love to see this back as well :)

@snowyu
Copy link

snowyu commented Nov 28, 2023

In fact go-bert.cpp is just a wrapper of incomplete bert.cpp.

Recommended: tokenizers-cpp is a better HF's tokenizers wrapper.

@kjp-souza
Copy link

kjp-souza commented Dec 8, 2023

@jmorganca, @BruceMacD, could you please explain what needs to be done to use this /embed API endpoint? I get this error now, but I could not find how to use the endpoint from the documentation:

2023/12/08 21:57:34 parser.go:59: WARNING: Unknown command: ​

Error: deprecated command: EMBED is no longer supported, use the /embed API endpoint instead

Is there a similar command that substitutes EMBED?
Thanks!!

@sandangel
Copy link

sandangel commented Dec 11, 2023

Hi, I found this: https://github.com/ml-explore/mlx-examples/blob/main/bert/README.md. I think this has a native support for Apple Silicon. Is it possible to replace the current llama.cpp with mlx for mac m1. ?

@jmorganca
Copy link
Member

@sandangel thanks for the pointer. We are looking at ways to support BERT models and the MLX framework seems like a great fit for that.

@sampriti026
Copy link

Hey if I want to use the generate embedding api with other embedding models in mteb, is there any way i can do that? if yes, then how?

@BruceMacD
Copy link
Contributor

@sampriti026 ollama has an endpoint to generate embeddings:
https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-embeddings

It sounds like you may be looking for embedding specific models, which ollama doesnt support yet. Support for BERT embedding models is tracked in #327

@sampriti026
Copy link

sampriti026 commented Dec 27, 2023

@BruceMacD unrelated to ollama, what is the alternative to ollama, for running the desired embedding models? any experience? also i was wondering if i can take one of the embedding model of choice and make it, and then run that model to generate embedding.

@sandangel
Copy link

If you're using Apple Silicon, a good alternative would be adding an API endpoint to https://github.com/ml-explore/mlx-examples/blob/main/bert/README.md . Endpoint can be similar to OpenAI endpoint of Ollama depends on framework you're using (langchain, llama-index, haystack etc...).

@BruceMacD BruceMacD changed the title Bring back the EMBED feature Bring back the EMBED feature in the Modelfile Dec 29, 2023
@espipj
Copy link

espipj commented Dec 31, 2023

This would be super useful

@chigkim
Copy link

chigkim commented Feb 10, 2024

Does Ollama support any embedding model yet? If so, which and where can I get?

@sublimator
Copy link
Contributor

@chigkim
ICYMI:
https://ollama.com/library/nomic-embed-text
https://ollama.com/library/all-minilm

@vividfog
Copy link
Author

Nice, this is an excellent feature done well. Thank you to all contributors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests