New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Ollama as an embeddings provider #4456
Conversation
Initial prototype of Ollama embeddings actually working, error handlign / retries still missing. Allow model to be any String and require dimensions parameter Fixed rustfmt formatting issues There were some formatting issues in the initial PR and this should not make the changes comply with the Rust style guidelines Because I accidentally didn't follow the style guide for commits in my commit messages I squashed them into one to comply
b43505c
to
d3004d8
Compare
Hello, thank you for the PR! I'll spend some time reviewing it in the future. Like you said, there is some duplication with OpenAI code, so I'm thinking we might want to factor the implementations at some point. I was wondering how the code would behave if the input text contains too many tokens? The current behavior for the OpenAI embedder is the following:
Meanwhile, the Hugging Face embedder in Meilisearch is implemented such that any tokens after the maximum value are removed (since we're doing the work locally for this embedder, we're always embedding the documents). We really don't want to error and cancel an entire indexing operation because a document went above the token limits. Would there be a way to implement windowing like OpenAI, or at least cutting any tokens beyond the maximum like the Hugging Face embedder? |
I started a notion page listing some of the planned improvements for v1.8. I added the |
Regarding the code structure: Too many token issues: |
Hello, Thank you for that great answer, and also again for contributing to Meilisearch, this is really appreciated ❤️. Regarding the max tokens issueNo problem that you didn't think of addressing it, that's what reviews are for :-). I'd like more details about your tests: is the behavior consistent when testing with multiple distinct models? Maybe we should open an issue on ollama or at least look a bit at the code to ascertain that the behavior is consistent?
Yes exactly, this is what we're doing for models downloaded from the HuggingFace Hub. Here, it means the user would need a way to provide the correct tokenizer to us, which would be very inconvenient to the user. An alternative would be to have an ollama endpoint that would just tokenize, but I'm don't think this exists. Regarding code structureYou don't need to change the code structure of your PR. If we can ascertain the behavior in the max token case, then I think we'll accept your PR after I do a more detailed review, and land it without structural change. This will be useful to people running Meilisearch on main, or perhaps even as a patch release (depending on products), and I'll take care of the refactoring by Meilisearch v1.8 |
After some experimentation with my local Ollama instance and asking about this on the Ollama Discord server, it looks like Ollama will never completely fail to embed an input, no matter how long it is, but I'm not entirely sure how it does that. It doesn't look like it's just truncating the inputs to fit the length, because adding a single word at the end of a very long prompt would result in a slightly different output. But as far as I can tell, the output is still useful and correct. I first tried getting the embeddings of a really long document and then did it again with the document appended to itself. The two embeddings were not identical, but the cosine distance between them was about One additional edge case I only just discovered is what happens if the text to be embedded is completely blank. In this case, Ollama will immediately return json just containing "null" instead of the embeddings array. Is this a situation that should be handled, or can it be ignored because it would mean the document or the template are not configured correctly anyways? |
Thanks for your findings, it is possible that ollama splits the text, produce multiple embeddings, and then average them. Doing that would yield the kind of results you're observing. About empty input, we could have you check that, but since there are multiple embedders that might have variously erratic behaviours when given an empty input, I think it is probably best if I check for the condition prior to calling embed. In other words, you can assume that the input is never the empty string. |
Hello, I have been testing your PR, I got an ollama server running, I have a few questions:
"error": {
"message": "internal: Error while generating embeddings: coding error: received unhandled HTTP status code 404 from Ollama.",
"code": "internal",
"type": "internal",
"link": "https://docs.meilisearch.com/errors#internal"
}, Could we handle the 404 return code so that this isn't a coding error but a user error and it displays a message along the line of "model xxx wasn't pulled on the ollama server"? |
Instead of the user manually specifying the model dimensions it will now automatically get determined Just like with hf.rs the word "test" gets embedded to determine the dimensions of the output Add a dedicated error type for if the model doesn't exist (don't automatically pull it though) and set the fault of that error to be the user
The dimension inference is now working the same way it does in hf.rs by simply getting the embeddings for "test" when the embedder is created. I completely removed the option to manually set the dimensions, I'm not sure if there is a use case where specifying it manually would still be required. |
Fix Meilisearch capitalization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me.
Thank you for your contribution, and your patience in listening to my feedback 👍
bors merge
Build succeeded:
|
Pull Request
Related issue
Related Discord Thread
What does this PR do?
ollama
MEILI_OLLAMA_URL
to set the embeddings URL of an Ollama instance with a default value ofhttp://localhost:11434/api/embeddings
if no variable is setopenai.rs
to be public so that they can be shared.nomic-embed-text
as default, but any string value is allowed, however it won't automatically check if the model actually exists or is an embedding modelTested against Ollama version
v0.1.27
and thenomic-embed-text
model.PR checklist
Please check if your PR fulfills the following requirements: