Hybrid search with experimental features using Custom Embedding API as an embedder #748
Replies: 1 comment 3 replies
-
Hello, @miguelisidoro, 👋 Your setup does not seem 100% clear to me. To be clear, you would implement a custom REST server that would serve as a proxy to Azure OpenAI, is that correct?
Correct
Correct
I'm not sure I understand this affirmation. Under the assumption that Meilisearch is going to call your custom API that calls the Azure OpenAI API, why do you expect that to be faster than you directly calling the Azure OpenAI API? It feels like I'm missing a piece here.
Correct, the Azure OpenAI embedding API will be called by your REST proxy which will be called by Meilisearch, if I understand your setup correctly
Why?
Looking at a sample response from OpenAI Azure in their documentation: {
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [
0.018990106880664825,
-0.0073809814639389515,
.... (1024 floats total for ada)
0.021276434883475304,
],
"index": 0
}
],
"model": "text-similarity-babbage:001"
} then you should have: {
"inputType": "textArray",
"pathToEmbeddings": ["data"],
"embeddingObject": ["embedding"]
// other parameters elided
} for correctly parsing the response.
So, assuming that your REST proxy would behave exactly like Azure OpenAI expected that it would take an API key as its "apiKey": "<API-key-of-your-custom-REST-embedder>",
"inputField": ["input"],
"inputType": "textArray",
"query": {}
The request will look the same in both situations, except that the text to embed will be sourced from:
This is an unfortunate error in the current documentation.
I'm not sure about the current documentation, but there is a recent Docs branch that adds an example, see here This new branch also contains more explanation about the So, to recap, the configuration of your embedder would looks something like the following: {
"source": "rest",
"url": "url-to-your-REST-proxy",
"apiKey": "api-key-to-your-REST-proxy",
"documentTemplate": "something containing relevant {{doc.field}}s, truncated if necessary",
"inputField": ["input"],
"inputType": "textArray",
"query": {},
"pathToEmbeddings": ["data"],
"embeddingObject": ["embedding"]
} |
Beta Was this translation helpful? Give feedback.
-
Hello,
We are thinking on implementing a custom REST API that serves as an embedder. but there is little documentation unfortunately.
Can you confirm that with a custom REST API as an embedder:
One of our concerns is that we will need to rebuild our existing search index and if we use a user provided approach, for each document, we will need to call the Azure Open AI embedding API to generate the embedding vector that will be set in the _vectors field of the index.
If on the other hand, use a REST API approach using a custom REST API that calls Azure Open AI Embedding API, we don't need to call the Azure Open AI Embedding API directly and indexing times are much faster.
Can you confirm what I said above?
About the format of the response required by Meilisearch when using a custom REST API as an embedder. Can you give us an example of a response that we would have to supply so that it can be processed correctly by Meilisearch?
About the request itself, how does Meilisearch do a request to the custom REST API as an embedder? How do I know how data is passed to the embedder? Is it a GET or a POST (I assume POST)? Can you give us an example of a request in the following situations?
Another thing: in https://www.meilisearch.com/docs/learn/experimental/vector_search#generate-auto-embeddings, REST Option, it is referred the following:
"model is a mandatory field indicating a compatible model.
documentTemplate is an optional field. Use it to customize the data you send to the embedder. It is highly recommended you configure a custom template for your documents."
What are we supposed to set in the model field?
documentTemplate although optional, is recommended. Are there any examples of document templates in the documentation or that you can supply?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions