Replies: 3 comments 3 replies
-
I am also interested. +1 |
Beta Was this translation helpful? Give feedback.
-
I've been looking for a way to do this, and I couldn't for the life of me find any clear answer for a very long time, because people are mostly writing LLM programs in Python where everything is behind a handful of functions, nobody bothers to check what they actually do and they just assume if there is a function for "loading" embeddings, it does just that. Nope. As far as I'm aware, there are no ways of "putting" things inside the LLM, except the prompt, training and loras ( which are sort of a training too ) Embeddings, are encoded representation of text, either your prompt, or LLMs response. You can use those embeddings, like an "array key" to fetch some text from a vectordb, which is similar to the text the embedding represents. You can't "load" embeddings, or the text fetched from vector store. Tools like langchain etc, are simply using embeddings to fetch from vector store, and append that text, to your original prompt, as if you typed that from your keyboard yourself. Aside from lora, an LLM has only one possible "input interface" and only one possible "output interface", and it's just text. The same text you give it normally. Langchain and similar libs, just remove that part of the prompt when presenting the results to you, so it appears like it magically gave the LLM some information. I might be wrong, so feel free to correct me, but I'm fairly certain that's how it works, at least currently. |
Beta Was this translation helpful? Give feedback.
-
I don't know about others but I am using a tiny embedding model from Embed4ALL (GPT4ALL) which is very fast. The in house llama.cpp embedder is very slow. I am using them to get the model search the internet and come up with correct answers. Well, at first I tried langchain's web-retrieval and tools but I was unable to find a way to use it. It's was very complicated for someone like me. So I ditched langchain all together and made a python library that uses cosine similarity search to get the chunks. First the contents of the web is divided into chunks of some size. Then, each of the chunks are converted into embeddings using the Embed4ALL embedder. Then, the model queries the embeddings using cosine similarity search to get the chunks. Only problem here is that websites are blocking access to the model for which I have found a way to fix that. |
Beta Was this translation helpful? Give feedback.
-
I want to use embedding with my model, how can i use the embedding provided to generated the vector store and load it when inferecing? Any examples would be much appreciated. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions