How to use embedding ? #2180

cclinus · 2023-07-11T17:07:11Z

cclinus
Jul 11, 2023

I want to use embedding with my model, how can i use the embedding provided to generated the vector store and load it when inferecing? Any examples would be much appreciated. Thanks.

starrshaw · 2023-08-20T13:56:56Z

starrshaw
Aug 20, 2023

I am also interested. +1

0 replies

staviq · 2023-08-22T01:19:18Z

staviq
Aug 22, 2023

I've been looking for a way to do this, and I couldn't for the life of me find any clear answer for a very long time, because people are mostly writing LLM programs in Python where everything is behind a handful of functions, nobody bothers to check what they actually do and they just assume if there is a function for "loading" embeddings, it does just that.

Nope. As far as I'm aware, there are no ways of "putting" things inside the LLM, except the prompt, training and loras ( which are sort of a training too )

Embeddings, are encoded representation of text, either your prompt, or LLMs response.

You can use those embeddings, like an "array key" to fetch some text from a vectordb, which is similar to the text the embedding represents.

You can't "load" embeddings, or the text fetched from vector store.

Tools like langchain etc, are simply using embeddings to fetch from vector store, and append that text, to your original prompt, as if you typed that from your keyboard yourself.

Aside from lora, an LLM has only one possible "input interface" and only one possible "output interface", and it's just text. The same text you give it normally. Langchain and similar libs, just remove that part of the prompt when presenting the results to you, so it appears like it magically gave the LLM some information.

I might be wrong, so feel free to correct me, but I'm fairly certain that's how it works, at least currently.

1 reply

dspasyuk Aug 26, 2023

@staviq Thank you for the explanation! It is very helpful! So basically I just need to read user input and expand it to include the data that might be relevant to the input from the vector database? What is up with embedding and embd-input-test executable in llama.cpp? Why does it return vector which then used test script to feed the model. Would be nice some information on the embedding in llama.cpp instead of TODO

qnixsynapse · 2023-08-31T07:29:34Z

qnixsynapse
Aug 31, 2023

I don't know about others but I am using a tiny embedding model from Embed4ALL (GPT4ALL) which is very fast. The in house llama.cpp embedder is very slow. I am using them to get the model search the internet and come up with correct answers.

Well, at first I tried langchain's web-retrieval and tools but I was unable to find a way to use it. It's was very complicated for someone like me. So I ditched langchain all together and made a python library that uses cosine similarity search to get the chunks.

First the contents of the web is divided into chunks of some size. Then, each of the chunks are converted into embeddings using the Embed4ALL embedder. Then, the model queries the embeddings using cosine similarity search to get the chunks.

Here is an example:

Only problem here is that websites are blocking access to the model for which I have found a way to fix that.

Edit: Much better this time.

2 replies

benayat Apr 29, 2024

@qnixsynapse can you please elaborate more on how you did that?

qnixsynapse Apr 29, 2024

You have asked this question after 8 months lol. I moved on from this "cosine similarity from scratch" implementation because it became way too complicated to maintain. So I am using llama_index now. Here is an example with Gemma 1.1 8B.

I feel llama_index is the best way to do this (saves a lot of code). I had to write my own LLM/embeddings class to use llamacpp and Bert mini LM embeddings model. Rest is same as what is there in the documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use embedding ? #2180

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

How to use embedding ? #2180

cclinus Jul 11, 2023

Replies: 3 comments · 3 replies

starrshaw Aug 20, 2023

staviq Aug 22, 2023

dspasyuk Aug 26, 2023

qnixsynapse Aug 31, 2023

benayat Apr 29, 2024

qnixsynapse Apr 29, 2024

cclinus
Jul 11, 2023

Replies: 3 comments 3 replies

starrshaw
Aug 20, 2023

staviq
Aug 22, 2023

qnixsynapse
Aug 31, 2023