Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LlamaCppEmbeddings does not work #3626

Closed
lanluu opened this issue Dec 12, 2023 · 19 comments
Closed

LlamaCppEmbeddings does not work #3626

lanluu opened this issue Dec 12, 2023 · 19 comments
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@lanluu
Copy link

lanluu commented Dec 12, 2023

Currently, we are employing llamacpp via langchain as documented at https://js.langchain.com/docs/integrations/llms/llama_cpp.

(version: 0.0.204)

It looks like that the embeddings via LlamaCppEmbeddings class (located at: langchain/src/embeddings/llama_cpp.ts) still do not work for us, as the resulted vector's length is quite randomized, for instance:
[
1824, 349, 418,
274, 1402, 28709,
334, 8236, 28828,
28804
]

It is noted that, the different gguf models such as llama-2 or zephyr-ß result the same thing.

Your tips will help us in this case much.

Thank you in advance!

Copy link

dosubot bot commented Dec 12, 2023

🤖

Hello @lanluu! 🖐️

I'm Dosu, a friendly bot here to help you out while we wait for a human maintainer. I specialize in solving bugs, answering questions, and even guiding you on your journey to becoming a contributor.

Remember, I'm here to assist you. Let's work together to sort this out! 😊

The vector length appears to be randomized because the LlamaCppEmbeddings class in the langchainjs framework generates embeddings based on the input text. The length of the resulting vector is dependent on the number of tokens in the input text, not on the specific model used (like llama-2 or zephyr-ß).

In the embedDocuments and embedQuery methods, the input text is tokenized using the _context.encode(text) function. This function breaks down the input text into tokens, which are then converted into numerical representations. The number of tokens, and thus the length of the resulting vector, depends on the content and structure of the input text.

Here is the relevant code:

const encodings = await this.caller.call(
  () =>
    new Promise((resolve) => {
      resolve(this._context.encode(text));
    })
);

for (let i = 0; i < encodings.length; i += 1) {
  const token: number = +encodings[i];
  tokens.push(token);
}

In this code, encodings.length determines the length of the resulting vector. This length is not fixed and will vary depending on the input text.

If you need more control over the length of the resulting vector, you might consider implementing a padding mechanism to ensure a consistent vector length, or truncating the vector to a fixed length. However, please note that both of these approaches may impact the quality of the embeddings.

For more details, you can refer to the source code of the LlamaCppEmbeddings class in the langchainjs repository.

I hope this helps! If you have any more questions, feel free to ask.

Best regards,
Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Dec 12, 2023
@jacoblee93
Copy link
Collaborator

CC @nigel-daniels any ideas here?

@nigel-daniels
Copy link
Contributor

@lanluu do you have some more context on what you are trying to do and how the failure manifests?

@AnirudhaGohokar
Copy link

I too can confirm this issue. The LlamaCppEmbeddings is generating random length encodings which in turn is giving below error when inserting into any DB:

Error processing the file: ClickHouseError: Constraint cons_vec_len for table default.vector_search (e161893e-d050-4cc8-b412-6f8051dc83df) is violated at row 2. Expression: (length(embedding) = 49). Column values: embedding = [512., 13715., 362., 7569., 3062., 259., 13., 30211., 29871., 29906., 29900., 29906., 29906., 10811., 1693., 616., 512., 1890., 4803., 9333., 363., 4831., 291., 29880., 6897., 259., 13., 29906., 259., 13., 3562., 310., 2866., 1237., 259., 13., 23869., 12267., 259., 13., 29909., 15837., 310., 1857., 319., 29902., 2693., 1860., 259., 13., 1576., 5434., 310., 319., 29902., 259., 13., 1168., 10085., 259., 13., 29896., 29899., 29941., 259., 13., 29941., 259., 13., 29946., 29899., 29953., 259., 13., 29947., 259., 13., 11403., 4251., 259., 13., 29947., 259., 13., 11096., 2708., 393., 674., 337., 7922., 319., 29902., 259., 13., 29896., 29899., 29941., 259., 13., 24445., 800., 310., 1857., 319., 29902., 26811., 335., 29885., 448., 21784., 29257., 259., 13., 29941.]

@nigel-daniels
Copy link
Contributor

@AnirudhaGohokar the length of the token array will be approximately relative to the length of the response, therefore this will result in a variable length array. The error you shared looks like the error is coming from a breaking table constraint in ClickHouse. Can you share the code you are using as I don't see why the resulting embedding array would map to a database constraint?

@awijshoff
Copy link

Hi, I also can confirm this issue. It happens with Faiss and HNSWLib. The code to reproduce is basically the example from the doc-files:

import { FaissStore } from "langchain/vectorstores/faiss";
import { LlamaCppEmbeddings } from "langchain/embeddings/llama_cpp";

const llamaPath = "models/mistral-7b-v0.1.Q5_K_M.gguf"; 

const embedding = new LlamaCppEmbeddings({
  modelPath: llamaPath,
  embedding: true,
});

const vectorStore = await FaissStore.fromTexts(
  ["Hello world", "Bye bye", "hello nice world"],
  [{ id: 2 }, { id: 1 }, { id: 3 }],
  embedding
);

const resultOne = await vectorStore.similaritySearch("hello world", 1);

which gives:

For Faiss:

throw new Error(`Query vector must have the same length as the number of dimensions (2)`);
                  ^

Error: Query vector must have the same length as the number of dimensions (2)

or when changing to HNSWLib:

this.index.addPoint(vectors[i], docstoreSize + i);
                       ^

Error: Invalid the given array length (expected 2, but got 4).

@nigel-daniels
Copy link
Contributor

Ok got it so the error is the dimension size not the specific array length.

@awijshoff
Copy link

awijshoff commented Dec 13, 2023

I think #3509 is related to this. When changing the order of the text (moving "Bye bye" to the first position) HNSWLib returns

this.index.addPoint(vectors[i], docstoreSize + i);
                       ^

Error: Invalid the given array length (expected 4, but got 2).

HNSWLib internally sets this.args.numDimensions to the length of the embeddings of the first element of the passed elements.

    async initIndex(vectors) {
        if (!this._index) {
            if (this.args.numDimensions === undefined) {
                this.args.numDimensions = vectors[0].length;
            }

therefore this never fails for the first element, but in this case fails for the second with an array-lenght of 2. I think other vectorStores work very similar, for example in Faiss:

const dv = vectors[0].length;
        if (!this._index) {
            const { IndexFlatL2 } = await FaissStore.importFaiss();
            this._index = new IndexFlatL2(dv);
        }
        const d = this.index.getDimension();
        if (dv !== d) {
            throw new Error(`Vectors must have the same length as the number of dimensions (${d})`);
        }

Therefore I assume that the vectors/embeddings have to be of equal length for this to work? But that is just my wild guess.

This article supports my theory that every vector/embedding has to have the same length (MongoDB article).

If that is indeed the case, the issue seems to be in node-llama-cpp, because in the following code-example (using node-llama-cpp directly), the val1 val2 and val3 are arrays of unequal length, whereas they should have been arrays of equal lenght - the dimension of the model?

import { LlamaModel, LlamaContext, LlamaChatSession } from "node-llama-cpp";

const llamaPath = "models/mistral-7b-v0.1.Q5_K_M.gguf"; //english

const model = new LlamaModel({
  modelPath: llamaPath,
});

const context = new LlamaContext({ model });

const val1 = context.encode("hello nice world");
const val2 = context.encode("Hello world");
const val3 = context.encode("Bye bye");

@nigel-daniels
Copy link
Contributor

@awijshoff correct I am just passing the results from node-llama-cpp thru. I've started a discussion withnode-llama-cpp's maintainer on where the fix is best applied.

@giladgd
Copy link

giladgd commented Dec 16, 2023

I haven't added support for embedding in node-llama-cpp yet, but it is planned for the next version that I'm currently working on.

The current implementation for this on Langhcain is wrong and won't work.

I've opened an issue for the embedding feature on node-llama-cpp so you can track its progress (withcatai/node-llama-cpp#123).
I'll also update this issue after I implement the embedding support so we can fix the code on Langchain.

@nigel-daniels
Copy link
Contributor

@giladgd thanks for the update, I'll await the node-llama-cpp v3 release then update accordingly.

@iOnline247
Copy link

I think #3509 is related to this. When changing the order of the text (moving "Bye bye" to the first position) HNSWLib returns

this.index.addPoint(vectors[i], docstoreSize + i);
                       ^

Error: Invalid the given array length (expected 4, but got 2).

HNSWLib internally sets this.args.numDimensions to the length of the embeddings of the first element of the passed elements.

    async initIndex(vectors) {
        if (!this._index) {
            if (this.args.numDimensions === undefined) {
                this.args.numDimensions = vectors[0].length;
            }

therefore this never fails for the first element, but in this case fails for the second with an array-lenght of 2. I think other vectorStores work very similar, for example in Faiss:

const dv = vectors[0].length;
        if (!this._index) {
            const { IndexFlatL2 } = await FaissStore.importFaiss();
            this._index = new IndexFlatL2(dv);
        }
        const d = this.index.getDimension();
        if (dv !== d) {
            throw new Error(`Vectors must have the same length as the number of dimensions (${d})`);
        }

Therefore I assume that the vectors/embeddings have to be of equal length for this to work? But that is just my wild guess.

This article supports my theory that every vector/embedding has to have the same length (MongoDB article).

If that is indeed the case, the issue seems to be in node-llama-cpp, because in the following code-example (using node-llama-cpp directly), the val1 val2 and val3 are arrays of unequal length, whereas they should have been arrays of equal lenght - the dimension of the model?

import { LlamaModel, LlamaContext, LlamaChatSession } from "node-llama-cpp";

const llamaPath = "models/mistral-7b-v0.1.Q5_K_M.gguf"; //english

const model = new LlamaModel({
  modelPath: llamaPath,
});

const context = new LlamaContext({ model });

const val1 = context.encode("hello nice world");
const val2 = context.encode("Hello world");
const val3 = context.encode("Bye bye");

@awijshoff Thanks for detailing this. The issue I created is exactly what you've described and am very excited about the update coming down the pipe for it. :)

@stewartoallen
Copy link

this is not a bug. the context encode and decode functions are meant to turn text into the tokens actually sent to the llm (it is not sent text), so they will vary with text input length. if your example above happened to match, it's because one or more words used more than one token. it's not 1:1. encode is good for determining if your input text will exceed the context length of the model. for embedding vectors, you will want to use a model designed for generating semantic search vectors. you can get these from Huggingface. in JS you can do something like:

import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";
import { env } from '@xenova/transformers';

const modelName = "Xenova/all-MiniLM-L12-v2";
const model = new HuggingFaceTransformersEmbeddings({ modelName });
const prompt = "query or prompt to transform into a semantic vector";
const vector = model.embedDocuments([ prompt ]);

in the case of the Xenova model above, you will always get a vector of dimension 384. then for each of your document chunks being stored in a vector db, you will use the same embed call. that way vectors are comparable. the vector db can take it from there.

@lanluu
Copy link
Author

lanluu commented Jan 15, 2024

@giladgd Hi, we are so excited to be able to work with node-llama-cpp v3 👍. Thank you for your effort. Do you have a hint regarding the release date of V3 for us? :-)

@giladgd
Copy link

giladgd commented Jan 18, 2024

@lanluu I predict it'll take about ~1 month to finish the major features that contain breaking changes for the v3 beta, so the stable release would come a bit after that

@giladgd
Copy link

giladgd commented Jan 21, 2024

I've added support for embedding in the beta version of node-llama-cpp (3.0.0-beta.3); you can find a code example of how to use it here.

Note that since this is a beta version, the API may change before the stable release.

@nigel-daniels
Copy link
Contributor

@giladgd that's great, thanks. Looking forward to the 3.0.0 release :)

@AllenPan03
Copy link

I've added support for embedding in the beta version of node-llama-cpp (3.0.0-beta.3); you can find a code example of how to use it here.

Note that since this is a beta version, the API may change before the stable release.

I used node-llama-cpp (3.0.0-beta.3) with langchain(2.8.8), but still got TypeError: this._context.encode is not a function
at file:///Users/panjunlin/work/github/chat-mind/node_modules/langchain/node_modules/@langchain/community/dist/embeddings/l lama_cpp.js:48:39, is the langchain version wrong?

@AllenPan03
Copy link

I also used node-llama-cpp (3.0.0-bet.14) earlier, but found a new error: file:///Users/panjunlin/work/github/chat-mind/node_modules/node-llama-cpp/dist/evaluator/LlamaModel.js:22
constructor({ modelPath, gpuLayers, vocabOnly, useMmap, useMlock, onLoadProgress, loadSignal }, { _llama }) {
^

TypeError: Cannot destructure property '_llama' of 'undefined' as it is undefined.
at LlamaModel (/Users/panjunlin/work/github/chat-mind/node_modules/node-llama-cpp/src/evaluator/LlamaModel.ts:63:9)

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 18, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 25, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

9 participants