Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HNSWLib TypeError: Cannot read properties of undefined (reading 'embedQuery') #2528

Closed
batjko opened this issue Sep 6, 2023 · 4 comments
Closed
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@batjko
Copy link

batjko commented Sep 6, 2023

In my CLI tool, I've been using the examples in the docs to use HNSWLib to persist my in-memory store to disk and reload it every time the CLI command is run (instead of loading and embedding the docs every time).

What works:

  1. Store gets created successfully.
  2. Documents get loaded and split into it successfully.
  3. VectorStore persisted correctly (as far as I can see) to disk (using .save())
  4. On next execution, the vector store gets loaded back into memory seemingly successfully (no error yet).

But then when I try to do anything with that store (e.g. run a similarity search or use it to execute a query in a chain call), it throws me the below error.

I have checked and I know the query is being passed in properly, and the embeddings object is really so simple I don't think it's possible to do anything wrong with it - at least the new OpenAIEmbeddings() thing was passed in normally, as per docs, but maybe I missed something?

Code:

export async function askLLM(question) {
  const modelName = 'gpt-4'
  const maxTokens = 4096
  const temperature = 0.33

  let vectorStore
  try {
    console.debug('Checking for existing vector store file...')
    vectorStore = await HNSWLib.load(storeFolder)
    console.debug(`Loaded from ${storeFolder}.`)
  } catch (error) {
    throw new Error('No vector store found. Please run "load" command first.')
  }

  const model = new OpenAI({ modelName, maxTokens, temperature })
  const chain = RetrievalQAChain.fromLLM(model, vectorStore.asRetriever(), true)

  const response = await chain.call({ // Here is where it dies!
    query: question,
  })

  return response.text
}

Output:

Checking for existing vector store file...
Loaded from hnswlib.

file:///home/pat/dev/myRepo/node_modules/langchain/dist/vectorstores/base.js:103
        const results = await this.similaritySearchVectorWithScore(await this.embeddings.embedQuery(query), k, filter);
                                                                                         ^

TypeError: Cannot read properties of undefined (reading 'embedQuery')
    at HNSWLib.similaritySearch (file:///home/pat/dev/code-grokker2/node_modules/langchain/dist/vectorstores/base.js:103:90)
    at VectorStoreRetriever._getRelevantDocuments (file:///home/pat/dev/code-grokker2/node_modules/langchain/dist/vectorstores/base.js:68:33)
    at VectorStoreRetriever.getRelevantDocuments (file:///home/pat/dev/code-grokker2/node_modules/langchain/dist/schema/retriever.js:66:40)
    at async RetrievalQAChain._call (file:///home/pat/dev/code-grokker2/node_modules/langchain/dist/chains/retrieval_qa.js:55:22)
    at async RetrievalQAChain.call (file:///home/pat/dev/code-grokker2/node_modules/langchain/dist/chains/base.js:101:28)
    at async askLLM (file:///home/pat/dev/code-grokker2/src/app.js:100:20)
    at async Command.<anonymous> (file:///home/pat/dev/code-grokker2/src/cli.js:30:5)

Note, it seems to complain, upon calling the chain, that there is no embeddings to use for embedding the query.
But I gave it the OpenAI model, which did not complain when creating the store and embedding the documents in the first place. So not sure why it has a problem here.

Any advice would be appreciated.

Persisted store:
image

Embedded Documents definitely present:
image

For reference, this is how the store was saved down:

async function storeCodeInVectorDB(splitDocs) {
  // load into memory
  const embeddings = new OpenAIEmbeddings()
  const store = await HNSWLib.fromDocuments(splitDocs, embeddings)

  // save to file
  await store.save(storeFolder)
  console.debug(`Saved vector store to ${storeFolder}.`)
  return store
}
@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Sep 6, 2023
@dosubot
Copy link

dosubot bot commented Sep 6, 2023

🤖

Hello,

From the error message you're seeing, it seems like the embedQuery method is being called on an undefined object. This method is part of the OpenAIEmbeddings object, which is used to convert a given text string into a numerical representation (embedding) using OpenAI's language model.

In your code, you're initializing OpenAIEmbeddings without any parameters. If this class requires any parameters for proper initialization, not providing them could lead to an undefined state, causing the error.

Also, it's possible that the fromDocuments or load method of HNSWLib is not properly initializing some internal properties, leading to the error. In your code, the HNSWLib class is initialized with embeddings and splitDocs parameters. If the embeddings parameter is not an instance of OpenAIEmbeddings or if it's not properly initialized, it could lead to the error.

Here are a few things you can try:

  1. Check if OpenAIEmbeddings requires any parameters for initialization and if so, provide them.
  2. Check if HNSWLib.fromDocuments and HNSWLib.load methods are properly initializing all the required properties.
  3. Check if embeddings is an instance of OpenAIEmbeddings and if it's properly initialized.

I hope this helps. If you're still facing issues, please provide more details about the implementation of OpenAIEmbeddings and HNSWLib, and I'll be happy to assist you further.

Best,
Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@batjko
Copy link
Author

batjko commented Sep 6, 2023

Bloody hell, @dosu-beta is right.

The typings on HNSWLib.load(), show that I need to pass in an embeddings object to it and that's what I forgot.

So adding this immediately fixed it for me:

image

Thanks, @dosu-beta. 🙏

@batjko batjko closed this as completed Sep 6, 2023
@dosubot
Copy link

dosubot bot commented Sep 6, 2023

🤖

Hello,

I'm glad to hear that you were able to resolve the issue. To provide some clarification on why the embeddings object is necessary when loading a vector store with HNSWLib.load(), it's because the embeddings object is used to convert raw text data into numerical vectors that can be processed by the HNSWLib vector store.

When you call HNSWLib.load(), it reads the HNSW index, the arguments, and the document store from a directory, then creates a new HNSWLib instance with these values. The embeddings object you pass to this function is used by the new HNSWLib instance for converting text data into numerical vectors. Here's the relevant code:

static async load(directory: string, embeddings: Embeddings) {
    const fs = await import("node:fs/promises");
    const path = await import("node:path");
    const args = JSON.parse(
      await fs.readFile(path.join(directory, "args.json"), "utf8")
    );
    const index = await HNSWLib.getHierarchicalNSW(args);
    const [docstoreFiles] = await Promise.all([
      fs
        .readFile(path.join(directory, "docstore.json"), "utf8")
        .then(JSON.parse),
      index.readIndex(path.join(directory, "hnswlib.index")),
    ]);
    args.docstore = new SynchronousInMemoryDocstore(new Map(docstoreFiles));

    args.index = index;

    return new HNSWLib(embeddings, args);
  }

In this code, the embeddings object is passed to the new HNSWLib instance, which is then returned by the function. This ensures that the loaded vector store can correctly interpret the stored vectors. You can find this code in the HNSWLib source file.

The underlyingEmbeddings object in the CacheBackedEmbeddings class is an instance of the Embeddings class. It is used to generate embeddings for text documents that are not already present in the cache. If the embeddings for certain documents are not found in the cache, the underlyingEmbeddings object is used to generate these embeddings. The generated embeddings are then stored in the cache for future use. You can find this code in the CacheBackedEmbeddings source file.

I hope this clarifies your question. If you have any more questions, feel free to ask.

Best,
Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@batjko
Copy link
Author

batjko commented Sep 6, 2023

Useful explanation, thank you.
I guess I assumed the persisted store would have embeddings information in it, but didn't really think about it.

Whoever created this bot, make money with it! It's awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant