-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notion vectorStore integration breaks when importing a database with an empty row #2807
Comments
🤖 Hello, Thank you for bringing this issue to our attention. It seems like the error is occurring because the Here's the relevant part of the code: const batches = chunkArray(this.stripNewLines ? texts.map((t) => t.replace(/\n/g, " ")) : texts, this.batchSize); In this line, To fix this issue, we could add a check to ensure that const batches = chunkArray(this.stripNewLines ? texts.map((t) => t ? t.replace(/\n/g, " ") : "") : texts, this.batchSize); In this modified code, Please try this modification and let us know if it resolves the issue. If it doesn't, or if you have any other questions, please don't hesitate to ask. Best, SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
@skarard can you have a look at this one? |
Sure. |
I was able to reproduce the bug and it opens a pretty solid question. First the bug: /**
* Interface for interacting with a document.
*/
export class Document<
// eslint-disable-next-line @typescript-eslint/no-explicit-any
Metadata extends Record<string, any> = Record<string, any>
> implements DocumentInput
{
pageContent: string;
metadata: Metadata;
constructor(fields: DocumentInput<Metadata>) {
this.pageContent = fields.pageContent
? fields.pageContent.toString()
: this.pageContent;
this.metadata = fields.metadata ?? ({} as Metadata);
}
} The constructor for the Document class will set A Notion page contains properties that the @natalie-bb a work around to get the script working for now is to replace the const workaroundDocs = docs.map(
(doc) => ({ ...doc, pageContent: doc.pageContent ?? "" } as Document)
); Your code may want to look something like this... const loader = new NotionAPILoader({...});
const docs = await loader.load();
const workaroundDocs = docs.map(
(doc) => ({ ...doc, pageContent: doc.pageContent ?? "" } as Document)
);
const vectorStore = await HNSWLib.fromDocuments(workaroundDocs, embeddings); @jacoblee93 Should the |
IMO no - it would never get retrieved |
@natalie-bb I'm going to add a little more context which might not be obvious. The database properties are not used when running the embeddings, this is to say the only searchable data is Something that may be closer to having similar searchable Notion properties is to add the properties as a header to const workaroundDocs = docs.map(
(doc) =>
({
...doc,
pageContent: `---\n${yaml.dump(doc.metadata.properties)}---\n${doc.pageContent ?? ""}`,
} as Document)
); There is a yaml parsing/writing library already in Langchain, this is the import you'd need. import yaml from "js-yaml"; |
@jacoblee93 Ok, if I understand you correctly, this would mean that HNSWLib will need to support |
As a workaround, I simply deleted the empty row. The error was somewhat obtuse, and it wasn't immediately obvious to me that I needed to do that. |
What I mean is it shouldn't be created at all - https://github.com/langchain-ai/langchainjs/blob/main/langchain/src/document.ts#L18 |
Thank you for your clarification. The typing for the Therefore the correct behaviour is for the Document class constructor is to throw an error when page contents are I'll also update the NotionAPILoader to check if |
file:///Users/natalie/chatbot/node_modules/langchain/dist/embeddings/openai.js:139
const batches = chunkArray(this.stripNewLines ? texts.map((t) => t.replace(/\n/g, " ")) : texts, this.batchSize);
^
TypeError: Cannot read properties of undefined (reading 'replace')
at file:///Users/natalie/chatbot/node_modules/langchain/dist/embeddings/openai.js:139:76
at Array.map ()
at OpenAIEmbeddings.embedDocuments (file:///Users/natalie/chatbot/node_modules/langchain/dist/embeddings/openai.js:139:63)
at HNSWLib.addDocuments (file:///Users/natalie/chatbot/node_modules/langchain/dist/vectorstores/hnswlib.js:48:54)
at Function.fromDocuments (file:///Users/natalie/chatbot/node_modules/langchain/dist/vectorstores/hnswlib.js:258:24)
at file:///Users/natalie/chatbot/tools/vectorStores/restaurants.ts:27:46
at processTicksAndRejections (node:internal/process/task_queues:95:5)
The text was updated successfully, but these errors were encountered: