-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to ingest data #468
Comments
The error "TypeError: Cannot read properties of undefined (reading 'text')" usually occurs when the documents being processed do not have the expected Steps to Troubleshoot
Here’s the modified code with additional logging: import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { pinecone } from '@/utils/pinecone-client';
import { PDFLoader } from 'langchain/document_loaders/fs/pdf';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';
const filePath = 'docs';
export const run = async () => {
try {
const directoryLoader = new DirectoryLoader(filePath, {
'.pdf': (path) => new PDFLoader(path),
});
const rawDocs = await directoryLoader.load();
console.log('rawDocs', rawDocs); // Log rawDocs to inspect structure
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const docs = await textSplitter.splitDocuments(rawDocs);
console.log('split docs', docs); // Log split docs to inspect structure
console.log('creating vector store...');
const embeddings = new OpenAIEmbeddings();
const index = pinecone.Index(PINECONE_INDEX_NAME);
await PineconeStore.fromDocuments(docs, embeddings, {
pineconeIndex: index,
namespace: PINECONE_NAME_SPACE,
textKey: 'text',
});
} catch (error) {
console.log('error', error);
throw new Error('Failed to ingest your data');
}
};
(async () => {
await run();
console.log('ingestion complete');
})(); Additional Checks
By following these steps, you should be able to identify and resolve the issue. If you continue to encounter problems, please provide the output of the logged structures for further assistance.
|
Apologies for being so dumb, but which part of the program should I use this code in? |
You should place the provided code in the import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { pinecone } from '@/utils/pinecone-client';
import { PDFLoader } from 'langchain/document_loaders/fs/pdf';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';
const filePath = 'docs';
export const run = async () => {
try {
const directoryLoader = new DirectoryLoader(filePath, {
'.pdf': (path) => new PDFLoader(path),
});
const rawDocs = await directoryLoader.load();
console.log('rawDocs', rawDocs); // Log rawDocs to inspect structure
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const docs = await textSplitter.splitDocuments(rawDocs);
console.log('split docs', docs); // Log split docs to inspect structure
console.log('creating vector store...');
const embeddings = new OpenAIEmbeddings();
const index = pinecone.Index(PINECONE_INDEX_NAME);
await PineconeStore.fromDocuments(docs, embeddings, {
pineconeIndex: index,
namespace: PINECONE_NAME_SPACE,
textKey: 'text',
});
} catch (error) {
console.log('error', error);
throw new Error('Failed to ingest your data');
}
};
(async () => {
await run();
console.log('ingestion complete');
})(); Explanation of the Code
Running the ScriptTo run this script, use the following command in your terminal: node scripts/ingest-data.ts Ensure you have all the necessary environment variables set up in your TroubleshootingIf you encounter the error [
{
"text": "chunk of text",
"metadata": {
"source": "source file path or identifier"
}
},
...
] If the structure is correct and the error persists, double-check the configuration of the Pinecone index and ensure that the
|
i have tried it but still not working. I get the following error. my pdf is simplest document of few pages which I converted from word to PDF so it is text property. looks like OpenAI works fine but Pinecone is not. any help is greatly appreciated. Document { file:///C:/Python/gpt4-pdf/scripts/ingest-data.ts:39 Also this is how i setup my .env file
|
The error "TypeError: Cannot read properties of undefined (reading 'text')" typically occurs when the code is trying to access the 'text' property of an object that is Steps to Resolve the Issue
By following these steps, you should be able to identify and resolve the issue causing the "TypeError: Cannot read properties of undefined (reading 'text')" error.
|
I have followed the readme instructions and created an index in Pinecone. However, still fail to ingest data and keep on getting this error. I dont know what am I doing wrong. I am fairly new to this and trying to learn along the way. any help is much appreciated.
`creating vector store...
error TypeError: Cannot read properties of undefined (reading 'text')
at C:\Python\gpt4-pdf\node_modules@pinecone-database\pinecone\dist\errors\utils.js:44:57
at step (C:\Python\gpt4-pdf\node_modules@pinecone-database\pinecone\dist\errors\utils.js:33:23)
at Object.next (C:\Python\gpt4-pdf\node_modules@pinecone-database\pinecone\dist\errors\utils.js:14:53)
at C:\Python\gpt4-pdf\node_modules@pinecone-database\pinecone\dist\errors\utils.js:8:71
at new Promise ()
at __awaiter (C:\Python\gpt4-pdf\node_modules@pinecone-database\pinecone\dist\errors\utils.js:4:12)
at extractMessage (C:\Python\gpt4-pdf\node_modules@pinecone-database\pinecone\dist\errors\utils.js:40:48)
at C:\Python\gpt4-pdf\node_modules@pinecone-database\pinecone\dist\errors\handling.js:66:70
at step (C:\Python\gpt4-pdf\node_modules@pinecone-database\pinecone\dist\errors\handling.js:33:23)
at Object.next (C:\Python\gpt4-pdf\node_modules@pinecone-database\pinecone\dist\errors\handling.js:14:53)
file:///C:/Python/gpt4-pdf/scripts/ingest-data.ts:46
throw new Error('Failed to ingest your data');
^
Error: Failed to ingest your data
at run (file:///C:/Python/gpt4-pdf/scripts/ingest-data.ts:46:11)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at file:///C:/Python/gpt4-pdf/scripts/ingest-data.ts:51:3
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
`
The text was updated successfully, but these errors were encountered: