Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Failed to ingest your data #318

Closed
umerarif01 opened this issue May 21, 2023 · 13 comments
Closed

Error: Failed to ingest your data #318

umerarif01 opened this issue May 21, 2023 · 13 comments

Comments

@umerarif01
Copy link

umerarif01 commented May 21, 2023

Hi! I am getting this error. I have setup all of my env variables correctly. I don't know why it is not working.

PS C:\Users\UMER ARIF\Desktop\Projects\gpt4-pdf-chatbot-langchain> npm run ingest

> gpt4-langchain-pdf-chatbot@0.1.0 ingest    
> tsx -r dotenv/config scripts/ingest-data.ts

[WARN] Importing from 'langchain/document_loaders' is deprecated. Import from eg. 'langchain/document_loaders/fs/text' or 'langchain/document_loaders/web/cheerio' instead. See https://js.langchain.com/docs/getting-started/install#updating-from-0052 for upgrade instructions.
error TypeError: Object.hasOwn is not a function
    at null.DirectoryLoader (c:/Users/UMER%20ARIF/Desktop/Projects/gpt4-pdf-chatbot-langchain/node_modules/langchain/dist/document_loaders/fs/directory.js:41:24)
    at null.run (c:\Users\UMER ARIF\Desktop\Projects\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:15:29)
    at null.<anonymous> (c:\Users\UMER ARIF\Desktop\Projects\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:49:9)
    at null.<anonymous> (c:\Users\UMER ARIF\Desktop\Projects\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:51:1)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
c:\Users\UMER ARIF\Desktop\Projects\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:44
    throw new Error('Failed to ingest your data');
          ^

Error: Failed to ingest your data
    at null.run (c:\Users\UMER ARIF\Desktop\Projects\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:44:11)
    at null.<anonymous> (c:\Users\UMER ARIF\Desktop\Projects\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:49:9)
    at null.<anonymous> (c:\Users\UMER ARIF\Desktop\Projects\gpt4-pdf-chatbot-langchain\scripts\ingest-data.ts:51:1)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
@khalidfarooq
Copy link

facing the same issue

@umerarif01
Copy link
Author

facing the same issue

Let me know bro if you find a solution.

@nexty5870
Copy link

Make sure you have your .env setup right -

PINECONE_INDEX_NAME is the name of your index ( I got it confuse at first and run into the same issue as you had) changed it and it worked

@EgyptianBrince
Copy link

facing same issue

@wail-asad
Copy link

@umerarif01 @khalidfarooq @EgyptianBrince
Make sure from Pinecone at create index, the Max Dimensions must be 1536.

@khalidfarooq
Copy link

The configuration is correct index name , dimensions, environment
Still it's not working

@bookofbash
Copy link

I ended up using PDFLoader
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
Then the first part of the code to be set up like this:

/* Name of directory to retrieve your files from */

const filePath = 'docs/LLM3.pdf'; //use your filename

export const run = async () => {
  try {
    /*load raw docs from the all files in the directory */
    const directoryLoader = new PDFLoader(filePath, {
      // you may need to add `.then(m => m.default)` to the end of the import
      pdfjs: () => import("pdfjs-dist/legacy/build/pdf.js").then(m => m.default),
    });
    // const loader = new PDFLoader(filePath);
    const rawDocs = await directoryLoader.load();

@umerarif01
Copy link
Author

@umerarif01 @khalidfarooq @EgyptianBrince
Make sure from Pinecone at create index, the Max Dimensions must be 1536.

Despite setting the dimensions to 1536, the script was still giving error. However, I was able to resolve the issue by utilizing the following Python script to ingest the document. As a result, the chatbot is now functioning smoothly.

You can access the Python script at the following link: https://github.com/ucl98/pinecone_ingest_python_implementation

Make sure to follow all of the instructions properly if you are going to use it.

@shiruken1
Copy link

Same here. Trying to figure out what Pinecone's error says but I can't make heads or tails of the error's structure.

data: { error: [Object] }

When I try to log the actual error object, I get undefined 🤷‍♂️

@EgyptianBrince
Copy link

its just a formatting error because langchain had a new update, replace line 13 with this

export const run = async () => {
try {
/*load raw docs from the all files in the directory */
const directoryLoader = new DirectoryLoader(filePath, {
'.pdf': (path) => new CustomPDFLoader(path, '/pdf'),
});

Itll work fine afterwards (remember to save file)

Essentially all your doing is adding the ", '/pdf'" in the new DirectoryLoader

@EgyptianBrince
Copy link

@umerarif01 @khalidfarooq @EgyptianBrince Make sure from Pinecone at create index, the Max Dimensions must be 1536.

its just a formatting error because langchain had a new update, replace line 13 with this

export const run = async () => {
try {
/*load raw docs from the all files in the directory */
const directoryLoader = new DirectoryLoader(filePath, {
'.pdf': (path) => new CustomPDFLoader(path, '/pdf'),
});

Itll work fine afterwards (remember to save file)

Essentially all your doing is adding the ", '/pdf'" in the new DirectoryLoader

@EgyptianBrince
Copy link

facing the same issue

its just a formatting error because langchain had a new update, replace line 13 with this

export const run = async () => {
try {
/*load raw docs from the all files in the directory */
const directoryLoader = new DirectoryLoader(filePath, {
'.pdf': (path) => new CustomPDFLoader(path, '/pdf'),
});

Itll work fine afterwards (remember to save file)

Essentially all your doing is adding the ", '/pdf'" in the new DirectoryLoader

@dosubot
Copy link

dosubot bot commented Sep 23, 2023

Hi, @umerarif01! I'm Dosu, and I'm helping the gpt4-pdf-chatbot-langchain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you were encountering an error while trying to ingest data, and other users like "khalidfarooq", "nexty5870", and "EgyptianBrince" have faced the same issue. User "nexty5870" suggested checking the .env setup, while user "wail-asad" recommended ensuring that the Max Dimensions is set to 1536 in Pinecone. User "bookofbash" also shared a workaround using a different code snippet. Eventually, you were able to resolve the issue by using a Python script for ingestion.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the project!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 23, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 30, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants