-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PineconeError: Error, message length too large: found 5453452 bytes, the limit is: 4194304 bytes #440
Comments
🤖 Hello @YIN-Renlong! Good to see you here again. I hope you're having a great day. The error you're encountering is due to the size of the data being ingested exceeding the limit set by Pinecone. This is likely happening because the The const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
}); You can try reducing the const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 500, // reduce chunkSize from 1000 to 500
chunkOverlap: 200,
}); Please note that reducing the You can find the Let me know if this helps or if you need further assistance. Sources |
@dosubot thanks. According to your plan, I changed the value from 1000 to 500 (chunkSize: 1000 to chunkSize: 500). However, the problem still arises and I got this error:
This means that your solution may not be successful in solving the problem. So, do you have a better opinion? |
🤖 Based on the error message you're encountering, it seems that the size of the data you're trying to upsert into Pinecone is exceeding its limit. Even though you've reduced the One solution could be to further reduce the Another solution could be to modify the A third solution could be to implement a check in the const BATCH_SIZE = 100; // Adjust this value as needed
// ...
const docs = await textSplitter.splitDocuments(rawDocs);
// ...
const embeddings = new OpenAIEmbeddings();
const index = pinecone.Index(PINECONE_INDEX_NAME);
// Split docs into batches
for (let i = 0; i < docs.length; i += BATCH_SIZE) {
const batch = docs.slice(i, i + BATCH_SIZE);
// Embed and upsert each batch separately
await PineconeStore.fromDocuments(batch, embeddings, {
pineconeIndex: index,
namespace: PINECONE_NAME_SPACE,
textKey: 'text',
});
} This code splits the documents into batches and upserts each batch separately, which should help to avoid exceeding Pinecone's data size limit. Please adjust the In addition to the above, you might want to check the following points based on similar issues in the repository:
You can find more details about these points in the following issues: npm run ingest / Pinecone error, Pinecone Ingest Error, and Error: Failed to ingest your data. I hope this helps! If you have any other questions or need further clarification, feel free to ask. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
thanks, I followed your third solution, however, error still happens: the following is the full code of 'ingest-data.ts' after modify: ` /* Name of directory to retrieve your files from export const run = async () => {
} catch (error) { (async () => { Let me know please what should I do now, thanks. |
thanks, I followed your third solution, however, error still happens: the following is the full code of 'ingest-data.ts' after modify: ` /* Name of directory to retrieve your files from export const run = async () => { // const loader = new PDFLoader(filePath); /* Split text into chunks */ const BATCH_SIZE = 100; // Adjust this value as needed const docs = await textSplitter.splitDocuments(rawDocs); console.log('creating vector store...'); // Split docs into batches // Embed and upsert each batch separately (async () => { Please let me know please what should I debug now, thanks. |
Hi, @YIN-Renlong I'm helping the gpt4-pdf-chatbot-langchain team manage their backlog and am marking this issue as stale. It looks like you encountered the "PineconeError: Error, message length too large" issue during the ingest process for certain PDF files, and you were seeking a solution to fix this issue when running Could you please confirm if this issue is still relevant to the latest version of the gpt4-pdf-chatbot-langchain repository? If it is, please let the gpt4-pdf-chatbot-langchain team know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you! |
Hi everyone, for some pdf files, I get the following error during ingest process
npm run ingest
, is there any way to fix it? Thankscreating vector store... error [PineconeError: PineconeClient: Error calling upsert: PineconeError: Error, message length too large: found 5453452 bytes, the limit is: 4194304 bytes] /Applications/gpt4chatbot/gpt4-pdf-chatbot-langchain-cnr/scripts/ingest-data.ts:46 throw new Error('Failed to ingest your data'); ^
Error: Failed to ingest your data at run (/Applications/gpt4chatbot/gpt4-pdf-chatbot-langchain-cnr/scripts/ingest-data.ts:46:11) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at <anonymous> (/Applications/gpt4chatbot/gpt4-pdf-chatbot-langchain-cnr/scripts/ingest-data.ts:51:3)
The text was updated successfully, but these errors were encountered: