You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
File "d:\langchain\pdfqa-app.py", line 46, in _upload_data
Pinecone.from_texts(self.doc_chunk,embeddings,batch_size=16,index_name=self.index_name)
File "E:\anaconda\envs\langchain\lib\site-packages\langchain\vectorstores\pinecone.py", line 232, in from_texts
embeds = embedding.embed_documents(lines_batch)
File "E:\anaconda\envs\langchain\lib\site-packages\langchain\embeddings\openai.py", line 297, in embed_documents
return self._get_len_safe_embeddings(texts, engine=self.deployment)
File "E:\anaconda\envs\langchain\lib\site-packages\langchain\embeddings\openai.py", line 221, in _get_len_safe_embeddings
token = encoding.encode(
File "E:\anaconda\envs\langchain\lib\site-packages\tiktoken\core.py", line 117, in encode
if match := _special_token_regex(disallowed_special).search(text):
TypeError: expected string or buffer
Use Pinecone.from_documents(self.doc_chunk, embeddings, batch_zie=16, index_name=self.index_name) instead. text_splitter.split_documents chunks file into a list of documents instead of text.
Hi, @catchlui! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, you raised an issue regarding a TypeError when encoding a special token in the tiktoken library. User YLFxGen suggested using Pinecone.from_documents instead of _text_splitter.split_documents as a potential solution, and user Rinisha160391 has approved this suggestion.
Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your contribution to the LangChain repository!
dosubotbot
added
the
stale
Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed
label
Nov 11, 2023
System Info
File "d:\langchain\pdfqa-app.py", line 46, in _upload_data
Pinecone.from_texts(self.doc_chunk,embeddings,batch_size=16,index_name=self.index_name)
File "E:\anaconda\envs\langchain\lib\site-packages\langchain\vectorstores\pinecone.py", line 232, in from_texts
embeds = embedding.embed_documents(lines_batch)
File "E:\anaconda\envs\langchain\lib\site-packages\langchain\embeddings\openai.py", line 297, in embed_documents
return self._get_len_safe_embeddings(texts, engine=self.deployment)
File "E:\anaconda\envs\langchain\lib\site-packages\langchain\embeddings\openai.py", line 221, in _get_len_safe_embeddings
token = encoding.encode(
File "E:\anaconda\envs\langchain\lib\site-packages\tiktoken\core.py", line 117, in encode
if match := _special_token_regex(disallowed_special).search(text):
TypeError: expected string or buffer
Who can help?
No response
Information
Related Components
Reproduction
def _load_docs(self):
loader=PyPDFLoader("D:\langchain\data_source\1706.03762.pdf")
self.doc= loader.load()
Expected behavior
Please help with the solution
The text was updated successfully, but these errors were encountered: