-
Notifications
You must be signed in to change notification settings - Fork 15k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Failed to tokenize (LlamaCpp and QAWithSourcesChain) #2645
Comments
I am facing the same issue. I am getting embeddings from LlamaCppEmbeddings and using Chroma for storing the embeddings. I also noticed it's very slow, and after hours of running, I got that error! |
After a bit more experimenting I think it’s potentially linked to the number of tokens being returned and attempted to be stuffed into the LLM. Errors out with the prompt and multiple documents from the VectorStore but works if I constrain it to a single document. Changing models has similar issues - max_token and ctx_size limits seem to be at play but error messages are a bit opaque. You then start to end up with messages such as |
For others hitting this issue, using the Swapping this for something like |
Same issue, but only with a file of about 400 words. If I use a smaller file with about 200 words, the following runs without problems. from langchain.embeddings import LlamaCppEmbeddings
llama = LlamaCppEmbeddings(model_path="ggml-model-q4_0.bin")
# file merger.txt has 400 words, file mini.txt 7
with open('pdfs/mini.txt', 'r') as file:
text = file.read().replace('\n', ' ')
query_result = llama.embed_query(text)
doc_result = llama.embed_documents([text]) @darth-veitcher Where could I put the map_reduce? I tried after model_path, but this seems very wrong. |
Try passing in n_ctx=2048 as parameter. |
EDIT: Have just seen your code and it looks like you're at the stage of loading a document and attempting to generate embeddings from it. My issue stemmed from returning results from a VectorStore (in my case OpenSearch) and then pushing this into the language model for summarisation. You seem to be loading the document in its entirety and then trying to generate an embedding with the whole document. Whilst you could increase the token size I'd recommend you look into document loaders and text splitters as a strategy in the docs as your current approach won't scale. High level you want to:
An implementation set of stubs might look like this. utils.py def load_unstructured_document(document: str) -> list[Document]:
"""Loads a given local unstructured document and returns the contained data synchronously.
Args:
document: string local path location of the document to load.
Returns:
A list of langchain `Document` objects. These contain primarily a `page_content` string and a `metadata` dictionary of fields.
"""
data: list[Document] = None
try:
loader: UnstructuredFileLoader = UnstructuredFileLoader(os.path.expanduser(str(document)))
data = loader.load()
logger.debug(f"Loaded {document}")
except Exception as e:
logger.exception(e)
return data def split_documents(documents: list[Document], chunk_size: int = 100, chunk_overlap: int = 0) -> list[Document]:
"""As documents can be large in size we need to split them down for models to interpret otherwise we run the risk of breaching token limits. Returns a list of split Documents.
Args:
documents: a list of langchain documents to split. This can be obtained through a loader.
chunk_size: integer size of the window we are creating to split the document by. Default: 100
chunk_overlap: integer size of the amount that each chunk should overlap the previous by. Default: 0
Returns:
A list of langchain `Document` objects. These contain primarily a `page_content` string and a `metadata` dictionary of fields.
"""
docs: list[Document] = Document(page_content="", metadata={})
try:
text_splitter = TokenTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
docs: list[Document] = text_splitter.split_documents(documents)
except Exception as e:
logger.exception(e)
return docs def store_embeddings_from_docs(
documents: list[Document],
opensearch_url: str = settings.opensearch.url,
index_name: str = settings.opensearch.default_index,
embeddings: Optional[Embeddings] = None,
) -> OpenSearchVectorSearch:
"""Sends the text string from loaded documents to a LLaMACPP embeddings.
We then store the embedding in OpenSearch, our vector search engine, and persist it on a file system.
Args:
documents: A list of langchain `Document` objects. These contain primarily a `page_content` string and a `metadata` dictionary of fields.
opensearch_url: connection string for elastic/opensearch.
index_name: index/collection within the backend to store embeddings in. Defaults to settings.opensearch.default_index.
embeddings: optional `langchain.embeddings.base.Embeddings` model to use. Defaults to the model specified in settings.
Returns:
A langchain ElasticVectorSearch `VectorStore` object acting as a wrapper around ElasticVectorSearch embeddings platform.
"""
vectordb: OpenSearchVectorSearch
if not embeddings:
embeddings = LlamaCppEmbeddings(
model_path=os.path.join(settings.models.DIRECTORY, settings.models.EMBEDDING), n_batch=8192
)
try:
vectordb = db(opensearch_url=opensearch_url, index_name=index_name)
vectordb.embedding_function = embeddings
vectordb.add_documents(documents)
except Exception as e:
if isinstance(e, OpenSearchException):
logger.error(f"Error persisting {documents} to OpenSearch.")
logger.exception(e)
vectordb = db(opensearch_url=opensearch_url, index_name=index_name)
return vectordb You'd chain the above together in a cli command. cli.py @app.command()
def index_unstructured_document(
document: str,
chunk_size: int = 100,
chunk_overlap: int = 0,
) -> OpenSearchVectorSearch:
"""Loads an unstructured document from disk, indexes and persists embeddings into an OpenSearch VectorStore.
Commandline wrapper functionality around multiple calls to: Load, Split, Embeddings, Persist
Args:
document: string local path location of the document to load.
chunk_size: integer size of the window we are creating to split the document by. Default: 100
chunk_overlap: integer size of the amount that each chunk should overlap the previous by. Default: 0
Returns:
A langchain OpenSearch `VectorStore` object acting as a wrapper around OpenSearch as a datastore for our embeddings.
"""
logger.info(f"Indexing {document}")
try:
return utils.store_embeddings_from_docs(
documents=utils.split_documents(
documents=utils.load_unstructured_document(document), chunk_size=chunk_size, chunk_overlap=chunk_overlap
),
)
except (IndexError, ValueError) as e: # usually means no embeddings returned or unsupported filetype
logger.error(f"Unable to index {document}. {e}")
pass
except PythonDocxError as e: # usually means it's a weird OneDrive symlink and hasn't been downloaded locally
logger.error(f"Unable to index {document}. {e}")
pass Original response (specifically addressing question on
That is a parameter for the Langchain itself. chain: QAWithSourcesChain = QAWithSourcesChain.from_chain_type(llm=llm, chain_type="stuff") I linked the official docs as well but in the above you’re passing in your |
Following.. I'm experiencing the same issue with GGML models on text-generation_webui. Not entirely the same issue as you guys but this is the only post with the same issue. |
For the past couple of days, I am sweating over a similar issue with my trials to create a clean indexing over a single document by using a chain with from langchain.embeddings import LlamaCppEmbeddings
embeddings = LlamaCppEmbeddings(model_path='./models/gpt4all-lora-quantized.bin')
# read txt file into an array of string chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter
with open("./text_data.txt") as f:
text_data = f.read()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_text(text_data)
# create index
from langchain.vectorstores import Chroma
docsearch = Chroma.from_texts(texts, embeddings)
# get language model for chain
from langchain.llms import LlamaCpp
llm = LlamaCpp(model_path='./models/gpt4all-lora-quantized.bin')
# create chain
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type="stuff")
# query
query = "What is the typical Elasticity Modulus values for hard clay?"
docs = docsearch.similarity_search(query)
chain.run(input_documents=docs, question=query) Above code that works perfectly on Google Collab with OpenAI Embeddings and LLM, but with offline copies of GPT4All or Facebook LLaMA it ends up with I can get an answer, if I replace the chain with something else that does not specify from langchain.chains import RetrievalQA
MIN_DOCS = 1 #more than 1 does results in Requested tokens exceed context window of 512
print(query)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff",
retriever=docsearch.as_retriever(search_kwargs={"k": MIN_DOCS})) It is really bizarre that I can't use publicly available LLMs to create a very simple use case of interrogating a text file. Am I missing something? |
You'll have the same issue - read the responses of mine above (particularly the second section). Either change from As you can see if you pass only a single document it'll work fine. You use of |
Thanks for the response mate. Had already read all here and just forgot to write that I tried Now I am guessing next step as you suggested is to increase the context tokens, to combat Spent days now on this (as a hobbyist after day job hours) and this issue post was the best source of info for this error. Thanks for opening it.... |
Haven't got access currently but it's something that you should be able to change when initialising the model. With Something along the lines of this could be worth a try. llm = LlamaCpp(model_path="models/my-model.bin", n_ctx=2048) |
I get this too. In short I am trying to Q and A with a
I get this error:
... which comes from what looks like sample text from the source code of map_reduce_prompt.py. |
I can report back that with GPT4All even makes up answers with some sort of logical interpolations of the available information rather than trying to find the correct answer by having a better understanding of the text. It is hard to explain but I am fairly impressed now and rather disappointed at the same time. Well I suppose and hope that the future versions of GPT4All would come closer to OpenAI. A side note: using Finally, I highly recommend watching this video by @karpathy, one of the lead AI brains of our time: https://www.youtube.com/watch?v=kCc8FmEb1nY This goes over the nitty-gritty details of the architecture behind GPTs (e.g. what is a token) and training strategies (e.g. chunks) with a very clear examples and even code samples. Here is the final full code for those you are interested in https://github.com/boraoku/jupyter-notebooks/blob/ec52acfba0b7dda75782d56a04b3df2b4bf62b27/GeoAI_Trials_01_GPT4AllmakesUpFormulas_fromBowles.ipynb Cheers! |
I could successfullly run an open source model with LangChain for document retrieval.
However the answer of the model only gets written to the console (via the callback manager) and after a while the model returns answer: ''. See the console output: Anybody got an idea how to fix it? Beside that great work! |
Hi, @darth-veitcher! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale. From what I understand, the issue you reported was related to a Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you for your contribution to LangChain! |
Hi there, getting the following error when attempting to run a
QAWithSourcesChain
using a local GPT4All model. The code works fine with OpenAI but seems to break if I swap in a local LLM model for the response. Embeddings work fine in the VectorStore (using OpenSearch).Exception as below.
From what I can tell the model is struggling to interpret the prompt template that's being passed to it?
The text was updated successfully, but these errors were encountered: