You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to run demo code for testing, but there is the error.
`
from raptor import RetrievalAugmentation
RA = RetrievalAugmentation()
with open('demo/sample.txt', 'r') as file:
text = file.read()
RA.add_documents(text)
question = "How did Cinderella reach her happy ending?"
answer = RA.answer_question(question=question)
print("Answer: ", answer)`
Traceback (most recent call last):
File "D:\Code\Python\20240531\RAPTOR\raptor\demotest.py", line 13, in <module>
RA.add_documents(text)
File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\RetrievalAugmentation.py", line 219, in add_documents
self.tree = self.tree_builder.build_from_text(text=docs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\tree_builder.py", line 291, in build_from_text
root_nodes = self.construct_tree(all_nodes, all_nodes, layer_to_nodes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\cluster_tree_builder.py", line 130, in construct_tree
process_cluster(
File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\cluster_tree_builder.py", line 77, in process_cluster
f"Node Texts Length: {len(self.tokenizer.encode(node_texts))}, Summarized Text Length: {len(self.tokenizer.encode(summarized_text))}"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Code\Python\20240531\RAPTOR\venv\Lib\site-packages\tiktoken\core.py", line 116, in encode
if match := _special_token_regex(disallowed_special).search(text):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or buffer
How can I fix it?
The text was updated successfully, but these errors were encountered:
Hey! I am not able to reproduce the above bug. Can you print out the text before RA.add_documents() and also print out RA.tree_builder.summarization_model to make sure that these models are set correctly.
@LeonMing30 Hi, I encountered the same issue as you before I realized that there's a mistake on my side. I used a custom summarization model whose output is not a simple string but a dictionary containing both the output string and some other metadata. Therefore, I also suggest you try calling the summarize() method of the model you are using and check if the return value is actually the chat output of the LLM.
I tried to run demo code for testing, but there is the error.
How can I fix it?
The text was updated successfully, but these errors were encountered: