Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: expected string or buffer #43

Open
LeonMing30 opened this issue May 31, 2024 · 3 comments
Open

TypeError: expected string or buffer #43

LeonMing30 opened this issue May 31, 2024 · 3 comments

Comments

@LeonMing30
Copy link

LeonMing30 commented May 31, 2024

I tried to run demo code for testing, but there is the error.

`
from raptor import RetrievalAugmentation

RA = RetrievalAugmentation()

with open('demo/sample.txt', 'r') as file:
    text = file.read()
RA.add_documents(text)
question = "How did Cinderella reach her happy ending?"
answer = RA.answer_question(question=question)
print("Answer: ", answer)`
Traceback (most recent call last):
  File "D:\Code\Python\20240531\RAPTOR\raptor\demotest.py", line 13, in <module>
    RA.add_documents(text)
  File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\RetrievalAugmentation.py", line 219, in add_documents
    self.tree = self.tree_builder.build_from_text(text=docs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\tree_builder.py", line 291, in build_from_text
    root_nodes = self.construct_tree(all_nodes, all_nodes, layer_to_nodes)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\cluster_tree_builder.py", line 130, in construct_tree
    process_cluster(
  File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\cluster_tree_builder.py", line 77, in process_cluster
    f"Node Texts Length: {len(self.tokenizer.encode(node_texts))}, Summarized Text Length: {len(self.tokenizer.encode(summarized_text))}"
                                                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Code\Python\20240531\RAPTOR\venv\Lib\site-packages\tiktoken\core.py", line 116, in encode
    if match := _special_token_regex(disallowed_special).search(text):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or buffer

How can I fix it?

@parthsarthi03
Copy link
Owner

Hey! I am not able to reproduce the above bug. Can you print out the text before RA.add_documents() and also print out RA.tree_builder.summarization_model to make sure that these models are set correctly.

@theta-lin
Copy link

@LeonMing30 Hi, I encountered the same issue as you before I realized that there's a mistake on my side. I used a custom summarization model whose output is not a simple string but a dictionary containing both the output string and some other metadata. Therefore, I also suggest you try calling the summarize() method of the model you are using and check if the return value is actually the chat output of the LLM.

@yyyf-g
Copy link

yyyf-g commented Jul 14, 2024

I encountered the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants