TypeError: expected string or buffer #43

LeonMing30 · 2024-05-31T09:43:46Z

I tried to run demo code for testing, but there is the error.

`
from raptor import RetrievalAugmentation

RA = RetrievalAugmentation()

with open('demo/sample.txt', 'r') as file:
    text = file.read()
RA.add_documents(text)
question = "How did Cinderella reach her happy ending?"
answer = RA.answer_question(question=question)
print("Answer: ", answer)`

Traceback (most recent call last):
  File "D:\Code\Python\20240531\RAPTOR\raptor\demotest.py", line 13, in <module>
    RA.add_documents(text)
  File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\RetrievalAugmentation.py", line 219, in add_documents
    self.tree = self.tree_builder.build_from_text(text=docs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\tree_builder.py", line 291, in build_from_text
    root_nodes = self.construct_tree(all_nodes, all_nodes, layer_to_nodes)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\cluster_tree_builder.py", line 130, in construct_tree
    process_cluster(
  File "D:\Code\Python\20240531\RAPTOR\raptor\raptor\cluster_tree_builder.py", line 77, in process_cluster
    f"Node Texts Length: {len(self.tokenizer.encode(node_texts))}, Summarized Text Length: {len(self.tokenizer.encode(summarized_text))}"
                                                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Code\Python\20240531\RAPTOR\venv\Lib\site-packages\tiktoken\core.py", line 116, in encode
    if match := _special_token_regex(disallowed_special).search(text):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or buffer

How can I fix it?

The text was updated successfully, but these errors were encountered:

parthsarthi03 · 2024-05-31T18:51:15Z

Hey! I am not able to reproduce the above bug. Can you print out the text before RA.add_documents() and also print out RA.tree_builder.summarization_model to make sure that these models are set correctly.

theta-lin · 2024-06-20T17:23:21Z

@LeonMing30 Hi, I encountered the same issue as you before I realized that there's a mistake on my side. I used a custom summarization model whose output is not a simple string but a dictionary containing both the output string and some other metadata. Therefore, I also suggest you try calling the summarize() method of the model you are using and check if the return value is actually the chat output of the LLM.

yyyf-g · 2024-07-14T06:35:07Z

I encountered the same problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: expected string or buffer #43

TypeError: expected string or buffer #43

LeonMing30 commented May 31, 2024 •

edited

Loading

parthsarthi03 commented May 31, 2024

theta-lin commented Jun 20, 2024

yyyf-g commented Jul 14, 2024

TypeError: expected string or buffer #43

TypeError: expected string or buffer #43

Comments

LeonMing30 commented May 31, 2024 • edited Loading

parthsarthi03 commented May 31, 2024

theta-lin commented Jun 20, 2024

yyyf-g commented Jul 14, 2024

LeonMing30 commented May 31, 2024 •

edited

Loading