# RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

In [None]:
# NOTE: An OpenAI API key must be set here for application initialization, even if not in use.
# If you're not utilizing OpenAI models, assign a placeholder string (e.g., "not_used").
import os
os.environ["OPENAI_API_KEY"] = "not_used"

In [None]:
# Load sample text file
with open('/home/ubuntu/workspace/juni/2024-flagship-llm/data/raw/박정희.txt', 'r') as file:
    document = file.read().split("\n\n")
# print(document[:2])

1) **Building**: RAPTOR recursively embeds, clusters, and summarizes chunks of text to construct a tree with varying levels of summarization from the bottom up. You can create a tree from the text in 'sample.txt' using `RA.add_documents(text)`.

2) **Querying**: At inference time, the RAPTOR model retrieves information from this tree, integrating data across lengthy documents at different abstraction levels. You can perform queries on the tree with `RA.answer_question`.

## Building the tree

In [None]:
from raptor import RetrievalAugmentation, RetrievalAugmentationConfig
from raptor.SummarizationModels import ClovaSummarizationModel
from raptor.QAModels import ClovaQAModel
from raptor.EmbeddingModels import BGEM3EmbeddingModel

# Initialize your custom models
custom_summarizer = ClovaSummarizationModel()
custom_qa = ClovaQAModel()
custom_embedding = BGEM3EmbeddingModel()

# Create a config with your custom models
custom_config = RetrievalAugmentationConfig(
    summarization_model=custom_summarizer,
    qa_model=custom_qa,
    embedding_model=custom_embedding
)

In [None]:
# Initialize RAPTOR with your custom config
RA = RetrievalAugmentation(config=custom_config)

In [None]:
# construct the tree
RA.add_documents(document, use_multithreading=True)

## Querying from the tree

### Without RAPTOR (Only Layer 0 Nodes)

In [None]:
question = "박정희 정권에서 추진된 국가주도의 경제발전에서 드러난 '정치경제적 국가주의'에 대해서 설명해줘"
answer = RA.answer_question(question=question, start_layer=0, num_layers=1, collapse_tree=False, reranking=False)
print(answer)

### With RAPTOR

In [None]:
answer = RA.answer_question(question=question, reranking=False)
print(answer)

## Saving the tree

In [None]:
# Save the tree by calling RA.save("path/to/save")
SAVE_PATH = "demo/pjh-bge"
RA.save(SAVE_PATH)

## Loading the saved tree

In [None]:
# load back the tree by passing it into RetrievalAugmentation
SAVE_PATH = "demo/pjh-bge"
RA = RetrievalAugmentation(tree=SAVE_PATH, config=custom_config)