# RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

This notebook shows how to use an implementation of RAPTOR with llama-index, leveraging the RAPTOR llama-pack.

RAPTOR works by recursively clustering and summarizing clusters in layers for retrieval.

There two retrieval modes:
- tree_traversal -- traversing the tree of clusters, performing top-k at each level in the tree.
- collapsed -- treat the entire tree as a giant pile of nodes, perform simple top-k.

See [the paper](https://arxiv.org/abs/2401.18059) for full algorithm details.

## Setup

In [30]:
!pip install llama-index llama-index-packs-raptor llama-index-vector-stores-qdrant chromadb  llama-index-vector-stores-chroma

Defaulting to user installation because normal site-packages is not writeable
Collecting llama-index-vector-stores-chroma
  Downloading llama_index_vector_stores_chroma-0.1.8-py3-none-any.whl (4.8 kB)
Installing collected packages: llama-index-vector-stores-chroma
Successfully installed llama-index-vector-stores-chroma-0.1.8


In [5]:
from llama_index.packs.raptor import RaptorPack

# optionally download the pack to inspect/modify it yourself!
# from llama_index.core.llama_pack import download_llama_pack
# RaptorPack = download_llama_pack("RaptorPack", "./raptor_pack")

2024-05-28 16:26:06.666639: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-28 16:26:06.666672: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-28 16:26:06.667527: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-28 16:26:06.675749: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [1]:
import os
# cant push this line as git prevents it 
#os.environ["OPENAI_API_KEY"] = ""

## Constructing the Clusters/Hierarchy Tree

In [2]:
import nest_asyncio

nest_asyncio.apply()

In [3]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader(input_files=["./Form Master Services Agreement (Outsourcing).DOCX"]).load_data()

In [6]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore # type: ignore
import chromadb

client = chromadb.PersistentClient(path="./raptor_paper_db")
collection = client.get_or_create_collection("raptor")

vector_store = ChromaVectorStore(chroma_collection=collection)

raptor_pack = RaptorPack(
    documents,
    embed_model=OpenAIEmbedding(
        model="text-embedding-3-small"
    ),  # used for embedding clusters
    llm=OpenAI(model="gpt-4", temperature=0.1),  # used for generating summaries
    vector_store=vector_store,  # used for storage
    similarity_top_k=4,  # top k for each layer, or overall top-k for collapsed
    mode="collapsed",  # sets default mode
    transformations=[
        SentenceSplitter(chunk_size=400, chunk_overlap=50)
    ],  # transformations applied for ingestion
)

Generating embeddings for level 0.
inside line 227
inside line 229
Performing clustering for level 0.


Exception ignored on calling ctypes callback function: <function ExecutionEngine._raw_object_cache_notify at 0x71077b9d36d0>
Traceback (most recent call last):
  File "/home/pulkit/.local/lib/python3.10/site-packages/llvmlite/binding/executionengine.py", line 171, in _raw_object_cache_notify
    def _raw_object_cache_notify(self, data):
KeyboardInterrupt: 


KeyboardInterrupt: 

## Retrieval

In [None]:
nodes = raptor_pack.run("What are the intellectual property rights of the vendor?", mode="collapsed")
print(len(nodes))
print(nodes[0].text)

2
The agreement outlines the terms related to changes in charges, intellectual property rights, ownership of work products, disclosure of inventions, and software provisions between the Provider and the Client. It specifies that any changes in charges must be agreed upon and adjusted accordingly. The agreement also addresses the ownership of software, work products, and modifications, with the Client retaining all rights and interests. It further discusses the use of Provider Proprietary Materials and the disclosure of inventions made during the agreement. Additionally, it covers the installation of upgrades and modifications to software, as well as the granting of rights and licenses related to Provider Patents within specific industries.


NameError: name 'raptor_pack' is not defined

In [35]:
nodes = raptor_pack.run(
    "What are the intellectual property rights of the vendor?", mode="tree_traversal"
)
print(len(nodes))
print(nodes[0].text)

Retrieved parent IDs from level 2: ['0f0c386b-12f5-460c-9fb2-5d65b0024c7c']
Retrieved 2 from parents at level 2.
Retrieved parent IDs from level 1: ['750e2166-637d-4a6a-8b6c-86b69ba6ad97', 'd6ead990-a89c-420d-b1bb-8108516f4b8f']
Retrieved 4 from parents at level 1.
Retrieved parent IDs from level 0: ['a9cc1f53-70ae-4e05-bfb5-e409ccd914f6', '637bef95-1443-4125-8ae0-4a7793896c0a']
Retrieved 4 from parents at level 0.
4
The territorial extent of the rights in the Work Product assigned to [XXXXXXXX] by Provider and/or the Provider Personnel under this Agreement shall extend to all the countries in the world.  The assignment of the Intellectual Property Rights in the Work Product by Provider and/or the Provider Personnel to [XXXXXXXX] shall be royalty-free absolute, irrevocable and perpetual.  

With respect to any Services performed in India, the Parties agree that, without limitation of any other [XXXXXXXX] rights or remedies under the Agreement, the following provisions shall apply: (i) 

## Multiquery

Since we saved to a vector store, we can also use it again! (For local vector stores, there is a `persist` and `from_persist_dir` method on the retriever)

In [19]:
from llama_index.packs.raptor import RaptorRetriever
from llama_index.core import QueryBundle
from llama_index.llms.openai import OpenAI
from llama_index.core import PromptTemplate
from llama_index.core.llms.utils import LLMType
from llama_index.core.schema import NodeWithScore
from llama_index.core.indices.base import BaseIndex
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.embeddings.utils import EmbedType
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.legacy.vector_stores import ChromaVectorStore
from llama_index.core.base.llms.types import CompletionResponse
from llama_index.core.node_parser import SentenceWindowNodeParser

from llama_index.core.retrievers import BaseRetriever
from llama_index.core import get_response_synthesizer
from llama_index.legacy.core.response.schema import RESPONSE_TYPE
from llama_index.core.response_synthesizers import BaseSynthesizer
from llama_index.core.query_engine import CustomQueryEngine, BaseQueryEngine
from typing import Any, List
from tqdm.asyncio import tqdm

class MultiQueriesRetriever(BaseRetriever):
    def __init__(self, base_retriever: BaseRetriever, model:OpenAI):
        self.template = PromptTemplate("""You are an AI language model assistant. Your task is to generate Five
    different versions of the given user question to retrieve relevant documents from a vector
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search.
    Provide these alternative questions seperated by newlines.
    Original question: {question}""")
        self._retrievers = [base_retriever]
        self.base_retriever = base_retriever
        self.model = model
    
    def gen_queries(self, query) -> List[str]:
        gen_queries_model = OpenAI(model="gpt-3-turbo", temperature=1.5)
        prompt = self.template.format(question=query)
        res = self.model.complete(prompt)
        return res.text.split("\n")

    async def run_gen_queries(self,generated_queries: List[str]) -> List[NodeWithScore]:
        tasks = list(map(lambda q: self.base_retriever.aretrieve(q), generated_queries)) 
        res = await tqdm.gather(*tasks)
        return res[0]

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        return list()

    async def _aretrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        query = query_bundle.query_str
        generated_queries = self.gen_queries(query)
        query_res = await self.run_gen_queries(generated_queries)
        return query_res

retriever = RaptorRetriever(
    [],
    embed_model=OpenAIEmbedding(
        model="text-embedding-3-small"
    ),  # used for embedding clusters
    llm=OpenAI(model="gpt-4", temperature=0.1),  # used for generating summaries
    vector_store=vector_store,  # used for storage
    similarity_top_k=8,  # top k for each layer, or overall top-k for collapsed
    mode="collapsed",  # sets default mode
)
mr = MultiQueriesRetriever(retriever, OpenAI(model="gpt-4", temperature=0.1))


In [None]:
# if using a default vector store
# retriever.persist("./persist")
# retriever = RaptorRetriever.from_persist_dir("./persist", ...)

## Query Engine

In [37]:
from rich.pretty import pprint

def pretty_print(title: str=None, content: Any=None):
    if title is None:
        print(content)
        return
    print(title)
    pprint(content)

ls = mr.gen_queries("Explain the responsibilities of the vendor for compliance .")


pretty_print("ls", ls)
rls = await mr.run_gen_queries(ls)
pretty_print("rls", rls)
     


ls


  0%|          | 0/5 [00:00<?, ?it/s]

inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229


100%|██████████| 5/5 [00:00<00:00,  7.73it/s]

rls





In [11]:
from llama_index.core.query_engine import RetrieverQueryEngine

# query_engine = RetrieverQueryEngine.from_args(
#     mr, llm=OpenAI(model="gpt-4", temperature=0.1)
# )

In [38]:
query_text = """4. Explain the responsibilities of the vendor for compliance ."""
final_res = await RetrieverQueryEngine(mr).aquery(query_text)


inside line 227
inside line 229
inside line 227
inside line 229


  0%|          | 0/5 [00:00<?, ?it/s]

inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229


100%|██████████| 5/5 [00:00<00:00,  7.84it/s]


inside line 227
inside line 229
inside line 227
inside line 229
inside line 227
inside line 229


In [39]:
final_res.response

'The vendor is responsible for ensuring compliance with all applicable laws, including federal "anti-kickback" acts and the U.S. Foreign Corrupt Practices Act. They must promptly notify the client of any charges of noncompliance and remedy the situation at their own cost. The vendor is also accountable for any fines or penalties resulting from noncompliance. Additionally, the vendor must maintain all necessary controls, operations, and systems to enable the client to comply with its obligations.'

In [12]:
# response = query_engine.query("what are intellectual property rights of vendor?")

inside line 47 <llama_index.core.query_engine.retriever_query_engine.RetrieverQueryEngine object at 0x7107199ae290>
inside line 227
inside line 229
line 51
line 53
line 55
line 189
line 144
inside line 227
inside line 229
line 228
line 235
line 240
line 246
line 252
line 259
line 146
line 191
inside line 227
inside line 229
line 196
line 198 response Empty Response
line 199
line 58
line 60


In [13]:
# print(str(response))

Empty Response
