# Retrieval

In baseline RAG, retrieval is usually done simply by using vector search. However, as the field has developed, researchers have discovered multiple ways to enhance the retrieval process. This notebook will cover the following retrieval methods: 

- **Iterative Retrieval** - Use the reasoning capabilities of LLMs to perform iterative retrieval-generation cycles until passing an evaluation step.
- **Recursive-Retrieval** - Search for smaller documents and use chunk references or metadata references to retrieve the full document.
- **Generator-Enhanced Retrieval** - Using a LLM to predict when and what to retrieve across generation with a process called [**F**orward-**L**ooking **A**ctive **RE**trieval augmented
generation (FLARE)](https://arxiv.org/pdf/2305.06983)
- **GraphRAG** - Using existing or LLM-generated knowledge graphs to enhance retrieval by harnessing the relationships between entities.

### Setup libraries and environment

In [None]:
%pip install python-dotenv
%pip install llama-index==0.10.33
%pip install llama-index-llms-openai==0.1.16
%pip install llama-index-graph-stores-nebula==0.1.2
%pip install llama-index-readers-wikipedia==0.1.4
%pip install llama-index-readers-papers=0.1.4
%pip install llama-index-readers-web==0.1.12

In [None]:
import os
import copy
import json

from dotenv import load_dotenv
from util.helpers import (
    get_malazan_pages,
    create_and_save_md_files,
)

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding, OpenAIEmbeddingModelType
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    download_loader,
    KnowledgeGraphIndex,
    PromptTemplate,
)
from llama_index.core.query_engine import (
    FLAREInstructQueryEngine,
    RetrieverQueryEngine,
    RetryQueryEngine,
    RetryGuidelineQueryEngine,
)

from llama_index.readers.web import SimpleWebPageReader
from llama_index.readers.papers import ArxivReader
from llama_index.readers.wikipedia import WikipediaReader
from llama_index.graph_stores.nebula import NebulaGraphStore
from llama_index.readers.wikipedia import WikipediaReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import IndexNode
from llama_index.core.extractors import (
    SummaryExtractor,
    QuestionsAnsweredExtractor,
)
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.evaluation import (
    RelevancyEvaluator,
    GuidelineEvaluator,
)

In [None]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio
nest_asyncio.apply()

In [None]:
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
llm = OpenAI(api_key=OPENAI_API_KEY, model="gpt-4-turbo")
embed_model = OpenAIEmbedding(api_key=OPENAI_API_KEY, model=OpenAIEmbeddingModelType.TEXT_EMBED_3_SMALL)

Read papers from arXiv as documents

In [None]:
reader = ArxivReader()
reader.load_data

papers = ["2404.10981", "2305.06983", "2312.10997"]
papers_dir = "./data/docs/arxiv"

arxiv_res = [reader.load_papers_and_abstracts(search_query=f"id:{paper}", max_results=1, papers_dir=papers_dir) for paper in papers]
documents = [doc for sublist in [d for (d, _) in arxiv_res] for doc in sublist]
abstracts = [a[0] for (_, a) in arxiv_res]


In [None]:
papers_index = VectorStoreIndex.from_documents(documents=documents, show_progress=True)

## Iterative Retrieval
In **Iterative Retrieval**, the key part of the process is the evaluator or *judge* that helps self-correct the retrieval process. 

The pipeline performs the following steps:
1. first queries the base query engine, then
2. use the evaluator to decided if the response passes.
3. If the response passes, then return response,
4. Otherwise, transform the original query with the evaluation result (query, response, and feedback) into a new query,
5. Repeat up to max_retries

There's different types of evaluators that can be used, depending on the use-case. Sometimes it might be useful to evaluate the answer specifically, or the context. Other times you might need to create some guidelines for the LLM to use in order to evaluate whether to perform another retrieval step or not.

In [None]:
## Prompt of RelevancyEvaluator
DEFAULT_EVAL_TEMPLATE = PromptTemplate(
    "Your task is to evaluate if the response for the query \
    is in line with the context information provided.\n"
    "You have two options to answer. Either YES/ NO.\n"
    "Answer - YES, if the response for the query \
    is in line with context information otherwise NO.\n"
    "Query and Response: \n {query_str}\n"
    "Context: \n {context_str}\n"
    "Answer: "
)

DEFAULT_REFINE_TEMPLATE = PromptTemplate(
    "We want to understand if the following query and response is"
    "in line with the context information: \n {query_str}\n"
    "We have provided an existing YES/NO answer: \n {existing_answer}\n"
    "We have the opportunity to refine the existing answer "
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{context_msg}\n"
    "------------\n"
    "If the existing answer was already YES, still answer YES. "
    "If the information is present in the new context, answer YES. "
    "Otherwise answer NO.\n"
)

In [None]:
evaluator = RelevancyEvaluator(llm=llm)
query_engine = RetryQueryEngine(query_engine=papers_index.as_query_engine(), evaluator=evaluator)


In [None]:
evaluator = GuidelineEvaluator(llm=llm, guidelines="List elements of the answer as bullets")
query_engine = RetryGuidelineQueryEngine(query_engine=papers_index.as_query_engine(), resynthesize_query=True, guideline_evaluator=evaluator)

In [None]:
query = "What are the phases of Advanced RAG?"
response = query_engine.query(str_or_query_bundle=query)


In [None]:
print(response.metadata)
print(str(response))


## Recursive-Retrieval

In **Recursive-Retrieval**, we search to relevant documents using references between smaller documents and their related larger documents. This enables us to optimize the narrowing the search space for the first retrieval, and then recursively increasing the context using the references.

We distinguish between two types of recursive-retrieval:
1. **Chunk-References** - References between smaller and larger documents.
2. **Metadata-References** - References between metadata like summaries or generated questions to documents.

### Chunk References

In [None]:
reader = WikipediaReader()
wiki_documents = reader.load_data(pages=["Vincent Van Gogh"])
wiki_documents

In [None]:

node_parser = SentenceSplitter(chunk_size=1024)
base_nodes = node_parser.get_nodes_from_documents(documents=wiki_documents, show_progress=True)
# set node ids to be a constant
for idx, node in enumerate(base_nodes):
    node.id_ = f"node-{idx}"

In [None]:
sub_chunk_sizes = [256, 512]
sub_node_parsers = [
    SentenceSplitter(chunk_size=c, chunk_overlap=20) for c in sub_chunk_sizes
]

all_nodes = []
for base_node in base_nodes:
    for n in sub_node_parsers:
        sub_nodes = n.get_nodes_from_documents([base_node])
        sub_inodes = [
            IndexNode.from_text_node(sn, base_node.node_id) for sn in sub_nodes
        ]
        all_nodes.extend(sub_inodes)

    # also add original node to node
    original_node = IndexNode.from_text_node(base_node, base_node.node_id)
    all_nodes.append(original_node)

To test the effectiveness of chunk references go to "**Test recursive-retrieval engine**" section.

### Metadata References

In [None]:
extractors = [
    SummaryExtractor(summaries=["self"], llm=OpenAI(api_key=OPENAI_API_KEY, model="gpt-3.5-turbo"), show_progress=True),
    QuestionsAnsweredExtractor(questions=5, llm=OpenAI(api_key=OPENAI_API_KEY, model="gpt-3.5-turbo"), show_progress=True),
]
# run metadata extractor across base nodes, get back dictionaries
node_to_metadata = {}
for extractor in extractors:
    metadata_dicts = extractor.extract(base_nodes)
    for node, metadata in zip(base_nodes, metadata_dicts):
        if node.node_id not in node_to_metadata:
            node_to_metadata[node.node_id] = metadata
        else:
            node_to_metadata[node.node_id].update(metadata)

In [None]:
def save_metadata_dicts(path, data):
    with open(path, "w") as fp:
        json.dump(data, fp)


def load_metadata_dicts(path):
    with open(path, "r") as fp:
        data = json.load(fp)
    return data


In [None]:

path = "./data/retrieval"
if not os.path.exists(path):
    print("Creating directory: ", path)
    os.makedirs(path)

save_metadata_dicts(path + "/metadata_dicts.json", node_to_metadata)


In [None]:
metadata_dicts = load_metadata_dicts("./data/retrieval/metadata_dicts.json")

In [None]:
all_nodes = copy.deepcopy(base_nodes)
for node_id, metadata in node_to_metadata.items():
    print(node_id)
    for val in metadata.values():
        all_nodes.append(IndexNode(text=val, index_id=node_id))


#### Test recursive-retrieval engine

In [None]:

all_nodes_dict = {n.node_id: n for n in all_nodes}
recursive_index = VectorStoreIndex(all_nodes, embed_model=embed_model, show_progress=True)
retriever = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": recursive_index.as_retriever(similarity_top_k=2)},
    node_dict=all_nodes_dict,
    verbose=True,
)
recursive_query_engine = RetrieverQueryEngine.from_args(retriever=retriever, llm=llm)

In [None]:
response = recursive_query_engine.query("What are the phases of Advanced RAG?")
print(response.metadata)
print(str(response))

## Generator-Enhanced Retrieval
TODO

## GraphRAG
TODO