# Retrieval

In baseline RAG, retrieval is usually done simply by using vector search. However, as the field has developed, researchers have discovered multiple ways to enhance the retrieval process. This notebook will cover the following retrieval methods: 

- **Iterative Retrieval** - Use the reasoning capabilities of LLMs to perform iterative retrieval-generation cycles until passing an evaluation step.
- **Recursive-Retrieval** - Search for smaller documents and use chunk references or metadata references to retrieve the full document.
- **Generator-Enhanced Retrieval** - Using a LLM to predict when and what to retrieve across generation with a process called [**F**orward-**L**ooking **A**ctive **RE**trieval augmented
generation (FLARE)](https://arxiv.org/pdf/2305.06983)
- **GraphRAG** - Using existing or LLM-generated knowledge graphs to enhance retrieval by harnessing the relationships between entities.

### Setup libraries and environment

In [None]:
%pip install python-dotenv  ipython-ngql llama-index==0.10.33 llama-index-llms-openai==0.1.16 llama-index-readers-wikipedia==0.1.4 llama-index-readers-papers==0.1.4 llama-index-readers-web==0.1.12 llama-index-graph-stores-nebula==0.1.2

In [None]:
import os
import copy
import json


from IPython.display import Markdown, display

from util.helpers import (
    get_malazan_pages,
    create_and_save_md_files,
)

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding, OpenAIEmbeddingModelType
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    KnowledgeGraphIndex,
    PromptTemplate,
    StorageContext,
    Settings,
)
from llama_index.core.query_engine import (
    FLAREInstructQueryEngine,
    RetrieverQueryEngine,
    RetryQueryEngine,
    RetryGuidelineQueryEngine,
    KnowledgeGraphQueryEngine,
)
from llama_index.core.prompts import PromptType

from llama_index.legacy.query_engine import KnowledgeGraphQueryEngine as LegacyKnowledgeGraphQueryEngine

from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

from llama_index.readers.web import SimpleWebPageReader
from llama_index.readers.papers import ArxivReader
from llama_index.readers.wikipedia import WikipediaReader
from llama_index.graph_stores.nebula import NebulaGraphStore
from llama_index.readers.wikipedia import WikipediaReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import IndexNode
from llama_index.core.extractors import (
    SummaryExtractor,
    QuestionsAnsweredExtractor,
)
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.evaluation import (
    RelevancyEvaluator,
    GuidelineEvaluator,
)

In [None]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio
nest_asyncio.apply()

In [None]:
from dotenv import load_dotenv

load_dotenv()


In [None]:
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
llm = OpenAI(api_key=OPENAI_API_KEY, model="gpt-4-turbo")
embed_model = OpenAIEmbedding(api_key=OPENAI_API_KEY, model=OpenAIEmbeddingModelType.TEXT_EMBED_3_SMALL)

Read papers from arXiv as documents

In [None]:
reader = ArxivReader()
reader.load_data

papers = ["2404.10981", "2305.06983", "2312.10997"]
papers_dir = "./data/docs/arxiv"

arxiv_res = [reader.load_papers_and_abstracts(search_query=f"id:{paper}", max_results=1, papers_dir=papers_dir) for paper in papers]
documents = [doc for sublist in [d for (d, _) in arxiv_res] for doc in sublist]
abstracts = [a[0] for (_, a) in arxiv_res]


In [None]:
papers_index = VectorStoreIndex.from_documents(documents=documents, show_progress=True)

## Iterative Retrieval
In **Iterative Retrieval**, the key part of the process is the evaluator or *judge* that helps self-correct the retrieval process. 

The pipeline performs the following steps:
1. first queries the base query engine, then
2. use the evaluator to decided if the response passes.
3. If the response passes, then return response,
4. Otherwise, transform the original query with the evaluation result (query, response, and feedback) into a new query,
5. Repeat up to max_retries

There's different types of evaluators that can be used, depending on the use-case. Sometimes it might be useful to evaluate the answer specifically, or the context. Other times you might need to create some guidelines for the LLM to use in order to evaluate whether to perform another retrieval step or not.

In [None]:
## Prompt of RelevancyEvaluator
DEFAULT_EVAL_TEMPLATE = PromptTemplate(
    "Your task is to evaluate if the response for the query \
    is in line with the context information provided.\n"
    "You have two options to answer. Either YES/ NO.\n"
    "Answer - YES, if the response for the query \
    is in line with context information otherwise NO.\n"
    "Query and Response: \n {query_str}\n"
    "Context: \n {context_str}\n"
    "Answer: "
)

DEFAULT_REFINE_TEMPLATE = PromptTemplate(
    "We want to understand if the following query and response is"
    "in line with the context information: \n {query_str}\n"
    "We have provided an existing YES/NO answer: \n {existing_answer}\n"
    "We have the opportunity to refine the existing answer "
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{context_msg}\n"
    "------------\n"
    "If the existing answer was already YES, still answer YES. "
    "If the information is present in the new context, answer YES. "
    "Otherwise answer NO.\n"
)

In [None]:
evaluator = RelevancyEvaluator(llm=llm)
query_engine = RetryQueryEngine(query_engine=papers_index.as_query_engine(), evaluator=evaluator)


In [None]:
evaluator = GuidelineEvaluator(llm=llm, guidelines="List elements of the answer as bullets")
query_engine = RetryGuidelineQueryEngine(query_engine=papers_index.as_query_engine(), resynthesize_query=True, guideline_evaluator=evaluator)

In [None]:
query = "What are the phases of Advanced RAG?"
response = query_engine.query(str_or_query_bundle=query)


In [None]:
print(response.metadata)
print(str(response))


## Recursive-Retrieval

In **Recursive-Retrieval**, we search to relevant documents using references between smaller documents and their related larger documents. This enables us to optimize the narrowing the search space for the first retrieval, and then recursively increasing the context using the references.

We distinguish between two types of recursive-retrieval:
1. **Chunk-References** - References between smaller and larger documents.
2. **Metadata-References** - References between metadata like summaries or generated questions to documents.

### Chunk References

In [None]:
reader = WikipediaReader()
wiki_documents = reader.load_data(pages=["Vincent Van Gogh"])
wiki_documents

In [None]:

node_parser = SentenceSplitter(chunk_size=1024)
base_nodes = node_parser.get_nodes_from_documents(documents=wiki_documents, show_progress=True)
# set node ids to be a constant
for idx, node in enumerate(base_nodes):
    node.id_ = f"node-{idx}"

In [None]:
sub_chunk_sizes = [256, 512]
sub_node_parsers = [
    SentenceSplitter(chunk_size=c, chunk_overlap=20) for c in sub_chunk_sizes
]

all_nodes = []
for base_node in base_nodes:
    for n in sub_node_parsers:
        sub_nodes = n.get_nodes_from_documents([base_node])
        sub_inodes = [
            IndexNode.from_text_node(sn, base_node.node_id) for sn in sub_nodes
        ]
        all_nodes.extend(sub_inodes)

    # also add original node to node
    original_node = IndexNode.from_text_node(base_node, base_node.node_id)
    all_nodes.append(original_node)

To test the effectiveness of chunk references go to "**Test recursive-retrieval engine**" section.

### Metadata References

In [None]:
extractors = [
    SummaryExtractor(summaries=["self"], llm=OpenAI(api_key=OPENAI_API_KEY, model="gpt-3.5-turbo"), show_progress=True),
    QuestionsAnsweredExtractor(questions=5, llm=OpenAI(api_key=OPENAI_API_KEY, model="gpt-3.5-turbo"), show_progress=True),
]
# run metadata extractor across base nodes, get back dictionaries
node_to_metadata = {}
for extractor in extractors:
    metadata_dicts = extractor.extract(base_nodes)
    for node, metadata in zip(base_nodes, metadata_dicts):
        if node.node_id not in node_to_metadata:
            node_to_metadata[node.node_id] = metadata
        else:
            node_to_metadata[node.node_id].update(metadata)

In [None]:
def save_metadata_dicts(path, data):
    with open(path, "w") as fp:
        json.dump(data, fp)


def load_metadata_dicts(path):
    with open(path, "r") as fp:
        data = json.load(fp)
    return data


In [None]:

path = "./data/retrieval"
if not os.path.exists(path):
    print("Creating directory: ", path)
    os.makedirs(path)

save_metadata_dicts(path + "/metadata_dicts.json", node_to_metadata)


In [None]:
metadata_dicts = load_metadata_dicts("./data/retrieval/metadata_dicts.json")

In [None]:
all_nodes = copy.deepcopy(base_nodes)
for node_id, metadata in node_to_metadata.items():
    print(node_id)
    for val in metadata.values():
        all_nodes.append(IndexNode(text=val, index_id=node_id))


#### Test recursive-retrieval engine

In [None]:

all_nodes_dict = {n.node_id: n for n in all_nodes}
recursive_index = VectorStoreIndex(all_nodes, embed_model=embed_model, show_progress=True)
retriever = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": recursive_index.as_retriever(similarity_top_k=2)},
    node_dict=all_nodes_dict,
    verbose=True,
)
recursive_query_engine = RetrieverQueryEngine.from_args(retriever=retriever, llm=llm)

In [None]:
response = recursive_query_engine.query("What are the phases of Advanced RAG?")
print(response.metadata)
print(str(response))

## Generator-Enhanced Retrieval
TODO

## GraphRAG
TODO

If you don't already have NebulaGraph running locally, you can use [nebula-up](https://github.com/wey-gu/nebula-up) to start it up.

Run the following command in your terminal (On Windows you should use WSL):

```bash
curl -fsSL nebula-up.siwei.io/install.sh | bash
```



If your NebulaGraph is started with nebula-up, you can add the following to your .env file:

```
NEBULA_USER=root
NEBULA_PASSWORD=nebula
NEBULA_HOST=localhost
NEBULA_PORT=9669
```

Or set with your own configurations for NebulaGraph.

In [None]:
%load_ext ngql


In [None]:
load_dotenv(override=True)
GRAPHD_HOST = os.getenv("NEBULA_HOST")
GRAPHD_PORT = os.getenv("NEBULA_PORT")
NEBULA_PASSWORD = os.getenv("NEBULA_PASSWORD")
NEBULA_USER = os.getenv("NEBULA_USER")
NEBULA_ADDRESS = f"{GRAPHD_HOST}:{GRAPHD_PORT}"
os.environ["NEBULA_ADDRESS"] = NEBULA_ADDRESS

In [None]:
connection_string = f"--address {GRAPHD_HOST} --port {GRAPHD_PORT} --user {NEBULA_USER} --password {NEBULA_PASSWORD}"
%ngql {connection_string}

Create new knowledge graph (**Graph Space** in Nebula terms) to use for retrieval. We'll create an entity-relationship graph with the following schema:
```
[entity:tag] - [relationship:edge] -> [entity:tag]
```

This will allow the LLM to automatically generate our knowledge graph with arbitrary relationships like:
```
[entity:David] - [relationship:has studied computer science at] -> [entity:Aarhus University]
[entity:David] - [relationship:currently works as a software pilot at] -> [entity:Trifork A/S]
```

We also generate an index for the name

In [None]:
%ngql CREATE SPACE IF NOT EXISTS graph_rag(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
%ngql SHOW SPACES

In [None]:
%%ngql
USE graph_rag;
CREATE TAG IF NOT EXISTS entity(name string);
CREATE EDGE IF NOT EXISTS relationship(relationship string);
CREATE TAG INDEX IF NOT EXISTS entity_index ON entity(name(256));

In [None]:
#%ngql USE rag_workshop; CLEAR SPACE rag_workshop; # clean graph space

Load documents we want to use

In [None]:
pages = get_malazan_pages(articles=["Anomander Rake"])
docs_path = "./data/docs/graph_rag"
create_and_save_md_files(pages, path=docs_path + "/")

documents = SimpleDirectoryReader(input_dir=docs_path).load_data()

Now we can use LlamaIndex's `KnowledgeGraphIndex` to index the graph and use it for retrieval. It uses the following default prompt to generate triplets for the graph:

   ```
    Some text is provided below. Given the text, extract up to 
    {max_knowledge_triplets} 
    knowledge triplets in the form of (subject, predicate, object). Avoid stopwords.
    ---------------------
    Example:
    Text: Alice is Bob's mother.
    Triplets:
    (Alice, is mother of, Bob)
    Text: Philz is a coffee shop founded in Berkeley in 1982.
    Triplets:
    (Philz, is, coffee shop)
    (Philz, founded in, Berkeley)
    (Philz, founded in, 1982)
    ---------------------
    Text: {text}
    Triplets:
```

In [None]:
os.environ["NEBULA_ADDRESS"] = NEBULA_ADDRESS
os.environ["NEBULA_PASSWORD"] = NEBULA_PASSWORD
os.environ["NEBULA_USER"] = NEBULA_USER

space_name = "graph_rag"
edge_types, rel_prop_names = ["relationship"], ["relationship"]
tags = ["entity"]

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)

Settings.llm = OpenAI(api_key=OPENAI_API_KEY, model="gpt-4-turbo")
Settings.embed_model = OpenAIEmbedding(api_key=OPENAI_API_KEY, model=OpenAIEmbeddingModelType.TEXT_EMBED_3_SMALL)
Settings.chunk_size = 512
storage_context = StorageContext.from_defaults(graph_store=graph_store)

This may take a little while. Afterwards we save the index to disk so we don't have to do it again.

In [None]:
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    show_progress=True,
)

In [None]:
#kg_index.storage_context.persist(persist_dir='./data/storage_graph')

In [None]:
from llama_index.core import load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir='./data/storage_graph', graph_store=graph_store)
kg_index = load_index_from_storage(
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    verbose=True,
)

In [None]:
kg_index_query_engine = kg_index.as_query_engine(
    retriever_mode="keyword",
    verbose=True,
    response_mode="tree_summarize",
)

In [None]:
%ngql MATCH p=(n)-[e:relationship*1..2]-(m) WHERE id(n) in ['Anomander'] RETURN n.entity.name,e[0].relationship, m.entity.name LIMIT 10

In [None]:
%ngql MATCH p=(n)-[e:relationship*1..2]-(m) WHERE id(n) in ['Rake'] RETURN n.entity.name,e[0].relationship, m.entity.name LIMIT 10

In [None]:
kg_index_query_engine.query("Tell me about Anomander Rakes relationship with the Malazan Empire")

### NL2Cypher

It's also possible that we might already have existing knowledge graphs with useful information which hasn't been indexed with embeddings. This would prohibit us from using the vector index to find the right entities/relationships. In this case its possible to use LLMs to automatically generate Cypher queries to retrieve information from the graph based on the query.

NL2Cypher is a tool that can be used to generate Cypher queries from natural language queries. We can use it to generate Cypher queries for our knowledge graph.

Currently this tool doesn't work super well with the Knowledge Space generated about "Anomander Rake", this shows how you might use it with a different knowledge graph.
Some fine-tuning might be needed to get the better results.


In [None]:
DEFAULT_NEBULAGRAPH_NL2CYPHER_PROMPT_TMPL = """
Generate NebulaGraph query from natural language.
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
---
{schema}
---
Note: NebulaGraph speaks a dialect of Cypher, comparing to standard Cypher:

1. it uses double equals sign for comparison: `==` rather than `=`
2. it needs explicit label specification when referring to node properties, i.e.
v is a variable of a node, and we know its label is Foo, v.`foo`.name is correct
while v.name is not.

For example, see this diff between standard and NebulaGraph Cypher dialect:
```diff
< MATCH (p:person)-[:directed]->(m:movie) WHERE m.name = 'The Godfather'
< RETURN p.name;
---
> MATCH (p:`person`)-[:directed]->(m:`movie`) WHERE m.`movie`.`name` == 'The Godfather'
> RETURN p.`person`.`name`;
```

Question: {query_str}

NebulaGraph Cypher dialect query:
"""
DEFAULT_NEBULAGRAPH_NL2CYPHER_PROMPT = PromptTemplate(
    DEFAULT_NEBULAGRAPH_NL2CYPHER_PROMPT_TMPL,
    prompt_type=PromptType.TEXT_TO_GRAPH_QUERY,
)

query_engine_with_nl2graphquery =  KnowledgeGraphQueryEngine(
    storage_context=storage_context,
    llm=OpenAI(api_key=OPENAI_API_KEY),
    verbose=True,
    graph_query_synthesis_prompt=DEFAULT_NEBULAGRAPH_NL2CYPHER_PROMPT
)

In [None]:
query_engine_with_nl2graphquery.generate_query("Tell me Anomander's relationship with Andarist")

In [None]:
response = query_engine_with_nl2graphquery.query(
    "Tell me about Anomander's relationship Andarist",
)
display(Markdown(f"<b>{response}</b>"))