# Hybrid RAG

This notebook shows an example of the Hybrid RAG methodology, as per the BlackRock and Nvidia paper
*[HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction](https://arxiv.org/abs/2408.04948)*
from August 2024.

This is similar to the intro Graph RAG pipeline, but with an important difference: the knowledge graph is constructed in *two* stages:
- First, the text is preprocessed with an LLM that rewrites the text in simple sentences, so that the entities and relationships are more
explicitly represented, and pronouns are replaced with the entities they refer to. This will help the extraction LLM in the second stage by providing a cleaner input.
- Next, the entity and relationship extraction stage uses the preprocessed text to explicitly extract the nodes
and edges that will represent the graph, as per a specified schema.

The rest of the pipeline is similar to the intro RAG pipeline, where we also store the chunk embeddings of the original text in the vector store. At retrieval time, we retrieve from both the vector store and the graph store, and then combine the results using a Cohere reranker that provides the combined context to the generation LLM.

The dataset used is about BlackRock Inc., the world's largest asset manager, and its founders and executives.

In [1]:
# Load environment variables
import os
from dotenv import load_dotenv

load_dotenv()
SEED = 37

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
COHERE_API_KEY = os.environ.get("COHERE_API_KEY")

assert OPENAI_API_KEY is not None, "OPENAI_API_KEY is not set"
assert COHERE_API_KEY is not None, "COHERE_API_KEY is not set"

In [2]:
from llama_index.core import SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

# Set up the embedding model and LLM
embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")
extraction_llm = OpenAI(model="gpt-4o-mini", temperature=0.0, seed=SEED)
generation_llm = OpenAI(model="gpt-4o-mini", temperature=0.3, seed=SEED)

# Load the dataset on Larry Fink
original_documents = SimpleDirectoryReader("../../data/blackrock").load_data()

## Part 1: Vector-only retrieval

This stage demonstrates how to extract information into a vector database and store it in [LanceDB](https://lancedb.com/), an open source, embedded vector database. The aim of this stage is to use the vector database to answer the questions using vector-only retrieval, commonly known as "naive RAG" or traditional RAG.

In [3]:
import shutil

# We'll use LanceDB to perform vector similarity search
shutil.rmtree("./test_lancedb", ignore_errors=True)

from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.vector_stores.lancedb import LanceDBVectorStore

import openai

openai.api_key = OPENAI_API_KEY

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
vector_store = LanceDBVectorStore(
    uri="./test_lancedb",
    mode="overwrite",
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

vector_index = VectorStoreIndex.from_documents(
    original_documents,
    storage_context=storage_context,
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    llm=OpenAI(model="gpt-4o-mini", temperature=0.3, seed=SEED),
)

[2024-10-09T17:04:48Z WARN  lance::dataset] No existing dataset at /Users/prrao/code/graph-rag/src/02_hybrid_rag/test_lancedb/vectors.lance, it will be created


In [5]:
Settings.llm = generation_llm

vector_retriever = vector_index.as_retriever(similarity_top_k=4)
vector_query_engine = RetrieverQueryEngine(vector_retriever)

response = vector_query_engine.query("Who founded BlackRock?")
str(response)

'BlackRock was founded by Larry Fink and seven partners in 1988.'

In [6]:
response = vector_query_engine.query("Where did Larry Fink and Robert Kapito meet?")
print(str(response))

Larry Fink and Robert Kapito first met at First Boston, where Kapito worked in the Public Finance department.


## Part 2: Graph-only retrieval

Next, let's demonstrate how to extract information into a knowledge graph and store it in [Kùzu](https://kuzudb.com/), an open source, embedded graph database.
The aim of this stage is to use the graph to answer the same questions as in Part 1, but using graph-only retrieval.

In [7]:
import warnings
from typing import List, Literal, Optional
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
from llama_index.graph_stores.kuzu import KuzuPropertyGraphStore

import kuzu

shutil.rmtree("test_kuzudb", ignore_errors=True)
db = kuzu.Database("test_kuzudb")

warnings.filterwarnings("ignore")

In [8]:
import nest_asyncio
nest_asyncio.apply()

### Load processed documents

For graph construction, we will load the preprocessed documents that were created by running the script `preprocess.py`, that rewrote the text in simple sentences, and replaced pronouns with the entities they refer to.


In [9]:
preprocessed_documents = SimpleDirectoryReader("../../data/blackrock/processed").load_data()

In [10]:
# Define the allowed entities and relationships
entities = Literal["PERSON", "CITY", "STATE", "UNIVERSITY", "ORGANIZATION"]
relations = Literal[
    "STUDIED_AT",
    "IS_FOUNDER_OF",
    "IS_CEO_OF",
    "BORN_IN",
    "IS_CITY_IN",
]

# Define explicit relationship directions as a list of triples
# The graph extraction process will be guided by this schema
validation_schema = [
    ("PERSON", "STUDIED_AT", "UNIVERSITY"),
    ("PERSON", "IS_CEO_OF", "ORGANIZATION"),
    ("PERSON", "IS_FOUNDER_OF", "ORGANIZATION"),
    ("PERSON", "BORN_IN", "CITY"),
    ("CITY", "IS_CITY_IN", "STATE"),
]

In [11]:
graph_store = KuzuPropertyGraphStore(
    db,
    has_structured_schema=True,
    relationship_schema=validation_schema,
)

In [12]:
schema_path_extractor = SchemaLLMPathExtractor(
    llm=extraction_llm,
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    strict=True,  # if false, will allow triples outside of the schema
)

In [13]:
# Set up the property graph index
kg_index = PropertyGraphIndex.from_documents(
    preprocessed_documents,
    embed_model=embed_model,
    kg_extractors=[schema_path_extractor],
    property_graph_store=graph_store,
    show_progress=True,
)

Parsing nodes: 100%|██████████| 3/3 [00:00<00:00, 2115.49it/s]
Extracting paths from text with schema: 100%|██████████| 3/3 [00:10<00:00,  3.64s/it]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00,  5.87it/s]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00,  1.08it/s]


Now that the graph is created, we can explore it in [Kùzu Explorer](https://github.com/kuzudb/explorer), a web-base UI, by running a Docker container that pulls the latest image of Kùzu Explorer as follows:

```bash
docker run -p 8000:8000 \
           -v ./test_kuzudb:/database \
           -e MODE=READ_ONLY \
           --rm kuzudb/explorer:latest
```

Then, launch the UI and then visting http://localhost:8000/.

The easiest way to see the entire graph is to use a Cypher query like `MATCH (a)-[b]->(c) RETURN * LIMIT 100`.

In [14]:
Settings.llm = generation_llm

kg_retriever = kg_index.as_retriever()
kg_query_engine = kg_index.as_query_engine(include_text=False)

response = kg_query_engine.query("Who founded BlackRock? Return the names as a numbered list.")
print(str(response))

1. Larry Fink  
2. Robert Kapito  
3. Susan Wagner  


In [15]:
response = kg_query_engine.query("Where did Larry Fink and Robert Kapito meet?")
print(str(response))

The information provided does not specify where Larry Fink and Robert Kapito met.


### Augmenting the graph with external knowledge

We know from a quick Google search that the given text documents don't provide the full picture of the founders of BlackRock. However, acquiring detailed textual information about entities in the data might not always be possible or feasible. In such cases, we can leverage external knowledge to add additional nodes to the graph to help with providing more relevant answers to the questions.

In [16]:
from llama_index.core.graph_stores.types import Relation, EntityNode

# Say we have this knowledge obtained from other sources about additional founders of BlackRock
additional_founders = [
    "Ben Golub",
    "Barbara Novick",
    "Ralph Schlosstein",
    "Keith Anderson",
    "Hugh Frater",
]

# Add additional founder nodes of type PERSON to the graph store
for founder in additional_founders:
    graph_store.upsert_nodes(
        [
            EntityNode(label="PERSON", name=founder),
        ]
    )
    graph_store.upsert_relations(
        [
            Relation(
                label="IS_FOUNDER_OF",
                source_id=founder,
                target_id="BlackRock Inc.",
            ),
        ]
    )

The graph looks something like this:

![](../../assets/fink.png)

The new nodes and relationships are now included in the graph, and are accessible to the graph query engine.

In [17]:
response = kg_query_engine.query("Who founded BlackRock? Return the names as a numbered list.")
print(str(response))

1. Larry Fink  
2. Robert Kapito  
3. Susan Wagner  
4. Ben Golub  
5. Barbara Novick  
6. Ralph Schlosstein  
7. Keith Anderson  
8. Hugh Frater  


The text2Cypher pipeline that retrieves from the graph was able to provide the correct answer to the above question (8 co-founders, including Larry Fink himself).

In [18]:
response = kg_query_engine.query("Where did Larry Fink and Robert Kapito meet?")
print(str(response))

The information provided does not specify where Larry Fink and Robert Kapito met.


## Takeaways from graph-only retrieval

It can be seen by inspecting the raw data that the LLM-extracted graph is incomplete. Once the right nodes/relationships are added to the graph, the quality of the graph-based retrieval improves significantly. This did require some manual curation, but we will demonstrate below that this process is worth it, by trying to answer the **same** questions using vector-only retrieval.

## Stage 3: Combining graph and vector retrieval to build a Graph RAG system

In this stage, we will demonstrate how to combine graph and vector retrieval and rerank the results in order to provide better context to the LLM prior to generating the response. We will use the afore-mentioned Kùzu and LanceDB databases to achieve this.

In [19]:
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import NodeWithScore
from llama_index.postprocessor.cohere_rerank.base import CohereRerank


class CustomRerankerRetriever(BaseRetriever):
    """Custom retriever with cohere reranking."""
    def __init__(
            self,
            kg_retriever,
            vector_retriever,
            cohere_api_key: Optional[str] = None,
            cohere_top_n: int = 10,
        ):
        self._kg_retriever = kg_retriever
        self._vector_retriever = vector_retriever
        self._reranker = CohereRerank(
            api_key=cohere_api_key, top_n=cohere_top_n
        )

    def _retrieve(self, query: str) -> List[NodeWithScore]:
        """Define custom retriever with reranking.

        Could return `str`, `TextNode`, `NodeWithScore`, or a list of those.
        """
        vector_retrieval_nodes = self._vector_retriever.retrieve(query)
        kg_retrieval_nodes = self._kg_retriever.retrieve(query)
        combined_nodes = vector_retrieval_nodes + kg_retrieval_nodes
        reranked_nodes = self._reranker.postprocess_nodes(
            combined_nodes,
            query_str=str(query),
        )
        unique_nodes = {n.node_id: n for n in reranked_nodes}
        return list(unique_nodes.values())

In [20]:
custom_reranker_retriever = CustomRerankerRetriever(
    kg_retriever,
    vector_retriever,
    cohere_api_key=COHERE_API_KEY,
    cohere_top_n=10,
)

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/prrao/Library/Application Support/sagemaker/config.yaml


In [21]:
# Set the LLM for generation in the CustomRerankerRetriever
Settings.llm = generation_llm

custom_reranker_query_engine = RetrieverQueryEngine(custom_reranker_retriever)

response = custom_reranker_query_engine.query("Who founded BlackRock? Return the names as a numbered list.")
print(str(response))

1. Larry Fink  
2. Susan Wagner  
3. Ralph Schlosstein  
4. Barbara Novick  
5. Robert Kapito  
6. Hugh Frater  
7. Ben Golub  
8. Keith Anderson  


In [22]:
response = custom_reranker_query_engine.query("Where did Larry Fink and Robert Kapito meet?")
print(str(response))

Larry Fink and Robert Kapito met while working at First Boston.


### Display the retrieved nodes

To inspect the reranked context, we can print the text of the nodes that were retrieved. It's clear that the reranker provides the most relevant context from *both* the vector store *and* the graph store.

In [23]:
nodes = custom_reranker_query_engine.retrieve("Who founded BlackRock? Return the names as a numbered list.")
for item in nodes:
    print(item.text)

Larry Fink is Founder, Chairman and CEO of BlackRock, Inc. He and seven partners founded BlackRock in 1988, and under his leadership, the firm has grown into a global leader in investment and technology solutions to help investors build better financial futures. Today, BlackRock is trusted to manage $10 trillion in assets, more than any other investment firm in the world.

Mr. Fink was was born on November 2, 1952, in Van Nuys, California. He earned an MBA from UCLA in 1976 and a BA in political science, also from UCLA, in 1974.
Susan Wagner is a cofounder and director of asset manager BlackRock, which she started with Larry Fink and others in 1988. She cofounded the company at age 26 and went on to serve as chief operating officer and vice chairman. Wagner oversaw BlackRock's 2009 merger with Barclay's Global Investors, which transformed the firm into the world's largest asset manager.

Prior to founding BlackRock, Wagner worked as a vice president of the mortgage finance group at Leh

In [24]:
nodes = custom_reranker_query_engine.retrieve("Where did Larry Fink and Robert Kapito meet?")
for item in nodes:
    print(item.text)

Robert Kapito is one of the most successful persons in the financial industry globally and well recognized as the co-founder and the President of BlackRock Inc. His career, full of innovation and top-class leadership, has changed and shaped the outlook of the world’s financial scene.

Robert Kapito was born on February 8, 1957, in Monticello, New York.

Mr. Kapito started his professional career in the First Boston in 1979 where he served in the Public Finance department. This work was the onset of his career, which was long and proved to be significant in finance. During his work at First Boston he first met Larry Fink, who would later be his partner at BlackRock.
Robert Kapito -> STUDIED_AT -> First Boston
Larry Fink -> STUDIED_AT -> UCLA
Robert Kapito -> IS_FOUNDER_OF -> BlackRock Inc.
Robert Kapito -> BORN_IN -> Monticello
Larry Fink -> BORN_IN -> Van Nuys
Robert Kapito -> IS_CEO_OF -> BlackRock Inc.
Robert Kapito -> IS_CEO_OF -> First Boston
Larry Fink is Founder, Chairman and CEO

## Conclusions

As can be seen from this demo, combining graph and vector retrieval can, on average, provide more accurate and contextually relevant answers to the questions. In certain cases, the vector retrieval can retrieve answers that are relevant through the fuzzy relationships that are implicitly modelled via the embeddings. In other cases, the graph retrieval can provide more relevant, complete answers, because it *explicitly* models the relationships between the entities in the text. It's important to note that the two types of retrievals can complement each other, rather than being in competition.

By combining vector and graph retrieval and a reranker retriever that can leverage the context from both kinds of retrieval to provide the generation LLM with more relevant and accurate context, and thus a more relevant answer than if either retrieval method was used on its own.

In practice, the Graph RAG system can be used to answer a wide range of questions, such as factual questions, definition questions, and reasoning questions. The key is to build a high-quality knowledge graph, and to combine it with vector search in a way that provides the most relevant and accurate answers to the questions.