# Comparing Retrieval Augmented Generation (RAG) Methods with Graphs

In this notebook we will explore and compare several RAG methods with a focus on knowledge graphs.

## Background on RAG:

Below digrams show how RAG works:

```
                  RAG with Llama Index
                  ┌────┬────┬────┬────┐                  
                  │ 1  │ 2  │ 3  │ 4  │                  
                  ├────┴────┴────┴────┤                  
                  │  Docs/Knowledge   │                  
┌───────┐         │        ...        │       ┌─────────┐
│       │         ├────┬────┬────┬────┤       │         │
│       │         │ 95 │ 96 │    │    │       │         │
│       │         └────┴────┴────┴────┘       │         │
│ User  │─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─▶   LLM   │
│       │                                     │         │
│       │                                     │         │
└───────┘    ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  └─────────┘
    │          ┌──────────────────────────┐        ▲     
    └────────┼▶│  Tell me ....., please   │├───────┘     
               └──────────────────────────┘              
             │ ┌────┐ ┌────┐               │             
               │ 3  │ │ 96 │                             
             │ └────┘ └────┘               │             
              ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 
```

In VectorDB based RAG, we create embeddings of each node (chunk), and find TopK related ones towards a given question during the query. In the above diagram, nodes `3` and `96` were fetched as the TopK related nodes and used to answer the user query. 

## Background Graph RAG

In Graph RAG, we will extract relationships between representing concise facts from each node. It would look something like this:

```
Node Split and Embedding

┌────┬────┬────┬────┐
│ 1  │ 2  │ 3  │ 4  │
├────┴────┴────┴────┤
│  Docs/Knowledge   │
│        ...        │
├────┬────┬────┬────┤
│ 95 │ 96 │    │    │
└────┴────┴────┴────┘
```

If we zoomed in:

```
       Node Split and Embedding, with Knowledge Graph being extracted

┌──────────────────┬──────────────────┬──────────────────┬──────────────────┐
│ .─.       .─.    │  .─.       .─.   │            .─.   │  .─.       .─.   │
│( x )─────▶ y )   │ ( x )─────▶ a )  │           ( j )  │ ( m )◀────( x )  │
│ `▲'       `─'    │  `─'       `─'   │            `─'   │  `─'       `─'   │
│  │     1         │        2         │        3    │    │        4         │
│ .─.              │                  │            .▼.   │                  │
│( z )─────────────┼──────────────────┼──────────▶( i )─┐│                  │
│ `◀────┐          │                  │            `─'  ││                  │
├───────┼──────────┴──────────────────┴─────────────────┼┴──────────────────┤
│       │                      Docs/Knowledge           │                   │
│       │                            ...                │                   │
│       │                                               │                   │
├───────┼──────────┬──────────────────┬─────────────────┼┬──────────────────┤
│  .─.  └──────.   │  .─.             │                 ││  .─.             │
│ ( x ◀─────( b )  │ ( x )            │                 └┼▶( n )            │
│  `─'       `─'   │  `─'             │                  │  `─'             │
│        95   │    │   │    96        │                  │   │    98        │
│            .▼.   │  .▼.             │                  │   ▼              │
│           ( c )  │ ( d )            │                  │  .─.             │
│            `─'   │  `─'             │                  │ ( x )            │
└──────────────────┴──────────────────┴──────────────────┴──`─'─────────────┘
```

In theory, knowledge graphs should help balance granularity and density. Optionally, multi-hop of `x -> y`, `i -> j -> z -> x` etc... across many more nodes (chunks) than TopK search allows. 

# 1. Preparation

## 1.1 Prepare for LLM

In [2]:
import os
from dotenv import load_dotenv

import logging
import sys

logging.basicConfig(
    stream=sys.stdout, level=logging.INFO
)  # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from IPython.display import Markdown, display, HTML

In [3]:
from llama_index import (
    ServiceContext,
    VectorStoreIndex,
    KnowledgeGraphIndex,
)
from llama_index import set_global_service_context

from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores.simple import SimpleGraphStore

INFO:numexpr.utils:Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.


## 1.2. Prepare  Graph Store


In [6]:
graph_store = SimpleGraphStore().from_persist_path(persist_path='../ho3_storage_kg_graph')

No existing llama_index.graph_stores.simple found at ../ho3_storage_kg_graph. Initializing a new graph_store from scratch. 


In [7]:
storage_context = StorageContext.from_defaults(graph_store=graph_store)

## 2. Build the Knowledge Graph

The Knowledge Graph is created with the `KnowledgeGraphIndex` from Llama Index. This extracts "Triplets" that will be persisted in `SimpleGraphStore`.

### 2.1 Preprocess Data

We will download and preprecess data from:
    https://www.iii.org/sites/default/files/docs/pdf/HO3_sample.pdf

In [8]:
from llama_index import SimpleDirectoryReader
from src_index.clean_sample_ho3 import clean_sample_ho3_pages

documents = SimpleDirectoryReader(input_files=['./data/HO3_sample.pdf']).load_data()

for i, _ in enumerate(documents):
    documents[i].text = clean_sample_ho3_pages(documents[i].text)


* PDF page count for the HO3 Homeowner's policy document

In [9]:
len(documents)

22

In [10]:
print(documents[0].text)

AGREEMENT
We will provide the insurance described in this policy
in return for the premium and compliance with allapplicable provisions of this policy.
DEFINITIONS
A. In this policy, "you" and "your" refer to the "named
insured" shown in the Declarations and the spouseif a resident of the same household. "We", "us"and "our" refer to the Company providing this insurance.
B. In addition, certain words and phrases are definedas follows:
1. "Aircraft Liability", "Hovercraft Liability", "Motor
Vehicle Liability" and "Watercraft Liability",subject to the provisions in b. below, mean the
following:
a. Liability for "bodily injury" or "property damage" arising out of the:
(1) Ownership of such vehicle or craft by an"insured";
(2) Maintenance, occupancy, operation,use, loading or unloading of such vehicle or craft by any person;
(3) Entrustment of such vehicle or craft byan "insured" to any person;
(4) Failure to supervise or negligent supervision of any person involving such vehicle or craft b

* Prompt to build the graph:

In [11]:
from llama_index.prompts.base import Prompt
from llama_index.prompts.prompt_type import PromptType


KG_TRIPLET_EXTRACT_TMPL = (
    "Some text is provided below. Given the text, extract up to "
    "{max_knowledge_triplets} "
    "knowledge triplets in the form of (subject, predicate, object). Avoid stopwords, 'page_label', and page numbers.\n"
    "---------------------\n"
    "Example:"
    "Text: Alice is Bob's mother."
    "Triplets:\n(Alice, is mother of, Bob)\n"
    "Text: Philz is a coffee shop founded in Berkeley in 1982.\n"
    "Triplets:\n"
    "(Philz, is, coffee shop)\n"
    "(Philz, founded in, Berkeley)\n"
    "(Philz, founded in, 1982)\n"
    "---------------------\n"
    "Text: {text}\n"
    "Triplets:\n"
)
KG_TRIPLET_EXTRACT_PROMPT = Prompt(
    KG_TRIPLET_EXTRACT_TMPL, prompt_type=PromptType.KNOWLEDGE_TRIPLET_EXTRACT
)

### 2.2 Extract Triplets and Save to SimpleGraphStore

This call will take some time, it'll extract entities and relationships and store them in SimpleGraphStore

In [12]:
from src_index.token_catcher import Usage
usage = Usage()

In [13]:
from llama_index.llms import OpenAI
from llama_index import OpenAIEmbedding, ServiceContext, KnowledgeGraphIndex

embedding_llm = OpenAIEmbedding()

llm=OpenAI(temperature=0, 
           model_name="gpt-4",
           )

service_context = ServiceContext.from_defaults(llm=llm, 
                                               embed_model=embedding_llm,
                                               chunk_size=512,
                                               chunk_overlap=100
                                               )

set_global_service_context(service_context)

kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=10,
    storage_context=storage_context,
    include_embeddings=True,
    kg_triple_extract_template=KG_TRIPLET_EXTRACT_PROMPT,
    show_progress=True)

  from .autonotebook import tqdm as notebook_tqdm
Parsing documents into nodes: 100%|██████████| 22/22 [00:00<00:00, 40.44it/s]
Generating embeddings: 100%|██████████| 11/11 [00:00<00:00, 42.32it/s]
Generating embeddings: 100%|██████████| 10/10 [00:00<00:00, 61.55it/s]
Generating embeddings: 100%|██████████| 6/6 [00:00<00:00, 41.70it/s]
Generating embeddings: 100%|██████████| 18/18 [00:00<00:00, 79.55it/s]
Generating embeddings: 100%|██████████| 8/8 [00:00<00:00, 69.51it/s]
Generating embeddings: 100%|██████████| 5/5 [00:00<00:00, 42.16it/s]
Generating embeddings: 100%|██████████| 7/7 [00:00<00:00, 48.53it/s]
Generating embeddings: 100%|██████████| 4/4 [00:00<00:00, 32.26it/s]
Generating embeddings: 100%|██████████| 11/11 [00:00<00:00, 51.00it/s]
Generating embeddings: 100%|██████████| 11/11 [00:00<00:00, 58.20it/s]
Generating embeddings: 100%|██████████| 12/12 [00:00<00:00, 56.23it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00, 23.76it/s]
Generating embeddings: 100%|████

In [15]:
usage.total_tokens()

48063

## 3 Create VectorStoreIndex for RAG

To compare Graph-based query with vector similarity we'll also create a `VectorStoreIndex`.

During the creation, the same data source will be split into chunks and converted to embeddings. During query time, the top-k related embeddings will be vector-searched with the embedding of the question.

```
                  RAG with Llama Index
                  ┌────┬────┬────┬────┐                  
                  │ 1  │ 2  │ 3  │ 4  │                  
                  ├────┴────┴────┴────┤                  
                  │  Docs/Knowledge   │                  
┌───────┐         │        ...        │       ┌─────────┐
│       │         ├────┬────┬────┬────┤       │         │
│       │         │ 95 │ 96 │    │    │       │         │
│       │         └────┴────┴────┴────┘       │         │
│ User  │─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─▶   LLM   │
│       │                                     │         │
│       │                                     │         │
└───────┘    ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  └─────────┘
    │          ┌──────────────────────────┐        ▲     
    └────────┼▶│  Tell me ....., please   │├───────┘     
               └──────────────────────────┘              
             │ ┌────┐ ┌────┐               │             
               │ 3  │ │ 96 │                             
             │ └────┘ └────┘               │             
              ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 
```

The next line of code does this:

In [16]:
from llama_index import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(
    documents,
)

In [17]:
# persist KG Index
kg_index.storage_context.persist(persist_dir='../ho3_storage_kg_graph')

# persist Vector Index
vector_index.storage_context.persist(persist_dir='../ho3_storage_kg_vector')

## 4. Persist and Load index objects from disk

Both the `KnowledgeGraphIndex` and `VectorStoreIndex` will be created only once. Afterwards, we will persist and re-use them. Adding IDs is optional and only needed when multiple index objects exist in the same directory.

In [14]:
from llama_index.llms import OpenAI
from llama_index import (
    OpenAIEmbedding, 
    ServiceContext, 
    load_index_from_storage,
    set_global_service_context
)

embedding_llm = OpenAIEmbedding()

llm=OpenAI(temperature=0, 
           model_name="gpt-4",
           )

service_context = ServiceContext.from_defaults(llm=llm, 
                                               embed_model=embedding_llm,
                                               chunk_size=512,
                                               chunk_overlap=100
                                               )

set_global_service_context(service_context)
# graph_store = SimpleGraphStore()

storage_context_kg = StorageContext.from_defaults(persist_dir="../ho3_storage_kg_graph")
kg_index = load_index_from_storage(
    storage_context=storage_context_kg,
    max_triplets_per_chunk=10,
    include_embeddings=True, 
    )

storage_context_vector = StorageContext.from_defaults(persist_dir='./ho3_storage_kg_vector')
vector_index = load_index_from_storage(
    storage_context=storage_context_vector,
)

INFO:llama_index.indices.loading:Loading all indices.
Loading all indices.
INFO:llama_index.indices.loading:Loading all indices.
Loading all indices.


## 5. Prepare for different query approaches

We will do 4 types of query approaches with LLM, KG, VectorDB:

| QueryEngine | Knowledge Graph Only                                 | Graph RAG query engine                                       | Vector RAG query engine                                      | Graph Vector RAG query engine                                |
| ----------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| Mechanism   | 1. **Graph Only** based on triplets<br />2. Query KG with the result<br />3. Answer synthesis based on query result | 1. Get related entities of the question<br />2. Get n-depth **SubGraphs** of related entities from KG<br />3. Answer synthesis based on related SubGraphs | 1. Create embedding of question<br />2. Semantic search **top-k related doc chunks**<br />3. Answer synthesis based on related doc chunks | 1. Do retrieval as Vector and Graph RAG <br />2. Answer synthesis based on **both related chunks and SubGraphs** |


### 5.1 Knowledge Graph Only

This approach strictly follows the graph.

In [18]:
kg_query_engine = kg_index.as_query_engine(
    # Uses the raw triplets instead of adding the text from the corresponding nodes
    include_text=True,
    retriever_mode="keyword",
    response_mode="tree_summarize",
    similarity_top_k=5,
)

### 5.1.1 Knowledge Graph Hybrid - Search 

Here, we add flexibility from embedding representations.

In [19]:
# query using triplets plus embedded keywords
hybrid_query_engine = kg_index.as_query_engine(
    include_text=True,
    response_mode="tree_summarize",
    embedding_mode="hybrid",
)

### 5.2 Graph RAG query engine

Graph RAG takes SubGraphs related to entities of the task/question as Context.

```
           Graph + Vector RAG with Llama Index
                  ┌────┬────┬────┬────┐                  
                  │ 1  │ 2  │ 3  │ 4  │                  
                  ├────┴────┴────┴────┤                  
                  │  Docs/Knowledge   │                  
┌───────┐         │        ...        │       ┌─────────┐
│       │         ├────┬────┬────┬────┤       │         │
│       │         │ 95 │ 96 │    │    │       │         │
│       │         └────┴────┴────┴────┘       │         │
│ User  │─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─▶   LLM   │
│       │                                     │         │
│       │                                     │         │
└───────┘    ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  └─────────┘
    │          ┌──────────────────────────┐        ▲     
    └────────┼▶│  Tell me about x, please │├───────┘     
               └──────────────────────────┘              
             │ Below are knowledge about x │             
               x->y<-z,x->h->i, m<-n,...                            
             │ Please answer based on them │             
              ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 
```

In [20]:
kg_rag_query_engine = kg_index.as_query_engine(
    include_text=False,
    retriever_mode="keyword",
    response_mode="tree_summarize",
)

### 5.3 Standard vector query engine

Vector RAG to find topK semantic related doc chunks, and use as context for the answer.

In [21]:
vector_rag_query_engine = vector_index.as_query_engine(
)

### 5.4 Graph+Vector RAG query engine

This is a combined Graph+Vector Based RAG, where we will retrieve both VectorDB and KG SubGraphs as the context for the answer.

```
           Graph + Vector RAG with Llama Index
                  ┌────┬────┬────┬────┐                  
                  │ 1  │ 2  │ 3  │ 4  │                  
                  ├────┴────┴────┴────┤                  
                  │  Docs/Knowledge   │                  
┌───────┐         │        ...        │       ┌─────────┐
│       │         ├────┬────┬────┬────┤       │         │
│       │         │ 95 │ 96 │    │    │       │         │
│       │         └────┴────┴────┴────┘       │         │
│ User  │─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─▶   LLM   │
│       │                                     │         │
│       │                                     │         │
└───────┘    ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  └─────────┘
    │          ┌──────────────────────────┐        ▲     
    └────────┼▶│  Tell me ....., please   │├───────┘     
               └──────────────────────────┘              
             │ ┌────┐┌────┐               │             
               │ 3  ││ 96 │ x->y<-z,x->h...                            
             │ └────┘└────┘               │             
              ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 
```

To implement that in Llama Index, we create a `CustomRetriever` to comebine the two: 

In [22]:
# import QueryBundle
from llama_index import QueryBundle

# import NodeWithScore
from llama_index.schema import NodeWithScore

# Retrievers
from llama_index.retrievers import BaseRetriever, VectorIndexRetriever, KGTableRetriever

from typing import List


class CustomRetriever(BaseRetriever):
    """Custom retriever that performs both Vector search and Knowledge Graph search"""

    def __init__(
        self,
        vector_retriever: VectorIndexRetriever,
        kg_retriever: KGTableRetriever,
        mode: str = "OR",
    ) -> None:
        """Init params."""

        self._vector_retriever = vector_retriever
        self._kg_retriever = kg_retriever
        if mode not in ("AND", "OR"):
            raise ValueError("Invalid mode.")
        self._mode = mode

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve nodes given query."""

        vector_nodes = self._vector_retriever.retrieve(query_bundle)
        kg_nodes = self._kg_retriever.retrieve(query_bundle)

        vector_ids = {n.node.node_id for n in vector_nodes}
        kg_ids = {n.node.node_id for n in kg_nodes}

        combined_dict = {n.node.node_id: n for n in vector_nodes}
        combined_dict.update({n.node.node_id: n for n in kg_nodes})

        if self._mode == "AND":
            retrieve_ids = vector_ids.intersection(kg_ids)
        else:
            retrieve_ids = vector_ids.union(kg_ids)

        retrieve_nodes = [combined_dict[rid] for rid in retrieve_ids]
        return retrieve_nodes

Next, we will create instances of the Vector and KG retrievers, which will be used in the instantiation of the Custom Retriever.

In [23]:
from llama_index import get_response_synthesizer
from llama_index.query_engine import RetrieverQueryEngine

# create custom retriever
vector_retriever = VectorIndexRetriever(index=vector_index)

kg_retriever = KGTableRetriever(
    index=kg_index, 
    retriever_mode="keyword", 
    include_text=True
)
custom_retriever = CustomRetriever(vector_retriever, kg_retriever)

# create response synthesizer
response_synthesizer = get_response_synthesizer(
    response_mode="tree_summarize",
)

And the query engine:

In [24]:
graph_vector_rag_query_engine = RetrieverQueryEngine(
    retriever=custom_retriever,
    response_synthesizer=response_synthesizer,
)

## 6. Query with all the Engines
* We will ask a purposefully vague and open-ended query

### 6.1 Keyword Knowledge Graph Query

In [33]:
response_kg = kg_query_engine.query("Tell me about the limit of liability.")

INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me about the limit of liability.
> Starting query: Tell me about the limit of liability.
INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Liability', 'Limit', 'Responsibility']
> Query keywords: ['Liability', 'Limit', 'Responsibility']
INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: e050c490-1e37-4358-9107-9ff60b6ec688: Corporal Punishment
Or Physical Or Mental Abuse
"Bodily injury" or "property ...
> Querying with idx: e050c490-1e37-4358-9107-9ff60b6ec688: Corporal Punishment
Or Physical Or Mental Abuse
"Bodily injury" or "property ...
INFO:llama_index.indices.knowledge_graph.retriever:> Extracted relationships: The following are knowledge sequence in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]`
Liability ['excluded from', 'Coverage E']
Liability ['pays', 'damages']
Liability ['provides defense', 'counsel of choice

In [34]:
display(Markdown(f"<b>{response_kg}</b>"))

<b>
The limit of liability is specified in the Declarations section of the policy. It is the maximum amount of money that the policy will pay out in the event of a claim. The limit of liability may vary depending on the type of coverage and the specific circumstances of the claim.</b>

### 6.2 Graph Keyword Embeddings Hybrid

In [25]:
response_hybrid = hybrid_query_engine.query("Tell me more about the limit of liability.")

display(Markdown(f"<b>{response_hybrid}</b>"))

INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me more about the limit of liability.
> Starting query: Tell me more about the limit of liability.
INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Limit', 'Liability']
> Query keywords: ['Limit', 'Liability']
INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: e050c490-1e37-4358-9107-9ff60b6ec688: Corporal Punishment
Or Physical Or Mental Abuse
"Bodily injury" or "property ...
> Querying with idx: e050c490-1e37-4358-9107-9ff60b6ec688: Corporal Punishment
Or Physical Or Mental Abuse
"Bodily injury" or "property ...
INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 19601d73-42c6-4d0d-9048-5c948c777711: DEDUCTIBLE
Unless otherwise noted in this policy, the following
deductible pr...
> Querying with idx: 19601d73-42c6-4d0d-9048-5c948c777711: DEDUCTIBLE
Unless otherwise noted in this policy, the following
deductible pr...
INFO:llama_index.indices.knowled

<b>
The limit of liability is the maximum amount of money that an insurance company will pay out for a claim. It is specified in the policy's Declarations and is usually based on the value of the property or the amount of coverage purchased. For Coverage B, the limit of liability is typically 10% of the limit of liability that applies to Coverage A. For Coverage C, the limit of liability is usually the total limit for each loss for all property in that category. There are also special limits of liability for certain categories of property, such as money, bank notes, bullion, gold, silver, platinum, coins, medals, scrip, stored value cards, and smart cards, which is $200.</b>

In [26]:
response_hybrid = hybrid_query_engine.query("Tell me more about personal property coverage")

display(Markdown(f"<b>{response_hybrid}</b>"))

INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me more about personal property coverage
> Starting query: Tell me more about personal property coverage
INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Personal', 'Coverage', 'Property']
> Query keywords: ['Personal', 'Coverage', 'Property']
INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: cf4c7da6-c5e9-43d8-822b-bcf2d8567441: or liquid fuel,other than fuel in a permanently installedfuel tank of a vehic...
> Querying with idx: cf4c7da6-c5e9-43d8-822b-bcf2d8567441: or liquid fuel,other than fuel in a permanently installedfuel tank of a vehic...
INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 19601d73-42c6-4d0d-9048-5c948c777711: DEDUCTIBLE
Unless otherwise noted in this policy, the following
deductible pr...
> Querying with idx: 19601d73-42c6-4d0d-9048-5c948c777711: DEDUCTIBLE
Unless otherwise noted in this policy, the following
deductible pr..

<b>
Personal property coverage under Coverage C of the HO3 sample policy covers personal property owned or used by an "insured" while it is anywhere in the world. After a loss and at the request of the insured, the policy will cover personal property owned by others while the property is on the part of the "residence premises" occupied by an "insured", or a guest or a "residence employee" while the property is in any residence occupied by an "insured". The limit of liability for personal property usually located at an "insured's" residence, other than the "residence premises", is 10% of the limit of liability for Coverage C, or $1,000, whichever is greater. There are also special limits of liability for certain categories of personal property, such as money, bank notes, bullion, gold other than goldware, silver other than silverware, platinum other than platinumware, coins, medals, scrip, stored value cards and smart cards, and securities, accounts, deeds, evidences of debt, letters of credit, notes other than bank notes, manuscripts, personal records, passports, tickets and stamps. The policy also covers personal property while it is being removed from a premises endangered by a Peril Insured Against and for no more than 30 days while removed. There is also coverage for loss resulting from the theft or unauthorized use of credit cards, electronic fund transfer cards or access devices, forgery or alteration of any check or negotiable instrument, and acceptance in good faith of counterfeit United States or Canadian paper currency. Losses are settled at actual cash value at the time of loss but not more than the amount required to repair or replace, or at replacement cost without deduction for depreciation for buildings covered under Coverage A or B.</b>

### 6.3 Vector RAG

In [27]:
response_vector_rag = vector_rag_query_engine.query("Tell me about the limit of liability")

display(Markdown(f"<b>{response_vector_rag}</b>"))

<b>
The limit of liability for Coverage E is shown in the Declarations and is the same regardless of the number of "insureds", claims made or persons injured. The limit of liability for Coverage F is also shown in the Declarations and is the total limit for medical expense payable for "bodily injury" to one person as the result of one accident. The limit of liability for personal property usually located at an "insured's" residence, other than the "residence premises", is 10% of the limit of liability for Coverage C, or $1,000, whichever is greater. There are also special limits of liability for certain categories of property, such as $200 on money, bank notes, bullion, gold other than goldware, silver other than silverware, platinum other than platinumware, coins, medals, scrip, stored value cards and smart cards, and $1,500 on securities, accounts, deeds, evidences of debt, letters of credit, notes other than bank notes, manuscripts, personal records, passports, tickets and stamps.</b>

In [28]:
response_vector_rag = vector_rag_query_engine.query("Tell me more about personal property coverage")

display(Markdown(f"<b>{response_vector_rag}</b>"))

<b>
Personal Property coverage under Coverage C of the HO3_sample.pdf policy covers personal property owned or used by an insured while it is anywhere in the world. After a loss and at the request of the insured, the policy will cover personal property owned by others while the property is on the part of the residence premises occupied by an insured, or a guest or residence employee while the property is in any residence occupied by an insured. The limit of liability for personal property usually located at an insured's residence, other than the residence premises, is 10% of the limit of liability for Coverage C, or $1,000, whichever is greater. There are also special limits of liability for certain categories of property, such as money, bank notes, bullion, gold, silver, platinum, coins, medals, scrip, stored value cards, and smart cards ($200 limit) and securities, accounts, deeds, evidences of debt, letters of credit, notes, manuscripts, personal records, passports, tickets, and stamps ($1,500 limit).</b>

### 6.4 Graph + Vector RAG

In [29]:
response_graph_vector_rag = graph_vector_rag_query_engine.query("Tell me about the limit of liability.")

Markdown(f"<b>{response_graph_vector_rag}</b>")

INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me about the limit of liability.
> Starting query: Tell me about the limit of liability.
INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Liability', 'Limit', 'Responsibility']
> Query keywords: ['Liability', 'Limit', 'Responsibility']
INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: e050c490-1e37-4358-9107-9ff60b6ec688: Corporal Punishment
Or Physical Or Mental Abuse
"Bodily injury" or "property ...
> Querying with idx: e050c490-1e37-4358-9107-9ff60b6ec688: Corporal Punishment
Or Physical Or Mental Abuse
"Bodily injury" or "property ...
INFO:llama_index.indices.knowledge_graph.retriever:> Extracted relationships: The following are knowledge sequence in max depth 2 in the form of `subject [predicate, object, predicate_next_hop, object_next_hop ...]`
Liability ['excluded from', 'Coverage E']
Liability ['pays', 'damages']
Liability ['provides defense', 'counsel of choice

<b>
The limit of liability for Liability coverage is stated in the Declarations section of the policy. It is the maximum amount that the policy will pay out for damages resulting from any one "occurrence". The limit of liability for Coverage E is the same regardless of the number of "insureds", claims made or persons injured. For Coverage F, the limit of liability for medical expenses payable for "bodily injury" to one person as the result of one accident will not be more than the Coverage F limit of liability shown in the Declarations. Liability is excluded from Loss Assessment and is not covered by any governmental body.</b>

In [30]:
response_graph_vector_rag = graph_vector_rag_query_engine.query("Tell me more about personal property coverage")

Markdown(f"<b>{response_graph_vector_rag}</b>")

INFO:llama_index.indices.knowledge_graph.retriever:> Starting query: Tell me more about personal property coverage
> Starting query: Tell me more about personal property coverage
INFO:llama_index.indices.knowledge_graph.retriever:> Query keywords: ['Personal', 'Coverage', 'Property']
> Query keywords: ['Personal', 'Coverage', 'Property']
INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 7d6e812a-1b83-4313-bc19-5fea56de82ac: district furnishing thefire department response.
This coverage is additional ...
> Querying with idx: 7d6e812a-1b83-4313-bc19-5fea56de82ac: district furnishing thefire department response.
This coverage is additional ...
INFO:llama_index.indices.knowledge_graph.retriever:> Querying with idx: 06487ca2-ad03-4ad7-bcd4-3d543258b6f2: equipment and outboardengines or motors; or
(c) Property while at any other r...
> Querying with idx: 06487ca2-ad03-4ad7-bcd4-3d543258b6f2: equipment and outboardengines or motors; or
(c) Property while at any other r..

<b>
Personal property coverage covers personal property owned or used by an "insured" while it is anywhere in the world. After a loss and at the request of the insured, the coverage will cover personal property owned by others while the property is on the part of the "residence premises" occupied by an "insured" or a guest or a "residence employee" while the property is in any residence occupied by an "insured". The limit of liability for personal property usually located at an "insured's" residence, other than the "residence premises", is 10% of the limit of liability for Coverage C, or $1,000, whichever is greater. There are also special limits of liability for certain categories of personal property, such as money, bank notes, bullion, gold other than goldware, silver other than silverware, platinum other than platinumware, coins, medals, scrip, stored value cards and smart cards. Covered property losses are settled at actual cash value at the time of loss but not more than the amount required to repair or replace. Buildings covered under Coverage A or B are settled at replacement cost without deduction for depreciation.</b>

## 7. Summary

### 7.1 Overall Comparision

In [37]:
# Set up a new model for QA
from llama_index.llms import OpenAI

qa_llm = OpenAI(temperature=0, model="gpt-4")

* Asking GPT-4 to compare query results

In [40]:

res = qa_llm.complete(f"""
Compare the QA results on "Explain Limit of Liability", list the knowledge facts between them to help evalute them. Output in markdown table.

Result GraphQuery: {response_kg}
---
Result GraphHybrid: {response_hybrid}
---
Result Vector: {response_vector_rag}
---
Result Graph+Vector: {response_graph_vector_rag}
---

"""
)


In [41]:
Markdown(f"{res}")

| Knowledge Fact | Result GraphQuery | Result GraphHybrid | Result Vector | Result Graph+Vector |
| --- | --- | --- | --- | --- |
| The limit of liability is specified in the Declarations section of the policy | Yes | No | No | No |
| The limit of liability is the maximum amount of money that the policy will pay out in the event of a claim | Yes | No | No | No |
| The limit of liability may vary depending on the type of coverage and the specific circumstances of the claim | Yes | No | No | No |
| Personal property coverage covers personal property owned or used by an "insured" while it is anywhere in the world | No | Yes | Yes | Yes |
| The policy will cover personal property owned by others while the property is on the part of the "residence premises" occupied by an "insured" | No | Yes | Yes | Yes |
| The limit of liability for personal property usually located at an "insured's" residence, other than the "residence premises", is 10% of the limit of liability for Coverage C, or $1,000, whichever is greater | No | Yes | Yes | Yes |
| There are special limits of liability for certain categories of personal property | No | Yes | Yes | Yes |
| Covered property losses are settled at actual cash value at the time of loss but not more than the amount required to repair or replace | No | Yes | No | Yes |
| Buildings covered under Coverage A or B are settled at replacement cost without deduction for depreciation | No | Yes | No | Yes |

In [42]:
res = qa_llm.complete(f"""
Compare the QA results on "Explain Personal Property Coverage", list the knowledge facts between them to help evalute them. Output in markdown table.

Result GraphQuery: {response_kg}
---
Result GraphHybrid: {response_hybrid}
---
Result Vector: {response_vector_rag}
---
Result Graph+Vector: {response_graph_vector_rag}
---

"""
)

In [43]:
Markdown(f"{res}")

| Knowledge Facts | GraphQuery | GraphHybrid | Vector | Graph+Vector |
| --- | --- | --- | --- | --- |
| Personal property coverage covers personal property owned or used by an insured | No | Yes | Yes | Yes |
| Personal property coverage covers personal property owned by others while the property is on the part of the residence premises occupied by an insured | No | Yes | Yes | Yes |
| The limit of liability for personal property usually located at an insured's residence, other than the residence premises, is 10% of the limit of liability for Coverage C, or $1,000, whichever is greater | No | Yes | Yes | Yes |
| There are special limits of liability for certain categories of personal property | No | Yes | Yes | Yes |
| Covered property losses are settled at actual cash value at the time of loss but not more than the amount required to repair or replace | No | Yes | No | Yes |
| Buildings covered under Coverage A or B are settled at replacement cost without deduction for depreciation | No | Yes | No | Yes |
| The limit of liability is specified in the Declarations section of the policy | Yes | No | No | No |
| The limit of liability may vary depending on the type of coverage and the specific circumstances of the claim | Yes | No | No | No |
| Personal property coverage covers personal property while it is being removed from a premises endangered by a Peril Insured Against and for no more than 30 days while removed | No | Yes | No | No |
| There is coverage for loss resulting from the theft or unauthorized use of credit cards, electronic fund transfer cards or access devices, forgery or alteration of any check or negotiable instrument, and acceptance in good faith of counterfeit United States or Canadian paper currency | No | Yes | No | No |

In [44]:
res =  qa_llm.complete(f"""
Compare the two QA result on "Explain Limit of Liability", list the differences between them to help evalute them. Output in markdown table.

Result from Graph: {response_kg}
---
Result from Graph_hybrid: {response_hybrid}

""" 
)


In [45]:
Markdown(f"{res}")

| Result from Graph | Result from Graph_hybrid |
| --- | --- |
| The answer is general and applies to any insurance policy. It explains that the limit of liability is the maximum amount the policy will pay out in the event of a claim. This limit is specified in the Declarations section of the policy and may vary depending on the type of coverage and the specific circumstances of the claim. | The answer is specific to a Homeowners 3 (HO3) insurance policy. It provides a detailed explanation of the limit of liability for personal property coverage under Coverage C. It mentions that the limit is 10% of the limit of liability for Coverage C, or $1,000, whichever is greater. It also lists special limits for certain categories of personal property and explains how losses are settled. |
| The answer does not provide specific examples or details about different types of coverage or circumstances that might affect the limit of liability. | The answer provides specific examples of personal property that might be covered and circumstances that might affect the limit of liability, such as the location of the property and whether it is being removed from a premises endangered by a Peril Insured Against. |
| The answer does not mention any special limits of liability for certain categories of property. | The answer mentions special limits of liability for certain categories of personal property, such as money, bank notes, bullion, gold other than goldware, silver other than silverware, platinum other than platinumware, coins, medals, scrip, stored value cards and smart cards, and securities, accounts, deeds, evidences of debt, letters of credit, notes other than bank notes, manuscripts, personal records, passports, tickets and stamps. |
| The answer does not mention any coverage for loss resulting from theft or unauthorized use of credit cards, electronic fund transfer cards or access devices, forgery or alteration of any check or negotiable instrument, and acceptance in good faith of counterfeit United States or Canadian paper currency. | The answer mentions coverage for loss resulting from the theft or unauthorized use of credit cards, electronic fund transfer cards or access devices, forgery or alteration of any check or negotiable instrument, and acceptance in good faith of counterfeit United States or Canadian paper currency. |
| The answer does not mention how losses are settled. | The answer explains that losses are settled at actual cash value at the time of loss but not more than the amount required to repair or replace, or at replacement cost without deduction for depreciation for buildings covered under Coverage A or B. |

In [46]:
res = qa_llm.complete(f"""
Compare the two QA results on "Explain Limit of Liability", list the differences between them, to help evalute them. Output in markdown table.

Result from Topk Vecs: {response_vector_rag}
---
Result from Graph+Vec: {response_graph_vector_rag}

"""
)


In [47]:
Markdown(f"{res}")

| Differences | Topk Vecs | Graph+Vec |
| --- | --- | --- |
| Use of quotation marks | No quotation marks used | Quotation marks used around terms like "insured" and "residence premises" |
| Mention of gold, silver, platinum | Does not specify that it excludes goldware, silverware, platinumware | Specifies that it covers gold, silver, platinum other than goldware, silverware, platinumware |
| Mention of certain categories of property | Mentions securities, accounts, deeds, evidences of debt, letters of credit, notes, manuscripts, personal records, passports, tickets, and stamps with a limit of $1,500 | Does not mention these categories of property |
| Settlement of covered property losses | Does not mention how covered property losses are settled | States that covered property losses are settled at actual cash value at the time of loss but not more than the amount required to repair or replace |
| Settlement of buildings covered under Coverage A or B | Does not mention how buildings covered under Coverage A or B are settled | States that buildings covered under Coverage A or B are settled at replacement cost without deduction for depreciation |

In [48]:
# Generate and export network graph

from pyvis.network import Network

g = kg_index.get_networkx_graph()
net = Network(
    notebook=True,
    directed=True,
    cdn_resources='in_line',
)

net = Network(
    cdn_resources='local',
    directed = True,            # directed graph
    bgcolor = "#222222",          # background color of graph 
    font_color = "white",      # use yellow for node labels
    height = "1000px",          # height of chart
    width = "100%",             # fill the entire width    
    )

net.repulsion(
    node_distance=200,
    central_gravity=0.2,
    spring_length=250,
    spring_strength=0.08,
    damping=0.1,
)
net.toggle_physics(True)
net.from_nx(g)
net.write_html("ho3_kg_index_gpt4.html")