# RAG Bootcamp

In [1]:
import nest_asyncio

nest_asyncio.apply()

In [9]:
!mkdir data
!wget "https://arxiv.org/pdf/2402.09353.pdf" -O "./data/dorav1.pdf"

zsh:1: command not found: wget


In [1]:
from dotenv import load_dotenv, find_dotenv
import os
import openai

_ = load_dotenv(find_dotenv())
openai.api_key = os.getenv('OPENAI_API_KEY')

In [13]:
# query an LLM and ask it about DoRA
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4")
response = llm.complete("What is DoRA?")

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [14]:
print(response.text)

Without specific context, it's hard to determine what DoRA refers to as it could mean different things in different fields. However, in general, it could refer to:

1. Division of Research and Analysis: In some organizations, this is a department responsible for conducting research and analyzing data.

2. Department of Regulatory Agencies: In some U.S. states, this is a government agency responsible for consumer protection and regulation of businesses.

3. Declaration of Research Assessment: In academia, this could refer to a formal statement assessing the impact and value of a research project.

4. Digital Operational Resilience Act (DORA): A proposed regulation by the European Union aimed at consolidating and upgrading ICT risk requirements across the financial sector.

Please provide more context for a more accurate definition.


### Basic RAG in 3 Steps

- Build external knowledge (i.e., updated data sources)
- Retrieve
- Augment and Generate

### 1. Build External Knowledge

In [5]:
"""Load the data.

With llama-index, before any transformations are applied,
data is loaded in the `Document` abstraction, which is
a container that holds the text of the document.
"""

from llama_index.core import SimpleDirectoryReader

loader = SimpleDirectoryReader(input_dir="./data")
documents = loader.load_data()

In [6]:
"""Chunk, Encode, and Store into a Vector Store.

To streamline the process, we can make use of the IngestionPipeline
class that will apply your specified transformations to the
Document's.
"""

from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client

client = qdrant_client.QdrantClient(location=":memory:")
vector_store = QdrantVectorStore(client=client, collection_name="test_store")

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        OpenAIEmbedding(),
    ],
    vector_store=vector_store,
)
_nodes = pipeline.run(documents=documents, num_workers=4)

In [7]:
# if you want to see the nodes
len(_nodes)
_nodes[0].text

'DoRA: Weight-Decomposed Low-Rank Adaptation\nShih-Yang Liu* 1 2Chien-Yi Wang1Hongxu Yin1Pavlo Molchanov1Yu-Chiang Frank Wang1\nKwang-Ting Cheng2Min-Hung Chen1\nAbstract\nAmong the widely used parameter-efficient fine-\ntuning (PEFT) methods, LoRA and its variants\nhave gained considerable popularity because of\navoiding additional inference costs. However,\nthere still often exists an accuracy gap between\nthese methods and full fine-tuning (FT). In this\nwork, we first introduce a novel weight decom-\nposition analysis to investigate the inherent dif-\nferences between FT and LoRA. Aiming to re-\nsemble the learning capacity of FT from the\nfindings, we propose Weight- Decomposed L ow-\nRankAdaptation ( DoRA ). DoRA decomposes\nthe pre-trained weight into two components, mag-\nnitude anddirection , for fine-tuning, specifically\nemploying LoRA for directional updates to effi-\nciently minimize the number of trainable param-\neters. By employing DoRA, we enhance both\nthe learning cap

In [8]:
"""Create a llama-index... wait for it... Index.

After uploading your encoded documents into your vector
store of choice, you can connect to it with a VectorStoreIndex
which then gives you access to all of the llama-index functionality.
"""

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

### 2. Retrieve Against A Query

In [9]:
"""Retrieve relevant documents against a query.

With our Index ready, we can now query it to
retrieve the most relevant document chunks.
"""

retriever = index.as_retriever(similarity_top_k=2)
retrieved_nodes = retriever.retrieve("What is DoRA?")

In [10]:
# to view the retrieved node
print(retrieved_nodes[0].text)

(1) that ∆Wcan be adapted by differ-
ent LoRA variants. With DoRA, the concept of incremental
directional update ∆Vintroduced in Equation.(5) can like-
wise be replaced with alternative LoRA variants. In this
section, we select VeRA (Kopiczko et al., 2024) as a case
study to explore DoRA’s compatibility with other LoRA
variants. VeRA suggests freezing a unique pair of random
low-rank matrices to be shared across all layers, employ-
ing only minimal layer-specific trainable scaling vectors to
capture each layer’s incremental updates. This approach
allows VeRA to reduce trainable parameters significantly
by 10x compared to LoRA, with only a minimal impact
on accuracy. We apply VeRA for the directional update in
DoRA and name such combination DV oRA. We assess the
effectiveness of both DV oRA and DoRA compared to VeRA
and LoRA across LLaMA-7B and LLaMA2-7B, focusing
on instruction tuning with the 10K subset of cleaned Alpaca
dataset (Taori et al., 2023). We utilize the official imple-
men

### 3. Generate Final Response

In [11]:
"""Context-Augemented Generation.

With our Index ready, we can create a QueryEngine
that handles the retrieval and context augmentation
in order to get the final response.
"""

query_engine = index.as_query_engine()

In [12]:
# to inspect the default prompt being used
print(
    query_engine.get_prompts()[
        "response_synthesizer:text_qa_template"
    ].default_template.template
)

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: 


In [13]:
response = query_engine.query("What is DoRA?")
print(response)

DoRA is a method that introduces incremental directional updates to adapt ∆W using various LoRA variants. It can be combined with other LoRA variants like VeRA, which reduces trainable parameters significantly while maintaining accuracy. DoRA, when combined with VeRA as DV oRA, shows consistent improvements over VeRA and LoRA in tasks like instruction tuning and fine-tuning models. Additionally, DoRA is shown to remain competitive with varying amounts of training data and can achieve better accuracy than LoRA with fewer trainable parameters by selectively updating only the directional components of certain modules.


In Summary
- LLMs as powerful as they are, don't perform too well with knowledge-intensive tasks (domain-specific, updated data, long-tail)
- Context augmentation has been shown (in a few studies) to outperform LLMs without augmentation
- In this notebook, we showed one such example that follows that pattern.

# Customization Tutorial for LlamaIndex

https://docs.llamaindex.ai/en/stable/getting_started/customization.html

In [1]:
from dotenv import load_dotenv, find_dotenv
import openai
import os

_ = load_dotenv(find_dotenv())
openai.api_key = os.getenv('OPENAI_API_KEY')

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

In [3]:
from datasets import load_dataset
xsum_dataset = load_dataset(
    "xsum", version="1.2.0"
)
documents = xsum_dataset["train"].select(range(1000)).to_pandas()

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


In [23]:
!mkdir -p 'document/'

In [4]:
joined_documents = '\n'.join(documents['document'])
with open('document/documents.txt', 'w', encoding='utf-8') as file:
    file.write(joined_documents)
documents = SimpleDirectoryReader("document").load_data()

In [5]:
index = VectorStoreIndex.from_documents(documents)

In [6]:
query_engine = index.as_query_engine()
response = query_engine.query("I'm looking for the information of Harry Potter. What could you suggest to me?")

In [7]:
response

Response(response='I would suggest looking into the play "Harry Potter and the Cursed Child." It is a theatrical production set 19 years after the last book in the Harry Potter series by JK Rowling. The play has received critical acclaim and is presented in two parts, offering a unique and immersive experience for fans of the wizarding world.', source_nodes=[NodeWithScore(node=TextNode(id_='28cb8331-4c3d-4063-8aac-3bd60404bc76', embedding=None, metadata={'file_path': '/Users/linghuang/Git/NLP/notebook/document/documents.txt', 'file_name': 'documents.txt', 'file_type': 'text/plain', 'file_size': 2132434, 'creation_date': '2024-03-11', 'last_modified_date': '2024-03-11'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInf

In [8]:
from llama_index.core import Settings

Settings.chunk_size = 512

# Local settings
from llama_index.core.node_parser import SentenceSplitter

index = VectorStoreIndex.from_documents(
    documents, transformations=[SentenceSplitter(chunk_size=512)]
)

In [11]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("I'm looking for the information of Harry Potter. What could you suggest to me?")
print(response)

You may want to explore the play "Harry Potter and the Cursed Child," which is set 19 years after the last book in the Harry Potter series by JK Rowling. The play has received positive reviews and is presented in two parts, showcasing the characters as adults with children of their own. It has been described as a magical and thrilling theatrical experience, offering a new and original story within the Harry Potter universe.


In [12]:
query_engine = index.as_query_engine(response_mode="tree_summarize")
response = query_engine.query("I'm looking for the information of Harry Potter. What could you suggest to me?")
print(response)

You might be interested in learning about the play "Harry Potter and the Cursed Child", which is set 19 years after the seventh book in the series by JK Rowling. The play has received positive reviews from critics, with praise for its magical effects, moments of comedy, and the relationship between characters. It is presented in two parts and has been described as a "truly game-changing production" with a "Dickensian sweep and momentum to the storytelling". The play is set in the wizarding world, featuring adult versions of the characters as their own children head off to school.


In [13]:
query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("I'm looking for the information of Harry Potter. What could you suggest to me?")
response.print_response_stream()

I would suggest looking into the play "Harry Potter and the Cursed Child", which is set 19 years after the seventh book in the series by JK Rowling. The play has received positive reviews and is presented in two parts, showcasing the characters as adults with their own children. It has been described as a "truly game-changing production" with thrilling theatrical elements and a unique storytelling approach.

In [14]:
query_engine = index.as_chat_engine()
response = query_engine.query("I'm looking for the information of Harry Potter. What could you suggest to me?")
print(response)

Harry Potter is a popular fictional character created by author JK Rowling. The Harry Potter series consists of seven books that follow the life and adventures of a young wizard named Harry Potter as he attends Hogwarts School of Witchcraft and Wizardry. The series has been highly successful, with millions of copies sold worldwide and adapted into eight films. Additionally, a play called "Harry Potter and the Cursed Child" written by Jack Thorne is set 19 years after the events of the final book in the original series. The play has received positive reviews and is considered a significant theatrical production.


# Knowledge Graph RAG Query Engine

https://docs.llamaindex.ai/en/latest/examples/query_engine/knowledge_graph_query_engine.html#optional-build-the-knowledge-graph-with-llamaindex

https://docs.llamaindex.ai/en/stable/examples/query_engine/knowledge_graph_rag_query_engine.html

In [3]:
# import logging
# import sys

# logging.basicConfig(
#     stream=sys.stdout, level=logging.INFO
# )  # logging.DEBUG for more verbose output


# # define LLM
# from llama_index.llms.openai import OpenAI
# from llama_index.embeddings.openai import OpenAIEmbedding
# from llama_index.core import Settings


# Settings.llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
# Settings.chunk_size = 512
# embed_model = OpenAIEmbedding(
#     model="text-embedding-3-small",
# )
# Settings.embed_model = embed_model

### Prepare for NebulaGraph

In [3]:
import os

os.environ["GRAPHD_HOST"] = "127.0.0.1"
os.environ["NEBULA_USER"] = "root"
os.environ["NEBULA_PASSWORD"] = "nebula" 
os.environ["NEBULA_ADDRESS"] = "127.0.0.1:9669"  

%reload_ext ngql
connection_string = f"--address {os.environ['GRAPHD_HOST']} --port 9669 --user root --password {os.environ['NEBULA_PASSWORD']}"
%ngql {connection_string}

Connection Pool Created


Unnamed: 0,Name
0,default
1,llamaindex
2,phillies_rag


In [4]:
space_name = "llamaindex"
edge_types, rel_prop_names = ["relationship"], ["relationship"]  # default, could be omit if create from an empty kg
tags = ["entity"]  # default, could be omit if create from an empty kg

In [5]:
from llama_index.core import StorageContext
from llama_index.graph_stores.nebula import NebulaGraphStore

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

### Step 1, load data from Wikipedia for “Guardians of the Galaxy Vol. 3”

In [6]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("document").load_data()

### Step 2, Generate a KnowledgeGraphIndex with NebulaGraph as graph_store

In [7]:
from llama_index.core import KnowledgeGraphIndex

kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    include_embeddings=True,
)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:openai._base_client:Retrying request to /chat/completions in 0.879648 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST h

In [6]:
%ngql USE llamaindex;

UsageError: Line magic function `%ngql` not found.


In [21]:
%ngql MATCH ()-[e]->() RETURN e LIMIT 10

INFO:nebula3.logger:Get connection to ('127.0.0.1', 9669)


Unnamed: 0,e
0,"(""'misuse of private information'"")-[:relation..."
1,"(""1 kg"")-[:relationship@3086906548626974739{re..."
2,"(""10 campuses"")-[:relationship@-41627020308679..."
3,"(""10%"")-[:relationship@-3139965656072287462{re..."
4,"(""10-15 positions"")-[:relationship@-6544047668..."
5,"(""100 women"")-[:relationship@-5343174954560565..."
6,"(""100 women season"")-[:relationship@2203100665..."
7,"(""100 young people"")-[:relationship@1285022070..."
8,"(""100bn-euro bailout"")-[:relationship@-8557117..."
9,"(""103 factories"")-[:relationship@-449693159216..."


### Perform Graph RAG Query

In [10]:
from llama_index.core import StorageContext
from llama_index.graph_stores.nebula import NebulaGraphStore

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

In [11]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    verbose=True,
)

query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

In [4]:
# from IPython.display import display, Markdown

# response = query_engine.query(
#     "Tell me about Peter Quill?",
# )

In [5]:
# response = await query_engine.aquery(
#     "Tell me about Peter Quill?",
# )
# display(Markdown(f"<b>{response}</b>"))

In [6]:
# # import QueryBundle
# from llama_index.core import QueryBundle
# # import NodeWithScore
# # from llama_index.core.data_structs import NodeWithScore
# # Retrievers 
# from llama_index.retrievers import BaseRetriever, VectorIndexRetriever, KGTableRetriever

# from typing import List


# class CustomRetriever(BaseRetriever):
#     """Custom retriever that performs both Vector search and Knowledge Graph search"""
    
#     def __init__(
#         self,
#         vector_retriever: VectorIndexRetriever,
#         kg_retriever: KGTableRetriever,
#         mode: str = "OR"
#     ) -> None:
#         """Init params."""
        
#         self._vector_retriever = vector_retriever
#         self._kg_retriever = kg_retriever
#         if mode not in ("AND", "OR"):
#             raise ValueError("Invalid mode.")
#         self._mode = mode
        
#     def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]: 
#         """Retrieve nodes given query."""
        
#         vector_nodes = self._vector_retriever.retrieve(query_bundle)
#         kg_nodes = self._kg_retriever.retrieve(query_bundle)

#         vector_ids = {n.node.get_doc_id() for n in vector_nodes}
#         kg_ids = {n.node.get_doc_id() for n in kg_nodes}
        
#         combined_dict = {n.node.get_doc_id(): n for n in vector_nodes}
#         combined_dict.update({n.node.get_doc_id(): n for n in kg_nodes})
        
#         if self._mode == "AND":
#             retrieve_ids = vector_ids.intersection(kg_ids)
#         else:
#             retrieve_ids = vector_ids.union(kg_ids)

#         retrieve_nodes = [combined_dict[rid] for rid in retrieve_ids]
#         return retrieve_nodes

In [15]:
# graph_rag_retriever_with_nl2graphquery = KnowledgeGraphRAGRetriever(
#     storage_context=storage_context,
#     verbose=True,
#     with_nl2graphquery=True,
# )

# query_engine_with_nl2graphquery = RetrieverQueryEngine.from_args(
#     graph_rag_retriever_with_nl2graphquery,
# )

In [7]:
# response = query_engine_with_nl2graphquery.query(
#     "What do you know about Peter Quill?",
# )
# display(Markdown(f"<b>{response}</b>"))

### Perform Graph RAG Query

In [8]:
# from llama_index.core.query_engine import RetrieverQueryEngine
# from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

# graph_rag_retriever = KnowledgeGraphRAGRetriever(
#     storage_context=storage_context,
#     verbose=True,
# )

# query_engine = RetrieverQueryEngine.from_args(
#     graph_rag_retriever,
# )

In [8]:
# response = await query_engine.aquery(
#     "Tell me about Harry Potter?",
# )
# display(Markdown(f"<b>{response}</b>"))

In [9]:
# graph_rag_retriever_with_nl2graphquery = KnowledgeGraphRAGRetriever(
#     storage_context=storage_context,
#     verbose=True,
#     with_nl2graphquery=True,
# )

# query_engine_with_nl2graphquery = RetrieverQueryEngine.from_args(
#     graph_rag_retriever_with_nl2graphquery,
# )

In [10]:
# response = query_engine_with_nl2graphquery.query(
#     "What do you know about Harry Potter?",
# )
# display(Markdown(f"<b>{response}</b>"))

# WARNING:llama_index.core.indices.knowledge_graph.retrievers:Error in retrieving from nl2graphquery: 'NoneType' object has no attribute 'kwargs'

# Asking the Knowledge Graph

In [9]:
from llama_index.core.query_engine import KnowledgeGraphQueryEngine

from llama_index.core import StorageContext
from llama_index.graph_stores.nebula import NebulaGraphStore


llm = OpenAI(model="gpt-4")
query_engine = KnowledgeGraphQueryEngine(
    storage_context=storage_context,
    llm=llm,
    verbose=True,
)

In [2]:
# response = query_engine.query(
#     "I'm looking for the information of Harry Potter. What could you suggest to me?",
# )

### Perform Graph RAG Query

In [6]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    verbose=True,
)

query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

In [3]:
# from IPython.display import display, Markdown

# response = query_engine.query(
#     "Tell me about Peter Quill?",
# )
# display(Markdown(f"<b>{response}</b>"))

In [4]:
# from IPython.display import display, Markdown

# response = query_engine.query(
#     "Tell me about Peter Quill?",
# )
# display(Markdown(f"<b>{response}</b>"))

# Connect to Nebula Graph and set up a new space

In [1]:
from dotenv import load_dotenv, find_dotenv
import openai
import os

_ = load_dotenv(find_dotenv())
openai.api_key = os.getenv('OPENAI_API_KEY')

In [2]:
os.environ["GRAPHD_HOST"] = "127.0.0.1"
os.environ["NEBULA_USER"] = "root"
os.environ["NEBULA_PASSWORD"] = "nebula" 
os.environ["NEBULA_ADDRESS"] = "127.0.0.1:9669"  

In [3]:
%reload_ext ngql
connection_string = f"--address {os.environ['GRAPHD_HOST']} --port 9669 --user root --password {os.environ['NEBULA_PASSWORD']}"
%ngql {connection_string}

Connection Pool Created


Unnamed: 0,Name
0,default
1,llamaindex
2,phillies_rag


In [4]:
%ngql CREATE SPACE IF NOT EXISTS phillies_rag(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);

In [5]:
%ngql SHOW SPACES;

Unnamed: 0,Name
0,default
1,llamaindex
2,phillies_rag


In [6]:
%ngql USE phillies_rag;

In [7]:
%ngql SHOW TAGS;

Unnamed: 0,Name
0,entity


In [8]:
%ngql SHOW EDGES;

Unnamed: 0,Name
0,relationship


In [9]:
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore

space_name = "phillies_rag"
edge_types, rel_prop_names = ["relationship"], ["relationship"]
tags = ["entity"]

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

## Load data and create KG index

In [10]:
from llama_index import (
    LLMPredictor,
    ServiceContext,
    KnowledgeGraphIndex,
)
from llama_index.graph_stores import SimpleGraphStore
from llama_index import download_loader
from llama_index.llms import OpenAI

# define LLM
llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(llm=llm, chunk_size=512)

In [11]:
from llama_index import load_index_from_storage
from llama_hub.youtube_transcript import YoutubeTranscriptReader

try:

    storage_context = StorageContext.from_defaults(persist_dir='./storage_graph', graph_store=graph_store)
    kg_index = load_index_from_storage(
        storage_context=storage_context,
        service_context=service_context,
        max_triplets_per_chunk=15,
        space_name=space_name,
        edge_types=edge_types,
        rel_prop_names=rel_prop_names,
        tags=tags,
        verbose=True,
    )
    index_loaded = True
except:
    index_loaded = False

if not index_loaded:
    
    WikipediaReader = download_loader("WikipediaReader")
    loader = WikipediaReader()
    wiki_documents = loader.load_data(pages=['Philadelphia Phillies'], auto_suggest=False)
    print(f'Loaded {len(wiki_documents)} documents')

    youtube_loader = YoutubeTranscriptReader()
    youtube_documents = youtube_loader.load_data(ytlinks=['https://www.youtube.com/watch?v=k-HTQ8T7oVw'])    
    print(f'Loaded {len(youtube_documents)} YouTube documents')

    kg_index = KnowledgeGraphIndex.from_documents(
        documents=wiki_documents + youtube_documents,
        storage_context=storage_context,
        max_triplets_per_chunk=15,
        service_context=service_context,
        space_name=space_name,
        edge_types=edge_types,
        rel_prop_names=rel_prop_names,
        tags=tags,
        include_embeddings=True,
    )
    
    kg_index.storage_context.persist(persist_dir='./storage_graph')

Collecting wikipedia~=1.4 (from -r /Users/linghuang/miniconda3/envs/llamaindex-nebulagrpah2/lib/python3.10/site-packages/llama_index/readers/llamahub_modules/wikipedia/requirements.txt (line 1))
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py): started
  Building wheel for wikipedia (setup.py): finished with status 'done'
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11678 sha256=45fdf0c179913ece2a74ffdd057864f58546156b2332048802f87a2a344f48f5
  Stored in directory: /Users/linghuang/Library/Caches/pip/wheels/5e/b6/c5/93f3dec388ae76edc830cb42901bb0232504dfc0df02fc50de
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0
Loaded 1 documents
Loaded 1 YouTube documents
(Philadelphia Phillies, are, American professiona

## Query with Text2Cypher

In [13]:
from llama_index.query_engine import KnowledgeGraphQueryEngine
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore
from IPython.display import Markdown, display

# query_engine = KnowledgeGraphQueryEngine(
#     storage_context=storage_context,
#     service_context=service_context,
#     llm=llm,
#     verbose=True,
# )

hybrid_query_engine = kg_index.as_query_engine(
    include_text=True,
    response_mode="tree_summarize",
    embedding_mode="hybrid",
    similarity_top_k=3,
    explore_global_knowledge=True,
)

In [14]:
hybrid_query_engine = kg_index.as_query_engine()
response = hybrid_query_engine.query("Tell me about Bryce Harper.")
display(Markdown(f"<b>{response}</b>"))

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/linghuang/Library/Caches/llama_index...
[nltk_data]   Unzipping corpora/stopwords.zip.


<b>Bryce Harper was named MVP of the NLCS and also won the NL Most Valuable Player Award in 2021.</b>

In [15]:
query_engine = kg_index.as_query_engine()
response = query_engine.query("Tell me about Bryce Harper.")
display(Markdown(f"<b>{response}</b>"))

<b>Bryce Harper was named MVP of the NLCS and also won the NL Most Valuable Player Award in 2021.</b>

In [16]:
response = query_engine.query("Tell me about Bryce Harper.")
display(Markdown(f"<b>{response}</b>"))

<b>Bryce Harper was named MVP of the NLCS and also won the NL Most Valuable Player Award in 2021.</b>

In [17]:
%%ngql
MATCH (p:`entity`)-[e:relationship]->(m:`entity`)
  WHERE p.`entity`.`name` == 'Bryce Harper'
RETURN p, e, m;

Unnamed: 0,p,e,m
0,"(""Bryce Harper"" :entity{name: ""Bryce Harper""})","(""Bryce Harper"")-[:relationship@-8274288782670...","(""MVP in 2021"" :entity{name: ""MVP in 2021""})"
1,"(""Bryce Harper"" :entity{name: ""Bryce Harper""})","(""Bryce Harper"")-[:relationship@-8274288782670...","(""NL Most Valuable Player Award"" :entity{name:..."
2,"(""Bryce Harper"" :entity{name: ""Bryce Harper""})","(""Bryce Harper"")-[:relationship@87212673184855...","(""MVP"" :entity{name: ""MVP""})"
3,"(""Bryce Harper"" :entity{name: ""Bryce Harper""})","(""Bryce Harper"")-[:relationship@87212673184855...","(""MVP of NLCS"" :entity{name: ""MVP of NLCS""})"


In [19]:
response = query_engine.query("How did the standing ovation Trey Turner received change his season?")
display(Markdown(f"<b>{response}</b>"))

<b>The standing ovation Trey Turner received appeared to positively impact his season. It seemed to boost his morale and confidence, resulting in improved performance on the field. This support from the fans translated into better gameplay, with Trey Turner showcasing one of his best performances in weeks. The encouragement and appreciation from the city of Philadelphia seemed to have a significant effect on Trey Turner's spirits, leading to enhancements in his batting average, home runs, and overall game performance.</b>

In [20]:
%%ngql 
MATCH (p:`entity`)-[r:`relationship`]->(q:`entity`)
WHERE p.`entity`.`name` == 'Trey Turner' 
RETURN p, r, q;

Unnamed: 0,p,r,q
0,"(""Trey Turner"" :entity{name: ""Trey Turner""})","(""Trey Turner"")-[:relationship@-88708975167964...","(""team"" :entity{name: ""team""})"
1,"(""Trey Turner"" :entity{name: ""Trey Turner""})","(""Trey Turner"")-[:relationship@-86686905492772...","(""everyone confused"" :entity{name: ""everyone c..."
2,"(""Trey Turner"" :entity{name: ""Trey Turner""})","(""Trey Turner"")-[:relationship@-86506552461924...","(""answers"" :entity{name: ""answers""})"
3,"(""Trey Turner"" :entity{name: ""Trey Turner""})","(""Trey Turner"")-[:relationship@-86506552461924...","(""pitches"" :entity{name: ""pitches""})"
4,"(""Trey Turner"" :entity{name: ""Trey Turner""})","(""Trey Turner"")-[:relationship@-86146775363823...","(""RBI total"" :entity{name: ""RBI total""})"
...,...,...,...
66,"(""Trey Turner"" :entity{name: ""Trey Turner""})","(""Trey Turner"")-[:relationship@774143091564343...","(""game of baseball"" :entity{name: ""game of bas..."
67,"(""Trey Turner"" :entity{name: ""Trey Turner""})","(""Trey Turner"")-[:relationship@782759718195839...","(""what happened"" :entity{name: ""what happened""})"
68,"(""Trey Turner"" :entity{name: ""Trey Turner""})","(""Trey Turner"")-[:relationship@784316126070734...","(""WBC home run total"" :entity{name: ""WBC home ..."
69,"(""Trey Turner"" :entity{name: ""Trey Turner""})","(""Trey Turner"")-[:relationship@874723966017160...","(""300 million dollars"" :entity{name: ""300 mill..."


In [22]:
response = query_engine.query("Tell me about some of the facts of Philadelphia Phillies.")
display(Markdown(f"<b>{response}</b>"))

<b>The Philadelphia Phillies have won two World Series championships, eight National League pennants, and have made 15 playoff appearances. They have played 21,486 games with a regular season record of 10,112–11,259–115. The Phillies are the oldest, continuous, one-name, one-city franchise in all of professional sports. The team has had standout players like Billy Hamilton, Sam Thompson, and Ed Delahanty in their history. The Phillies won their first pennant in 1915 and have had notable players like Grover Cleveland Alexander and Gavvy Cravath. The team has a long history dating back to 1883 and has played at various stadiums in Philadelphia.</b>

In [23]:
%%ngql 
MATCH (p:`entity`)-[e:relationship]->(m:`entity`)
  WHERE p.`entity`.`name` == 'Philadelphia Phillies' 
RETURN p, e, m;

Unnamed: 0,p,e,m
0,"(""Philadelphia Phillies"" :entity{name: ""Philad...","(""Philadelphia Phillies"")-[:relationship@-4972...","(""world record for most ever losses"" :entity{n..."
1,"(""Philadelphia Phillies"" :entity{name: ""Philad...","(""Philadelphia Phillies"")-[:relationship@-3256...","(""several stadiums in the city"" :entity{name: ..."
2,"(""Philadelphia Phillies"" :entity{name: ""Philad...","(""Philadelphia Phillies"")-[:relationship@-2366...","(""American professional baseball team"" :entity..."
3,"(""Philadelphia Phillies"" :entity{name: ""Philad...","(""Philadelphia Phillies"")-[:relationship@-2366...","(""historically associated with futility"" :enti..."
4,"(""Philadelphia Phillies"" :entity{name: ""Philad...","(""Philadelphia Phillies"")-[:relationship@-2366...","(""oldest one-name one-city franchise"" :entity{..."
5,"(""Philadelphia Phillies"" :entity{name: ""Philad...","(""Philadelphia Phillies"")-[:relationship@-1237...","(""eight National League pennants"" :entity{name..."
6,"(""Philadelphia Phillies"" :entity{name: ""Philad...","(""Philadelphia Phillies"")-[:relationship@-1237...","(""two World Series championships"" :entity{name..."
7,"(""Philadelphia Phillies"" :entity{name: ""Philad...","(""Philadelphia Phillies"")-[:relationship@31737...","(""Major League Baseball"" :entity{name: ""Major ..."
8,"(""Philadelphia Phillies"" :entity{name: ""Philad...","(""Philadelphia Phillies"")-[:relationship@35386...","(""Philadelphia"" :entity{name: ""Philadelphia""})"
9,"(""Philadelphia Phillies"" :entity{name: ""Philad...","(""Philadelphia Phillies"")-[:relationship@79481...","(""official website"" :entity{name: ""official we..."


## Query with vector index

In [25]:
from llama_index import VectorStoreIndex

WikipediaReader = download_loader("WikipediaReader")
loader = WikipediaReader()
wiki_documents = loader.load_data(pages=['Philadelphia Phillies'], auto_suggest=False)
print(f'Loaded {len(wiki_documents)} documents')

youtube_loader = YoutubeTranscriptReader()
youtube_documents = youtube_loader.load_data(ytlinks=['https://www.youtube.com/watch?v=k-HTQ8T7oVw'])    
print(f'Loaded {len(youtube_documents)} YouTube documents')

vector_index = VectorStoreIndex.from_documents(wiki_documents + youtube_documents)
vector_query_engine = vector_index.as_query_engine()

response = vector_query_engine.query("Tell me about Bryce Harper.")
display(Markdown(f"<b>{response}</b>"))

Loaded 1 documents
Loaded 1 YouTube documents


<b>Bryce Harper was signed by the Phillies to a 13-year, $330 million deal during the off-season.</b>

In [26]:
response = vector_query_engine.query("How did the standing ovation Trey Turner received change his season?")
display(Markdown(f"<b>{response}</b>"))

<b>The standing ovation Trey Turner received seemed to have a positive impact on his season. Following the ovation, Trey Turner's performance notably improved. He experienced a surge in his batting average, hitting five home runs and achieving a high OPS in the subsequent 18 games. Turner himself mentioned that the hit he got after the ovation felt like one of his best swings all year. The support and love shown to him by the city of Philadelphia appeared to have lifted his spirits, leading to a significant turnaround in his performance on the field.</b>

In [27]:
response = vector_query_engine.query("Tell me about some of the facts of Philadelphia Phillies.")
display(Markdown(f"<b>{response}</b>"))

<b>The Philadelphia Phillies have won two World Series championships, eight National League pennants, and have made 15 playoff appearances. They have played a total of 21,486 games with a regular season record of 10,112–11,259–115. The team has a history dating back to 1883 and is one of the oldest professional sports franchises in the United States. The Phillies have had notable players like Grover Cleveland Alexander, Gavvy Cravath, and Mike Schmidt, with 33 Phillies players being inducted into the Baseball Hall of Fame.</b>

## Create CustomRetriever to combine vector index and KG index

In [28]:
from llama_index import QueryBundle
from llama_index.schema import NodeWithScore
from llama_index.retrievers import BaseRetriever, VectorIndexRetriever, KGTableRetriever
from typing import List

class CustomRetriever(BaseRetriever):
    """Custom retriever that performs both Vector search and Knowledge Graph search"""

    def __init__(
        self,
        vector_retriever: VectorIndexRetriever,
        kg_retriever: KGTableRetriever,
        mode: str = "OR",
    ) -> None:
        """Init params."""

        self._vector_retriever = vector_retriever
        self._kg_retriever = kg_retriever
        if mode not in ("AND", "OR"):
            raise ValueError("Invalid mode.")
        self._mode = mode

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve nodes given query."""

        vector_nodes = self._vector_retriever.retrieve(query_bundle)
        kg_nodes = self._kg_retriever.retrieve(query_bundle)

        vector_ids = {n.node.node_id for n in vector_nodes}
        kg_ids = {n.node.node_id for n in kg_nodes}

        combined_dict = {n.node.node_id: n for n in vector_nodes}
        combined_dict.update({n.node.node_id: n for n in kg_nodes})

        if self._mode == "AND":
            retrieve_ids = vector_ids.intersection(kg_ids)
        else:
            retrieve_ids = vector_ids.union(kg_ids)

        retrieve_nodes = [combined_dict[rid] for rid in retrieve_ids]
        return retrieve_nodes

In [29]:
from llama_index import get_response_synthesizer
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.retrievers import VectorIndexRetriever, KGTableRetriever

# create custom retriever
vector_retriever = VectorIndexRetriever(index=vector_index)
kg_retriever = KGTableRetriever(
    index=kg_index, retriever_mode="keyword", include_text=False
)
custom_retriever = CustomRetriever(vector_retriever, kg_retriever)

# create response synthesizer
response_synthesizer = get_response_synthesizer(
    service_context=service_context,
    response_mode="tree_summarize",
)

## Create 7 query engines and run queries

In [30]:
# KG vector-based entity retrieval
kg_query_engine = kg_index.as_query_engine()

# KG keyword-based entity retrieval
kg_keyword_query_engine = kg_index.as_query_engine(
    # setting to false uses the raw triplets instead of adding the text from the corresponding nodes
    include_text=False,
    retriever_mode="keyword",
    response_mode="tree_summarize",
)

# KG hybrid entity retrieval
kg_hybrid_query_engine = kg_index.as_query_engine(
    include_text=True,
    response_mode="tree_summarize",
    embedding_mode="hybrid",
    similarity_top_k=3,
    explore_global_knowledge=True,
)

# Raw vector index retrieval
vector_query_engine = vector_index.as_query_engine()

# Custom combo query engine
custom_query_engine = RetrieverQueryEngine(
    retriever=custom_retriever,
    response_synthesizer=response_synthesizer,
)

# using KnowledgeGraphQueryEngine
from llama_index.query_engine import KnowledgeGraphQueryEngine

kgqe_query_engine = KnowledgeGraphQueryEngine(
    storage_context=storage_context,
    service_context=service_context,
    llm=llm,
    verbose=True,
)

# using KnowledgeGraphRAGRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    service_context=service_context,
    llm=llm,
    verbose=True,
)

kg_rag_query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever, service_context=service_context
)

In [31]:
response = kg_query_engine.query("Tell me about Bryce Harper.")
display(Markdown(f"<b>{response}</b>"))

<b>Bryce Harper was named MVP of the NLCS and also won the NL Most Valuable Player Award in 2021.</b>

In [32]:
response = kg_keyword_query_engine.query("Tell me about Bryce Harper.")
display(Markdown(f"<b>{response}</b>"))

<b>Bryce Harper won the NL Most Valuable Player Award and was named MVP in 2021. He also won the NL Most Valuable Player Award previously and was named MVP of NLCS. Additionally, he has been involved in relationships with other individuals like Ryan Howard and Mike Schmidt in the context provided.</b>

In [33]:
response = kg_hybrid_query_engine.query("Tell me about Bryce Harper.")
display(Markdown(f"<b>{response}</b>"))

<b>Bryce Harper won the NL Most Valuable Player Award and was named MVP of the NLCS. He also won the MVP award in 2021.</b>

In [34]:
response = vector_query_engine.query("Tell me about Bryce Harper.")
display(Markdown(f"<b>{response}</b>"))

<b>Bryce Harper was signed by the Phillies to a 13-year, $330 million deal during the off-season.</b>

In [35]:
response = custom_query_engine.query("Tell me about Bryce Harper.")
display(Markdown(f"<b>{response}</b>"))

<b>Bryce Harper has been involved in various significant events and achievements throughout his career. He has won the NL Most Valuable Player Award, was named MVP in 2021, and won the NL Most Valuable Player Award multiple times. Harper has been a key player for the Phillies, contributing to their success in different seasons. Additionally, he has been part of trades and signings that have impacted the team's roster and performance.</b>

In [36]:
response = kgqe_query_engine.query("Tell me about Bryce Harper.")
display(Markdown(f"<b>{response}</b>"))

[33;1m[1;3mGraph Store Query:
MATCH (e1:`entity`)-[:relationship]->(e2:`entity`)
WHERE e1.`entity`.`name` == 'Bryce Harper'
RETURN e2.`entity`.`name`;
[0m[33;1m[1;3mGraph Store Response:
{'e2.entity.name': ['MVP in 2021', 'NL Most Valuable Player Award', 'MVP', 'MVP of NLCS']}
[0m[32;1m[1;3mFinal Response: Bryce Harper was named MVP in 2021, received the NL Most Valuable Player Award, and was also the MVP of the NLCS.
[0m

<b>Bryce Harper was named MVP in 2021, received the NL Most Valuable Player Award, and was also the MVP of the NLCS.</b>

In [37]:
response = kg_rag_query_engine.query("Tell me about Bryce Harper.")
display(Markdown(f"<b>{response}</b>"))

[32;1m[1;3mEntities processed: ['Bryce', 'Bryce Harper', 'Harper']
[0m[32;1m[1;3mEntities processed: ['Bryce', 'bryce', 'Bryce Harper', 'harper', 'Harper']
[0m[36;1m[1;3mGraph RAG context:
The following are knowledge sequence in max depth 2 in the form of directed graph like:
`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...` extracted based on key entities as subject:
Bryce Harper{name: Bryce Harper} -[relationship:{relationship: won}]-> NL Most Valuable Player Award{name: NL Most Valuable Player Award} <-[relationship:{relationship: won}]- Ryan Howard{name: Ryan Howard}
Bryce Harper{name: Bryce Harper} -[relationship:{relationship: won}]-> NL Most Valuable Player Award{name: NL Most Valuable Player Award}
Bryce Harper{name: Bryce Harper} -[relationship:{relationship: was named}]-> MVP{name: MVP}
Bryce Harper{name: Bryce Harper} -[relationship:{relationship: won}]-> MVP in 2021{name: MVP in 2021}
Bryce Harper{name: Bryce Harper} -[relationship:{rela

<b>Bryce Harper won the NL Most Valuable Player Award and was named MVP in 2021. He also won the NL Most Valuable Player Award and was named MVP of NLCS. Additionally, he was involved in a relationship where he won the award along with Ryan Howard and Mike Schmidt.</b>

In [38]:
response = kg_query_engine.query("How did the standing ovation Trey Turner received change his season?")
display(Markdown(f"<b>{response}</b>"))

<b>The standing ovation Trey Turner received appeared to positively impact his season. Following the ovation, Turner seemed to feel supported and appreciated, leading to improved performance on the field. He had one of his best games in weeks and his batting average significantly increased in the following 18 games. The love and support from the fans seemed to boost Turner's confidence and morale, resulting in a noticeable improvement in his gameplay and overall demeanor.</b>

In [39]:
response = kg_keyword_query_engine.query("How did the standing ovation Trey Turner received change his season?")
display(Markdown(f"<b>{response}</b>"))

<b>The standing ovation Trey Turner received seemed to have a positive impact on his season, as it was mentioned that he was in the worst slump of his career at that time. The ovation may have boosted his morale and confidence, potentially helping him to overcome his struggles and improve his performance on the field.</b>

In [40]:
response = kg_hybrid_query_engine.query("How did the standing ovation Trey Turner received change his season?")
display(Markdown(f"<b>{response}</b>"))

<b>The standing ovation Trey Turner received appeared to have a significant impact on his season, leading to a positive shift in his performance and mindset. The support and love he received from the city of Philadelphia seemed to boost his confidence and morale, resulting in improved gameplay. This newfound support translated into an increase in his batting average, more home runs, and an overall notable improvement in his performance on the field. The standing ovation seemed to lift Trey Turner out of his slump, helping him regain his form and play with a renewed sense of joy and focus.</b>

In [41]:
response = vector_query_engine.query("How did the standing ovation Trey Turner received change his season?")
display(Markdown(f"<b>{response}</b>"))

<b>The standing ovation Trey Turner received seemed to have a positive impact on his season. Following the ovation, Trey Turner's performance notably improved. He experienced one of his best games in weeks, mentioning that the hit he got felt like one of his best swings all year. Subsequently, in the 18 games after the standing ovation, Trey Turner displayed a significant improvement in his batting average, OPS, hits, home runs, and RBIs. This positive response from the fans and the support shown seemed to have lifted Trey Turner's spirits, leading to a noticeable change in his performance on the field.</b>

In [42]:
response = custom_query_engine.query("How did the standing ovation Trey Turner received change his season?")
display(Markdown(f"<b>{response}</b>"))

<b>The standing ovation Trey Turner received appeared to have a significant positive impact on his season. It boosted his confidence, morale, and seemed to help him break out of a slump. This translated into improved performance on the field, with notable increases in his batting average, OPS, hits, home runs, and RBIs in the games following the standing ovation.</b>

In [43]:
# response = kgqe_query_engine.query("How did the standing ovation Trey Turner received change his season?")
# display(Markdown(f"<b>{response}</b>"))

In [44]:
response = kg_rag_query_engine.query("How did the standing ovation Trey Turner received change his season?")
display(Markdown(f"<b>{response}</b>"))

[32;1m[1;3mEntities processed: ['Trey Turner', 'Turner', 'standing', 'season', 'standing ovation', 'change', 'ovation', 'Trey']
[0m[32;1m[1;3mEntities processed: ['Trey Turner', 'Change', 'Turner', 'Ovation', 'Standing', 'Standing Ovation', 'Season', 'Trey']
[0m[36;1m[1;3mGraph RAG context:
The following are knowledge sequence in max depth 2 in the form of directed graph like:
`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...` extracted based on key entities as subject:
Trey{name: Trey} -[relationship:{relationship: has been winning}]-> not{name: not}
Trey{name: Trey} -[relationship:{relationship: has been fighting}]-> demons{name: demons}
Trey{name: Trey} -[relationship:{relationship: looked}]-> defeated{name: defeated}
Trey{name: Trey} -[relationship:{relationship: has been stuck inside of}]-> head{name: head}
standing ovation{name: standing ovation} <-[relationship:{relationship: received}]- Trey Turner{name: Trey Turner} -[relationship:{relations

<b>The standing ovation Trey Turner received seemed to have positively impacted his season, as he was noted to be more relaxed, playing the game of baseball, and showing love. Additionally, he was observed to be out of his own head, seemingly gone mental, and seemingly gone powers. These changes suggest a shift towards a more positive and focused mindset during his season.</b>

In [45]:
response = kg_query_engine.query("Tell me some facts about the current stadium of Philadelphia Phillies.")
display(Markdown(f"<b>{response}</b>"))

<b>The current stadium of the Philadelphia Phillies is Citizens Bank Park, situated in the South Philadelphia Sports Complex. The Phillies have been playing at Citizens Bank Park since 2004.</b>

In [46]:
response = kg_keyword_query_engine.query("Tell me some facts about the current stadium of Philadelphia Phillies.")
display(Markdown(f"<b>{response}</b>"))

<b>The current stadium of the Philadelphia Phillies is known for featuring cut-out cardboard figures.</b>

In [47]:
response = kg_hybrid_query_engine.query("Tell me some facts about the current stadium of Philadelphia Phillies.")
display(Markdown(f"<b>{response}</b>"))

<b>The current stadium of the Philadelphia Phillies is Citizens Bank Park, located in South Philadelphia. It replaced Veterans Stadium in 2004 and is part of the South Philadelphia Sports Complex, along with Lincoln Financial Field and the Wells Fargo Center. Citizens Bank Park has been the home of the Phillies since its opening and has hosted various events, providing a modern and fan-friendly experience for spectators.</b>

In [48]:
response = vector_query_engine.query("Tell me some facts about the current stadium of Philadelphia Phillies.")
display(Markdown(f"<b>{response}</b>"))

<b>The current stadium of the Philadelphia Phillies is Citizens Bank Park, located in the South Philadelphia Sports Complex. It has been the team's home stadium since 2004. The stadium features a restaurant named "Harry the K's" in honor of Harry Kalas, the Phillies' legendary broadcaster. Additionally, the Phillies spent $10 million in 2011 to upgrade the video system at Citizens Bank Park, including installing a new display screen in left field, making it the largest in the National League at 76 feet high and 97 feet wide.</b>

In [49]:
response = custom_query_engine.query("Tell me some facts about the current stadium of Philadelphia Phillies.")
display(Markdown(f"<b>{response}</b>"))

<b>The current stadium of the Philadelphia Phillies is Citizens Bank Park, which has been their home since 2004. It features a restaurant named "Harry the K's" in honor of Harry Kalas, the iconic Phillies broadcaster. The TV broadcast booth at the stadium is named "The Harry Kalas Broadcast Booth" in memory of him, while the radio-broadcast booth is named "The Richie 'Whitey' Ashburn Broadcast Booth" after another beloved Phillies broadcaster. Additionally, a statue of Harry Kalas was unveiled at Citizens Bank Park in 2011, funded and constructed by Phillies fans.</b>

In [50]:
response = kgqe_query_engine.query("Tell me some facts about the current stadium of Philadelphia Phillies.")
display(Markdown(f"<b>{response}</b>"))

[33;1m[1;3mGraph Store Query:
MATCH (s:`entity`)-[:relationship]->(stadium:`entity`)
WHERE s.`entity`.`name` == 'Philadelphia Phillies'
RETURN stadium.`entity`.`name`;
[0m[33;1m[1;3mGraph Store Response:
{'stadium.entity.name': ['world record for most ever losses', 'several stadiums in the city', 'American professional baseball team', 'historically associated with futility', 'oldest one-name one-city franchise', 'eight National League pennants', 'two World Series championships', 'Major League Baseball', 'Philadelphia', 'official website', '15 playoff appearances', 'Philadelphia']}
[0m[32;1m[1;3mFinal Response: The current stadium of Philadelphia Phillies has a world record for most ever losses, has been historically associated with futility, is the oldest one-name one-city franchise, has won eight National League pennants and two World Series championships. It is associated with Major League Baseball, located in Philadelphia, has an official website, and has made 15 playoff app

<b>The current stadium of Philadelphia Phillies has a world record for most ever losses, has been historically associated with futility, is the oldest one-name one-city franchise, has won eight National League pennants and two World Series championships. It is associated with Major League Baseball, located in Philadelphia, has an official website, and has made 15 playoff appearances.</b>

In [51]:
response = kg_rag_query_engine.query("Tell me some facts about the current stadium of Philadelphia Phillies.")
display(Markdown(f"<b>{response}</b>"))

[32;1m[1;3mEntities processed: ['current', 'stadium', 'Phillies', 'Philadelphia', 'facts', 'Philadelphia Phillies']
[0m[32;1m[1;3mEntities processed: ['current', 'Phillies', 'stadium', 'Philadelphia', 'facts', 'Philadelphia Phillies']
[0m[36;1m[1;3mGraph RAG context:
The following are knowledge sequence in max depth 2 in the form of directed graph like:
`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...` extracted based on key entities as subject:
Philadelphia Phillies{name: Philadelphia Phillies} -[relationship:{relationship: were founded in}]-> Philadelphia{name: Philadelphia} <-[relationship:{relationship: left}]- Philadelphia Athletics{name: Philadelphia Athletics}
Philadelphia Phillies{name: Philadelphia Phillies} -[relationship:{relationship: were founded in}]-> Philadelphia{name: Philadelphia} -[relationship:{relationship: radio hosts}]-> did something Unthinkable{name: did something Unthinkable}
Philadelphia Phillies{name: Philadelphia Phillie

<b>The current stadium of Philadelphia Phillies has been associated with being a fertile hitting ground and has hosted several stadiums in the city.</b>