<h2>Using Knowledge Graph Context and Vector Index</h2>

In [1]:
import os

from llama_index.core import VectorStoreIndex,ServiceContext
from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.graph_stores.nebula import NebulaGraphStore
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from llama_index.core.graph_stores import SimpleGraphStore

from dotenv import load_dotenv



<h3>Setting up Environment</h3>

In [2]:
# Load variables from the .env file into environment variables
dotenv_path = '.env'
load_dotenv(dotenv_path)


# Setting up LLM for Llama_index
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = llm
Settings.chunk_size = 512

<h3>Loading documents from local</h3>

In [3]:
documents = SimpleDirectoryReader("data").load_data()

<h3>Setup Nebula Graph Store</h3>

In [4]:

space_name = "paul_graham_essay"
edge_types, rel_prop_names = ["relationship"], [
    "relationship"
] 
tags = ["entity"] 

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)


<h3>Creating index with documents ang Graph store</h3>

In [5]:

storage_context = StorageContext.from_defaults(graph_store=graph_store)

nebula_index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=2,
    storage_context=storage_context,
    show_progress=True,
    include_embeddings=True,
)

  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 2/2 [00:00<00:00, 18.07it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,  3.05it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,  3.37it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,  3.25it/s]
Generating embeddings: 100%|██████████| 10/10 [00:00<00:00, 11.44it/s]
Generating embeddings: 100%|██████████| 3/3 [00:00<00:00,  4.34it/s]
Generating embeddings: 100%|██████████| 14/14 [00:01<00:00, 11.94it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,  2.91it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,  3.27it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,  3.22it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,  3.40it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,  3.18it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,  2.77it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,

<h3>Creating Graph Query Engine and Querying</h3>

In [6]:
graph_query_engine = nebula_index.as_query_engine(response_mode="tree_summarize",include_text=False)

response = graph_query_engine.query(
    "What is Cricket?",
)
print(response)

Cricket is a game that is perceived to be a bat-and-ball game, played in South East England, originated in England, played between two teams, played on modified fields, has historical ties, influenced lexicon, is the subject of works, suggests David Block, is played on a field, regulated by umpires, spread globally with the British Empire, stopped during the Second World War, influenced popular culture, dominated by Don Bradman, began to expand in 1888-89, and has a broad impact.


<h3>Creating Vector index from given documents</h3>

In [7]:

storage_context = StorageContext.from_defaults()
service_context = ServiceContext.from_defaults(llm=llm, chunk_size=1024)
vector_index = VectorStoreIndex.from_documents(
    documents=documents,
    service_context=service_context,
    storage_context=storage_context
)

  service_context = ServiceContext.from_defaults(llm=llm, chunk_size=1024)


<h3>Querying with Vector index</h3>

In [8]:
vector_query_engine = vector_index.as_query_engine()

response1 = vector_query_engine.query("What is Cricket?")
response1.response

"Cricket is a bat-and-ball game played between two teams of eleven players on a field with a 22-yard pitch containing a wicket at each end. The game involves a bowler from the fielding team bowling the ball towards the striker's wicket, with the striker aiming to hit the ball and score runs by exchanging places with the nonstriker. The fielding team aims to dismiss batters to prevent runs from being scored. Cricket has various forms, ranging from Twenty20 to Test matches, and is governed by the International Cricket Council (ICC) with rules maintained by the Marylebone Cricket Club (MCC)."

<h3>Comparing query response of Graph and Vector index</h3>

In [9]:
response = graph_query_engine.query(
    "What is the relationship between cricket and baseball?",
)
print(response)

response1 = vector_query_engine.query("What is the relationship between cricket and baseball?")
response1.response

The relationship between cricket and baseball is that they are both bat-and-ball games.


'Cricket and baseball are both bat-and-ball sports played between two teams, each with a specific number of players. In cricket, the game is played with eleven players on each team, while in baseball, each team consists of nine players. Both sports involve scoring runs by hitting the ball and running between designated points on the field. Additionally, both sports have a defensive team that aims to prevent the opposing team from scoring runs by getting players out. Despite some differences in rules and gameplay, cricket and baseball share similarities in their fundamental structure and objectives as bat-and-ball games.'

<h3>Giving graph output as context for Vector Index</h3>

In [10]:
query = "What is the relationship between cricket and baseball?"
response = graph_query_engine.query(
    query
)
ext_query = f"Context from knowledge graph{response.response}. Query: What is the relationship between baseball and cricket? Note: Use context from knowledge graph and produce the output in elaborate"
response1 = vector_query_engine.query(ext_query)
print(response1.response)

The relationship between cricket and baseball is that they are both bat-and-ball games. Both sports involve hitting a ball with a hand-held implement - a bat in this case. While cricket has a solid target structure called the wicket that the batter must defend, baseball has bases that the batter must run to. Additionally, both games have a pitcher (bowler in cricket) who delivers the ball to the batter. The objective in both sports is to score runs by hitting the ball and running between designated points. Despite some differences in rules and gameplay, the fundamental similarity lies in the concept of using a bat to hit a ball in a competitive setting.


In [11]:
documents = SimpleDirectoryReader("invoices").load_data()

In [12]:
storage_context = StorageContext.from_defaults(graph_store=graph_store)

nebula_index1 = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=2,
    storage_context=storage_context,
    show_progress=True,
    include_embeddings=True,
)

Parsing nodes: 100%|██████████| 7/7 [00:00<00:00, 2128.47it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,  3.10it/s]
Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.07s/it]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00,  2.69it/s]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00,  2.82it/s]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00,  3.13it/s]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00,  2.89it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,  3.37it/s]
Processing nodes: 100%|██████████| 7/7 [00:13<00:00,  1.92s/it]


In [22]:
graph_query_engine = nebula_index1.as_query_engine(response_mode="tree_summarize",include_text=False)

response = graph_query_engine.query(
    "Compare invoices of all companies given in the context by analizing total amount due",
)
print(response)

To compare the invoices of all companies provided in the context, you would need to analyze the total amount due on each invoice. This analysis would involve examining the amount owed on each invoice issued by the different companies to determine any variations or similarities in the total amounts due.


In [23]:

storage_context = StorageContext.from_defaults()
service_context = ServiceContext.from_defaults(llm=llm, chunk_size=1024)
vector_index1 = VectorStoreIndex.from_documents(
    documents=documents,
    service_context=service_context,
    storage_context=storage_context
)
vector_query_engine = vector_index1.as_query_engine()

response1 = vector_query_engine.query("Compare invoices of all companies given in the context by analizing total amount due")
response1.response

  service_context = ServiceContext.from_defaults(llm=llm, chunk_size=1024)


'Smith Enterprises has a total amount due of $2180, while Johnson Ltd. has a total amount due of $1653.6. Therefore, Smith Enterprises has a higher total amount due compared to Johnson Ltd.'