# GraphRAG Library Tutorial

This notebook demonstrates the setup and usage of GraphRAG, an AI graph-based content interpretation and search capability, and an alternative to traditional RAG with vector stores. The notebook includes the steps to initialize GraphRAG, load data, fine-tune prompts, index data, and perform both local and global searches.

> **Note:**
> Please ignore all the warnings and erros during executions of the cells, as graphrag is still very fresh and in development. All of the errors and warnings are non-breaking, they don't throw an exception, so you can continue working on without interruption.

### 1. Import Required Modules

First, we import the necessary modules and set up the environment for the notebook.

In [1]:
%load_ext autoreload
%autoreload 2

############# IMPORTS


import os
from utils.graphrag_helper import *

from IPython.display import display, Markdown, HTML

import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='pandas', message=".*indexing.*") 
warnings.filterwarnings('ignore', category=UserWarning, module='pandas', message=".*SettingWithCopyWarning.*") 

  from .autonotebook import tqdm as notebook_tqdm


### 2. Initialize GraphRagHelper

We create an instance of the `GraphRagHelper` class, pointing it to the project directory where our data and configurations are stored.

In [2]:
## You can use the 'sample_graphrag' folder and skip below steps 3, 4, and 5, to try search right away.

gr = GraphRagHelper('graphrag')

### 3. Initialize the Project

Initialize the GraphRAG project by running the setup command. This prepares the necessary directories and environment configurations.

In [3]:
gr.initialize()

Subprocessing command: ['python', '-m', 'graphrag.index', '--init', '--root', 'sample_graphrag']
Standard Output:
⠋ GraphRAG Indexer 
Initializing project at sample_graphrag
⠋ GraphRAG Indexer 
Standard Error:



{'stdout': '⠋ GraphRAG Indexer \nInitializing project at sample_graphrag\n⠋ GraphRAG Indexer ',
 'stderr': '',
 'status': True}

### 4. Load Data and Fine-Tune Prompts

Load the initial dataset and fine-tune the prompts based on this data. Index the data to prepare it for search operations.

In [None]:
gr.load_data('sample_data')
gr.prompt_fine_tune()
gr.index_data()

### 5. Load Additional Data and Re-index

If additional data is available, load it and repeat the fine-tuning and indexing process to update the dataset. This is just to demonstrate that you can do delta additions to the graph.

In [None]:
## 2nd pass
gr.load_data('sample_data_extra/')
gr.prompt_fine_tune()
gr.index_data()

### 6.A. Perform a Local Search

Execute a local search query to retrieve relevant information from the indexed dataset.

In [11]:
r = gr.local_search("where is the Kengsington Hotel?", community_level=1)

The most recent folder is: 20240723-143050
Most recent path:  sample_graphrag\output\20240723-143050\artifacts
Entity Table path:  sample_graphrag\output\20240723-143050\artifacts/create_final_nodes.parquet
## Location of The Kensington Hotel

The Kensington Hotel is a budget accommodation option located in London, specifically near Earl's Court. Earl's Court is a well-known area in London, making the hotel conveniently situated for visitors looking to explore the city [Data: Entities (93, 99, 103); Relationships (88, 103, 104)].

### Nearby Attractions

Being in close proximity to Earl's Court, The Kensington Hotel offers easy access to various attractions and amenities in the area. This makes it an ideal choice for budget-conscious travelers who want to stay in a central location without compromising on accessibility.

### Booking Information

For those interested in booking a stay at The Kensington Hotel, Margie's Travel offers accommodation options and comprehensive travel services

### 6.B Perform a Local Search

Execute a local search query to retrieve relevant information from the indexed dataset.

In [5]:
r = gr.local_search("what are the safety measures in the Tesla S?", community_level=2)

The most recent folder is: 20240723-143050
Most recent path:  sample_graphrag\output\20240723-143050\artifacts
Entity Table path:  sample_graphrag\output\20240723-143050\artifacts/create_final_nodes.parquet
# Safety Measures in the Tesla Model S

The Tesla Model S is renowned for its comprehensive suite of safety features, which contribute to its high safety ratings and reputation as one of the safest vehicles on the road. Below, we explore the various safety measures integrated into the Tesla Model S.

## Structural Safety

The Tesla Model S features a lightweight aluminum body that enhances both efficiency and safety. The aluminum structure is designed to absorb impact energy more effectively than steel, providing superior protection in the event of a collision [Data: Entities (7, 69); Sources (15)].

## Advanced Driver-Assistance Systems

The Model S is equipped with advanced driver-assistance capabilities, including the Autopilot system and the optional Full Self-Driving package. T

### 7. Retrieve Context for a Specific Query

For a more detailed analysis, retrieve the context associated with a specific query without performing a full search.

In [14]:
r = gr.local_search("where is the Kensington Hotel?", community_level=1, context_only=True)


The most recent folder is: 20240723-143050
Most recent path:  sample_graphrag\output\20240723-143050\artifacts
Entity Table path:  sample_graphrag\output\20240723-143050\artifacts/create_final_nodes.parquet


### 8. Inspect Context Records

Check the keys of the context records returned from the previous context-only search.

In [15]:
r['context_records'].keys()

dict_keys(['reports', 'relationships', 'claims', 'entities', 'sources'])

### 9. Display Entities from Context Records

Display the first few entities from the context records to understand the structure of the retrieved context.

In [16]:
Markdown(r['context_records']['entities'][:4].to_markdown())

|    |   id | entity               | description                                                                                                                              |   number of relationships | in_context   |
|---:|-----:|:---------------------|:-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------:|:-------------|
|  0 |   93 | THE KENSINGTON HOTEL | The Kensington Hotel is a budget accommodation option in London, near Earl’s Court                                                       |                         4 | True         |
|  1 |   99 | EARL'S COURT         | Earl’s Court is an area in London, near The Kensington Hotel                                                                             |                         1 | True         |
|  2 |  103 | EARL’S COURT         | Earl’s Court is a neighborhood in London, near The Kensington Hotel                                                                      |                         1 | True         |
|  3 |   91 | THE BUCKINGHAM HOTEL | The Buckingham Hotel is a comfortable hotel in London, close to major sights like Buckingham Palace, Regent’s Park, and Trafalgar Square |                         6 | True         |

### 10.A Perform a Global Search

Execute a global search query to retrieve relevant information from a broader dataset that includes community-level insights.

> **Note:**
> The below demonstrates the power of GraphRAG. The below is not reliable when executing using traditional RAG with vector stores.

> **Note:**
> Notice that the topics cover mostly Tesla Model S and not Margie's Travel, since there are far fewer nodes for Margie's Travel, and therefore did not likely form a higher level community (level >= 2).

In [17]:
await gr.global_search("what are the topics in this database?")

The most recent folder is: 20240723-143050
Most recent path:  sample_graphrag\output\20240723-143050\artifacts
The database contains several topics related to the Tesla Model S, each focusing on different aspects of the vehicle. Below is a summary of the key topics covered:

### Tesla Model S Community
This topic encompasses various aspects of the Tesla Model S, including its advanced technology, high safety standards, sustainability, performance, customization options, influence on the automotive industry, versatility, charging solutions, recognition, and continuous updates [Data: Reports (3)].

### Tesla Model S and Warranty
This topic provides details about the comprehensive warranty offered by Tesla for the Model S. It includes information on the 8-year or unlimited mile battery and drive unit warranty, as well as the 4-year or 50,000-mile limited warranty [Data: Reports (4)].

### Tesla Model S Energy-Saving Features
This topic discusses the energy-saving features of the Tesla Mod

{'response': 'The database contains several topics related to the Tesla Model S, each focusing on different aspects of the vehicle. Below is a summary of the key topics covered:\n\n### Tesla Model S Community\nThis topic encompasses various aspects of the Tesla Model S, including its advanced technology, high safety standards, sustainability, performance, customization options, influence on the automotive industry, versatility, charging solutions, recognition, and continuous updates [Data: Reports (3)].\n\n### Tesla Model S and Warranty\nThis topic provides details about the comprehensive warranty offered by Tesla for the Model S. It includes information on the 8-year or unlimited mile battery and drive unit warranty, as well as the 4-year or 50,000-mile limited warranty [Data: Reports (4)].\n\n### Tesla Model S Energy-Saving Features\nThis topic discusses the energy-saving features of the Tesla Model S, particularly focusing on the standby mode designed to minimize battery drain durin

### 10.B Perform a Global Search

> **Note:**
> It looks like hotels is a topic in the graph that did not make it to the global clusters, due likely to its very low frequency and small size.

In [18]:
## It looks like hotels is a topic in the graph that did not make it to the global clusters, due likely to its very low frequency and small size.
resp = await gr.global_search("what are all topics about hotels stored in the graph?", community_level=2)
print(resp['response'])

I am sorry but I am unable to answer this question given the provided data.
I am sorry but I am unable to answer this question given the provided data.


### 11. Retrieve Global Context for a Specific Query

Retrieve the context associated with a specific global search query without performing a full search.

In [19]:
await gr.global_search("what is Tesla Model S about?", context_only=True)

 'context_records': {'reports':   id                                     title  occurrence weight  \
  0  3                   Tesla Model S Community           1.000000   
  1  4                Tesla Model S and Warranty           0.944444   
  2  2      Tesla Model S Energy-Saving Features           0.111111   
  3  1  Tesla Model S and Bioweapon Defense Mode           0.111111   
  
                                               content  rank  
  0  # Tesla Model S Community\n\nThe community rev...   8.5  
  1  # Tesla Model S and Warranty\n\nThe community ...   7.5  
  2  # Tesla Model S Energy-Saving Features\n\nThe ...   7.5  
  3  # Tesla Model S and Bioweapon Defense Mode\n\n...   7.5  }}