In [1]:
# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License.

In [2]:
import os

import pandas as pd
import tiktoken

from graphrag.query.indexer_adapters import read_indexer_entities, read_indexer_reports
from graphrag.query.llm.oai.chat_openai import ChatOpenAI
from graphrag.query.llm.oai.typing import OpenaiApiType
from graphrag.query.structured_search.global_search.community_context import (
    GlobalCommunityContext,
)
from graphrag.query.structured_search.global_search.search import GlobalSearch

## Global Search example

Global search method generates answers by searching over all AI-generated community reports in a map-reduce fashion. This is a resource-intensive method, but often gives good responses for questions that require an understanding of the dataset as a whole (e.g. What are the most significant values of the herbs mentioned in this notebook?).

In [3]:
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

True

### LLM setup

In [4]:
api_key = os.getenv("GRAPHRAG_API_KEY")
# llm_model = os.getenv("GRAPHRAG_LLM_MODEL")
llm_model = os.getenv("GRAPHRAG_LLM_DEPLOYMENT")
llm_deployment = os.getenv("GRAPHRAG_LLM_DEPLOYMENT")
# embedding_model = os.getenv("GRAPHRAG_EMBEDDING_MODEL")
embedding_model = os.getenv("GRAPHRAG_EMBEDDING_DEPLOYMENT")
embedding_deployment = os.getenv("GRAPHRAG_EMBEDDING_DEPLOYMENT")
api_base = os.getenv("GRAPHRAG_API_BASE")
api_version = os.getenv("GRAPHRAG_API_VERSION")

llm = ChatOpenAI(
    api_key=api_key,
    model=llm_model,
    api_type=OpenaiApiType.AzureOpenAI,  # OpenaiApiType.OpenAI or OpenaiApiType.AzureOpenAI
    max_retries=20,
    api_base=api_base,
    api_version=api_version
)

token_encoder = tiktoken.get_encoding("cl100k_base")

### Load community reports as context for global search

- Load all community reports in the `create_final_community_reports` table from the ire-indexing engine, to be used as context data for global search.
- Load entities from the `create_final_nodes` and `create_final_entities` tables from the ire-indexing engine, to be used for calculating community weights for context ranking. Note that this is optional (if no entities are provided, we will not calculate community weights and only use the `rank` attribute in the community reports table for context ranking)

In [5]:
# parquet files generated from indexing pipeline
INPUT_DIR = "./../sample-output/output/20240812-215728/artifacts"
COMMUNITY_REPORT_TABLE = "create_final_community_reports"
ENTITY_TABLE = "create_final_nodes"
ENTITY_EMBEDDING_TABLE = "create_final_entities"

# community level in the Leiden community hierarchy from which we will load the community reports
# higher value means we use reports from more fine-grained communities (at the cost of higher computation cost)
COMMUNITY_LEVEL = 2

In [6]:
entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
report_df = pd.read_parquet(f"{INPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")
entity_embedding_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_EMBEDDING_TABLE}.parquet")

reports = read_indexer_reports(report_df, entity_df, COMMUNITY_LEVEL)
entities = read_indexer_entities(entity_df, entity_embedding_df, COMMUNITY_LEVEL)
print(f"Total report count: {len(report_df)}")
print(
    f"Report count after filtering by community level {COMMUNITY_LEVEL}: {len(reports)}"
)
report_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  entity_df["community"] = entity_df["community"].fillna(-1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  entity_df["community"] = entity_df["community"].astype(int)


Total report count: 52
Report count after filtering by community level 2: 37


Unnamed: 0,community,full_content,level,rank,title,rank_explanation,summary,findings,full_content_json,id
0,49,# Scrooge and the City\n\nThe community revolv...,3,6.5,Scrooge and the City,The impact severity rating is moderately high ...,"The community revolves around the City, which ...",[{'explanation': 'The City is the central enti...,"{\n ""title"": ""Scrooge and the City"",\n ""...",ca456f65-a353-4f64-8dac-1186aef8cb2e
1,50,# Scrooge and His Transformative Journey\n\nTh...,3,8.5,Scrooge and His Transformative Journey,The impact severity rating is high due to the ...,"The community centers around Scrooge, a charac...",[{'explanation': 'Scrooge is initially depicte...,"{\n ""title"": ""Scrooge and His Transformativ...",c585e980-f081-4317-9b3b-616fbde49b73
2,51,# Scrooge and the Ghostly Visitations\n\nThe c...,3,8.5,Scrooge and the Ghostly Visitations,The impact severity rating is high due to the ...,The community centers around Ebenezer Scrooge ...,[{'explanation': 'Ebenezer Scrooge undergoes a...,"{\n ""title"": ""Scrooge and the Ghostly Visit...",7f2c56ec-6083-4b2d-b1c9-482fb052f951
3,37,# Ghost of Christmas Present and Festive Decor...,2,7.5,Ghost of Christmas Present and Festive Decorat...,The impact severity rating is high due to the ...,The community centers around the Ghost of Chri...,[{'explanation': 'The Ghost of Christmas Prese...,"{\n ""title"": ""Ghost of Christmas Present an...",f15da27e-cad7-4bc2-b37b-e0a5fbd5bcf3
4,38,# Scrooge and the Spirit\n\nThe community revo...,2,8.5,Scrooge and the Spirit,The impact severity rating is high due to the ...,The community revolves around Scrooge and the ...,[{'explanation': 'The Spirit is a central figu...,"{\n ""title"": ""Scrooge and the Spirit"",\n ...",9c3a5d61-6e73-4e0a-9df9-a383b376de3e


#### Build global context based on community reports

In [7]:
context_builder = GlobalCommunityContext(
    community_reports=reports,
    entities=entities,  # default to None if you don't want to use community weights for ranking
    token_encoder=token_encoder,
)

#### Perform global search

In [8]:
context_builder_params = {
    "use_community_summary": False,  # False means using full community reports. True means using community short summaries.
    "shuffle_data": True,
    "include_community_rank": True,
    "min_community_rank": 0,
    "community_rank_name": "rank",
    "include_community_weight": True,
    "community_weight_name": "occurrence weight",
    "normalize_community_weight": True,
    "max_tokens": 12_000,  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)
    "context_name": "Reports",
}

map_llm_params = {
    "max_tokens": 1000,
    "temperature": 0.0,
    "response_format": {"type": "json_object"},
}

reduce_llm_params = {
    "max_tokens": 2000,  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 1000-1500)
    "temperature": 0.0,
}

In [9]:
search_engine = GlobalSearch(
    llm=llm,
    context_builder=context_builder,
    token_encoder=token_encoder,
    max_data_tokens=12_000,  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)
    map_llm_params=map_llm_params,
    reduce_llm_params=reduce_llm_params,
    allow_general_knowledge=False,  # set this to True will add instruction to encourage the LLM to incorporate general knowledge in the response, which may increase hallucinations, but could be useful in some use cases.
    json_mode=True,  # set this to False if your LLM model does not support JSON mode.
    context_builder_params=context_builder_params,
    concurrent_coroutines=32,
    response_type="multiple paragraphs",  # free form text describing the response type and format, can be anything, e.g. prioritized list, single paragraph, multiple paragraphs, multiple-page report
)

## Run global search on sample queries

In [10]:
result = await search_engine.asearch("what is the data you have")
print(result.response)

### Overview of the Data

The dataset provides a comprehensive analysis of Charles Dickens' "A Christmas Carol," focusing on various aspects such as characters, themes, settings, and significant events. Below is a detailed summary of the key elements covered in the data:

#### Key Characters and Relationships

The data includes detailed descriptions of various communities and characters, emphasizing their relationships, interactions, and significant events. Key characters such as Ebenezer Scrooge, Tiny Tim, the Cratchit family, and supernatural entities like the Ghosts of Christmas Past, Present, and Yet to Come are central to the narrative [Data: Reports (47, 25, 46, 39, 19, 44, 33, 45, 35, 29, 4, 30, 42, 7, 11, 48, 17, 26)].

#### Supernatural Elements

A significant portion of the data is dedicated to the role of supernatural entities, particularly the Ghosts of Christmas. The report titled "Scrooge and the Spirit" discusses the transformative journey guided by these spirits, coveri

In [11]:
result = await search_engine.asearch("who is the main character of this story?")
print(result.response)

### Main Character of the Story

The main character of "A Christmas Carol" is Ebenezer Scrooge. He is initially depicted as a miserly, solitary, and uncharitable old man, particularly during the Christmas season [Data: Reports (46, 45, 39, 19, 33, 41, 38)].

### Character Transformation

Scrooge undergoes a profound transformation throughout the story. This change is driven by supernatural encounters with the Ghosts of Christmas Past, Present, and Yet to Come, as well as his former business partner, Jacob Marley [Data: Reports (41)]. These encounters lead him to reflect on his life and ultimately transform from a miserly, solitary individual to a generous and kind-hearted person [Data: Reports (41)].

### Impact on Other Characters

Scrooge's transformation significantly impacts other characters in the story, such as Bob Cratchit and Tiny Tim, highlighting his central role in the narrative [Data: Reports (41)].

In summary, Ebenezer Scrooge is the main character whose journey from a co

In [12]:
result = await search_engine.asearch("How is the main character connected to other characters in the story?")
print(result.response)

### Connections of the Main Character to Other Characters

Ebenezer Scrooge, the central figure in the story, is intricately connected to various characters, each playing a significant role in his transformation from a miserly, solitary old man to a generous and kind-hearted individual.

#### Jacob Marley
Scrooge's deceased business partner, Jacob Marley, continues to influence Scrooge's life as a ghost. Marley's ghost warns Scrooge about his impending fate and the forthcoming visits of The Three Spirits, setting the stage for Scrooge's transformative journey [Data: Reports (19, 41, 45)].

#### The Three Spirits
The Ghosts of Christmas Past, Present, and Yet to Come are pivotal in driving Scrooge's transformation. Each spirit guides him through reflections on his past, present, and potential future, showing him various scenes that help him reflect on his life choices and understand the importance of compassion [Data: Reports (38, 39, 41, 47)].

#### Bob Cratchit and Family
Scrooge's re

In [13]:
result = await search_engine.asearch("Did someone become richer by someone becoming poorer?")
print(result.response)

### Analysis of Wealth Redistribution

In the dataset, there are clear instances where individuals became richer at the expense of others becoming poorer. This dynamic is particularly evident in the interactions surrounding the deceased man's belongings and the character of Scrooge.

#### Theft and Appraisal of the Deceased Man's Belongings

Mrs. Dilber, a laundress, along with the Charwoman and the Undertaker's Man, stole various items from a deceased man's room. These items included sheets, towels, wearing apparel, silver teaspoons, sugar-tongs, and boots. They brought these stolen goods to Joe, a marine-store dealer and receiver of stolen goods, for appraisal and sale. As a result, Joe and the individuals who stole the items became richer through these transactions, while the deceased man was effectively made poorer by the loss of his possessions [Data: Reports (29, 30)].

#### Scrooge's Exploitation of Bob Cratchit

Initially, Scrooge is depicted as a miserly and solitary old man w