## Setup
The `settings.yaml` file configures the library including model types being used - in this case I've configured it to use gpt-4o.

Please refer to the [CLI docs](https://microsoft.github.io/graphrag/cli/#init) for more detailed information on how to generate the `settings.yaml` file.

#### Load `settings.yaml` configuration

In [1]:
import yaml
from graphrag.config.create_graphrag_config import create_graphrag_config

settings = yaml.safe_load(open("settings.yaml")) 
#the config file is generated from the settings we loaded

config = create_graphrag_config(
    values=settings, root_dir="."
)

  from .autonotebook import tqdm as notebook_tqdm


## Workflow and Document Processing
We're pre-processing the documents and building or populating the graph that will back the LLM.

In [2]:

from graphrag.index.run.run_workflows import run_workflows
from graphrag.index.typing import PipelineRunResult
import graphrag.api as api



workflows = [
        "create_base_text_units",
        "create_final_documents",
        "extract_graph",
        "compute_communities",
        "create_final_entities",
        "create_final_relationships",
        "create_final_nodes",
        "create_final_communities",
        "create_final_text_units",
        "create_final_community_reports",
        "generate_text_embeddings",
    ]




## Create Base Text Units
We've defined the workflow steps but let's pull them apart to understand a bit more about each and its outputs.

In [3]:
from graphrag.index.input.factory import create_input
input = await create_input(config.input, None, ".")
input


Unnamed: 0,text,id,title
0,﻿The Project Gutenberg eBook of A Christmas Ca...,77fd5668fcbeb8d240a7816bf00854bd31af91a84d0318...,book.txt


In [4]:
input['text']

0    ﻿The Project Gutenberg eBook of A Christmas Ca...
Name: text, dtype: object

In [5]:
config.chunks

ChunkingConfig(size=1200, overlap=100, group_by_columns=['id'], strategy="tokens", encoding_model='cl100k_base')

In [6]:
#overriding the workflow callback in order to proceed - this is used for progress tracking but we don't need it
from typing import Protocol
import sys
from graphrag.logger.progress import Progress

class WorkflowCallbacks(Protocol):
    def progress(Progress) -> None:
        pass
    def error(
        self,
        message: str,
        cause: BaseException | None = None,
        stack: str | None = None,
        details: dict | None = None,
    ) -> None:
        print(message)
        sys.exit(1)

In [7]:
from graphrag.index.flows.create_base_text_units import create_base_text_units


chunked_text = create_base_text_units(input,WorkflowCallbacks,
        config.chunks.group_by_columns,
        config.chunks.size,
        config.chunks.overlap,
        config.chunks.encoding_model,
        strategy=config.chunks.strategy,)

In [8]:
chunked_text.head()

Unnamed: 0,id,text,document_ids,n_tokens
0,336671e337e5f4539069473e8f8691b3ed696331aabe67...,﻿The Project Gutenberg eBook of A Christmas Ca...,[77fd5668fcbeb8d240a7816bf00854bd31af91a84d031...,1200
1,2160a0c64179a7920c578f3400ad64f77c22927e6ab8c7...,and thither in\n restless haste and moanin...,[77fd5668fcbeb8d240a7816bf00854bd31af91a84d031...,1200
2,d798befe565a9ed5b6b536fd8a95a1d396867b232ec308...,"-fisted hand at the grindstone, Scrooge! a\nsq...",[77fd5668fcbeb8d240a7816bf00854bd31af91a84d031...,1200
3,cc6a8a52ea673776c03f32442c2a05f75b59d30a0bf4c0...,'Bah!' again; and followed it up with 'Humbug!...,[77fd5668fcbeb8d240a7816bf00854bd31af91a84d031...,1200
4,1c129c3dd67b1761adbdb4186b2de1036b2e4ff3683e4d...,have no doubt his liberality is well represen...,[77fd5668fcbeb8d240a7816bf00854bd31af91a84d031...,1200


## Create Final Documents

In [9]:
from graphrag.index.flows.create_final_documents import create_final_documents
final_docs = create_final_documents(input,chunked_text, config.input.document_attribute_columns)

In [10]:
final_docs.head()

Unnamed: 0,id,human_readable_id,title,text,text_unit_ids
0,77fd5668fcbeb8d240a7816bf00854bd31af91a84d0318...,1,book.txt,﻿The Project Gutenberg eBook of A Christmas Ca...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...


## Extract Graph

In [11]:
from graphrag.index.operations.extract_entities import extract_entities


extraction_strategy = config.entity_extraction.resolved_strategy(
        config.root_dir, config.encoding_model
    )

extraction_num_threads = config.entity_extraction.parallelization.num_threads
extraction_async_mode = config.entity_extraction.async_mode
entity_types = config.entity_extraction.entity_types

summarization_strategy = config.summarize_descriptions.resolved_strategy(
        config.root_dir,
    )
summarization_num_threads = (
        config.summarize_descriptions.parallelization.num_threads
    )

In [12]:
entities, relationships = await extract_entities(
        chunked_text,
        WorkflowCallbacks,
        None,
        text_column="text",
        id_column="id",
        strategy=extraction_strategy,
        async_mode=extraction_async_mode,
        entity_types=entity_types,
        num_threads=extraction_num_threads,
    )

In [13]:
entities.head()

Unnamed: 0,title,type,description,text_unit_ids
0,CHARLES DICKENS,PERSON,[Charles Dickens is the author of 'A Christmas...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
1,ARTHUR RACKHAM,PERSON,[Arthur Rackham is the illustrator of 'A Chris...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
2,J. B. LIPPINCOTT COMPANY,ORGANIZATION,[J. B. Lippincott Company is the publisher of ...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
3,UNITED STATES,GEO,[The United States is a country where the Proj...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
4,A CHRISTMAS CAROL,EVENT,[A Christmas Carol is a book written by Charle...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...


In [14]:
relationships.head()

Unnamed: 0,source,target,description,text_unit_ids,weight
0,CHARLES DICKENS,A CHRISTMAS CAROL,[Charles Dickens is the author of 'A Christmas...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,10.0
1,ARTHUR RACKHAM,A CHRISTMAS CAROL,[Arthur Rackham is the illustrator of 'A Chris...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,8.0
2,J. B. LIPPINCOTT COMPANY,A CHRISTMAS CAROL,[J. B. Lippincott Company published 'A Christm...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,7.0
3,A CHRISTMAS CAROL,PROJECT GUTENBERG,[Project Gutenberg provides a free eBook versi...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,1.0
4,EBENEZER SCROOGE,BOB CRATCHIT,[Bob Cratchit works as a clerk for Ebenezer Sc...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,8.0


Generating summaries for the entities and relationships - shortening the 'description' field in each dataframe.

In [15]:
from graphrag.index.operations.summarize_descriptions import summarize_descriptions

entity_summaries, relationship_summaries = await summarize_descriptions(
        entities,
        relationships,
        WorkflowCallbacks,
        None,
        strategy=summarization_strategy,
        num_threads=summarization_num_threads,
    )

In [16]:
entities.head().iloc[1]['description']

["Arthur Rackham is the illustrator of 'A Christmas Carol'"]

In [17]:
entity_summaries.head().iloc[1]['description']

"Arthur Rackham is the illustrator of 'A Christmas Carol'"

In [19]:
import pandas as pd
from uuid import uuid4

def _prep_nodes(entities, summaries) -> pd.DataFrame:
    entities.drop(columns=["description"], inplace=True)
    nodes = entities.merge(summaries, on="title", how="left").drop_duplicates(
        subset="title"
    )
    nodes = nodes.loc[nodes["title"].notna()].reset_index()
    nodes["human_readable_id"] = nodes.index
    nodes["id"] = nodes["human_readable_id"].apply(lambda _x: str(uuid4()))
    return nodes

base_entity_nodes = _prep_nodes(entities, entity_summaries)

In [20]:
base_entity_nodes

Unnamed: 0,index,title,type,text_unit_ids,description,human_readable_id,id
0,0,CHARLES DICKENS,PERSON,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,Charles Dickens is the author of 'A Christmas ...,0,d97af007-fc4c-4f1b-bdfc-fd2484f4db4e
1,1,ARTHUR RACKHAM,PERSON,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,Arthur Rackham is the illustrator of 'A Christ...,1,4c9b10b3-f17f-4947-b5bb-93120b606bcf
2,2,J. B. LIPPINCOTT COMPANY,ORGANIZATION,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,J. B. Lippincott Company is the publisher of '...,2,e59f14f9-c182-4cc9-b8ac-682f3588f135
3,3,UNITED STATES,GEO,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,The United States is a country where the Proje...,3,833ab652-b89e-4187-91fd-1ea4e9061ac8
4,5,A CHRISTMAS CAROL,EVENT,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,A Christmas Carol is a book written by Charles...,4,0dba2f5c-55bc-4024-8526-cd9b3c9f8487
...,...,...,...,...,...,...,...
61,87,PLUMP SISTER,PERSON,[990810cc42806e862ab06ba66acd3baea96a6fa5071b4...,The plump sister is another guest at Fred's di...,61,28db29e0-54c8-4c5c-947b-786a62125449
62,90,PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION,ORGANIZATION,[f579cd1eb2da1f2c52f7dbb9fce464032734b71c34686...,The Project Gutenberg Literary Archive Foundat...,62,efc47627-6e66-43c9-9a3f-83e35b1d1d9f
63,93,MICHAEL S. HART,PERSON,[aa68b6c1603744d1b2567f6300c9bcdf7a2c20d8ed31b...,Professor Michael S. Hart was the originator o...,63,b8abd2c2-5737-4e42-9cab-39f2f325ceea
64,94,SALT LAKE CITY,GEO,[aa68b6c1603744d1b2567f6300c9bcdf7a2c20d8ed31b...,Location of the business office of the Project...,64,ac416885-683b-4503-8271-deb7985bf8c4


In [21]:
def _prep_edges(relationships, summaries) -> pd.DataFrame:
    edges = (
        relationships.drop(columns=["description"])
        .drop_duplicates(subset=["source", "target"])
        .merge(summaries, on=["source", "target"], how="left")
    )
    edges["human_readable_id"] = edges.index
    edges["id"] = edges["human_readable_id"].apply(lambda _x: str(uuid4()))
    return edges

base_relationship_edges = _prep_edges(relationships, relationship_summaries)


In [22]:
base_relationship_edges

Unnamed: 0,source,target,text_unit_ids,weight,description,human_readable_id,id
0,CHARLES DICKENS,A CHRISTMAS CAROL,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,10.0,Charles Dickens is the author of 'A Christmas ...,0,96623900-0fab-433c-a2f0-f31633e9815e
1,ARTHUR RACKHAM,A CHRISTMAS CAROL,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,8.0,Arthur Rackham is the illustrator of 'A Christ...,1,5ea79b5a-e72d-437a-9e6e-0e9cb5862823
2,J. B. LIPPINCOTT COMPANY,A CHRISTMAS CAROL,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,7.0,J. B. Lippincott Company published 'A Christma...,2,888c2092-07d5-4879-808a-c49f18d27b97
3,A CHRISTMAS CAROL,PROJECT GUTENBERG,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,1.0,Project Gutenberg provides a free eBook versio...,3,38771a5f-fc37-439f-80db-3d89555bb98b
4,EBENEZER SCROOGE,BOB CRATCHIT,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,8.0,Bob Cratchit works as a clerk for Ebenezer Scr...,4,e9edc25b-d7b7-4198-8cbc-d8710275205f
...,...,...,...,...,...,...,...
83,PROJECT GUTENBERG,PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION,[aa68b6c1603744d1b2567f6300c9bcdf7a2c20d8ed31b...,9.0,The Project Gutenberg Literary Archive Foundat...,83,dd647c71-0fb9-4495-b5f4-a1ab198699ea
84,PROJECT GUTENBERG,MICHAEL S. HART,[aa68b6c1603744d1b2567f6300c9bcdf7a2c20d8ed31b...,9.0,Michael S. Hart was the originator of the Proj...,84,04566bb3-7e06-4000-b518-d54f062eb9e4
85,PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION,SALT LAKE CITY,[aa68b6c1603744d1b2567f6300c9bcdf7a2c20d8ed31b...,1.0,The business office of the Project Gutenberg L...,85,f3c1b121-8107-4f84-ac5c-04fbbf58d8cf
86,PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION,MISSISSIPPI,[aa68b6c1603744d1b2567f6300c9bcdf7a2c20d8ed31b...,7.0,The Project Gutenberg Literary Archive Foundat...,86,6bb2d004-4dc9-49b1-9e3f-9dcf4ed4046c


### Graph Extraction Details

In [23]:
from graphrag.index.operations.extract_entities.typing import Document
doc1 = chunked_text.iloc[0][['id','text']]
doc1 = Document(
    id=doc1['id'],
    text=doc1['text'])

In [24]:
print(doc1.text[:400])

The Project Gutenberg eBook of A Christmas Carol
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United Stat


In [25]:
from graphrag.index.operations.extract_entities.graph_extractor import GraphExtractor
from graphrag.index.llm.load_llm import load_llm, read_llm_params
import graphrag.config.defaults as defs

llm_config = read_llm_params(extraction_strategy.get("llm", {}))
llm = load_llm("entity_extraction", llm_config, callbacks=None, cache=None)

tuple_delimiter = extraction_strategy.get("tuple_delimiter", None)
record_delimiter = extraction_strategy.get("record_delimiter", None)
completion_delimiter = extraction_strategy.get("completion_delimiter", None)
extraction_prompt = extraction_strategy.get("extraction_prompt", None)
encoding_model = extraction_strategy.get("encoding_name", None)
max_gleanings = extraction_strategy.get("max_gleanings", defs.ENTITY_EXTRACTION_MAX_GLEANINGS)

extractor = GraphExtractor(
        llm_invoker=llm,
        prompt=extraction_prompt,
        encoding_model=encoding_model,
        max_gleanings=max_gleanings,
        on_error=lambda e, s, d: (
            callbacks.error("Entity Extraction Error", e, s, d) if callbacks else None
        ),
    )

In [26]:
results = await extractor(
        [doc1],
        {
            "entity_types": entity_types,
            "tuple_delimiter": tuple_delimiter,
            "record_delimiter": record_delimiter,
            "completion_delimiter": completion_delimiter,
        },
    )

In [27]:
results.output.nodes

NodeView(('ARTHUR RACKHAM', 'J. B. LIPPINCOTT COMPANY', 'UNITED STATES', 'PROJECT GUTENBERG', 'A CHRISTMAS CAROL', 'CHARLES DICKENS', 'EBENEZER SCROOGE', 'GHOST OF JACOB MARLEY', 'GHOST OF CHRISTMAS PAST', 'GHOST OF CHRISTMAS PRESENT', 'GHOST OF CHRISTMAS YET TO COME', 'BOB CRATCHIT'))

In [28]:
results.output.edges

EdgeView([('ARTHUR RACKHAM', 'A CHRISTMAS CAROL'), ('J. B. LIPPINCOTT COMPANY', 'A CHRISTMAS CAROL'), ('UNITED STATES', 'PROJECT GUTENBERG'), ('PROJECT GUTENBERG', 'A CHRISTMAS CAROL'), ('A CHRISTMAS CAROL', 'CHARLES DICKENS'), ('EBENEZER SCROOGE', 'BOB CRATCHIT'), ('EBENEZER SCROOGE', 'GHOST OF JACOB MARLEY'), ('EBENEZER SCROOGE', 'GHOST OF CHRISTMAS PAST'), ('EBENEZER SCROOGE', 'GHOST OF CHRISTMAS PRESENT'), ('EBENEZER SCROOGE', 'GHOST OF CHRISTMAS YET TO COME')])

## Compute Communities   
Here the graspologic_native calculate communities or groupings within the network.

In [29]:
from graphrag.index.operations.cluster_graph import cluster_graph
from graphrag.index.operations.create_graph import create_graph
import pandas as pd

max_cluster_size = config.cluster_graph.max_cluster_size
use_lcc = config.cluster_graph.use_lcc
seed = config.cluster_graph.seed

def compute_communities(
    base_relationship_edges: pd.DataFrame,
    max_cluster_size: int,
    use_lcc: bool,
    seed: int | None = None,
) -> pd.DataFrame:
    """All the steps to create the base entity graph."""
    graph = create_graph(base_relationship_edges)

    communities = cluster_graph(
        graph,
        max_cluster_size,
        use_lcc,
        seed=seed,
    )

    base_communities = pd.DataFrame(
        communities, columns=pd.Index(["level", "community", "parent", "title"])
    ).explode("title")
    base_communities["community"] = base_communities["community"].astype(int)

    return base_communities

base_communities = compute_communities(
    relationships,
    max_cluster_size=max_cluster_size,
    use_lcc=use_lcc,
    seed=seed,
)

  _edge_swap_numba = nb.jit(_edge_swap, nopython=False)


In [30]:
base_communities

Unnamed: 0,level,community,parent,title
0,0,0,-1,BOB CRATCHIT
0,0,0,-1,CAMDEN TOWN
0,0,0,-1,COUNTING-HOUSE
0,0,0,-1,MR. SCROOGE'S NEPHEW
0,0,0,-1,MRS. CRATCHIT
...,...,...,...,...
8,1,8,4,MARLEY
8,1,8,4,MARLEY'S GHOST
8,1,8,4,SCROOGE AND MARLEY
9,1,9,4,SCROOGE'S FORMER FIANCÉE


## Create Final Entities
Filtering columns from the dataframes.

In [31]:
base_entity_nodes

Unnamed: 0,index,title,type,text_unit_ids,description,human_readable_id,id
0,0,CHARLES DICKENS,PERSON,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,Charles Dickens is the author of 'A Christmas ...,0,d97af007-fc4c-4f1b-bdfc-fd2484f4db4e
1,1,ARTHUR RACKHAM,PERSON,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,Arthur Rackham is the illustrator of 'A Christ...,1,4c9b10b3-f17f-4947-b5bb-93120b606bcf
2,2,J. B. LIPPINCOTT COMPANY,ORGANIZATION,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,J. B. Lippincott Company is the publisher of '...,2,e59f14f9-c182-4cc9-b8ac-682f3588f135
3,3,UNITED STATES,GEO,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,The United States is a country where the Proje...,3,833ab652-b89e-4187-91fd-1ea4e9061ac8
4,5,A CHRISTMAS CAROL,EVENT,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,A Christmas Carol is a book written by Charles...,4,0dba2f5c-55bc-4024-8526-cd9b3c9f8487
...,...,...,...,...,...,...,...
61,87,PLUMP SISTER,PERSON,[990810cc42806e862ab06ba66acd3baea96a6fa5071b4...,The plump sister is another guest at Fred's di...,61,28db29e0-54c8-4c5c-947b-786a62125449
62,90,PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION,ORGANIZATION,[f579cd1eb2da1f2c52f7dbb9fce464032734b71c34686...,The Project Gutenberg Literary Archive Foundat...,62,efc47627-6e66-43c9-9a3f-83e35b1d1d9f
63,93,MICHAEL S. HART,PERSON,[aa68b6c1603744d1b2567f6300c9bcdf7a2c20d8ed31b...,Professor Michael S. Hart was the originator o...,63,b8abd2c2-5737-4e42-9cab-39f2f325ceea
64,94,SALT LAKE CITY,GEO,[aa68b6c1603744d1b2567f6300c9bcdf7a2c20d8ed31b...,Location of the business office of the Project...,64,ac416885-683b-4503-8271-deb7985bf8c4


In [32]:
def create_final_entities(
    base_entity_nodes: pd.DataFrame,
) -> pd.DataFrame:
    """All the steps to transform final entities."""
    return base_entity_nodes.loc[
        :,
        [
            "id",
            "human_readable_id",
            "title",
            "type",
            "description",
            "text_unit_ids",
        ],
    ]

final_entities = create_final_entities(base_entity_nodes)

In [33]:
#reordered the columns and dropped the index
final_entities.head()

Unnamed: 0,id,human_readable_id,title,type,description,text_unit_ids
0,d97af007-fc4c-4f1b-bdfc-fd2484f4db4e,0,CHARLES DICKENS,PERSON,Charles Dickens is the author of 'A Christmas ...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
1,4c9b10b3-f17f-4947-b5bb-93120b606bcf,1,ARTHUR RACKHAM,PERSON,Arthur Rackham is the illustrator of 'A Christ...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
2,e59f14f9-c182-4cc9-b8ac-682f3588f135,2,J. B. LIPPINCOTT COMPANY,ORGANIZATION,J. B. Lippincott Company is the publisher of '...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
3,833ab652-b89e-4187-91fd-1ea4e9061ac8,3,UNITED STATES,GEO,The United States is a country where the Proje...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
4,0dba2f5c-55bc-4024-8526-cd9b3c9f8487,4,A CHRISTMAS CAROL,EVENT,A Christmas Carol is a book written by Charles...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...


## Create Final Relationships

In [34]:
base_relationship_edges

Unnamed: 0,source,target,text_unit_ids,weight,description,human_readable_id,id
0,CHARLES DICKENS,A CHRISTMAS CAROL,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,10.0,Charles Dickens is the author of 'A Christmas ...,0,96623900-0fab-433c-a2f0-f31633e9815e
1,ARTHUR RACKHAM,A CHRISTMAS CAROL,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,8.0,Arthur Rackham is the illustrator of 'A Christ...,1,5ea79b5a-e72d-437a-9e6e-0e9cb5862823
2,J. B. LIPPINCOTT COMPANY,A CHRISTMAS CAROL,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,7.0,J. B. Lippincott Company published 'A Christma...,2,888c2092-07d5-4879-808a-c49f18d27b97
3,A CHRISTMAS CAROL,PROJECT GUTENBERG,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,1.0,Project Gutenberg provides a free eBook versio...,3,38771a5f-fc37-439f-80db-3d89555bb98b
4,EBENEZER SCROOGE,BOB CRATCHIT,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,8.0,Bob Cratchit works as a clerk for Ebenezer Scr...,4,e9edc25b-d7b7-4198-8cbc-d8710275205f
...,...,...,...,...,...,...,...
83,PROJECT GUTENBERG,PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION,[aa68b6c1603744d1b2567f6300c9bcdf7a2c20d8ed31b...,9.0,The Project Gutenberg Literary Archive Foundat...,83,dd647c71-0fb9-4495-b5f4-a1ab198699ea
84,PROJECT GUTENBERG,MICHAEL S. HART,[aa68b6c1603744d1b2567f6300c9bcdf7a2c20d8ed31b...,9.0,Michael S. Hart was the originator of the Proj...,84,04566bb3-7e06-4000-b518-d54f062eb9e4
85,PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION,SALT LAKE CITY,[aa68b6c1603744d1b2567f6300c9bcdf7a2c20d8ed31b...,1.0,The business office of the Project Gutenberg L...,85,f3c1b121-8107-4f84-ac5c-04fbbf58d8cf
86,PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION,MISSISSIPPI,[aa68b6c1603744d1b2567f6300c9bcdf7a2c20d8ed31b...,7.0,The Project Gutenberg Literary Archive Foundat...,86,6bb2d004-4dc9-49b1-9e3f-9dcf4ed4046c


In [35]:
from graphrag.index.operations.compute_degree import compute_degree
from graphrag.index.operations.compute_edge_combined_degree import compute_edge_combined_degree

def create_final_relationships(
    base_relationship_edges: pd.DataFrame,
) -> pd.DataFrame:
    """All the steps to transform final relationships."""
    relationships = base_relationship_edges

    graph = create_graph(base_relationship_edges)
    degrees = compute_degree(graph)

    relationships["combined_degree"] = compute_edge_combined_degree(
        relationships,
        degrees,
        node_name_column="title",
        node_degree_column="degree",
        edge_source_column="source",
        edge_target_column="target",
    )

    return relationships.loc[
        :,
        [
            "id",
            "human_readable_id",
            "source",
            "target",
            "description",
            "weight",
            "combined_degree",
            "text_unit_ids",
        ],
    ]

We calculate degrees - which is the number of edges adjacent to a node:
[GraphX Degree](https://networkx.org/documentation/stable/reference/classes/generated/networkx.Graph.degree.html)

In [36]:
graph = create_graph(base_relationship_edges)
degrees = compute_degree(graph)

In [37]:
graph.degree

DegreeView({'CHARLES DICKENS': 1, 'A CHRISTMAS CAROL': 4, 'ARTHUR RACKHAM': 1, 'J. B. LIPPINCOTT COMPANY': 1, 'PROJECT GUTENBERG': 4, 'EBENEZER SCROOGE': 3, 'BOB CRATCHIT': 9, 'FRED': 4, 'GHOST OF JACOB MARLEY': 1, 'FEZZIWIG': 1, 'MRS. FEZZIWIG': 1, 'SCROOGE': 38, 'MARLEY': 3, "MARLEY'S GHOST": 2, "SCROOGE'S NEPHEW": 4, 'CHRISTMAS EVE': 1, 'COUNTING-HOUSE': 2, 'CLERK': 1, 'SCROOGE AND MARLEY': 2, 'CHRISTMAS': 4, 'LORD MAYOR': 2, 'MANSION HOUSE': 2, 'ST. DUNSTAN': 1, 'UNION WORKHOUSES': 1, 'TREADMILL': 1, 'POOR LAW': 1, 'LONDON': 1, 'CITY OF LONDON': 1, 'CAMDEN TOWN': 2, 'JACOB MARLEY': 2, 'GHOST': 3, "SCROOGE'S FORMER FIANCÉE": 2, "SCROOGE'S FORMER FIANCÉE'S FAMILY": 2, 'BELLE': 1, "SCROOGE'S NIECE": 3, 'TOPPER': 3, 'GHOST OF CHRISTMAS PAST': 1, 'GHOST OF CHRISTMAS PRESENT': 1, 'SPIRIT': 4, 'TWELFTH-NIGHT PARTY': 2, 'GHOST OF CHRISTMAS YET TO COME': 1, 'CITY': 1, 'OLD JOE': 2, 'MRS. DILBER': 1, 'THE WOMAN': 2, 'PHANTOM': 1, 'THE UNHAPPY MAN': 2, 'CAROLINE': 3, "CAROLINE'S HUSBAND": 2, 

In [38]:
degrees.sort_values(by="degree", ascending=False)

Unnamed: 0,title,degree
11,SCROOGE,38
6,BOB CRATCHIT,9
58,PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION,4
38,SPIRIT,4
14,SCROOGE'S NEPHEW,4
...,...,...
27,CITY OF LONDON,1
33,BELLE,1
36,GHOST OF CHRISTMAS PAST,1
37,GHOST OF CHRISTMAS PRESENT,1


In [39]:
degrees

Unnamed: 0,title,degree
0,CHARLES DICKENS,1
1,A CHRISTMAS CAROL,4
2,ARTHUR RACKHAM,1
3,J. B. LIPPINCOTT COMPANY,1
4,PROJECT GUTENBERG,4
...,...,...
58,PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION,4
59,UNITED STATES,2
60,MICHAEL S. HART,1
61,SALT LAKE CITY,1


In [40]:
#this is joined back to the dataset with degrees combined between source and target
final_relationships = create_final_relationships(base_relationship_edges)

In [41]:
degrees[degrees['title']=='CHARLES DICKENS']

Unnamed: 0,title,degree
0,CHARLES DICKENS,1


In [42]:
degrees[degrees['title']=='A CHRISTMAS CAROL']

Unnamed: 0,title,degree
1,A CHRISTMAS CAROL,4


In [43]:
final_relationships.head()

Unnamed: 0,id,human_readable_id,source,target,description,weight,combined_degree,text_unit_ids
0,96623900-0fab-433c-a2f0-f31633e9815e,0,CHARLES DICKENS,A CHRISTMAS CAROL,Charles Dickens is the author of 'A Christmas ...,10.0,5,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
1,5ea79b5a-e72d-437a-9e6e-0e9cb5862823,1,ARTHUR RACKHAM,A CHRISTMAS CAROL,Arthur Rackham is the illustrator of 'A Christ...,8.0,5,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
2,888c2092-07d5-4879-808a-c49f18d27b97,2,J. B. LIPPINCOTT COMPANY,A CHRISTMAS CAROL,J. B. Lippincott Company published 'A Christma...,7.0,5,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
3,38771a5f-fc37-439f-80db-3d89555bb98b,3,A CHRISTMAS CAROL,PROJECT GUTENBERG,Project Gutenberg provides a free eBook versio...,1.0,8,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
4,e9edc25b-d7b7-4198-8cbc-d8710275205f,4,EBENEZER SCROOGE,BOB CRATCHIT,Bob Cratchit works as a clerk for Ebenezer Scr...,8.0,12,[336671e337e5f4539069473e8f8691b3ed696331aabe6...


## Create Final Nodes

In [44]:
from graphrag.config.models.embed_graph_config import EmbedGraphConfig
from graphrag.index.operations.layout_graph.layout_graph import layout_graph
from graphrag.index.operations.embed_graph.embed_graph import embed_graph


def create_final_nodes(
    base_entity_nodes: pd.DataFrame,
    base_relationship_edges: pd.DataFrame,
    base_communities: pd.DataFrame,
    callbacks: WorkflowCallbacks,
    embed_config: EmbedGraphConfig,
    layout_enabled: bool,
) -> pd.DataFrame:
    """All the steps to transform final nodes."""
    graph = create_graph(base_relationship_edges)
    graph_embeddings = None
    if embed_config.enabled:
        graph_embeddings = embed_graph(
            graph,
            embed_config,
        )
    layout = layout_graph(
        graph,
        callbacks,
        layout_enabled,
        embeddings=graph_embeddings,
    )

    degrees = compute_degree(graph)

    nodes = (
        base_entity_nodes.merge(layout, left_on="title", right_on="label", how="left")
        .merge(degrees, on="title", how="left")
        .merge(base_communities, on="title", how="left")
    )
    nodes["level"] = nodes["level"].fillna(0).astype(int)
    nodes["community"] = nodes["community"].fillna(-1).astype(int)
    # disconnected nodes and those with no community even at level 0 can be missing degree
    nodes["degree"] = nodes["degree"].fillna(0).astype(int)
    return nodes.loc[
        :,
        [
            "id",
            "human_readable_id",
            "title",
            "community",
            "level",
            "degree",
            "x",
            "y",
        ],
    ]


In [45]:
base_entity_nodes.head()

Unnamed: 0,index,title,type,text_unit_ids,description,human_readable_id,id
0,0,CHARLES DICKENS,PERSON,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,Charles Dickens is the author of 'A Christmas ...,0,d97af007-fc4c-4f1b-bdfc-fd2484f4db4e
1,1,ARTHUR RACKHAM,PERSON,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,Arthur Rackham is the illustrator of 'A Christ...,1,4c9b10b3-f17f-4947-b5bb-93120b606bcf
2,2,J. B. LIPPINCOTT COMPANY,ORGANIZATION,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,J. B. Lippincott Company is the publisher of '...,2,e59f14f9-c182-4cc9-b8ac-682f3588f135
3,3,UNITED STATES,GEO,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,The United States is a country where the Proje...,3,833ab652-b89e-4187-91fd-1ea4e9061ac8
4,5,A CHRISTMAS CAROL,EVENT,[336671e337e5f4539069473e8f8691b3ed696331aabe6...,A Christmas Carol is a book written by Charles...,4,0dba2f5c-55bc-4024-8526-cd9b3c9f8487


In [46]:
base_communities.head()

Unnamed: 0,level,community,parent,title
0,0,0,-1,BOB CRATCHIT
0,0,0,-1,CAMDEN TOWN
0,0,0,-1,COUNTING-HOUSE
0,0,0,-1,MR. SCROOGE'S NEPHEW
0,0,0,-1,MRS. CRATCHIT


In [47]:
final_nodes = create_final_nodes(
    base_entity_nodes,
    base_relationship_edges,
    base_communities,
    WorkflowCallbacks,
    config.embed_graph,
    False,
)

In [48]:
final_nodes.head()

Unnamed: 0,id,human_readable_id,title,community,level,degree,x,y
0,d97af007-fc4c-4f1b-bdfc-fd2484f4db4e,0,CHARLES DICKENS,-1,0,1,0.0,0.0
1,4c9b10b3-f17f-4947-b5bb-93120b606bcf,1,ARTHUR RACKHAM,-1,0,1,0.0,0.0
2,e59f14f9-c182-4cc9-b8ac-682f3588f135,2,J. B. LIPPINCOTT COMPANY,-1,0,1,0.0,0.0
3,833ab652-b89e-4187-91fd-1ea4e9061ac8,3,UNITED STATES,-1,0,2,0.0,0.0
4,0dba2f5c-55bc-4024-8526-cd9b3c9f8487,4,A CHRISTMAS CAROL,-1,0,4,0.0,0.0


In [49]:
graph = create_graph(base_relationship_edges)

layout = layout_graph(
    graph,
    WorkflowCallbacks,
    False,
    embeddings=False,
)

layout.head()

Unnamed: 0,label,x,y,size
0,CHARLES DICKENS,0,0,0
1,A CHRISTMAS CAROL,0,0,0
2,ARTHUR RACKHAM,0,0,0
3,J. B. LIPPINCOTT COMPANY,0,0,0
4,PROJECT GUTENBERG,0,0,0


In [50]:
layout['x'].sort_values(ascending=False)

0     0
47    0
34    0
35    0
36    0
     ..
26    0
27    0
28    0
29    0
62    0
Name: x, Length: 63, dtype: int64

[Calculates UMAP embedding vectors](https://umap-learn.readthedocs.io/en/latest/how_umap_works.html)

## Create Final Communities

In [51]:
from graphrag.index.flows.create_final_communities import create_final_communities

final_communities = create_final_communities(
    base_entity_nodes,
    base_relationship_edges,
    base_communities
)

In [52]:
final_communities.head()

Unnamed: 0,id,human_readable_id,community,parent,level,title,entity_ids,relationship_ids,text_unit_ids,period,size
0,eeff2a69-66bd-4f61-8f09-5f272556b301,0,0,-1,0,Community 0,"[d5d4ba59-c88b-47ba-8b38-59097ad7aba4, c98db2a...","[3ec63306-9a4e-43e6-b855-4f47afd26635, 5a08bbd...",[28a097ac2ac91a0894f100b3def2d8b38beef06cf1704...,2025-05-07,7
1,9d0b0eb0-3dc1-4b30-9cee-e62e12814f97,1,1,-1,0,Community 1,"[d491e066-151e-44ed-a39d-f74f6ad46e09, 12b89c6...","[193853b0-127c-4c3d-b93a-481ed01302d9, 1a8be9e...",[16a44556cf88ce525c0a958d9957e089c92f4f2c2ca36...,2025-05-07,6
2,442e49d6-0583-4c01-8b98-39add3fdb20e,2,2,-1,0,Community 2,"[d9a0ac2c-a2a5-4d14-97a7-c6adfa5697b2, 56eb3db...","[13d0609b-d021-49b0-ad26-d50842918a93, 47a87df...",[00d55a11708224576f458fb8beaecc5b423f18b2f732e...,2025-05-07,5
3,235ac0df-b571-43b8-ae9a-56b1b3f38c8d,3,3,-1,0,Community 3,"[1ed82571-3430-440d-ba82-0c412bf040dd, ac99724...","[04c487a6-4118-4f9f-b454-8b74b3e657f6, 3e8b6d0...",[336671e337e5f4539069473e8f8691b3ed696331aabe6...,2025-05-07,5
4,4623d29b-d797-4802-a4c7-ad7fafe9c416,4,4,-1,0,Community 4,"[da242f9a-a3b7-4c0e-a89e-46e9649bfb32, ba62dec...","[04b33e3f-c0bb-4d48-85f0-e9b0a7ce9400, 0ff34af...",[12511d755f22193db859eae103d3e970b006a4b6bb233...,2025-05-07,24


## Final Text Units

In [53]:
config.claim_extraction.enabled

False

In [54]:
from graphrag.index.flows.create_final_text_units import create_final_text_units

final_text_units = create_final_text_units(
        chunked_text,
        final_entities,
        final_relationships,
        None,
    )

In [55]:
final_text_units.head()

Unnamed: 0,id,human_readable_id,text,n_tokens,document_ids,entity_ids,relationship_ids
0,336671e337e5f4539069473e8f8691b3ed696331aabe67...,1,﻿The Project Gutenberg eBook of A Christmas Ca...,1200,[77fd5668fcbeb8d240a7816bf00854bd31af91a84d031...,"[d97af007-fc4c-4f1b-bdfc-fd2484f4db4e, 4c9b10b...","[96623900-0fab-433c-a2f0-f31633e9815e, 5ea79b5..."
1,2160a0c64179a7920c578f3400ad64f77c22927e6ab8c7...,2,and thither in\n restless haste and moanin...,1200,[77fd5668fcbeb8d240a7816bf00854bd31af91a84d031...,"[992bb6c6-a655-4da7-8ec0-b419b4952e76, ef5bcc5...","[dd4bb1b5-3283-4061-ba47-eeae3bd342ca, f76fdb3..."
2,d798befe565a9ed5b6b536fd8a95a1d396867b232ec308...,3,"-fisted hand at the grindstone, Scrooge! a\nsq...",1200,[77fd5668fcbeb8d240a7816bf00854bd31af91a84d031...,"[ba62dec7-3e33-4884-b38f-24092af904ca, fbf85ea...","[097c5e07-37f3-4923-94e7-1d10815636d8, d838671..."
3,cc6a8a52ea673776c03f32442c2a05f75b59d30a0bf4c0...,4,'Bah!' again; and followed it up with 'Humbug!...,1200,[77fd5668fcbeb8d240a7816bf00854bd31af91a84d031...,"[ba62dec7-3e33-4884-b38f-24092af904ca, 3c9110b...","[f76fdb3f-9039-4f6c-a840-eb57084511e0, 097c5e0..."
4,1c129c3dd67b1761adbdb4186b2de1036b2e4ff3683e4d...,5,have no doubt his liberality is well represen...,1200,[77fd5668fcbeb8d240a7816bf00854bd31af91a84d031...,"[ba62dec7-3e33-4884-b38f-24092af904ca, 900d1d2...","[aad65924-0577-450a-aa9c-2b65836bec44, 0ff34af..."


## Final Community Reports

In [56]:
from graphrag.index.flows.create_final_community_reports import create_final_community_reports

final_community_reports = await create_final_community_reports(
        final_nodes,
        final_relationships,
        final_entities,
        final_communities,
        None,
        WorkflowCallbacks,
        None,
        config.community_reports.resolved_strategy(config.root_dir),
        async_mode=config.community_reports.async_mode,
        num_threads=4,
    )

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  input.loc[:, NODE_DETAILS] = input.loc[


In [57]:
final_community_reports.head()

Unnamed: 0,id,human_readable_id,community,parent,level,title,summary,full_content,rank,rank_explanation,findings,full_content_json,period,size
0,5f6d2b46a67a4615aa21d463025502a4,6,6,4,1,Ebenezer Scrooge and the Spirits of Christmas,"The community centers around Ebenezer Scrooge,...",# Ebenezer Scrooge and the Spirits of Christma...,8.5,The impact severity rating is high due to the ...,[{'explanation': 'Ebenezer Scrooge is initiall...,"{\n ""title"": ""Ebenezer Scrooge and the Spir...",2025-05-07,17
1,7162bb53122e4b57ba920afa994e62cf,7,7,4,1,Mansion House and the Lord Mayor,The community is centered around the Mansion H...,# Mansion House and the Lord Mayor\n\nThe comm...,4.0,The impact severity rating is moderate due to ...,[{'explanation': 'The Mansion House serves as ...,"{\n ""title"": ""Mansion House and the Lord Ma...",2025-05-07,2
2,30c5125848c74c7ca23ab3f47589e68d,8,8,4,1,Scrooge and Marley: A Haunting Legacy,The community centers around the business part...,# Scrooge and Marley: A Haunting Legacy\n\nThe...,7.5,The impact severity rating is high due to the ...,[{'explanation': 'Scrooge and Marley were part...,"{\n ""title"": ""Scrooge and Marley: A Hauntin...",2025-05-07,3
3,813379bd7dd04dd6a8f504a85251ac00,9,9,4,1,Scrooge's Former Fiancée and Her Family,The community centers around Scrooge's former ...,# Scrooge's Former Fiancée and Her Family\n\nT...,3.0,The impact severity rating is low due to the p...,[{'explanation': 'Scrooge observes the family ...,"{\n ""title"": ""Scrooge's Former Fiancée and ...",2025-05-07,2
4,992bdb5201184e76a8d04b8dd8421da3,0,0,-1,0,The Cratchit Family and Scrooge's Influence,The community centers around the Cratchit fami...,# The Cratchit Family and Scrooge's Influence\...,6.5,The impact severity rating is moderate due to ...,[{'explanation': 'Bob Cratchit is a pivotal fi...,"{\n ""title"": ""The Cratchit Family and Scroo...",2025-05-07,7


In [58]:
from graphrag.index.operations.summarize_communities import (
    prepare_community_reports,
    restore_community_hierarchy,
    summarize_communities,
)
from graphrag.index.flows.create_final_community_reports import _prep_nodes, _prep_edges, _prep_claims

entities_df = final_entities.loc[:, ["id", "description"]]
nodes_df = final_nodes.merge(entities_df, on="id")
nodes = _prep_nodes(nodes_df)
edges = _prep_edges(final_relationships)

local_contexts = prepare_community_reports(
        nodes,
        edges,
        None,
        WorkflowCallbacks,
        config.community_reports.resolved_strategy(config.root_dir).get("max_input_length", 16_000),
    )

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  input.loc[:, NODE_DETAILS] = input.loc[


In [59]:
local_contexts

Unnamed: 0,community,all_context,context_string,context_size,context_exceed_limit,level
0,6,"[{'title': 'BELLE', 'degree': 1, 'node_details...","-----Entities-----\nhuman_readable_id,title,de...",1551,False,1
1,7,"[{'title': 'LORD MAYOR', 'degree': 2, 'node_de...","-----Entities-----\nhuman_readable_id,title,de...",130,False,1
2,8,"[{'title': 'MARLEY', 'degree': 3, 'node_detail...","-----Entities-----\nhuman_readable_id,title,de...",303,False,1
3,9,"[{'title': 'SCROOGE'S FORMER FIANCÉE', 'degree...","-----Entities-----\nhuman_readable_id,title,de...",174,False,1
0,0,"[{'title': 'BOB CRATCHIT', 'degree': 9, 'node_...","-----Entities-----\nhuman_readable_id,title,de...",460,False,0
1,1,"[{'title': 'CAROLINE', 'degree': 3, 'node_deta...","-----Entities-----\nhuman_readable_id,title,de...",353,False,0
2,2,"[{'title': 'CHRISTMAS', 'degree': 4, 'node_det...","-----Entities-----\nhuman_readable_id,title,de...",524,False,0
3,3,"[{'title': 'EBENEZER SCROOGE', 'degree': 3, 'n...","-----Entities-----\nhuman_readable_id,title,de...",428,False,0
4,4,"[{'title': 'BELLE', 'degree': 1, 'node_details...","-----Entities-----\nhuman_readable_id,title,de...",2074,False,0
5,5,"[{'title': 'MRS. DILBER', 'degree': 1, 'node_d...","-----Entities-----\nhuman_readable_id,title,de...",223,False,0


In [60]:
print(config.community_reports.resolved_strategy(config.root_dir).get("extraction_prompt")[:2000])

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License
"""A file containing prompts definition."""

COMMUNITY_REPORT_PROMPT = """
You are an AI assistant that helps a human analyst to perform general information discovery. Information discovery is the process of identifying and assessing relevant information associated with certain entities (e.g., organizations and individuals) within a network.

# Goal
Write a comprehensive report of a community, given a list of entities that belong to the community as well as their relationships and optional associated claims. The report will be used to inform decision-makers about information associated with the community and their potential impact. The content of this report includes an overview of the community's key entities, their legal compliance, technical capabilities, reputation, and noteworthy claims.

# Report Structure

The report should include the following sections:

- TITLE: community's name that represents its k

In [61]:
community_hierarchy = restore_community_hierarchy(nodes)

community_reports = await summarize_communities(
        local_contexts,
        nodes,
        community_hierarchy,
        WorkflowCallbacks,
        None,
        summarization_strategy,
        async_mode=config.community_reports.async_mode,
        num_threads=4,
    )

In [62]:
community_reports.head()

Unnamed: 0,community,full_content,level,rank,title,rank_explanation,summary,findings,full_content_json
0,6,# Ebenezer Scrooge and the Spirits of Christma...,1,8.5,Ebenezer Scrooge and the Spirits of Christmas,The impact severity rating is high due to the ...,"The community centers around Ebenezer Scrooge,...",[{'explanation': 'Ebenezer Scrooge is initiall...,"{\n ""title"": ""Ebenezer Scrooge and the Spir..."
1,7,# Mansion House and the Lord Mayor\n\nThe comm...,1,4.0,Mansion House and the Lord Mayor,The impact severity rating is moderate due to ...,The community is centered around the Mansion H...,[{'explanation': 'The Mansion House serves as ...,"{\n ""title"": ""Mansion House and the Lord Ma..."
2,8,# Scrooge and Marley Business Community\n\nThe...,1,7.5,Scrooge and Marley Business Community,The impact severity rating is high due to the ...,The community centers around the business part...,[{'explanation': 'Scrooge and Marley is the bu...,"{\n ""title"": ""Scrooge and Marley Business C..."
3,9,# Scrooge and His Former Fiancée's Family\n\nT...,1,3.0,Scrooge and His Former Fiancée's Family,The impact severity rating is low due to the p...,The community centers around Scrooge and the f...,[{'explanation': 'Scrooge's observation of his...,"{\n ""title"": ""Scrooge and His Former Fiancé..."
4,0,# The Cratchit Family and Scrooge's Influence\...,0,7.5,The Cratchit Family and Scrooge's Influence,The impact severity rating is high due to the ...,This community centers around the Cratchit fam...,[{'explanation': 'Bob Cratchit is a central fi...,"{\n ""title"": ""The Cratchit Family and Scroo..."


In [63]:
print(community_reports.iloc[0]['full_content'])

# Ebenezer Scrooge and the Spirits of Christmas

The community centers around Ebenezer Scrooge, a character known for his initial miserly nature, and his transformative journey through interactions with various spirits during the Christmas season. Key entities include the Ghosts of Christmas Past, Present, and Yet to Come, who guide Scrooge through visions that lead to his profound change of heart. The narrative unfolds in London, with significant events occurring on Christmas Eve and Christmas Day, highlighting Scrooge's relationships and his eventual embrace of generosity and kindness.

## Ebenezer Scrooge's initial character and transformation

Ebenezer Scrooge is initially depicted as a miserly and cold-hearted individual, particularly during the Christmas season. His solitary lifestyle and wealth are contrasted with his strong dislike for Christmas. However, through a series of ghostly visitations, Scrooge undergoes a profound transformation, becoming generous and kind, especially

In [64]:
print(community_reports.iloc[0]['summary'])

The community centers around Ebenezer Scrooge, a character known for his initial miserly nature, and his transformative journey through interactions with various spirits during the Christmas season. Key entities include the Ghosts of Christmas Past, Present, and Yet to Come, who guide Scrooge through visions that lead to his profound change of heart. The narrative unfolds in London, with significant events occurring on Christmas Eve and Christmas Day, highlighting Scrooge's relationships and his eventual embrace of generosity and kindness.


## Generate Text Embeddings

In [65]:
from graphrag.index.config.embeddings import get_embedded_fields, get_embedding_settings

embedded_fields = get_embedded_fields(config)
text_embed = get_embedding_settings(config.embeddings)

print(embedded_fields)


{'entity.description', 'text_unit.text', 'community.full_content'}


In [66]:
from graphrag.index.operations.embed_text import embed_text

async def _run_and_snapshot_embeddings(
    name: str,
    data: pd.DataFrame,
    embed_column: str,
    callbacks: WorkflowCallbacks,
    cache: None,
    storage: None,
    text_embed_config: dict,
    snapshot_embeddings_enabled: bool,
) -> None:
    """All the steps to generate single embedding."""
    if text_embed_config:
        data["embedding"] = await embed_text(
            data,
            callbacks,
            cache,
            embed_column=embed_column,
            embedding_name=name,
            strategy=text_embed_config["strategy"],
        )

        if snapshot_embeddings_enabled is True:
            data = data.loc[:, ["id", "embedding"]]

In [67]:
from graphrag.index.config.embeddings import (
    community_full_content_embedding,
    community_summary_embedding,
    community_title_embedding,
    document_text_embedding,
    entity_description_embedding,
    entity_title_embedding,
    relationship_description_embedding,
    text_unit_text_embedding,
)

embedding_param_map = {
    document_text_embedding: {
        "data": final_docs.loc[:, ["id", "text"]]
        if final_docs is not None
        else None,
        "embed_column": "text",
    },
    relationship_description_embedding: {
        "data": final_relationships.loc[:, ["id", "description"]]
        if final_relationships is not None
        else None,
        "embed_column": "description",
    },
    text_unit_text_embedding: {
        "data": final_text_units.loc[:, ["id", "text"]]
        if final_text_units is not None
        else None,
        "embed_column": "text",
    },
    entity_title_embedding: {
        "data": final_entities.loc[:, ["id", "title"]]
        if final_entities is not None
        else None,
        "embed_column": "title",
    },
    entity_description_embedding: {
        "data": final_entities.loc[:, ["id", "title", "description"]].assign(
            title_description=lambda df: df["title"] + ":" + df["description"]
        )
        if final_entities is not None
        else None,
        "embed_column": "title_description",
    },
    community_title_embedding: {
        "data": final_community_reports.loc[:, ["id", "title"]]
        if final_community_reports is not None
        else None,
        "embed_column": "title",
    },
    community_summary_embedding: {
        "data": final_community_reports.loc[:, ["id", "summary"]]
        if final_community_reports is not None
        else None,
        "embed_column": "summary",
    },
    community_full_content_embedding: {
        "data": final_community_reports.loc[:, ["id", "full_content"]]
        if final_community_reports is not None
        else None,
        "embed_column": "full_content",
    },
}


In [68]:
embedding_param_map.keys()

dict_keys(['document.text', 'relationship.description', 'text_unit.text', 'entity.title', 'entity.description', 'community.title', 'community.summary', 'community.full_content'])

In [69]:
embedding_param_map['community.full_content']['data']

Unnamed: 0,id,full_content
0,5f6d2b46a67a4615aa21d463025502a4,# Ebenezer Scrooge and the Spirits of Christma...
1,7162bb53122e4b57ba920afa994e62cf,# Mansion House and the Lord Mayor\n\nThe comm...
2,30c5125848c74c7ca23ab3f47589e68d,# Scrooge and Marley: A Haunting Legacy\n\nThe...
3,813379bd7dd04dd6a8f504a85251ac00,# Scrooge's Former Fiancée and Her Family\n\nT...
4,992bdb5201184e76a8d04b8dd8421da3,# The Cratchit Family and Scrooge's Influence\...
5,cbd3f1243bc34c7cb26e68bdd8f3dfcf,# Spirit's Journey with Mr. Scrooge and Caroli...
6,58c137d30ae549558554c8d11ef72c3c,# Scrooge and the Spirit of Christmas\n\nThe c...
7,b1d5f1c4feee48ea902bf40be78ec2dd,# Ebenezer Scrooge and Fred's Dinner Party\n\n...
8,7078a271b13b4769a17b138ac36c1424,# Ebenezer Scrooge and His Transformative Jour...
9,1e6c0e742ce64dbdbeae4dd25c980783,# Old Joe and the Unhappy Man's Estate\n\nThe ...


In [70]:
for field in embedded_fields:
    await _run_and_snapshot_embeddings(
        name=field,
        callbacks=WorkflowCallbacks,
        cache=None,
        storage=None,
        text_embed_config=text_embed,
        snapshot_embeddings_enabled=False,
        **embedding_param_map[field],
    )

In [71]:
embedding_param_map['community.full_content']['data']

Unnamed: 0,id,full_content,embedding
0,5f6d2b46a67a4615aa21d463025502a4,# Ebenezer Scrooge and the Spirits of Christma...,"[-0.014818704687058926, 0.008277944289147854, ..."
1,7162bb53122e4b57ba920afa994e62cf,# Mansion House and the Lord Mayor\n\nThe comm...,"[0.017740551382303238, 0.005728313699364662, 0..."
2,30c5125848c74c7ca23ab3f47589e68d,# Scrooge and Marley: A Haunting Legacy\n\nThe...,"[0.00380644085817039, 0.02471626177430153, 0.0..."
3,813379bd7dd04dd6a8f504a85251ac00,# Scrooge's Former Fiancée and Her Family\n\nT...,"[0.0033682140056043863, 0.019343266263604164, ..."
4,992bdb5201184e76a8d04b8dd8421da3,# The Cratchit Family and Scrooge's Influence\...,"[0.010107767768204212, 0.02027149498462677, 0...."
5,cbd3f1243bc34c7cb26e68bdd8f3dfcf,# Spirit's Journey with Mr. Scrooge and Caroli...,"[0.02352629229426384, 0.029753072187304497, 0...."
6,58c137d30ae549558554c8d11ef72c3c,# Scrooge and the Spirit of Christmas\n\nThe c...,"[-0.0022218171507120132, 0.02098696306347847, ..."
7,b1d5f1c4feee48ea902bf40be78ec2dd,# Ebenezer Scrooge and Fred's Dinner Party\n\n...,"[-0.010625563561916351, 0.013428626582026482, ..."
8,7078a271b13b4769a17b138ac36c1424,# Ebenezer Scrooge and His Transformative Jour...,"[-0.00757161621004343, 0.007154949475079775, 0..."
9,1e6c0e742ce64dbdbeae4dd25c980783,# Old Joe and the Unhappy Man's Estate\n\nThe ...,"[0.010569276288151741, 0.02559269405901432, -0..."


## Query an index

To query an index, several index files must first be read into memory and passed to the query API. 

In [72]:
from pathlib import Path

def _load_search_prompt(root_dir: str, prompt_config: str | None) -> str | None:
    """
    Load the search prompt from disk if configured.

    If not, leave it empty - the search functions will load their defaults.

    """
    if prompt_config:
        prompt_file = Path(root_dir) / prompt_config
        if prompt_file.exists():
            return prompt_file.read_bytes().decode(encoding="utf-8")
    return None

In [73]:
from graphrag.query.indexer_adapters import (
    read_indexer_communities,
    read_indexer_entities,
    read_indexer_reports,
) 

community_level = 2

communities_ = read_indexer_communities(final_communities, final_nodes, final_community_reports)
reports = read_indexer_reports(
        final_community_reports,
        final_nodes,
        community_level=community_level,
        dynamic_community_selection=False,
    )
entities_ = read_indexer_entities(final_nodes, final_entities, community_level=community_level)
map_prompt = _load_search_prompt(config.root_dir, config.global_search.map_prompt)
reduce_prompt = _load_search_prompt(
        config.root_dir, config.global_search.reduce_prompt
    )
knowledge_prompt = _load_search_prompt(
        config.root_dir, config.global_search.knowledge_prompt
    )

In [74]:
final_entities.head()

Unnamed: 0,id,human_readable_id,title,type,description,text_unit_ids
0,d97af007-fc4c-4f1b-bdfc-fd2484f4db4e,0,CHARLES DICKENS,PERSON,Charles Dickens is the author of 'A Christmas ...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
1,4c9b10b3-f17f-4947-b5bb-93120b606bcf,1,ARTHUR RACKHAM,PERSON,Arthur Rackham is the illustrator of 'A Christ...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
2,e59f14f9-c182-4cc9-b8ac-682f3588f135,2,J. B. LIPPINCOTT COMPANY,ORGANIZATION,J. B. Lippincott Company is the publisher of '...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
3,833ab652-b89e-4187-91fd-1ea4e9061ac8,3,UNITED STATES,GEO,The United States is a country where the Proje...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...
4,0dba2f5c-55bc-4024-8526-cd9b3c9f8487,4,A CHRISTMAS CAROL,EVENT,A Christmas Carol is a book written by Charles...,[336671e337e5f4539069473e8f8691b3ed696331aabe6...


In [75]:
entities_[:3]

[Entity(id='0ad90724-a842-44dd-81b2-29f5b2cf351d', short_id='42', title='TOPPER', type='PERSON', description="Topper is a character known for his playful nature, particularly during the Christmas games, where he is an enthusiastic participant. He is especially noted for his involvement in the game of blind man's-buff. Additionally, Topper is interested in one of Scrooge's niece's sisters and often comments on his status as a bachelor.", description_embedding=None, name_embedding=None, community_ids=['3'], text_unit_ids=['00d55a11708224576f458fb8beaecc5b423f18b2f732ea164ffd2bf14b3aa6aa0429abfcf832770eaa8517384dc1c61e03b841d0189e8d02c2352117ab187919', '23b16a51cf44ece4fa8cf953c9d5b9cd6ec5fc1769d5a8ae7b8f6659a883f4c971d2e22f716e984a51b7e87bef1e3c3bd4c2d2d30d3750ce547205d48f0a764e'], rank=3, attributes=None),
 Entity(id='0dba2f5c-55bc-4024-8526-cd9b3c9f8487', short_id='4', title='A CHRISTMAS CAROL', type='EVENT', description='A Christmas Carol is a book written by Charles Dickens, first pu

In [76]:
reports

[CommunityReport(id='5f6d2b46a67a4615aa21d463025502a4', short_id='6', title='Ebenezer Scrooge and the Spirits of Christmas', community_id='6', summary='The community centers around Ebenezer Scrooge, a character known for his initial miserly nature, and his transformative journey through interactions with various spirits. Key entities include the Ghosts of Christmas Past, Present, and Yet to Come, who guide Scrooge through visions that lead to his change of heart. The narrative unfolds in London, with significant events occurring on Christmas Eve and Christmas Day, highlighting themes of redemption and the spirit of Christmas.', full_content="# Ebenezer Scrooge and the Spirits of Christmas\n\nThe community centers around Ebenezer Scrooge, a character known for his initial miserly nature, and his transformative journey through interactions with various spirits. Key entities include the Ghosts of Christmas Past, Present, and Yet to Come, who guide Scrooge through visions that lead to his 

In [77]:
communities_

[Community(id='eeff2a69-66bd-4f61-8f09-5f272556b301', short_id='0', title='Community 0', level='0', entity_ids=None, relationship_ids=None, covariate_ids=None, sub_community_ids=[], attributes=None, size=None, period=None),
 Community(id='9d0b0eb0-3dc1-4b30-9cee-e62e12814f97', short_id='1', title='Community 1', level='0', entity_ids=None, relationship_ids=None, covariate_ids=None, sub_community_ids=[], attributes=None, size=None, period=None),
 Community(id='442e49d6-0583-4c01-8b98-39add3fdb20e', short_id='2', title='Community 2', level='0', entity_ids=None, relationship_ids=None, covariate_ids=None, sub_community_ids=[], attributes=None, size=None, period=None),
 Community(id='235ac0df-b571-43b8-ae9a-56b1b3f38c8d', short_id='3', title='Community 3', level='0', entity_ids=None, relationship_ids=None, covariate_ids=None, sub_community_ids=[], attributes=None, size=None, period=None),
 Community(id='4623d29b-d797-4802-a4c7-ad7fafe9c416', short_id='4', title='Community 4', level='0', enti

In [78]:
from graphrag.query.factory import get_global_search_engine
from graphrag.query.structured_search.base import SearchResult

search_engine = get_global_search_engine(
        config,
        reports=reports,
        entities=entities_,
        communities=communities_,
        response_type="Multiple Paragraphs",
        dynamic_community_selection=False,
        map_system_prompt=map_prompt,
        reduce_system_prompt=reduce_prompt,
        general_knowledge_inclusion_prompt=knowledge_prompt,
    )
result: SearchResult = await search_engine.asearch(query="Who is Scrooge and what are his main relationships?")
response = result.response

creating llm client with {'api_key': 'REDACTED,len=164', 'type': "openai_chat", 'encoding_model': 'cl100k_base', 'model': 'gpt-4o', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'audience': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 50000, 'requests_per_minute': 1000, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25, 'responses': None}


In [79]:
print(response)

### Ebenezer Scrooge: Character Overview

Ebenezer Scrooge is a central character known for his initial miserly and cold-hearted nature, particularly during the Christmas season. He is depicted as a character who prioritizes wealth over the festive spirit, working in his counting-house on Christmas Eve [Data: Reports (6, 2)]. However, Scrooge undergoes a significant transformation through interactions with the Ghosts of Christmas Past, Present, and Yet to Come. These spirits guide him through visions that lead to his change of heart, highlighting themes of redemption and the spirit of Christmas [Data: Reports (6)].

### Key Relationships

#### Jacob Marley

Scrooge's relationship with Jacob Marley, his deceased business partner, is significant. Marley's ghostly visit serves as a catalyst for Scrooge's journey of reflection and transformation, emphasizing the depth of their relationship beyond just business [Data: Reports (6, 8)].

#### Fred

Fred, Scrooge's nephew, plays an important r

### Examining the Query

In [81]:
import tiktoken
#default_llm_settings = config.get_language_model_config("default_chat_model")
from graphrag.query.structured_search.global_search.community_context import (
    GlobalCommunityContext,
)

# Here we get encoding based on specified encoding name
token_encoder = tiktoken.get_encoding(config.encoding_model)

cb = context_builder=GlobalCommunityContext(
            community_reports=reports,
            communities=communities_,
            entities=entities_,
            token_encoder=token_encoder,
            dynamic_community_selection=False,
        )




In [None]:
llm_calls, prompt_tokens, output_tokens = {}, {}, {}
query="Who is Scrooge and what are his main relationships?"


context_result = await cb.build_context(
    query=query,
    conversation_history=None,
)
llm_calls["build_context"] = context_result.llm_calls
prompt_tokens["build_context"] = context_result.prompt_tokens
output_tokens["build_context"] = context_result.output_tokens

In [95]:
print(len(context_result.context_chunks))

1


In [86]:
from graphrag.prompts.query.global_search_map_system_prompt import (
    MAP_SYSTEM_PROMPT,
)
from graphrag.query.llm.get_client import get_llm

DEFAULT_MAP_LLM_PARAMS = {
    "max_tokens": 1000,
    "temperature": 0.0,
}

sllm = get_llm(config)

for data in context_result.context_chunks:
    search_prompt = ""
    search_prompt = MAP_SYSTEM_PROMPT.format(context_data=data)
    search_messages = [
        {"role": "system", "content": search_prompt},
        {"role": "user", "content": query},
    ]
    search_response = await sllm.agenerate(
            messages=search_messages, streaming=False, **DEFAULT_MAP_LLM_PARAMS
        )

creating llm client with {'api_key': 'REDACTED,len=164', 'type': "openai_chat", 'encoding_model': 'cl100k_base', 'model': 'gpt-4o', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'audience': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 50000, 'requests_per_minute': 1000, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25, 'responses': None}


In [87]:
search_response

'{\n    "points": [\n        {\n            "description": "Ebenezer Scrooge is a central character known for his initial miserly nature and his transformative journey through interactions with various spirits. His story unfolds in London, with significant events occurring on Christmas Eve and Christmas Day, highlighting themes of redemption and the spirit of Christmas [Data: Reports (6)].",\n            "score": 90\n        },\n        {\n            "description": "Scrooge\'s main relationships include interactions with the Ghosts of Christmas Past, Present, and Yet to Come, who guide him through visions that lead to his change of heart [Data: Reports (6)].",\n            "score": 85\n        },\n        {\n            "description": "Scrooge has a significant relationship with his nephew and niece, as well as Jacob Marley, his former business partner. These relationships highlight themes of redemption, familial bonds, and the celebration of Christmas [Data: Reports (2)].",\n        

In [89]:
import json
from graphrag.query.llm.text_utils import num_tokens, try_parse_json_object
from typing import Any


def parse_search_response(search_response: str) -> list[dict[str, Any]]:
    """Parse the search response json and return a list of key points.

    Parameters
    ----------
    search_response: str
        The search response json string

    Returns
    -------
    list[dict[str, Any]]
        A list of key points, each key point is a dictionary with "answer" and "score" keys
    """
    search_response, j = try_parse_json_object(search_response)
    if j == {}:
        return [{"answer": "", "score": 0}]

    parsed_elements = json.loads(search_response).get("points")
    if not parsed_elements or not isinstance(parsed_elements, list):
        return [{"answer": "", "score": 0}]

    return [
        {
            "answer": element["description"],
            "score": int(element["score"]),
        }
        for element in parsed_elements
        if "description" in element and "score" in element
    ]

In [90]:
processed_response = parse_search_response(search_response)

search_result = SearchResult(
                response=processed_response,
                context_data=data,
                context_text=data,
                completion_time=1,
                llm_calls=1,
                prompt_tokens=num_tokens(search_prompt, token_encoder),
                output_tokens=num_tokens(search_response, token_encoder),
            )

In [91]:
search_result.response

[{'answer': 'Ebenezer Scrooge is a central character known for his initial miserly nature and his transformative journey through interactions with various spirits. His story unfolds in London, with significant events occurring on Christmas Eve and Christmas Day, highlighting themes of redemption and the spirit of Christmas [Data: Reports (6)].',
  'score': 90},
 {'answer': "Scrooge's main relationships include interactions with the Ghosts of Christmas Past, Present, and Yet to Come, who guide him through visions that lead to his change of heart [Data: Reports (6)].",
  'score': 85},
 {'answer': 'Scrooge has a significant relationship with his nephew and niece, as well as Jacob Marley, his former business partner. These relationships highlight themes of redemption, familial bonds, and the celebration of Christmas [Data: Reports (2)].',
  'score': 80},
 {'answer': "Scrooge's relationship with the Cratchit family, particularly Bob Cratchit, is important. The Cratchit family is influenced 