# Visualizing the knowledge graph with `yfiles-jupyter-graphs`

This notebook shows how to use `yfiles-jupyter-graphs` to add interactive graph visualizations of the parquet files and how to visualize the result context of `graphrag` queries (see at the end of this notebook).

In [30]:
# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License.

In [31]:
import os

import pandas as pd
import tiktoken

from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
from graphrag.query.indexer_adapters import (
    read_indexer_covariates,
    read_indexer_entities,
    read_indexer_relationships,
    read_indexer_reports,
    read_indexer_text_units,
)
from graphrag.query.input.loaders.dfs import (
    store_entity_semantic_embeddings,
)
from graphrag.query.llm.oai.chat_openai import ChatOpenAI
from graphrag.query.llm.oai.embedding import OpenAIEmbedding
from graphrag.query.llm.oai.typing import OpenaiApiType
from graphrag.query.structured_search.local_search.mixed_context import (
    LocalSearchMixedContext,
)
from graphrag.query.structured_search.local_search.search import LocalSearch
from graphrag.vector_stores.lancedb import LanceDBVectorStore

## Local Search Example

Local search method generates answers by combining relevant data from the AI-extracted knowledge-graph with text chunks of the raw documents. This method is suitable for questions that require an understanding of specific entities mentioned in the documents (e.g. What are the healing properties of chamomile?).

### Load text units and graph data tables as context for local search

- In this test we first load indexing outputs from parquet files to dataframes, then convert these dataframes into collections of data objects aligning with the knowledge model.

### Load tables to dataframes

In [32]:
INPUT_DIR = "./../sample-output/output/20240812-215728/artifacts"
LANCEDB_URI = f"{INPUT_DIR}/lancedb"

COMMUNITY_REPORT_TABLE = "create_final_community_reports"
ENTITY_TABLE = "create_final_nodes"
ENTITY_EMBEDDING_TABLE = "create_final_entities"
RELATIONSHIP_TABLE = "create_final_relationships"
COVARIATE_TABLE = "create_final_covariates"
TEXT_UNIT_TABLE = "create_final_text_units"
COMMUNITY_LEVEL = 2

#### Read entities

In [33]:
# read nodes table to get community and degree data
entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
entity_embedding_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_EMBEDDING_TABLE}.parquet")

#### Read relationships

In [34]:
relationship_df = pd.read_parquet(f"{INPUT_DIR}/{RELATIONSHIP_TABLE}.parquet")
relationships = read_indexer_relationships(relationship_df)

# Visualizing nodes and relationships with `yfiles-jupyter-graphs`

`yfiles-jupyter-graphs` is a graph visualization extension that provides interactive and customizable visualizations for structured node and relationship data.

In this case, we use it to provide an interactive visualization for the knowledge graph of the Local Search sample by passing node and relationship lists converted from the given parquet files. The requirements for the input data is an `id` attribute for the nodes and `start`/`end` properties for the relationships that correspond to the node ids. Additional attributes can be added in the `properties` of each node/relationship dict:

In [35]:
from yfiles_jupyter_graphs import GraphWidget


# converts the entities dataframe to a list of dicts for yfiles-jupyter-graphs
def convert_entities_to_dicts(df):
    """Convert the entities dataframe to a list of dicts for yfiles-jupyter-graphs."""

    if 'graph_embedding' in entity_df.columns:
        df = df.drop('graph_embedding', axis=1)

    nodes_dict = {}
    for _, row in df.iterrows():
        # Create a dictionary for each row and collect unique nodes
        node_id = row["title"]
        if node_id not in nodes_dict:
            nodes_dict[node_id] = {
                "id": node_id,
                "properties": row.to_dict(),
            }
    return list(nodes_dict.values())


# converts the relationships dataframe to a list of dicts for yfiles-jupyter-graphs
def convert_relationships_to_dicts(df):
    """Convert the relationships dataframe to a list of dicts for yfiles-jupyter-graphs."""
    relationships = []
    for _, row in df.iterrows():
        # Create a dictionary for each row
        relationships.append({
            "start": row["source"],
            "end": row["target"],
            "properties": row.to_dict(),
        })
    return relationships


w = GraphWidget()
w.directed = True
w.nodes = convert_entities_to_dicts(entity_df)
w.edges = convert_relationships_to_dicts(relationship_df)

## Configure data-driven visualization

The additional properties can be used to configure the visualization for different use cases.

In [36]:
# show title on the node
w.node_label_mapping = "title"


# map community to a color
def community_to_color(community):
    """Map a community to a color."""
    colors = [
        "crimson",
        "darkorange",
        "indigo",
        "cornflowerblue",
        "cyan",
        "teal",
        "green",
    ]
    return (
        colors[int(community) % len(colors)] if community is not None else "lightgray"
    )


def edge_to_source_community(edge):
    """Get the community of the source node of an edge."""
    source_node = next(
        (entry for entry in w.nodes if entry["properties"]["title"] == edge["start"]),
        None,
    )
    source_node_community = source_node["properties"]["community"]
    return source_node_community if source_node_community is not None else None


w.node_color_mapping = lambda node: community_to_color(node["properties"]["community"])
w.edge_color_mapping = lambda edge: community_to_color(edge_to_source_community(edge))
# map size data to a reasonable factor
w.node_scale_factor_mapping = lambda node: 0.5 + node["properties"]["size"] * 1.5 / 20
# use weight for edge thickness
w.edge_thickness_factor_mapping = "weight"

## Automatic layouts

The widget provides different automatic layouts that serve different purposes: `Circular`, `Hierarchic`, `Organic (interactiv or static)`, `Orthogonal`, `Radial`, `Tree`, `Geo-spatial`.

For the knowledge graph, this sample uses the `Circular` layout, though `Hierarchic` or `Organic` are also suitable choices.

In [37]:
# Use the circular layout for this visualization. For larger graphs, the default organic layout is often preferable.
w.organic_layout()

## Display the graph

In [38]:
display(w)

GraphWidget(layout=Layout(height='800px', width='100%'))

# Visualizing the result context of `graphrag` queries

The result context of `graphrag` queries allow to inspect the context graph of the request. This data can similarly be visualized as graph with `yfiles-jupyter-graphs`.

## Making the request

The following performs Local Search

In [39]:
# setup (see also ../../local_search.ipynb)
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

entities = read_indexer_entities(entity_df, entity_embedding_df, COMMUNITY_LEVEL)

description_embedding_store = LanceDBVectorStore(
    collection_name="entity_description_embeddings",
)
description_embedding_store.connect(db_uri=LANCEDB_URI)
entity_description_embeddings = store_entity_semantic_embeddings(
    entities=entities, vectorstore=description_embedding_store
)
covariate_df = pd.read_parquet(f"{INPUT_DIR}/{COVARIATE_TABLE}.parquet")
claims = read_indexer_covariates(covariate_df)
covariates = {"claims": claims}
report_df = pd.read_parquet(f"{INPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")
reports = read_indexer_reports(report_df, entity_df, COMMUNITY_LEVEL)
text_unit_df = pd.read_parquet(f"{INPUT_DIR}/{TEXT_UNIT_TABLE}.parquet")
text_units = read_indexer_text_units(text_unit_df)

api_key = os.getenv("GRAPHRAG_API_KEY")
llm_model = os.getenv("GRAPHRAG_LLM_MODEL")
llm_deployment = os.getenv("GRAPHRAG_LLM_DEPLOYMENT")
embedding_model = os.getenv("GRAPHRAG_EMBEDDING_MODEL")
embedding_deployment = os.getenv("GRAPHRAG_EMBEDDING_DEPLOYMENT")
api_base = os.getenv("GRAPHRAG_API_BASE")
api_version = os.getenv("GRAPHRAG_API_VERSION")

llm = ChatOpenAI(
    api_key=api_key,
    model=llm_model,
    deployment_name=llm_deployment,
    api_version=api_version,
    api_type=OpenaiApiType.AzureOpenAI,  # OpenaiApiType.OpenAI or OpenaiApiType.AzureOpenAI
    api_base=api_base,
    max_retries=20,
)

token_encoder = tiktoken.get_encoding("cl100k_base")

text_embedder = OpenAIEmbedding(
    api_key=api_key,
    api_base=api_base,
    api_version=api_version,
    api_type=OpenaiApiType.AzureOpenAI,
    model=embedding_model,
    deployment_name=embedding_deployment,
    max_retries=20,
)

context_builder = LocalSearchMixedContext(
    community_reports=reports,
    text_units=text_units,
    entities=entities,
    relationships=relationships,
    covariates=covariates,
    entity_text_embeddings=description_embedding_store,
    embedding_vectorstore_key=EntityVectorStoreKey.ID,  # if the vectorstore uses entity title as ids, set this to EntityVectorStoreKey.TITLE
    text_embedder=text_embedder,
    token_encoder=token_encoder,
)

local_context_params = {
    "text_unit_prop": 0.5,
    "community_prop": 0.1,
    "conversation_history_max_turns": 5,
    "conversation_history_user_turns_only": True,
    "top_k_mapped_entities": 10,
    "top_k_relationships": 10,
    "include_entity_rank": True,
    "include_relationship_weight": True,
    "include_community_rank": False,
    "return_candidate_context": False,
    "embedding_vectorstore_key": EntityVectorStoreKey.ID,  # set this to EntityVectorStoreKey.TITLE if the vectorstore uses entity title as ids
    "max_tokens": 12_000,  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)
}

llm_params = {
    "max_tokens": 2_000,  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 1000=1500)
    "temperature": 0.0,
}

search_engine = LocalSearch(
    llm=llm,
    context_builder=context_builder,
    token_encoder=token_encoder,
    llm_params=llm_params,
    context_builder_params=local_context_params,
    response_type="multiple paragraphs",  # free form text describing the response type and format, can be anything, e.g. prioritized list, single paragraph, multiple paragraphs, multiple-page report
)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  entity_df["community"] = entity_df["community"].fillna(-1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  entity_df["community"] = entity_df["community"].astype(int)


## Run local search on sample queries

In [40]:
result = await search_engine.asearch("Tell me about Scrooge?")
print(result.response)

# Scrooge: A Character Study

## Introduction

Ebenezer Scrooge is the central character in the narrative, known for his initial miserly and cynical disposition towards Christmas and life in general. His journey from a cold-hearted miser to a warm, generous individual is a pivotal element of the story, showcasing themes of redemption, transformation, and the true spirit of Christmas.

## Scrooge's Initial Cynicism

At the beginning of the narrative, Scrooge is depicted as a miserly and uncharitable man who detests Christmas, often referring to it as a "humbug." His disdain for the holiday is deeply rooted in his past experiences and personal reflections, many of which are centered around Christmas. This negative outlook is evident in his interactions with others, including his refusal to donate to the poor and his harsh treatment of his clerk, Bob Cratchit [Data: Relationships (158); Sources (3)].

## Transformation Through Supernatural Encounters

Scrooge's transformation begins with 

## Inspecting the context data used to generate the response

In [41]:
result.context_data["entities"].head()

Unnamed: 0,id,entity,description,number of relationships,in_context
0,176,GROCER,,1,True
1,170,FRUITERER'S,The Fruiterer's is a shop that is radiant with...,1,True
2,75,POULTERERS' AND GROCERS' TRADES,Shops that are part of a splendid joke and a g...,1,True
3,163,PLENTY'S HORN,Plenty's Horn is a symbolic reference to abund...,1,True
4,87,APOSTLES,Figures depicted on the Dutch tiles in Scrooge...,0,True


In [42]:
result.context_data["relationships"].head()

Unnamed: 0,id,source,target,description,weight,rank,links,in_context
0,163,SCROOGE,GENTLEMAN,Scrooge and the gentleman have a notable inter...,2.0,118,3,True
1,223,SCROOGE,INFAMOUS RESORT,Scrooge visits an infamous part of the town,1.0,118,3,True
2,205,SCROOGE,FOUNDER OF THE FEAST,Scrooge is referred to as the Founder of the F...,1.0,117,3,True
3,261,CHRISTMAS,GROCER'S,The Grocer's is nearly closed but filled with ...,1.0,32,2,True
4,263,CHRISTMAS,FRUITERER'S,The Fruiterer's is radiant with fruits and oth...,1.0,32,2,True


In [43]:
result = await search_engine.asearch("what is the data you have")
print(result.response)



### Data Overview

I have access to several datasets related to "A Christmas Carol" and Project Gutenberg. Below is a summary of the available data:

#### Entities
This dataset includes various characters and entities from "A Christmas Carol" and Project Gutenberg. Each entity has a unique ID, a description, and the number of relationships it has with other entities.

- **Characters from "A Christmas Carol"**: Includes main characters like Ebenezer Scrooge, Bob Cratchit, and the Three Spirits, as well as minor characters like the Charwoman and Peter.
- **Project Gutenberg Entities**: Includes Project Gutenberg itself, its trademark, license, and key figures like Michael S. Hart and Suzanne Shell.

#### Relationships
This dataset outlines the connections between different entities. Each relationship has a source, target, description, weight, rank, and links.

- **Character Interactions**: Details how characters in "A Christmas Carol" interact with each other, such as Scrooge's interacti

In [44]:
result = await search_engine.asearch("who is the main character of this story?")
print(result.response)



The main character of "A Christmas Carol" is Ebenezer Scrooge. Scrooge is a miserly and cold-hearted old man who undergoes a profound transformation over the course of the story. His journey from a life of greed and isolation to one of generosity and warmth is central to the narrative.

### Scrooge's Relationships and Interactions

Scrooge's interactions with other characters highlight his initial disdain for Christmas and human connection. For instance, his nephew Fred consistently invites him to Christmas dinner, embodying the spirit of generosity and familial love, despite Scrooge's initial rejections [Data: Relationships (74, 155); Entities (16, 52)]. Scrooge's relationship with his sister Fan, who brings him home for Christmas, also underscores the familial bonds that he has neglected [Data: Relationships (129); Entities (30)].

### Transformation on Christmas Day

The pivotal moment in Scrooge's transformation occurs on Christmas Day, following his encounters with the spirits. Th

In [45]:
result = await search_engine.asearch("How is the main character connected to other characters in the story?")
print(result.response)

# Connections of the Main Character in "A Christmas Carol"

## Ebenezer Scrooge's Relationships

Ebenezer Scrooge, the main character in "A Christmas Carol," is intricately connected to various other characters, each relationship highlighting different aspects of his personality and journey.

### Scrooge and Fezziwig

One of the most significant relationships in Scrooge's past is with Fezziwig, his former employer during his apprenticeship. Fezziwig is depicted as a jovial and benevolent figure who hosts lively Christmas Eve celebrations, contrasting sharply with Scrooge's later disdain for the holiday. This relationship is crucial in understanding Scrooge's initial exposure to the festive and generous spirit, which he later rejects [Data: Entities (37); Relationships (137, 139, 195, 326)].

### Scrooge and Fred

Fred, Scrooge's nephew, represents the warmth and familial love that Scrooge has distanced himself from. Despite Scrooge's unpleasant nature and his disdain for Christmas, Fre

In [52]:
result = await search_engine.asearch("Did someone become richer by someone becoming poorer?")
print(result.response)



The concept of someone becoming richer at the expense of someone else becoming poorer is often discussed in economic and social contexts. In the data provided, there is a specific instance that illustrates this dynamic.

### The Case of the Woman and the Deceased Man

In the provided data, there is a claim involving a woman who admitted to stealing bed-curtains, blankets, and a shirt from a deceased man's room [Data: Claims (77)]. This act of theft directly benefits the woman at the expense of the deceased man's estate, illustrating a situation where one party becomes richer (in terms of material possessions) by making another party poorer.

### Relationships and Interactions

The woman has interactions with two other entities, Joe and an unhappy man. Joe appraises and buys items brought by the woman, which likely includes the stolen goods [Data: Relationships (93)]. This transaction further enriches Joe, who profits from reselling the items, while the original owner (the deceased man)

## Visualizing the result context as graph

In [46]:
"""
Helper function to visualize the result context with `yfiles-jupyter-graphs`.

The dataframes are converted into supported nodes and relationships lists and then passed to yfiles-jupyter-graphs.
Additionally, some values are mapped to visualization properties.
"""


def show_graph(result):
    """Visualize the result context with yfiles-jupyter-graphs."""
    from yfiles_jupyter_graphs import GraphWidget

    if (
        "entities" not in result.context_data
        or "relationships" not in result.context_data
    ):
        msg = "The passed results do not contain 'entities' or 'relationships'"
        raise ValueError(msg)

    # converts the entities dataframe to a list of dicts for yfiles-jupyter-graphs
    def convert_entities_to_dicts(df):
        """Convert the entities dataframe to a list of dicts for yfiles-jupyter-graphs."""
        nodes_dict = {}
        for _, row in df.iterrows():
            # Create a dictionary for each row and collect unique nodes
            node_id = row["entity"]
            if node_id not in nodes_dict:
                nodes_dict[node_id] = {
                    "id": node_id,
                    "properties": row.to_dict(),
                }
        return list(nodes_dict.values())

    # converts the relationships dataframe to a list of dicts for yfiles-jupyter-graphs
    def convert_relationships_to_dicts(df):
        """Convert the relationships dataframe to a list of dicts for yfiles-jupyter-graphs."""
        relationships = []
        for _, row in df.iterrows():
            # Create a dictionary for each row
            relationships.append({
                "start": row["source"],
                "end": row["target"],
                "properties": row.to_dict(),
            })
        return relationships

    w = GraphWidget()
    # use the converted data to visualize the graph
    w.nodes = convert_entities_to_dicts(result.context_data["entities"])
    w.edges = convert_relationships_to_dicts(result.context_data["relationships"])
    w.directed = True
    # show title on the node
    w.node_label_mapping = "entity"
    # use weight for edge thickness
    w.edge_thickness_factor_mapping = "weight"
    display(w)



In [47]:
# show_graph(result)