# Entity Relationship Extraction

This guides explain the default implementation of the Entity Relationship Extraction.

The component can be customized in multiple ways including full replacement by an
implementation that follows the same protocol.

In [1]:
import os

from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

True

## Load Sample TextUnits DataFrame

In [2]:
import pandas as pd

df_text_units = pd.read_parquet("sample-data/base_text_units.parquet")

# let's work only with a subset of the data
# for this guide to avoid any unnecessary LLM cost

df_text_units = df_text_units[0:3]

df_text_units.head()

Unnamed: 0,id,document_id,text_unit
0,f28e49bc-5b67-46b3-b971-6d6cb2832790,a0192baf-d76a-40d4-bcd3-437127eef568,"﻿ A CHRISTMAS CAROL\n\n [Illustration: _""How..."
1,6fae26d7-9b26-4f79-ac78-970e69fcab95,a0192baf-d76a-40d4-bcd3-437127eef568,"at the grindstone, Scrooge! a\nsqueezing, wre..."
2,c93ae0c0-c8c3-49a9-beb0-a1e3b74efa0a,a0192baf-d76a-40d4-bcd3-437127eef568,dismal? What reason have you to be morose? You...


## The default implementation

In [3]:
from langchain_graphrag.indexing.graph_generation import EntityRelationshipExtractor

We first need to create an LLM to pass to `EntityRelationshipExtractor`

In [4]:
from langchain_openai import ChatOpenAI
from langchain_community.cache import SQLiteCache

openai_api_key = os.getenv("LANGCHAIN_GRAPHRAG_OPENAI_CHAT_API_KEY", None)

if openai_api_key is None:
    raise ValueError("Please set the LANGCHAIN_GRAPHRAG_OPENAI_CHAT_API_KEY environment variable")

er_llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.0,
    api_key=openai_api_key,
    cache=SQLiteCache("openai_cache.db"), # always a good idea to use Cache
)

# There is a static method provide to build the default extractor
extractor = EntityRelationshipExtractor.build_default(llm=er_llm)

We now run the extractor on the dataframe

In [5]:
text_unit_graphs = extractor.invoke(df_text_units)

Extracting entities and relationships ...: 100%|██████████| 3/3 [00:00<00:00, 20.16it/s]


Let's see how many nodes and edges we got for each text unit.

In [6]:
for index, g in enumerate(text_unit_graphs):
    print("---------------------------------")
    print(f"Graph: {index}")
    print(f"Number of nodes - {len(g.nodes)}")
    print(f"Number of edges - {len(g.edges)}")
    print(g.nodes())
    print(g.edges())
    print("---------------------------------")

---------------------------------
Graph: 0
Number of nodes - 16
Number of edges - 9
['A CHRISTMAS CAROL', 'CHARLES DICKENS', 'EBENEZER SCROOGE', 'MARLEY', 'BOB CRATCHIT', 'TIM CRATCHIT', 'MR. FEZZIWIG', 'FRED', 'GHOST OF CHRISTMAS PAST', 'GHOST OF CHRISTMAS PRESENT', 'GHOST OF CHRISTMAS YET TO COME', 'JACOB MARLEY', 'MRS. CRATCHIT', 'BELLE', 'DICK WILKINS', 'MRS. FEZZIWIG']
[('EBENEZER SCROOGE', 'MARLEY'), ('EBENEZER SCROOGE', 'FRED'), ('EBENEZER SCROOGE', 'BOB CRATCHIT'), ('EBENEZER SCROOGE', 'GHOST OF CHRISTMAS PAST'), ('EBENEZER SCROOGE', 'GHOST OF CHRISTMAS PRESENT'), ('EBENEZER SCROOGE', 'GHOST OF CHRISTMAS YET TO COME'), ('EBENEZER SCROOGE', 'MR. FEZZIWIG'), ('EBENEZER SCROOGE', 'BELLE'), ('BOB CRATCHIT', 'TIM CRATCHIT')]
---------------------------------
---------------------------------
Graph: 1
Number of nodes - 4
Number of edges - 4
['SCROOGE', "SCROOGE'S NEPHEW", 'CHRISTMAS', 'COUNTING-HOUSE']
[('SCROOGE', "SCROOGE'S NEPHEW"), ('SCROOGE', 'CHRISTMAS'), ('SCROOGE', 'COUNTING-

Let's see data for some nodes and edges 

In [7]:
# You will see that every node has `description` and `text_unit_ids` as attributes
text_unit_graphs[0].nodes["EBENEZER SCROOGE"]

{'type': 'PERSON',
 'description': ['Ebenezer Scrooge is the main character in A Christmas Carol, depicted as a miserly old man who undergoes a profound transformation.'],
 'text_unit_ids': ['f28e49bc-5b67-46b3-b971-6d6cb2832790']}

In [8]:
# You will see that every edge has `weight`, `description` and `text_unit_ids` as attributes
text_unit_graphs[0].edges[('EBENEZER SCROOGE', 'MARLEY')]

{'weight': 2.0,
 'description': ["Marley is the ghost of Scrooge's former business partner, who warns him about his selfish ways and the consequences of his actions",
  'Marley warns Scrooge about the chains he will wear if he does not change his ways, establishing a direct connection between their fates'],
 'text_unit_ids': ['f28e49bc-5b67-46b3-b971-6d6cb2832790']}