# Convert JSON to GraphDocument and insert into Neo4j -- Default System Prompt

- With help from LLM but we only limited to small amount of data
- Let LLM create graph schema with `default system Prompt`
- We only show here an example of creating schema of `EntityType="Individual"`, the reader is encourage to repeat the example for `EntityType="Entity"` as excercise



**Important Note:**

```python
# Allowed nodes and relationships
allowed_nodes = ["Person", "Alias", "Address", "Program", "IdentityDocument"]
allowed_relationships = ["HAS_ALIAS", "HAS_ADDRESS","SANCTIONED_BY", "HAS_DOCUMENT" ]

# LLM setup
llm = ChatOpenAI(temperature=0, model_name="gpt-4o")

# LLMGraphTransformer
llm_transformer = LLMGraphTransformer(
    llm=llm,
    allowed_nodes=allowed_nodes,
    allowed_relationships=allowed_relationships,
    node_properties=True,
    relationship_properties=True
)
```


- **Allowed Nodes & Relationships:** We explicitly define which node types (`Person`, `Alias`, `Address`, etc.) and relationship types (`HAS_ALIAS`, `HAS_ADDRESS`, etc.) can be extracted.
- **Property Extraction:** Setting `node_properties=True` and `relationship_properties=True` ensures that the LLM populates relevant attributes for each node and relationship.
- **Controlled Graph Generation:** By restricting the structure, we prevent unwanted or irrelevant node types and relationships from being created.


```python
# **Step 2: Insert new graph data**
graph.add_graph_documents(graph_documents, baseEntityLabel=False, include_source=False)
```

- **`baseEntityLabel=False`** (to prevent unnecessary indexing):  
  - If `True`, adds a secondary `__Entity__` label to every node.  
  - This label is indexed, improving import speed and performance.  
  - **We set it to `False` to keep our database cleaner and avoid extra indexing.**

- **`include_source=False`** (we set it to `False` to avoid `MENTIONS` relationships in our graph):  
  - If `True`, stores the original source document and links it to the created nodes using the `MENTIONS` relationship.  
  - This helps trace back the origin of extracted information.  
  - If no explicit `id` is available in the source metadata, an MD5 hash of `page_content` is used for merging.  
  - **Since we do not want `MENTIONS` in our graph, we explicitly set it to `False`.**




In [3]:
import sys
import os

from dotenv import load_dotenv
sys.path.append(os.path.abspath('..'))
load_dotenv('../.env',override=True)

True

In [4]:
import os
import json
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI
from langchain_core.documents import Document
from langchain_neo4j import Neo4jGraph
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm

# Load environment variables
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
NEO4J_URI = os.getenv("NEO4J_URI", "bolt://localhost:7687")
NEO4J_USER = os.getenv("NEO4J_USER", "neo4j")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD", "password")

# Initialize Neo4jGraph (LangChain handles the connection)
graph = Neo4jGraph(url=NEO4J_URI,username=NEO4J_USER,password=NEO4J_PASSWORD,enhanced_schema=True)


# **Step 1: Delete old graph before inserting new data**
graph.query("MATCH (n) DETACH DELETE n")
print("Old graph deleted.")


# Allowed nodes and relationships
allowed_nodes = ["Person", "Alias", "Address", "Program", "IdentityDocument"]
allowed_relationships = ["HAS_ALIAS", "HAS_ADDRESS","SANCTIONED_BY", "HAS_DOCUMENT" ]

# LLM setup
llm = ChatOpenAI(temperature=0, model_name="gpt-4o")

# LLMGraphTransformer
llm_transformer = LLMGraphTransformer(
    llm=llm,
    allowed_nodes=allowed_nodes,
    allowed_relationships=allowed_relationships,
    node_properties=True,
    relationship_properties=True
)

# Load JSON data
with open("ofac_data_small.json", "r", encoding="utf-8") as f:
    data = json.load(f)["individuals"]

# Function to process text
def process_text(text: str):
    doc = Document(page_content=text)
    return llm_transformer.convert_to_graph_documents([doc])

# Transform data using LLMGraphTransformer with parallelization
graph_documents = []
with ThreadPoolExecutor(max_workers=10) as executor:
    futures = [executor.submit(process_text, json.dumps(entity)) for entity in data]
    for future in tqdm(as_completed(futures), total=len(futures), desc="Processing documents"):
        graph_documents.extend(future.result())

# **Step 2: Insert new graph data**
graph.add_graph_documents(graph_documents, baseEntityLabel=False, include_source=False)

print("New graph data successfully added to Neo4j!")

Old graph deleted.


Processing documents: 100%|██████████| 50/50 [01:26<00:00,  1.74s/it]


New graph data successfully added to Neo4j!
