- In this notebook, we walk through using Neo4j, Ollama, and HuggingFace to build a property graph 
- Specifically, we will be using the `SchemaLLMPathExtractor` which allows us to specify an \
exact schema containing possible entity types, relation types, and defining how they can be \
connected together \
- This is useful for when you have a specific graph you want to build, and want to limit what the LLM \
is predicting

### Setup

In [1]:
%pip install llama-index
%pip install llama-index-llms-ollama
%pip install llama-index-embeddings-huggingface
# Optional
%pip install llama-index-graph-stores-neo4j
%pip install llama-index-graph-stores-nebula

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting llama-index
  Downloading llama_index-0.12.43-py3-none-any.whl.metadata (12 kB)
Collecting llama-index-agent-openai<0.5,>=0.4.0 (from llama-index)
  Downloading llama_index_agent_openai-0.4.11-py3-none-any.whl.metadata (439 bytes)
Collecting llama-index-cli<0.5,>=0.4.2 (from llama-index)
  Downloading llama_index_cli-0.4.3-py3-none-any.whl.metadata (1.4 kB)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.7.7-py3-none-any.whl.metadata (3.3 kB)
Collecting llama-index-multi-modal-llms-openai<0.6,>=0.5.0 (from llama-index)
  Downloading llama_index_multi_modal_llms_openai-0.5.1-py3-none-any.whl.metadata (440 bytes)
Collecting llama-index-program-openai<0.4,>=0.3.0 (from llama-index)
  Downloading llama_index_program_openai-0.3.2-py3-none-any.whl.metadata (473 bytes)
Collecting llama-index-question-gen-openai<0.4,>=0.3.0 (fr

### Load Data:

In [2]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2025-06-23 17:59:39--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2025-06-23 17:59:40 (2,05 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



In [15]:
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

### Graph Construction:
- To construct our graph, we are going to take advantage of the `SchemaLLMPathExtractor` to construct our graph
- Given some schema for a graph, we can extract entities and relations that follow this schema, rather than letting the LLM decide entities and relations at random

In [16]:
import nest_asyncio

nest_asyncio.apply()

In [17]:
from typing import Literal
from llama_index.llms.ollama import Ollama
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

# best practice to use upper-case
entities = Literal['PERSON', 'PLACE', 'ORGANIZATION']
relations = Literal['HAS', 'PART_OF', 'WORKED_ON', 'WORKED_WITH', 'WORKED_AT']

# define which entities can have which relations
validation_schema = {
    'PERSON': ['HAS', 'PART_OF', 'WORKED_ON', 'WORKED_WITH', 'WORKED_AT'],
    'PLACE': ['HAS', 'PART_OF', 'WORKED_AT'],
    'ORGANIZATION': ['HAS', 'PART_OF', 'WORKED_WITH'],
}

kg_extractor = SchemaLLMPathExtractor(
    llm = llm,
    possible_entities = entities,
    possible_relations = relations,
    kg_validation_schema = validation_schema,
    # if false, allows for values outside of the schema
    # useful for using the schema as a suggestion
    strict = True,
)

In [6]:
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore

pg_store = Neo4jPropertyGraphStore(
    username='neo4j',
    password='llamaindex',
    url="bolt://localhost:7689",
    database='nd168.ver1'
)
vec_store = None


In [7]:
import os
import sys
from pathlib import Path
sys.path.append(str(Path(os.getcwd()).parent.parent))

from dotenv import load_dotenv
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

load_dotenv(override=True)

llm = OpenAI(
    model="gpt-4o-mini",
    api_key=os.getenv('OPENAI_API_KEY'),  # uses OPENAI_API_KEY env var by default
)
embed_model = OpenAIEmbedding(
    model='text-embedding-3-large',
    api_key=os.getenv('OPENAI_API_KEY')
)


In [8]:
from llama_index.core import PropertyGraphIndex

index = PropertyGraphIndex.from_documents(
    documents, 
    kg_extractors=[kg_extractor],
    embed_model=embed_model,
    property_graph_store=pg_store,
    vector_store=vec_store
)

NameError: name 'documents' is not defined

### Querying:
- Now that our graph is created, we can create it
- As is the theme with this notebook, we will be using a lower-level API and constructing all our retriever ourselves

In [5]:
from llama_index.core.indices.property_graph import (
    LLMSynonymRetriever,
    VectorContextRetriever,
)

llm_synonym = LLMSynonymRetriever(
    index.property_graph_store,
    llm=llm,
    include_text=False,
)

vector_context = VectorContextRetriever(
    index.property_graph_store,
    embed_model=embed_model,
    include_text=False
)

In [22]:
retriever = index.as_retriever(
    sub_retrievers=[
        llm_synonym,
        vector_context,
    ]
)

In [24]:
nodes = retriever.retrieve("What happened at Interleaf?")

for node in nodes:
    print(node)

Node ID: ad875c75-54aa-4edb-b739-e693e940fb8f
Text: Interleaf -> HAS -> Microsoft Word
Score:  1.000

Node ID: 8fd4805a-9524-4f11-a7eb-0ebbf3fff8e6
Text: Interleaf -> WORKED_WITH -> smart people
Score:  1.000

Node ID: 08cab4b8-cc7c-41b1-931b-bb1bfa495e44
Text: Interleaf -> HAS -> Lisp hacker
Score:  1.000

Node ID: 9db5bd1c-28cc-469c-884a-57e8ef6fe120
Text: Interleaf -> HAS -> HTML
Score:  1.000

Node ID: df9c77fc-b91a-4864-92f7-39a8437f45e8
Text: Paul Graham -> WORKED_AT -> Interleaf
Score:  1.000

Node ID: e974e3fb-8c28-4e1f-b18b-d806d70b05bd
Text: Jessica Livingston -> WORKED_AT -> Boston VC firm
Score:  0.613

Node ID: e5fae6af-c90f-4650-acd3-787b058a9f4f
Text: Paul Graham -> WORKED_WITH -> Yahoo's boss
Score:  0.600

Node ID: 09c6e51a-8607-495c-b146-51d1ea549ca3
Text: Paul Graham -> WORKED_ON -> Apple II
Score:  0.600



#### Load Graph from existing graph

In [2]:
from llama_index.core import PropertyGraphIndex

index = PropertyGraphIndex.from_existing(
    property_graph_store=pg_store
)

In [6]:
query_engine = index.as_query_engine(
    sub_retrievers=[
        llm_synonym,
        vector_context,
    ],
    llm=llm
)

response = query_engine.query("What happened at Interleaf?")

print(str(response))

Interleaf was associated with Microsoft Word and had a team that included smart individuals and a Lisp hacker. Paul Graham worked there and was involved in projects related to the Apple II.
