# Neo4j Hello World (Notebook) - Friends Use Case

This notebook connects to a local Neo4j **Community** instance (via Docker), creates a tiny graph, and queries it.

**Assumes** 
 
 
- Neo4j service is running at `bolt://localhost:${URI_PORT}` with the user and password set in the `.env` file. **Run `docker compose up -d`**.
- Ollama service is up on `http://localhost:11434` (ollama default). **Run `ollama serve` and pull the model `ollama pull nomic-embed-text`** (if not pulled yet).



In [None]:
# Dependencies

import os
from dotenv import load_dotenv  
import yaml
from pathlib import Path
from pprint import pprint
from termcolor import cprint
from langchain_neo4j import Neo4jGraph

from helper_neo4j import vectorize_property
from helper_neo4j import neo4j_KGRAG_search


In [2]:
# Environment variables

load_dotenv()  # Load local environment variables

URI = "bolt://localhost:" + os.environ.get("URI_PORT")
NEO4J_USER = os.environ.get("NEO4J_USER")
NEO4J_PWD = os.environ.get("NEO4J_PASSWORD")
NEO4J_DB = os.getenv("NEO4J_DATABASE", "neo4j")    # 👈 choose DB here

cprint(f"Connecting to Neo4j at {URI} with user {NEO4J_USER} and password {NEO4J_PWD}", "green")

[32mConnecting to Neo4j at bolt://localhost:7687 with user neo4j and password test1234[0m


In [3]:
# Load cypher queries

queries = yaml.safe_load(Path("queries_friends.yaml").read_text())
queries.keys()  # list available queries

dict_keys(['constraints', 'create_seed', 'show_people', 'show_companies', 'match_adjacency', 'show_text', 'add_text', 'create_vector_indexes', 'delete_all'])

In [4]:
# Neo4j Langchain wrapper instance

kg = Neo4jGraph(url=URI, username=NEO4J_USER, password=NEO4J_PWD, database=NEO4J_DB)

### 1. Create data

<p align="center">
  <img src="media/KG_step1_populate_graph.svg" width="550">
</p>


- **Entities**: Person and Company nodes with unique constraints (unique name and uuid)
- **Relationships**: KNOWS (person-to-person) and WORKS_AT (person-to-company)
- **Properties**: Basic attributes (name, age, education, industry)


In [5]:
# Populate graph

cprint(f"\nConnected to Neo4j database: {NEO4J_DB}", "green")

cprint("\nCreating constraints (if not exist)", "green")
for q in queries["constraints"]:
    kg.query(q)

cprint("\nInit Cleanup.", "green")
for q in queries["delete_all"]:
    kg.query(q)
    
cprint("\nCreate data", "green")
kg.query(queries["create_seed"])



[32m
Connected to Neo4j database: neo4j[0m
[32m
Creating constraints (if not exist)[0m
[32m
Init Cleanup.[0m
[32m
Create data[0m


[]

In [6]:
# Example cypher queries

cprint("\nQuery: list all people", "green")
records = kg.query(queries["show_people"]) # <class 'list'>
for r in records:
    print(r)
    
cprint("\nQuery: list all companies", "green")
records = kg.query(queries["show_companies"]) # <class 'list'>
for r in records:
    print(r)

cprint("\nQuery: adjacency (who knows whom)", "green")
records = kg.query(queries["match_adjacency"]) # <class 'list'>
for r in records:
    print(r)

[32m
Query: list all people[0m
{'name': 'Paula', 'age': 25, 'p.gender': 'female', 'education': 'Computer Engineering'}
{'name': 'Guillermo', 'age': 26, 'p.gender': 'male', 'education': 'Industrial Engineering'}
{'name': 'Gabriela', 'age': 26, 'p.gender': 'female', 'education': 'Physics'}
{'name': 'Iria', 'age': 27, 'p.gender': 'female', 'education': 'Physics'}
{'name': 'Cristina', 'age': 27, 'p.gender': 'female', 'education': 'Physics'}
[32m
Query: list all companies[0m
{'name': 'Indra', 'industry': 'Engineering'}
{'name': 'CIEMAT', 'industry': 'Scientific Research'}
{'name': 'CBM', 'industry': 'Scientific Research'}
[32m
Query: adjacency (who knows whom)[0m
{'person': 'Cristina', 'knows': ['Gabriela', 'Iria'], 'works_at': 'CBM'}
{'person': 'Gabriela', 'knows': ['Cristina', 'Iria'], 'works_at': 'CIEMAT'}
{'person': 'Guillermo', 'knows': ['Iria', 'Paula'], 'works_at': 'Indra'}
{'person': 'Iria', 'knows': ['Paula', 'Gabriela', 'Cristina', 'Guillermo'], 'works_at': 'Indra'}
{'person

### 2. Add rich text info

<p align="center">
  <img src="media/KG_step2_generate_rich_descriptions.svg" width="750">
</p>


In [7]:
# Add rich text descriptions 

cprint("\nQuery: Adding descriptions, appearance and summaries", "green")
for q in queries["add_text"]:
    kg.query(q)

records = kg.query(queries["show_text"])
for r in records:
    print(r)


[32m
Query: Adding descriptions, appearance and summaries[0m
{'labels(n)': ['Person'], 'n.name': 'Iria', 'n.text': 'Iria is a female of 27 years old and studied Physics.Iria has blue eyes and long brunette and wavy hair. She likes to paint her nails in red or purple colours. She usually wears long earrings.'}
{'labels(n)': ['Person'], 'n.name': 'Guillermo', 'n.text': 'Guillermo is a male of 26 years old and studied Industrial Engineering.Guillermo has brown eyes and short hair. He has a very fancy shirt that he takes to all important events. He shaved his head this summer.'}
{'labels(n)': ['Person'], 'n.name': 'Gabriela', 'n.text': "Gabriela is a female of 26 years old and studied Physics.Gabriela has long curly hair with babylights. She's petite and likes to wear hippie-style clothes."}
{'labels(n)': ['Person'], 'n.name': 'Paula', 'n.text': 'Paula is a female of 25 years old and studied Computer Engineering.Paula short hair in a wolfcut style. She wears long and wide pants and sneak

### 3. Create property embeddings (first step into RAG) 

<p align="center">
  <img src="media/KG_step3_generate_property_embeddings.svg" width="750">
</p>

**RAG** implementation requires selecting a **property to embed and use for similarity searches**. 

Description properties containing **rich text** work well for this purpose, as they provide richer semantic information. In our example, we'll use *text*.

In order to do so, we create two vector indexes in Neo4j:

- **Vector index *person_node_info_idx***: based on property ***info_emb*** for nodes of type "Person"
-  **Vector index *company_node_info_idx***: based on property ***info_emb*** for nodes of type "Company"

After that, we create the embeddings (this happens for both Person nodes and Company nodes):

- **Property *text*** ---`nomic-embed-text`---> **Property *embedding***


In [8]:
# Create vector indexes

for q in queries["create_vector_indexes"]:
    kg.query(q)

# Show created vector indexes
results = kg.query("SHOW VECTOR INDEXES")
idx = list(results)
cprint(f"\nFound {len(idx)} vector index entries.", "green")
for r in idx:
    cprint("-"*20,"green")
    pprint(r)

[32m
Found 4 vector index entries.[0m
[32m--------------------[0m
{'entityType': 'NODE',
 'id': 5,
 'indexProvider': 'vector-2.0',
 'labelsOrTypes': ['Chunk'],
 'lastRead': neo4j.time.DateTime(2025, 9, 23, 12, 39, 1, 589000000, tzinfo=<UTC>),
 'name': 'chunks_node_text_idx',
 'owningConstraint': None,
 'populationPercent': 100.0,
 'properties': ['embedding'],
 'readCount': 1,
 'state': 'ONLINE',
 'type': 'VECTOR'}
[32m--------------------[0m
{'entityType': 'NODE',
 'id': 16,
 'indexProvider': 'vector-2.0',
 'labelsOrTypes': ['Company'],
 'lastRead': None,
 'name': 'company_node_text_idx',
 'owningConstraint': None,
 'populationPercent': 100.0,
 'properties': ['embedding'],
 'readCount': None,
 'state': 'ONLINE',
 'type': 'VECTOR'}
[32m--------------------[0m
{'entityType': 'RELATIONSHIP',
 'id': 15,
 'indexProvider': 'vector-2.0',
 'labelsOrTypes': ['KNOWS'],
 'lastRead': None,
 'name': 'knows_relationship_text_idx',
 'owningConstraint': None,
 'populationPercent': 100.0,
 'pro

In [None]:
# Create property embeddings 

# (p:PERSON): create embeddings only for nodes missing them
vectorize_property(runner = kg.query,
                   element = "node", 
                   node_label = "Person",
                   source_property = "text"
                   )

# (c:COMPANY): create embeddings only for nodes missing them
vectorize_property(runner = kg.query,
                   element = "node", 
                   node_label = "Company", 
                   source_property = "text",
                   )

# [r:KNOWS]: create embeddings only for nodes missing them
vectorize_property(runner = kg.query,
                   element = "relationship",
                   rel_type = "KNOWS",
                   source_property = "text"
                   )

[32m
Generating embeddings for (n:Person) on n.text[0m
[32m
Generating embeddings[0m
  input text: 'Iria is a female of 27 years old and studied Physi'...
  emb vec: [0.031421136, 0.032104157, -0.15766615, -0.02647035, -0.007289977, 0.053833045, 0.028187405, -0.04293681, 0.020300867, 0.038549334]

[32m
Generating embeddings[0m
  input text: 'Guillermo is a male of 26 years old and studied In'...
  emb vec: [-0.023262369, 0.05322417, -0.15030223, 0.0006475813, -0.024272997, 0.068250276, 0.06251268, -0.028424125, -0.00060151465, 0.016591795]

[32m
Generating embeddings[0m
  input text: 'Gabriela is a female of 26 years old and studied P'...
  emb vec: [0.028215451, 0.02993047, -0.18000375, 0.032270715, -0.040491913, 0.081315376, 0.0066461083, -0.050384577, -0.04074791, -0.018135754]

[32m
Generating embeddings[0m
  input text: 'Paula is a female of 25 years old and studied Comp'...
  emb vec: [0.011417965, 0.040603343, -0.16852917, -0.01312405, -0.036922883, 0.08387193, -0.0141

### 4. Search 

Whenever we query this graph, we can use two different but complementary search techniques:

1. **KG Retreival**: through **Neo4J Cypher Query Language (CQL)** we can query precise entities and relations. The input query must be translated into CQL to get the desired results.

2. **Vector Retrieval**: **embedding the input query**, we can make a vector search against the vector indexes defined above.

The results will be a combination of both searches.

<p align="center">
  <img src="media/KGRAG_schema.svg">
</p>


In [None]:
# KG RAG Search

# Query Nodes
result = neo4j_KGRAG_search(runner = kg.query,
                             element = "node",
                             query = "Who shaved its head this summer?", 
                             index = "person_node_text_idx",
                             source_property = "text",
                             top_k = 5
                             )
pprint(result)


result  = neo4j_KGRAG_search(runner = kg.query,
                                element = "node",
                                query = "Which company investigates Cancer?",
                                index = "company_node_text_idx",
                                source_property = "text",
                                top_k = 5
                              )
pprint(result)


# Query Relationships
result  = neo4j_KGRAG_search(runner = kg.query,
                              element = "relationship",
                              query = "Who is helping Iria at work?",
                              index = "knows_relationship_text_idx",
                              source_property = "text",
                              top_k = 5
                              )
pprint(result)


[32m
Generating embeddings[0m
  input text: 'Who shaved its head this summer?'...
  emb vec: [0.022573026, -0.015476549, -0.17212495, 0.0018683294, -0.038688686, 0.044817436, 0.01842398, 0.013745231, 0.050375726, 0.008345599]

[32m
Running vector search query[0m
{'combined_context': '\n'
                     '\n'
                     ' Guillermo is a male of 26 years old and studied '
                     'Industrial Engineering.Guillermo has brown eyes and '
                     'short hair. He has a very fancy shirt that he takes to '
                     'all important events. He shaved his head this summer.\n'
                     '\n'
                     ' Gabriela is a female of 26 years old and studied '
                     'Physics.Gabriela has long curly hair with babylights. '
                     "She's petite and likes to wear hippie-style clothes.\n"
                     '\n'
                     ' Cristina is a female of 27 years old and studied '
                  