# Neo4j Hello World (Notebook) - Friends Use Case

This notebook connects to a local Neo4j **Community** instance (via Docker), creates a tiny graph, and queries it.

**Assumes** 
 
 
- Neo4j service is running at `bolt://localhost:${URI_PORT}` with the user and password set in the `.env` file. **Run `docker compose up -d`**.
- Ollama service is up on `http://localhost:11434` (ollama default). **Run `ollama serve` and pull the model `ollama pull nomic-embed-text`** (if not pulled yet).



In [None]:
# Dependencies

import os
from dotenv import load_dotenv  
import yaml
from pathlib import Path
from pprint import pprint
from termcolor import cprint
from langchain_neo4j import Neo4jGraph
from neo4j import GraphDatabase

from helpers import helper_folium, helper_leaflet, helper_neo4j


In [None]:
# Environment variables

load_dotenv()  # Load local environment variables

URI = "bolt://localhost:" + os.environ.get("URI_PORT")
NEO4J_USER = os.environ.get("NEO4J_USER")
NEO4J_PWD = os.environ.get("NEO4J_PASSWORD")
NEO4J_DB = os.getenv("NEO4J_DATABASE", "neo4j")    # 👈 choose DB here

cprint(f"Connecting to Neo4j at {URI}.", "green")

In [None]:
# Load cypher queries

queries = yaml.safe_load(Path("data/friends/queries_friends.yaml").read_text())
queries.keys()  # list available queries

In [None]:
# Neo4j Langchain wrapper instance
kg = Neo4jGraph(url=URI, username=NEO4J_USER, password=NEO4J_PWD, database=NEO4J_DB)

# Alternative: Neo4j Driver instance
driver = GraphDatabase.driver(uri=URI, auth=(NEO4J_USER, NEO4J_PWD))
with driver.session(database=NEO4J_DB) as session:
    
    dbinfo = session.run("CALL db.info()").single()

### 1. Create data

<p align="center">
  <img src="media/KG_step1_populate_graph.svg" width="550">
</p>


- **Entities**: Person and Company nodes with unique constraints (unique name and uuid)
- **Relationships**: KNOWS (person-to-person) and WORKS_AT (person-to-company)
- **Properties**: Basic attributes (name, age, education, industry)


In [None]:
# Populate graph

cprint(f"\nConnected to Neo4j database: {NEO4J_DB}", "green")

cprint("\nCreating constraints (if not exist)", "green")
for q in queries["constraints"]:
    kg.query(q)

cprint("\nInit Cleanup.", "green")
for q in queries["delete_all"]:
    kg.query(q)
    
cprint("\nCreate data", "green")
kg.query(queries["create_seed"])



In [None]:
# Example cypher queries

cprint("\nQuery: list all people", "green")
records = kg.query(queries["show_people"]) # <class 'list'>
for r in records:
    print(r)
    
cprint("\nQuery: list all companies", "green")
records = kg.query(queries["show_companies"]) # <class 'list'>
for r in records:
    print(r)

cprint("\nQuery: adjacency (who knows whom)", "green")
records = kg.query(queries["match_adjacency"]) # <class 'list'>
for r in records:
    print(r)

### 2.A Add rich text info

<p align="center">
  <img src="media/KG_step2_generate_rich_descriptions.svg" width="750">
</p>


In [None]:
# Add rich text descriptions 

cprint("\nQuery: Adding descriptions, appearance and summaries", "green")
for q in queries["add_text"]:
    kg.query(q)


In [None]:
# Show descriptions
records = kg.query(queries["show_text"])
for r in records:
    print(r)

### 2.B Add location info

In [None]:
# Add location property 

cprint("\nQuery: Adding location property", "green")
for q in queries["add_locations"]:
    kg.query(q)

In [None]:
# Show locations and plot maps
records = kg.query(queries["show_locations"])
# Replace with your query result rows
# records = [
#     {"name":"Iria","lat":40.437596,"lon":-3.711223,"labels":["Person"]},
#     {"name":"Guillermo","lat":40.455022,"lon":-3.692355,"labels":["Person"]},
#     {"name":"Gabriela","lat":40.475721,"lon":-3.711451,"labels":["Person"]},
#     {"name":"Paula","lat":40.490170,"lon":-3.654654,"labels":["Person"]},
#     {"name":"Cristina","lat":40.367462,"lon":-3.597745,"labels":["Person"]},
#     {"name":"Indra","lat":40.396648,"lon":-3.624635,"labels":["Company"]},
#     {"name":"CIEMAT","lat":40.453938,"lon":-3.728925,"labels":["Company"]},
#     {"name":"CBM","lat":40.549613,"lon":-3.690136,"labels":["Company"]},
# ]
for r in records:
    print(r)

# Follium map
helper_folium.create_map_from_rows(records)

# Leaflet map
helper_leaflet.create_map_from_rows(records)

### 3. Create property embeddings (first step into RAG) 

<p align="center">
  <img src="media/KG_step3_generate_property_embeddings.svg" width="750">
</p>

**RAG** implementation requires selecting a **property to embed and use for similarity searches**. 

Description properties containing **rich text** work well for this purpose, as they provide richer semantic information. In our example, we'll use *text*.

In order to do so, we create two vector indexes in Neo4j:

- **Vector index *person_node_info_idx***: based on property ***info_emb*** for nodes of type "Person"
-  **Vector index *company_node_info_idx***: based on property ***info_emb*** for nodes of type "Company"

After that, we create the embeddings (this happens for both Person nodes and Company nodes):

- **Property *text*** ---`nomic-embed-text`---> **Property *embedding***


In [None]:
# Create vector indexes

for q in queries["create_vector_indexes"]:
    kg.query(q)

# Show created vector indexes
results = kg.query("SHOW VECTOR INDEXES")
idx = list(results)
cprint(f"\nFound {len(idx)} vector index entries.", "green")
for r in idx:
    cprint("-"*20,"green")
    pprint(r)

In [None]:
# Create property embeddings 

# (p:PERSON): create embeddings only for nodes missing them
helper_neo4j.vectorize_property(runner = kg.query,
                   element = "node", 
                   node_label = "Person",
                   source_property = "text"
                   )

# (c:COMPANY): create embeddings only for nodes missing them
helper_neo4j.vectorize_property(runner = kg.query,
                   element = "node", 
                   node_label = "Company", 
                   source_property = "text",
                   )

# [r:KNOWS]: create embeddings only for nodes missing them
helper_neo4j.vectorize_property(runner = kg.query,
                   element = "relationship",
                   rel_type = "KNOWS",
                   source_property = "text"
                   )

### 4. Search 

Whenever we query this graph, we can use two different but complementary search techniques:

1. **KG Retreival**: through **Neo4J Cypher Query Language (CQL)** we can query precise entities and relations. The input query must be translated into CQL to get the desired results.

2. **Vector Retrieval**: **embedding the input query**, we can make a vector search against the vector indexes defined above.

The results will be a combination of both searches.

<p align="center">
  <img src="media/KGRAG_schema.svg">
</p>


In [None]:
# KG RAG Search

# Query Nodes
result = helper_neo4j.neo4j_KGRAG_search(runner = kg.query,
                             query = "Who shaved its head this summer?", 
                             index = "person_node_idx",
                             source_property = "text",
                             main_property = "name",
                             top_k = 5
                             )
pprint(result, width = 200, sort_dicts=False, indent=2)
file = "data/friends/friends_context_1.txt"
with open(file, 'w', encoding='utf-8') as f:
  f.write(result.get("combined_context", ""))

result  = helper_neo4j.neo4j_KGRAG_search(runner = kg.query,
                                query = "Which company investigates Cancer?",
                                index = "company_node_idx",
                                source_property = "text",
                                main_property = "name",
                                top_k = 5
                              )
pprint(result, width = 200, sort_dicts=False, indent=2)
file = "data/friends/friends_context_2.txt"
with open(file, 'w', encoding='utf-8') as f:
  f.write(result.get("combined_context", ""))

In [None]:
# Query Relationships
result  = helper_neo4j.neo4j_KGRAG_search(runner = kg.query,
                              query = "Who is helping Iria at work?",
                              index = "know_relationship_idx",
                              source_property = "text",
                              main_property = "name",
                              top_k = 5
                              )
pprint(result, width = 200, sort_dicts=False)

file = "data/friends/friends_context_3.txt"
with open(file, 'w', encoding='utf-8') as f:
  f.write(result.get("combined_context", ""))
