# Neo4j Semantic Search Pipeline

This notebook implements a semantic search pipeline using Neo4j and Sentence Transformers. It covers:
1.  **Data Loading & Embedding:** Constructing semantic text from flight data and generating embeddings using three different models.
2.  **Index Creation:** Creating vector indices in Neo4j.
3.  **Search:** Executing a similarity search.

In [2]:
# Install necessary dependencies
!pip install neo4j sentence-transformers python-dotenv

Collecting neo4j
  Downloading neo4j-6.0.3-py3-none-any.whl.metadata (5.2 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collec

In [None]:
import os
os.environ['NEO4J_URI'] = '---'
os.environ['NEO4J_USERNAME'] = '---'
os.environ['NEO4J_PASSWORD'] = '---'

## Part 1: Load Models & Process Data
This step fetches journey data, creates descriptive text, generates embeddings (MiniLM, MPNet, BGE-M3), and stores them in `JourneyVector` nodes.

In [5]:
from neo4j import GraphDatabase
from sentence_transformers import SentenceTransformer
import os

# ------------------------------
# 1. Load Embedding Models
# ------------------------------
print("Loading embedding models...")
# Loading all three as requested
model_minilm = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
model_mpnet = SentenceTransformer("sentence-transformers/paraphrase-mpnet-base-v2")
model_bge_m3 = SentenceTransformer("BAAI/bge-m3")

# ------------------------------
# 2. Neo4j Connection
# ------------------------------
uri = os.getenv('NEO4J_URI')
username = os.getenv('NEO4J_USERNAME')
password = os.getenv('NEO4J_PASSWORD')

driver = GraphDatabase.driver(uri, auth=(username, password))

# ------------------------------
# Helper: Semantic Text Builder
# ------------------------------
def build_semantic_text(record):
    """
    Constructs a qualitative narrative using the provided airline metrics.
    """
    # --- Delay Context ---
    delay = record['arrival_delay_minutes']
    # Metrics: avg ~ -1.4, max ~ 880. >60 is definitely severe.
    if delay <= 0:
        delay_desc = f"arrived early by {abs(delay)} minutes"
        punctuality = "highly punctual"
    elif delay <= 15:
        delay_desc = f"was roughly on time ({delay} min delay)"
        punctuality = "punctual"
    elif delay <= 60:
        delay_desc = f"had a moderate delay of {delay} minutes"
        punctuality = "delayed"
    else:
        delay_desc = f"suffered a severe delay of {delay} minutes"
        punctuality = "severely delayed"

    # --- Food Context ---
    # Metrics: avg ~ 2.8, max 5.
    score = record['food_satisfaction_score']
    if score <= 2:
        food_desc = "poor dining experience"
    elif score == 3:
        food_desc = "average dining experience"
    else:
        food_desc = "excellent dining experience"

    # --- Distance Context ---
    # Metrics: avg ~ 2237, max ~ 8440.
    miles = record['actual_flown_miles']
    if miles < 1000:
        haul = "short-haul"
    elif miles < 4000:
        haul = "medium-haul"
    else:
        haul = "long-haul"

    # --- Construct the Sentence ---
    # We embed "Concept" + "Details"
    text = (
        f"A {punctuality} {haul} flight. "
        f"The {record['passenger_class']} journey covered {miles} miles on a {record['fleet_type_description']} aircraft. "
        f"It {delay_desc}. "
        f"The passenger (Generation: {record['generation']}, Status: {record['loyalty_program_level']}) "
        f"reported a {food_desc} with a rating of {score}/5. "
        f"Route: {record['origin']} to {record['destination']}."
    )
    return text

# ------------------------------
# 3. Processing Pipeline
# ------------------------------
def process(tx):
    print("Fetching journey data...")
    # We grab the ID to link specifically to this Journey
    result = tx.run("""
        MATCH (p:Passenger)-[:TOOK]->(j:Journey)-[:ON]->(f:Flight)
        MATCH (f)-[:DEPARTS_FROM]->(dep:Airport)
        MATCH (f)-[:ARRIVES_AT]->(arr:Airport)
        RETURN
            j.feedback_ID AS feedback_ID,
            p.record_locator AS record_locator,
            p.generation AS generation,
            p.loyalty_program_level AS loyalty_program_level,
            j.food_satisfaction_score AS food_satisfaction_score,
            j.arrival_delay_minutes AS arrival_delay_minutes,
            j.actual_flown_miles AS actual_flown_miles,
            j.passenger_class AS passenger_class,
            f.fleet_type_description AS fleet_type_description,
            dep.station_code AS origin,
            arr.station_code AS destination
    """)
    
    records = list(result)
    print(f"Found {len(records)} journeys to embed.")

    for i, row in enumerate(records):
        # 1. Build rich text
        text = build_semantic_text(row)

        # 2. Generate embeddings
        emb_minilm = model_minilm.encode(text).tolist()
        emb_mpnet = model_mpnet.encode(text).tolist()
        emb_bge_m3 = model_bge_m3.encode(text).tolist()

        # 3. Store in SEPARATE Node (:JourneyVector)
        # We link via feedback_ID.
        tx.run("""
            MATCH (j:Journey {feedback_ID: $fid})
            
            // Create/Merge the Vector Node
            // We ID it by the journey ID + suffix to ensure 1:1 mapping
            MERGE (jv:JourneyVector {id: $fid + '_vec'})
            ON CREATE SET 
                jv.text = $text,
                jv.minilm_embedding = $e1,
                jv.mpnet_embedding = $e2,
                jv.bgem3_embedding = $e3
            ON MATCH SET
                jv.text = $text,
                jv.minilm_embedding = $e1,
                jv.mpnet_embedding = $e2,
                jv.bgem3_embedding = $e3
            
            // Link it
            MERGE (j)-[:HAS_VECTOR]->(jv)
        """, 
        fid=row['feedback_ID'],
        text=text,
        e1=emb_minilm, 
        e2=emb_mpnet, 
        e3=emb_bge_m3)

        if i % 50 == 0:
            print(f"Processed {i}/{len(records)}...")

with driver.session() as session:
    session.execute_write(process)

print("Done! Vectors stored in 'JourneyVector' nodes.")

Loading embedding models...
Fetching journey data...
Found 3132 journeys to embed.
Processed 0/3132...
Processed 50/3132...
Processed 100/3132...
Processed 150/3132...
Processed 200/3132...
Processed 250/3132...
Processed 300/3132...
Processed 350/3132...
Processed 400/3132...
Processed 450/3132...
Processed 500/3132...
Processed 550/3132...
Processed 600/3132...
Processed 650/3132...
Processed 700/3132...
Processed 750/3132...
Processed 800/3132...
Processed 850/3132...
Processed 900/3132...
Processed 950/3132...
Processed 1000/3132...
Processed 1050/3132...
Processed 1100/3132...
Processed 1150/3132...
Processed 1200/3132...
Processed 1250/3132...
Processed 1300/3132...
Processed 1350/3132...
Processed 1400/3132...
Processed 1450/3132...
Processed 1500/3132...
Processed 1550/3132...
Processed 1600/3132...
Processed 1650/3132...
Processed 1700/3132...
Processed 1750/3132...
Processed 1800/3132...
Processed 1850/3132...
Processed 1900/3132...
Processed 1950/3132...
Processed 2000/3132.

## Part 2: Create Indices
Creates Vector Indices in Neo4j for the three different embedding models.

In [6]:
from neo4j import GraphDatabase
import os

uri = os.getenv('NEO4J_URI')
username = os.getenv('NEO4J_USERNAME')
password = os.getenv('NEO4J_PASSWORD')

driver = GraphDatabase.driver(uri, auth=(username, password))

def create_indices():
    with driver.session() as session:
        print("Creating indices on :JourneyVector...")

        # 1. MiniLM
        session.run("""
            CREATE VECTOR INDEX minilm_vec_index IF NOT EXISTS
            FOR (n:JourneyVector) ON (n.minilm_embedding)
            OPTIONS {indexConfig: {`vector.dimensions`: 384, `vector.similarity_function`: 'cosine'}}
        """)
        
        # 2. MPNet
        session.run("""
            CREATE VECTOR INDEX mpnet_vec_index IF NOT EXISTS
            FOR (n:JourneyVector) ON (n.mpnet_embedding)
            OPTIONS {indexConfig: {`vector.dimensions`: 768, `vector.similarity_function`: 'cosine'}}
        """)

        # 3. BGE-M3
        session.run("""
            CREATE VECTOR INDEX bgem3_vec_index IF NOT EXISTS
            FOR (n:JourneyVector) ON (n.bgem3_embedding)
            OPTIONS {indexConfig: {`vector.dimensions`: 1024, `vector.similarity_function`: 'cosine'}}
        """)

        print("Indices created successfully.")

if __name__ == "__main__":
    create_indices()

Creating indices on :JourneyVector...
Indices created successfully.


## Part 3: Semantic Search Test
Performs a test search using the MiniLM model.

In [15]:
from neo4j import GraphDatabase
from sentence_transformers import SentenceTransformer
import os 
from dotenv import load_dotenv

load_dotenv()

# We'll test with MiniLM for speed
model = SentenceTransformer("BAAI/bge-m3")

uri = os.getenv('NEO4J_URI', 'neo4j://localhost:7687')
username = os.getenv('NEO4J_USERNAME', 'neo4j')
password = os.getenv('NEO4J_PASSWORD', 'password')
driver = GraphDatabase.driver(uri, auth=(username, password))

def search(query, top_k=3):
    embedding = model.encode(query).tolist()
    
    cypher = """
    CALL db.index.vector.queryNodes('bgem3_vec_index', $k, $vec)
    YIELD node, score
    
    MATCH (j:Journey)-[:HAS_VECTOR]->(node)
    MATCH (p:Passenger)-[:TOOK]->(j)
    
    RETURN 
        score,
        node.text AS semantic_text,
        j.feedback_ID AS feedback_id,
        j.arrival_delay_minutes AS actual_delay,
        j.food_satisfaction_score AS actual_food
    """
    
    with driver.session() as session:
        result = session.run(cypher, k=top_k, vec=embedding)
        return [dict(r) for r in result]

if __name__ == "__main__":
    print("--- Testing Semantic Search ---")
    q = "big delays and bad food"
    print(f"Query: '{q}'")
    
    results = search(q)
    for r in results:
        print(f"\nScore: {r['score']:.4f}")
        print(f"Text: {r['semantic_text']}")
        print(f"DB Check -> Delay: {r['actual_delay']}, Food: {r['actual_food']}")

--- Testing Semantic Search ---
Query: 'big delays and bad food'

Score: 0.8155
Text: A severely delayed short-haul flight. The Economy journey covered 126 miles on a CRJ-700 aircraft. It suffered a severe delay of 76 minutes. The passenger (Generation: Boomer, Status: non-elite) reported a poor dining experience with a rating of 1/5. Route: ASX to DEX.
DB Check -> Delay: 76, Food: 1

Score: 0.8137
Text: A severely delayed short-haul flight. The Economy journey covered 190 miles on a ERJ-175 aircraft. It suffered a severe delay of 125 minutes. The passenger (Generation: Boomer, Status: non-elite) reported a poor dining experience with a rating of 2/5. Route: SBX to SFX.
DB Check -> Delay: 125, Food: 2

Score: 0.8128
Text: A severely delayed short-haul flight. The Economy journey covered 372 miles on a A320-200 aircraft. It suffered a severe delay of 90 minutes. The passenger (Generation: Boomer, Status: non-elite) reported a poor dining experience with a rating of 1/5. Route: SFX to SN

## Comparitive Analysis

In [21]:
from neo4j import GraphDatabase
from sentence_transformers import SentenceTransformer
import os
from dotenv import load_dotenv

load_dotenv()

# ---------------------------------------------------------
# 1. SETUP: Load All 3 Models
# ---------------------------------------------------------
print("Loading models... (This might take a minute)")
models = {
    "minilm": {
        "model": SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2"),
        "index": "minilm_vec_index"
    },
    "mpnet": {
        "model": SentenceTransformer("sentence-transformers/paraphrase-mpnet-base-v2"),
        "index": "mpnet_vec_index"
    },
    "bge-m3": {
        "model": SentenceTransformer("BAAI/bge-m3"),
        "index": "bgem3_vec_index"
    }
}

uri = os.getenv('NEO4J_URI', 'neo4j://localhost:7687')
username = os.getenv('NEO4J_USERNAME', 'neo4j')
password = os.getenv('NEO4J_PASSWORD', 'password')
driver = GraphDatabase.driver(uri, auth=(username, password))


QUESTIONS = [
    "Q1: flights with severe delays and terrible food",
    "Q2: excellent dining experience on a short flight",
    "Q3: unhappy passengers on Boeing aircraft",
    "Q4: Millennial generation complaining about delays",
    "Q5: Premier Gold members with poor satisfaction",
    "Q6: The food was great but the flight was late",
    "Q7: long haul flights that arrived early",
    "Q8: Economy class passengers who had a good time",
    "Q9: nightmare journey with huge delay over 3 hours",
    "Q10: smooth trip with no issues",
    "Q11: Show me the delay for a flight out of JNX", 
    "Q12: Boomers who had excellent food", 
    "Q13: Satisfied Premier Gold members in Economy", 
    "Q14: Find all flights from JNX to EWX longer than 2000 miles.", 
    "Q15: Gen X passengers flying on the A320-200", 
    "Q16: Flights out of SEX", 
    "Q17: Flights from SEX to IAX"
    
]

# ---------------------------------------------------------
# 3. SEARCH LOGIC
# ---------------------------------------------------------
def search(query, model_key, top_k=3):
    model_obj = models[model_key]["model"]
    index_name = models[model_key]["index"]
    
    embedding = model_obj.encode(query).tolist()
    
    cypher = f"""
    CALL db.index.vector.queryNodes('{index_name}', $k, $vec)
    YIELD node, score
    
    MATCH (j:Journey)-[:HAS_VECTOR]->(node)
    
    RETURN 
        score,
        node.text AS semantic_text,
        j.arrival_delay_minutes AS actual_delay,
        j.food_satisfaction_score AS actual_food
    """
    
    with driver.session() as session:
        result = session.run(cypher, k=top_k, vec=embedding)
        return [dict(r) for r in result]

# ---------------------------------------------------------
# 4. EXECUTION LOOP
# ---------------------------------------------------------
if __name__ == "__main__":
    print("\n=== STARTING 3-MODEL COMPARISON (Top 3 Results) ===\n")

    for q in QUESTIONS:
        print(f"_"*80)
        print(f"QUERY: {q}")
        print(f"_"*80)
        
        for model_name in ["minilm", "mpnet", "bge-m3"]:
            print(f"\n--- MODEL: {model_name.upper()} ---")
            try:
                results = search(q, model_name, top_k=3)
                
                if not results:
                    print("  No results found.")
                    continue

                for i, r in enumerate(results):
                    print(f"  #{i+1} [Score: {r['score']:.4f}]")
                    # Truncate text to keep output clean
                    clean_text = r['semantic_text']
                    print(f"     Text: \"{clean_text}\"")
                    print(f"     Stats: Delay={r['actual_delay']}min | Food={r['actual_food']}/5")
                
            except Exception as e:
                print(f"  Error: {e}")
        
        print("\n")

Loading models... (This might take a minute)

=== STARTING 3-MODEL COMPARISON (Top 3 Results) ===

________________________________________________________________________________
QUERY: Q1: flights with severe delays and terrible food
________________________________________________________________________________

--- MODEL: MINILM ---
  #1 [Score: 0.7914]
     Text: "A severely delayed medium-haul flight. The Economy journey covered 1745 miles on a B757-300 aircraft. It suffered a severe delay of 162 minutes. The passenger (Generation: Gen Z, Status: non-elite) reported a poor dining experience with a rating of 1/5. Route: LAX to ORX."
     Stats: Delay=162min | Food=1/5
  #2 [Score: 0.7859]
     Text: "A severely delayed medium-haul flight. The Economy journey covered 1754 miles on a B737-900 aircraft. It suffered a severe delay of 61 minutes. The passenger (Generation: Boomer, Status: non-elite) reported a poor dining experience with a rating of 1/5. Route: DEX to BOX."
     Stats

### Executive Summary
After testing 3 embedding models against 17 distinct airline-related queries, **BAAI/bge-m3 (BGE-M3)** emerged as the superior model for this specific use case. While **MPNet** showed excellent semantic understanding of natural language concepts (like "short-haul" vs "long-haul"), **BGE-M3** was the only model capable of reliably capturing specific entities like Airport Codes (`SEX`, `JNX`) while maintaining high semantic accuracy. **MiniLM**, while likely the fastest, struggled significantly with numerical context and specific entity linking.

### Model Leaderboard

| Rank | Model | Strength | Weakness | Recommendation |
| :--- | :--- | :--- | :--- | :--- |
| 🥇 | **BGE-M3** | **Hybrid Capabilities:** Excellent at both semantic nuances ("bad food") and exact keywords (Airport Codes). | Slower inference than MiniLM. | **Primary Choice** |
| 🥈 | **MPNet** | **Semantic Depth:** Best understanding of length ("short-haul") and strong sentiment correlation. | **Keyword Blindness:** Completely failed to recognize specific airport codes (e.g., Q16). | Backup Choice |
| 🥉 | **MiniLM** | **Speed:** Likely the fastest inference. | **Low Fidelity:** Confused "Medium Haul" for "Short Haul"; frequently missed negative sentiment in complex queries. | Not Recommended |

---

### Detailed Analysis by Query Category

#### 1. Semantic Understanding (Sentiment & Length)
*Tests: Q1, Q2, Q4, Q7, Q9*

* **The Task:** Interpret "short flight", "long haul", "nightmare", and "terrible food".
* **MPNet & BGE-M3 (Tie):** Both models correctly distinguished between "Short-Haul" (<1000 miles) and "Long-Haul" (>4000 miles). For Q2 ("short flight"), MPNet found flights under 900 miles, whereas MiniLM retrieved "Medium-Haul" flights over 1700 miles, failing the user intent.
* **MiniLM (Fail):** In Q8 ("Economy... good time"), MiniLM retrieved a flight with a **2/5** food rating and called it a match. It struggled to separate the concept of "flight" from the specific sentiment "good".

#### 2. Entity & Keyword Recognition (Airport Codes)
*Tests: Q11, Q16, Q17*

* **The Task:** "Flights out of SEX" (Station Code). This is the hardest task for pure dense retrievers.
* **BGE-M3 (Winner):** It was the **only** model to successfully map "out of SEX" to `Origin: SEX` (Q16).
* **MPNet & MiniLM (Fail):** Both models treated "SEX" and "JNX" as generic noise, retrieving flights from random airports like `PHX`, `LHX`, or `BOX`.
* **Why this matters:** Users frequently search by airport code. Only BGE-M3 demonstrated the "Lexical + Semantic" hybrid behavior needed for this.

#### 3. Complex Logic & Edge Cases
*Tests: Q6, Q9, Q14*

* **The Task:** "Food was great but flight was late" (Mixed Sentiment) and "Delay > 3 hours" (Numerical reasoning).
* **Mixed Sentiment (Universal Fail):** All three models struggled with Q6. They prioritized the "flight" vector and either found punctual flights or bad food. This confirms that **Embeddings cannot replace Cypher** for structured filtering (e.g., `WHERE food > 4 AND delay > 0`).
* **Numerical nuances:**
    * In Q14 ("> 2000 miles"), **MiniLM** actually performed surprisingly well, finding the `JNX -> EWX` route.
    * In Q9 ("Huge delay > 3 hours"), **BGE-M3** found a massive 620-minute delay, showing better sensitivity to "magnitude" words like "huge" compared to MiniLM's 85-minute result.

#### 4. Demographic & Attribute Matching
*Tests: Q4, Q5, Q12, Q15*

* **The Task:** Match specific generations ("Millennial", "Boomer") or aircraft ("A320-200").
* **Consistency:** All models performed adequately here, generally finding the correct Generation or Status when explicitly requested.
* **Nuance:** **MPNet** seemed slightly better at pairing the demographic with the correct *sentiment* (e.g., finding an unhappy Millennial rather than just *any* Millennial).

---

### Conclusion 

For the Airline Graph-RAG system, **we will proceed with BGE-M3.**

While MPNet offers slightly better "literary" understanding of sentiments, an airline system **must** respect airport codes and specific entities. BGE-M3's ability to act as a hybrid retriever—catching the keyword "SEX" (airport) while still understanding "severe delay"—makes it the robust choice to minimize hallucinations.