# Text-to-Cypher Demo with Medha

This notebook demonstrates Medha with **Cypher** (Neo4j) queries, proving that
the library is truly query-language agnostic.

**No live Neo4j instance is required.** Medha caches the query strings themselves —
it never executes them. The demo shows cache behavior across all four tiers:

- **Tier 0** — L1 in-memory cache (microseconds)
- **Tier 1** — Template matching with parameter extraction
- **Tier 2** — Exact vector match
- **Tier 3** — Semantic similarity match

**Requirements:** `pip install medha[fastembed]`

## Cell 1: Setup & Imports

In [None]:
import time

from medha import Medha, Settings
from medha.embeddings.fastembed_adapter import FastEmbedAdapter
from medha.types import QueryTemplate

## Cell 2: Conceptual Graph Schema

We imagine a graph database with the following schema (no Neo4j needed):

```
(:Person {name, age})
(:Company {name, industry})
(:Project {name, status})

(:Person)-[:FRIEND]->(:Person)
(:Person)-[:WORKS_AT]->(:Company)
(:Person)-[:WORKS_ON]->(:Project)
(:Person)-[:MANAGES]->(:Person)
```

The Cypher queries below are valid against this schema. Since Medha caches
query strings without executing them, we provide conceptual response summaries.

In [None]:
# Conceptual sample data the graph would contain:
graph_description = {
    "persons": ["John", "Alice", "Bob", "Charlie", "Diana"],
    "companies": ["Acme", "Globex", "Initech"],
    "projects": ["Alpha", "Beta", "Gamma"],
    "relationships": [
        "John -[:FRIEND]-> Alice",
        "John -[:FRIEND]-> Bob",
        "Alice -[:WORKS_AT]-> Acme",
        "Bob -[:WORKS_AT]-> Acme",
        "Charlie -[:WORKS_AT]-> Globex",
        "Diana -[:WORKS_AT]-> Initech",
        "Alice -[:WORKS_ON]-> Alpha",
        "Alice -[:WORKS_ON]-> Beta",
        "Bob -[:MANAGES]-> Charlie",
    ],
}
print(f"Conceptual graph: {len(graph_description['persons'])} persons, "
      f"{len(graph_description['companies'])} companies, "
      f"{len(graph_description['projects'])} projects")

## Cell 3: Initialize Medha

We use:
- **FastEmbedAdapter** — local embeddings, no API key needed
- **memory mode** — Qdrant runs in-process, no external server required
- **query_language="cypher"** — labels the cache as Cypher-oriented

In [None]:
embedder = FastEmbedAdapter()
settings = Settings(qdrant_mode="memory", query_language="cypher")

medha = Medha(collection_name="cypher_demo", embedder=embedder, settings=settings)
await medha.start()
print("Medha initialized (collection='cypher_demo', mode='memory', language='cypher')")

## Cell 4: Store Question-Cypher Pairs

We store 6 question-Cypher pairs with conceptual response summaries.
Since there is no live Neo4j, the `response_summary` represents what the
query *would* return against our conceptual graph.

In [None]:
pairs = [
    (
        "Find John's friends",
        "MATCH (p:Person {name:'John'})-[:FRIEND]-(f) RETURN f.name",
        "[{'f.name': 'Alice'}, {'f.name': 'Bob'}]",
    ),
    (
        "Who works at Acme?",
        "MATCH (p:Person)-[:WORKS_AT]->(c:Company {name:'Acme'}) RETURN p.name",
        "[{'p.name': 'Alice'}, {'p.name': 'Bob'}]",
    ),
    (
        "How many people are in the database?",
        "MATCH (p:Person) RETURN COUNT(p)",
        "[{'COUNT(p)': 5}]",
    ),
    (
        "What projects does Alice work on?",
        "MATCH (p:Person {name:'Alice'})-[:WORKS_ON]->(proj:Project) RETURN proj.name",
        "[{'proj.name': 'Alpha'}, {'proj.name': 'Beta'}]",
    ),
    (
        "Find all companies",
        "MATCH (c:Company) RETURN c.name, c.industry",
        "[{'c.name': 'Acme'}, {'c.name': 'Globex'}, {'c.name': 'Initech'}]",
    ),
    (
        "Who are the managers?",
        "MATCH (p:Person)-[:MANAGES]->(other) RETURN DISTINCT p.name",
        "[{'p.name': 'Bob'}]",
    ),
]

for question, cypher, summary in pairs:
    await medha.store(question, cypher, response_summary=summary)
    print(f"Stored: {question!r}")
    print(f"        {cypher}\n")

## Cell 5: Tier 2 — Exact Vector Match

Searching with the **exact same question** that was stored yields a high-confidence
exact vector match. The returned query is Cypher — Medha doesn't care about the
query language, it matches on the *question* semantics.

In [None]:
start = time.perf_counter()
hit = await medha.search("How many people are in the database?")
elapsed = (time.perf_counter() - start) * 1000

print(f"Strategy: {hit.strategy}")        # EXACT_MATCH or L1_CACHE
print(f"Query:    {hit.generated_query}")  # MATCH (p:Person) RETURN COUNT(p)
print(f"Score:    {hit.confidence:.4f}")   # ~0.99+
print(f"Time:     {elapsed:.2f}ms")

## Cell 6: Tier 3 — Semantic Similarity Match

Now we search with a **rephrased question** that was never stored.
Medha recognizes the semantic similarity and returns the matching Cypher query.

In [None]:
start = time.perf_counter()
hit = await medha.search("List all of John's connections and buddies")
elapsed = (time.perf_counter() - start) * 1000

print(f"Strategy: {hit.strategy}")        # SEMANTIC_MATCH
print(f"Query:    {hit.generated_query}")  # MATCH (p:Person {name:'John'})-[:FRIEND]-(f) RETURN f.name
print(f"Score:    {hit.confidence:.4f}")   # ~0.90+
print(f"Time:     {elapsed:.2f}ms")

## Cell 7: Tier 1 — Template Matching with Parameter Extraction

We load a Cypher template with a `{name}` parameter. The `parameter_patterns` regex
extracts capitalized names from the input question.

When we ask about **Diana's** friends — a name not in any stored pair's question —
the template generates a new Cypher query with the extracted parameter.

In [None]:
templates = [
    QueryTemplate(
        intent="person_friends",
        template_text="Find {name}'s friends",
        query_template="MATCH (p:Person {{name:'{name}'}})-[:FRIEND]-(f) RETURN f.name",
        parameters=["name"],
        parameter_patterns={"name": r"\b(John|Alice|Bob|Charlie|Diana)\b"},
    ),
]

await medha.load_templates(templates)
print(f"Loaded {len(templates)} template(s)\n")

start = time.perf_counter()
hit = await medha.search("Show me Diana's friends")
elapsed = (time.perf_counter() - start) * 1000

print(f"Strategy: {hit.strategy}")        # TEMPLATE_MATCH
print(f"Query:    {hit.generated_query}")  # MATCH (p:Person {name:'Diana'})-[:FRIEND]-(f) RETURN f.name
print(f"Time:     {elapsed:.2f}ms")

## Cell 8: Tier 0 — L1 In-Memory Cache

The L1 cache stores recent search results in memory. The first call goes through the
full waterfall. The second call for the **same question** returns instantly from the
L1 cache — typically >100x faster.

In [None]:
# First call — goes through the vector backend
start = time.perf_counter()
hit1 = await medha.search("Who works at Acme?")
t1 = (time.perf_counter() - start) * 1000

# Second call — served from L1 cache
start = time.perf_counter()
hit2 = await medha.search("Who works at Acme?")
t2 = (time.perf_counter() - start) * 1000

print(f"First call:  {t1:.2f}ms ({hit1.strategy})")
print(f"Second call: {t2:.2f}ms ({hit2.strategy})")
if t2 > 0:
    print(f"Speedup:     {t1/t2:.0f}x")

## Cell 9: Stats & Cleanup

In [None]:
print("Cache Statistics:")
print(medha.stats)

await medha.close()
print("\nCleaned up: Medha closed.")