# 💡**Knowledge Graph and GraphDB with LangChain**

* Knowledge Graphs
* `Neo4j` Graph Database
* Cypher Queries
* Graph DB apps with LangChain

### ⭕1. **Knowledge Graphs: Concepts and Examples**




📘 **Knowledge Graph**


![knowgraph](https://raw.githubusercontent.com/mohd-faizy/Developing_LLMs_Applications_with_LangChain/refs/heads/main/GraphDB/_img_db/knowledgeGraph.jpg)

* **Definition**:
  * A semantic network representing real-world **entities** and **relationships**, enabling machines to understand and reason about the data contextually.

* 🔹 **Key Components:**
  * **Nodes** – Represent entities (e.g., *Person*, *Place*, *Organization*)
  * **Edges** – Represent relationships between entities (e.g., *bornIn*, *worksAt*)
  * **Labels** – Define types or categories of nodes (e.g., *Scientist*, *City*)
  * **Properties** – Attributes of nodes/edges in key-value format (e.g., *birthDate: 1879-03-14*)

* ⭕Example Use-Case: **Academic Knowledge Graph**
  * 🧠 Nodes:
    * Albert Einstein (Person)
    * Germany (Place)
    * Theoretical Physicist, Physicist (Occupation)
    * Theory of Relativity (Concept)
    * Physics (Discipline)

  * **Edges**:
    * ***Einstein → born In → Germany***
    * ***Einstein → occupation → Theoretical Physicist***
    * ***Theoretical Physicist → kindOf → Physicist***
    * ***Einstein → developed → Theory of Relativity***
    * ***Theory of Relativity → branchOf → Physics***
    * ***Physicist → practices → Physics***
  
  * 📌 Labels & Properties (example):
    * `birthDate`: "1879-03-14"
    * `nationality`: "German"

* 🔹 **NER (Named Entity Recognition)**
  * Extracts structured entities (like names, places, dates) from unstructured text
  * Helps in automatic **graph construction** from text sources like articles or documents
    * ``Person`: Albert Einstein
    * `Concept`: Theory of Relativity
    * `Place`: Germany
    * `Occupation`: Theoretical Physicist 

* 🔹 **Real-World Applications:**
  * **Google Search** – Displays knowledge panels and suggestions for queries like *"Albert Einstein"*
  * **YouTube** – Enhances content and ad recommendations by understanding entities in videos
  * **LinkedIn & Facebook** – Build social and professional graphs for recommendation and discovery

### ⭕2. **Retrieval Augmented Generation (RAG) & Search Techniques**



* **Traditional RAG Pipeline**:
  * Ingest documents → chunk them → generate embeddings → store in vector DB → perform similarity search
  

* 🔍 **Types of Search**
  
  * 🟢1. **Keyword Search**
      * Based on: **Bag of Words (BoW), TF-IDF**
    
      * ***Characteristics:***
        * Sparse vector representation
        * Matches exact words or phrases
        * Less contextual understanding

      * `Pros`: Fast and interpretable
      * Cons: Fails to capture semantic similarity

  * 🟢2. **Semantic Search $\Rightarrow$ Dense Vector Search**

      * Based on: **Dense embeddings (e.g., via transformers)**

      * ***Characteristics:***
        * Captures **meaning and context**
        * Uses vector similarity (e.g., cosine similarity)

      * `Pros`: Finds conceptually similar content, even with different words
      * Cons: Requires more computation and pre-trained models

  * 🟢3. **Hybrid Search**

      * Combines: **Keyword + Semantic search + Knowledge Search**

      * ***Characteristics:***
        * Merges lexical accuracy with semantic depth
        * Often uses weighted scoring from both methods
    
      * `Pros`: More robust and accurate results
      * Common in: Modern search engines and RAG (Retrieval-Augmented Generation) systems
  
      * **Knowledge Graph + RAG**:
        * Adds structured semantic understanding to RAG
        * Enhances response quality and context awareness



### ⭕3. **RDBMS vs. Graph Database**



| Feature            | **RDBMS**                                                              | **Graph Database (e.g., Neo4j)**                                              |
| ------------------ | ---------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| **Data Model**     | Tables (with rows & columns)                                           | Graph: Nodes (entities), Relationships (edges), and Properties (attributes)   |
| **Schema**         | Rigid, predefined schema                                               | Flexible or schema-optional                                                   |
| **Relationships**  | Represented using **foreign keys** & **joins**                         | Represented **natively** as direct connections (edges)                        |
| **Query Language** | SQL (Structured Query Language)                                        | Cypher Query Language (CQL)                                                   |
| **Query Example**  | To find friends of friends: Multiple `JOIN` operations                 | Simple `MATCH` pattern: `(:Person)-[:FRIEND]->(:Person)-[:FRIEND]->(:Person)` |
| **Performance**    | Slower for complex joins and deep relations                            | Optimized for traversing complex relationships                                |
| **Storage**        | Normalized tables with keys                                            | Graph structure stored directly                                               |
| **Use Case Fit**   | Best for **structured**, transactional data (e.g., finance, inventory) | Best for **connected** data (e.g., social networks, recommendation systems)   |


#### 🔷 **Neo4j – Key Highlights**

1. **Real-time Insights**
   *Neo4j provides fast, real-time access to connected data.*

2. **Easy Retrieval**
   *Data can be easily retrieved using* **Cypher Query** *language.*

3. **Cypher Query Language**
   *Cypher is a* **declarative language** *used to visually and intuitively represent graph patterns.*

4. **No Joins Required**
   *Since relationships are stored natively, there are* **no joins**, *which means faster and simpler queries.*



#### 🔹 In **RDBMS** (SQL):

```sql
SELECT c2.name
FROM authors a
JOIN collaborations c1 ON a.id = c1.author_id
JOIN collaborations c2 ON c1.paper_id = c2.paper_id
JOIN authors c2 ON c2.id = c2.author_id
WHERE a.name = 'Albert Einstein' AND c2.name != 'Albert Einstein';
```

#### 🔹 In **Graph DB** (`Cypher` - `Neo4j`):

```cypher
MATCH (e:Author {name: 'Albert Einstein'})-[:WROTE]->(:Paper)<-[:WROTE]-(coauthor)
RETURN coauthor.name
```

>⚡ **Result**: Graph DB returns this much faster and more intuitively with fewer lines.


#### 🧠 **When to Use What?**

| Use Case                                        | Best Fit         |
| ----------------------------------------------- | ---------------- |
| Banking, Invoicing, HR Systems                  | ✅ RDBMS          |
| Social Media, Fraud Detection, Knowledge Graphs | ✅ Graph Database |
| Product Inventory or Billing Systems            | ✅ RDBMS          |
| Recommendation Engines, Semantic Search         | ✅ Graph Database |

### ⭕4. **Neo4j Property Graph Model**

#### **Syntax**

- Nodes: Represent entities (e.g., Person, Movie).
  - Syntax: `(n:Label {property:value})`
  - Example: `(p:Person {name:"John"})`
- 
- Relationships: Connect nodes and have types (e.g., :ACTED_IN).
  - Syntax: `(a)-[:RELATION_TYPE]->(b)`

---

```Cypher
// Step 1: Create a person node
CREATE (robert:Person {name: "Robert Downey Jr", born: 1970})

// Step 2: Create a movie node
CREATE (ironman:Movie {title: "Iron Man"})

// Step 3: Create the relationship
MATCH (robert:Person {name: "Robert Downey Jr"}), 
      (ironman:Movie {title: "Iron Man"})
CREATE (robert)-[:ACTED_IN]->(ironman)
```


# **👨‍💻CODE**

In [1]:
# ============================ #
#        IMPORTS & SETUP       #
# ============================ #

from langchain_neo4j import Neo4jGraph, GraphCypherQAChain
from langchain_groq import ChatGroq
from langchain_core.documents import Document
from langchain_experimental.graph_transformers import LLMGraphTransformer
from dotenv import load_dotenv
import os

# Load environment variables (NEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD, etc.)
load_dotenv()


# ============================ #
#     INITIALIZE LLM (Groq)    #
# ============================ #

llm = ChatGroq(model="llama-3.3-70b-versatile")

# ============================ #
#     NEO4J CONNECTION SETUP   #
# ============================ #
graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"),
    password=os.getenv("NEO4J_PASSWORD")
    )


# ============================ #
#        TEXT PREPARATION      #
# ============================ #

# Sample biography input about Elon Musk
text = """
Elon Reeve Musk (born June 28, 1971) is a businessman and investor known for his key roles in space
company SpaceX and automotive company Tesla, Inc. Other involvements include ownership of X Corp.?,
formerly Twitter, and his role in the founding of The Boring Company, xAI, Neuralink and OpenAI.
He is one of the wealthiest people in the world; as of July 2024, Forbes estimates his net worth to be
US$221 billion.Musk was born in Pretoria to Maye and engineer Errol Musk, and briefly attended
the University of Pretoria before immigrating to Canada at age 18, acquiring citizenship through
his Canadian-born mother. Two years later, he matriculated at Queen's University at Kingston in Canada.
Musk later transferred to the University of Pennsylvania and received bachelor's degrees in economics
and physics. He moved to California in 1995 to attend Stanford University, but dropped out after
two days and, with his brother Kimbal, co-founded online city guide software company Zip2.
"""


# Wrap the raw text inside a LangChain Document object
documents = [Document(page_content=text)]
documents

[Document(metadata={}, page_content="\nElon Reeve Musk (born June 28, 1971) is a businessman and investor known for his key roles in space\ncompany SpaceX and automotive company Tesla, Inc. Other involvements include ownership of X Corp.?,\nformerly Twitter, and his role in the founding of The Boring Company, xAI, Neuralink and OpenAI.\nHe is one of the wealthiest people in the world; as of July 2024, Forbes estimates his net worth to be\nUS$221 billion.Musk was born in Pretoria to Maye and engineer Errol Musk, and briefly attended\nthe University of Pretoria before immigrating to Canada at age 18, acquiring citizenship through\nhis Canadian-born mother. Two years later, he matriculated at Queen's University at Kingston in Canada.\nMusk later transferred to the University of Pennsylvania and received bachelor's degrees in economics\nand physics. He moved to California in 1995 to attend Stanford University, but dropped out after\ntwo days and, with his brother Kimbal, co-founded online 

In [2]:
# ============================ #
#     GRAPH TRANSFORMATION     #
# ============================ #

# Create a transformer to extract graph structure (nodes + relationships) using the LLM
llm_transformer = LLMGraphTransformer(llm=llm)

# Convert documents into graph data (nodes and relationships)
graph_documents = llm_transformer.convert_to_graph_documents(documents)

graph_documents

[GraphDocument(nodes=[Node(id='Elon Reeve Musk', type='Person', properties={}), Node(id='Spacex', type='Company', properties={}), Node(id='Tesla, Inc.', type='Company', properties={}), Node(id='X Corp.', type='Company', properties={}), Node(id='The Boring Company', type='Company', properties={}), Node(id='Xai', type='Company', properties={}), Node(id='Neuralink', type='Company', properties={}), Node(id='Openai', type='Company', properties={}), Node(id='Maye', type='Person', properties={}), Node(id='Errol Musk', type='Person', properties={}), Node(id='University Of Pretoria', type='University', properties={}), Node(id='Canada', type='Country', properties={}), Node(id="Queen'S University", type='University', properties={}), Node(id='University Of Pennsylvania', type='University', properties={}), Node(id='Stanford University', type='University', properties={}), Node(id='Kimbal Musk', type='Person', properties={}), Node(id='Zip2', type='Company', properties={})], relationships=[Relationshi

In [None]:
#  ============================ #
#         DEBUG OUTPUT         #
# ============================ #

# View extracted graph structure
graph_documents[0].nodes           # List of nodes (entities)

[Node(id='Elon Reeve Musk', type='Person', properties={}),
 Node(id='Spacex', type='Company', properties={}),
 Node(id='Tesla, Inc.', type='Company', properties={}),
 Node(id='X Corp.', type='Company', properties={}),
 Node(id='The Boring Company', type='Company', properties={}),
 Node(id='Xai', type='Company', properties={}),
 Node(id='Neuralink', type='Company', properties={}),
 Node(id='Openai', type='Company', properties={}),
 Node(id='Maye', type='Person', properties={}),
 Node(id='Errol Musk', type='Person', properties={}),
 Node(id='University Of Pretoria', type='University', properties={}),
 Node(id='Canada', type='Country', properties={}),
 Node(id="Queen'S University", type='University', properties={}),
 Node(id='University Of Pennsylvania', type='University', properties={}),
 Node(id='Stanford University', type='University', properties={}),
 Node(id='Kimbal Musk', type='Person', properties={}),
 Node(id='Zip2', type='Company', properties={})]

In [4]:
graph_documents[0].relationships   # List of relationships (edges) between entities

[Relationship(source=Node(id='Elon Reeve Musk', type='Person', properties={}), target=Node(id='Spacex', type='Company', properties={}), type='FOUNDER', properties={}),
 Relationship(source=Node(id='Elon Reeve Musk', type='Person', properties={}), target=Node(id='Tesla, Inc.', type='Company', properties={}), type='FOUNDER', properties={}),
 Relationship(source=Node(id='Elon Reeve Musk', type='Person', properties={}), target=Node(id='X Corp.', type='Company', properties={}), type='OWNER', properties={}),
 Relationship(source=Node(id='Elon Reeve Musk', type='Person', properties={}), target=Node(id='The Boring Company', type='Company', properties={}), type='FOUNDER', properties={}),
 Relationship(source=Node(id='Elon Reeve Musk', type='Person', properties={}), target=Node(id='Xai', type='Company', properties={}), type='FOUNDER', properties={}),
 Relationship(source=Node(id='Elon Reeve Musk', type='Person', properties={}), target=Node(id='Neuralink', type='Company', properties={}), type='FO

```js
movie_query="""

LOAD CSV WITH HEADERS FROM
'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv' AS row

/*
Load CSV data from a remote URL.
The CSV file contains headers, and each row corresponds to a movie with its metadata.
*/

// Create (or match if it already exists) a Movie node with a unique movieId
MERGE (m:Movie {id: row.movieId})

// Set properties on the Movie node: release date, title, and IMDb rating
SET m.released = date(row.released),
    m.title = row.title,
    m.imdbRating = toFloat(row.imdbRating)

/*
Handle directors:
- Split the director string by '|', in case there are multiple directors.
- For each director:
    - MERGE a Person node with the director's name (trimming whitespace).
    - Create a DIRECTED relationship from the person to the movie.
*/
FOREACH (director IN split(row.director, '|') |
    MERGE (p:Person {name: trim(director)})
    MERGE (p)-[:DIRECTED]->(m)
)

/*
Handle actors:
- Similar to directors, split by '|'.
- For each actor:
    - MERGE a Person node with the actor's name.
    - Create an ACTED_IN relationship from the person to the movie.
*/
FOREACH (actor IN split(row.actors, '|') |
    MERGE (p:Person {name: trim(actor)})
    MERGE (p)-[:ACTED_IN]->(m)
)

/*
Handle genres:
- Split the genres string by '|'.
- For each genre:
    - MERGE a Genre node with the genre name.
    - Create an IN_GENRE relationship from the movie to the genre.
*/
FOREACH (genre IN split(row.genres, '|') |
    MERGE (g:Genre {name: trim(genre)})
    MERGE (m)-[:IN_GENRE]->(g)
)
"""

```

In [None]:
### Load the dataset of movie

movie_query="""
LOAD CSV WITH HEADERS FROM
'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv' as row

MERGE(m:Movie{id:row.movieId})
SET m.released = date(row.released),
    m.title = row.title,
    m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') |
    MERGE (p:Person {name:trim(director)})
    MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') |
    MERGE (p:Person {name:trim(actor)})
    MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') |
    MERGE (g:Genre {name:trim(genre)})
    MERGE (m)-[:IN_GENRE]->(g))
"""

In [6]:
graph

<langchain_neo4j.graphs.neo4j_graph.Neo4jGraph at 0x2710d64d940>

In [7]:
graph.query(movie_query)

[]

In [8]:
graph.refresh_schema()
print(graph.schema)

Node properties:
Movie {id: STRING, released: DATE, title: STRING, imdbRating: FLOAT}
Person {name: STRING}
Genre {name: STRING}
Relationship properties:

The relationships:
(:Movie)-[:IN_GENRE]->(:Genre)
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)


In [None]:
chain = GraphCypherQAChain.from_llm(
    llm=llm,
    graph=graph,
    verbose=True,
    allow_dangerous_requests=True
)

chain

GraphCypherQAChain(verbose=True, graph=<langchain_neo4j.graphs.neo4j_graph.Neo4jGraph object at 0x000002710D64D940>, cypher_generation_chain=PromptTemplate(input_variables=['question', 'schema'], input_types={}, partial_variables={}, template='Task:Generate Cypher statement to query a graph database.\nInstructions:\nUse only the provided relationship types and properties in the schema.\nDo not use any other relationship types or properties that are not provided.\nSchema:\n{schema}\nNote: Do not include any explanations or apologies in your responses.\nDo not respond to any questions that might ask anything else than for you to construct a Cypher statement.\nDo not include any text except the generated Cypher statement.\n\nThe question is:\n{question}')
| RunnableBinding(bound=ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x000002710D64CC20>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000002710D64D7F0>, model_name='llama-3.3-70b-v

In [10]:
response=chain.invoke({"query":"Who was the director of the moview GoldenEye"})

response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (p:Person)-[:DIRECTED]->(m:Movie {title: 'GoldenEye'}) 
RETURN p.name
[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Martin Campbell'}][0m

[1m> Finished chain.[0m


{'query': 'Who was the director of the moview GoldenEye',
 'result': 'Martin Campbell was the director of the movie GoldenEye.'}

In [11]:
response=chain.invoke({"query":"tell me the genre of th movie GoldenEye"})

response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: 'GoldenEye'})-[:IN_GENRE]->(g:Genre) RETURN g.name[0m
Full Context:
[32;1m[1;3m[{'g.name': 'Adventure'}, {'g.name': 'Action'}, {'g.name': 'Thriller'}][0m

[1m> Finished chain.[0m


{'query': 'tell me the genre of th movie GoldenEye',
 'result': 'The genres of the movie GoldenEye are Adventure, Action, and Thriller.'}

In [12]:
response=chain.invoke({"query":"Who was the director in movie Casino"})

response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:DIRECTED]->(m:Movie {title: 'Casino'}) RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Martin Scorsese'}][0m

[1m> Finished chain.[0m


{'query': 'Who was the director in movie Casino',
 'result': 'Martin Scorsese was the director in the movie Casino.'}

In [14]:
response=chain.invoke({"query":"Which movie were released in 2008"})

response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie) WHERE m.released = "2008" RETURN m.title[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m


{'query': 'Which movie were released in 2008',
 'result': "I don't know the answer."}

In [15]:
response=chain.invoke({"query":"Give me the list of movie having imdb rating more than 8"})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie) WHERE m.imdbRating > 8 RETURN m.title, m.imdbRating[0m
Full Context:
[32;1m[1;3m[{'m.title': 'Toy Story', 'm.imdbRating': 8.3}, {'m.title': 'Heat', 'm.imdbRating': 8.2}, {'m.title': 'Casino', 'm.imdbRating': 8.2}, {'m.title': 'Twelve Monkeys (a.k.a. 12 Monkeys)', 'm.imdbRating': 8.1}, {'m.title': 'Seven (a.k.a. Se7en)', 'm.imdbRating': 8.6}, {'m.title': 'Usual Suspects, The', 'm.imdbRating': 8.6}, {'m.title': 'Hate (Haine, La)', 'm.imdbRating': 8.1}, {'m.title': 'Braveheart', 'm.imdbRating': 8.4}, {'m.title': 'Taxi Driver', 'm.imdbRating': 8.3}, {'m.title': 'Anne Frank Remembered', 'm.imdbRating': 8.2}][0m

[1m> Finished chain.[0m


{'query': 'Give me the list of movie having imdb rating more than 8',
 'result': 'The movies with an IMDB rating more than 8 are: Seven (a.k.a. Se7en) with a rating of 8.6, Usual Suspects, The with a rating of 8.6, Braveheart with a rating of 8.4, Toy Story with a rating of 8.3, and Taxi Driver with a rating of 8.3.'}

In [16]:
examples = [
    {
        "question": "How many artists are there?",
        "query": "MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)",
    },
    {
        "question": "Which actors played in the movie Casino?",
        "query": "MATCH (m:Movie {{title: 'Casino'}})<-[:ACTED_IN]-(a) RETURN a.name",
    },
    {
        "question": "How many movies has Tom Hanks acted in?",
        "query": "MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)",
    },
    {
        "question": "List all the genres of the movie Schindler's List",
        "query": "MATCH (m:Movie {{title: 'Schindler\\'s List'}})-[:IN_GENRE]->(g:Genre) RETURN g.name",
    },
    {
        "question": "Which actors have worked in movies from both the comedy and action genres?",
        "query": "MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name",
    },
    {
        "question": "Which directors have made movies with at least three different actors named 'John'?",
        "query": "MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name",
    },
    {
        "question": "Identify movies where directors also played a role in the film.",
        "query": "MATCH (p:Person)-[:DIRECTED]->(m:Movie), (p)-[:ACTED_IN]->(m) RETURN m.title, p.name",
    },
    {
        "question": "Find the actor with the highest number of movies in the database.",
        "query": "MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1",
    },
]

# ⁉️**Quiz**

## 🧠 I. Short Answer Quiz

**1. What is a Knowledge Graph, and what three main components does it use to represent information?**
A Knowledge Graph is a semantic network representing relationships among real-world entities. Its three key components are:

* **Nodes** (entities)
* **Edges** (relationships)
* **Labels** (categories for nodes)

**2. Explain the purpose of "nodes" and "edges" within a Knowledge Graph.**
Nodes represent distinct entities such as people, places, or concepts. Edges connect these nodes and define how the entities are related to one another.

**3. How does Named Entity Recognition (NER) relate to the creation of Knowledge Graphs?**
NER identifies specific entities (like names, places, and organizations) from unstructured text. These entities become nodes, and their detected connections form the edges in a Knowledge Graph.

**4. Describe the primary difference between keyword search and semantic search in the context of retrieval.**
Keyword search looks for exact matches of words, often missing context. Semantic search interprets the meaning of words to find conceptually related information, even if exact terms differ.

**5. What is "hybrid search" and why is it considered more effective than keyword or semantic search alone?**
Hybrid search combines keyword and semantic search. It improves accuracy by using precise term matching and contextual understanding together for better information retrieval.

**6. List three key differences between a Relational Database Management System (RDBMS) and a Graph Database.**

* **Data Structure**: RDBMS uses tables; GraphDB uses nodes and edges.
* **Query Language**: RDBMS uses SQL; GraphDB uses Cypher.
* **Relationship Handling**: RDBMS uses joins; GraphDB natively models relationships.

**7. What is the main advantage of using Cipher Query Language in Neo4j compared to SQL in RDBMS for complex queries?**
Cypher allows intuitive and visual querying of data relationships, eliminating complex joins and nested queries typically required in SQL.

**8. Explain what "properties" and "labels" signify when defining elements within a Neo4j Property Graph Data Model.**

* **Properties**: Key-value pairs storing metadata about nodes or relationships (e.g., name, year).
* **Labels**: Tags that categorize nodes into types like "Movie" or "Person."

**9. How does LangChain's LLMGraphTransformer facilitate the creation of a graph document from raw text?**
It uses an LLM to process raw text, extract entities and relationships, and convert them into structured graph elements automatically.

**10. Describe the function of the GraphCypherQAChain in LangChain for querying a graph database.**
It converts natural language questions into Cypher queries, runs them on a graph database, and returns the result as a human-readable answer.

---

## ✍️ II. Quiz Answer Key

*(Provided in your original message – formatted for instructor/self-review use.)*

---

## 📝 III. Essay Format Questions

1. **Information Retrieval Evolution**
   Discuss how search evolved from basic keyword methods to hybrid search powered by semantic understanding and Knowledge Graphs. Highlight LLMs’ role in enabling deep context awareness and enhancing response quality.

2. **RDBMS vs. GraphDB Models**
   Compare data structure (tables vs. nodes), relationship modeling, and query languages (SQL vs. Cypher). Provide examples like social networks or movie recommendation systems where GraphDB excels.

3. **Unstructured to Graph Pipeline**
   Explain steps: document chunking → embedding/vectorization → entity extraction (NER) → relationship detection → graph creation (e.g., with Neo4j). Address challenges like ambiguity and benefit of structured insights.

4. **LangChain’s Role in RAG Apps**
   Break down how LangChain components like `LLMGraphTransformer` and `GraphCypherQAChain` enable intelligent data interaction. Discuss how this enhances the performance and usability of RAG systems.

5. **Designing a Movie Recommendation System Using Knowledge Graphs**
   Define graph structure: nodes = movies, directors, actors; relationships = ACTED\_IN, DIRECTED. Use Cypher to find "similar movies" based on shared genre, cast, director, or user preferences.

---

## 📚 IV. Glossary of Key Terms

| Term                                     | Definition                                                              |
| ---------------------------------------- | ----------------------------------------------------------------------- |
| **Knowledge Graph**                      | Semantic structure representing entities and relationships.             |
| **Graph Database**                       | A NoSQL DB using nodes, edges, and properties to store relationships.   |
| **Neo4j**                                | A popular open-source GraphDB implementing the property graph model.    |
| **LangChain**                            | A framework for chaining together components to build LLM-powered apps. |
| **Nodes**                                | Entities in a graph like people, items, or locations.                   |
| **Edges (Relationships)**                | Connections between nodes (e.g., WORKS\_FOR, DIRECTED).                 |
| **Labels**                               | Types or categories of nodes (e.g., "Person", "Movie").                 |
| **Properties**                           | Attributes of nodes/relationships as key-value pairs.                   |
| **Cypher Query Language**                | Graph-specific query language used in Neo4j.                            |
| **RDBMS**                                | Relational DB that uses tables and SQL (e.g., MySQL, PostgreSQL).       |
| **Keyword Search**                       | Search technique matching exact words in documents.                     |
| **Semantic Search**                      | Finds content based on meaning/context using embeddings.                |
| **Hybrid Search**                        | Combines keyword and semantic search for accuracy.                      |
| **Retrieval Augmented Generation (RAG)** | Combines retrieval with generation by an LLM.                           |
| **Embedding Vectors**                    | Numeric representations of text for similarity comparison.              |
| **Vector Database**                      | Optimized DB for storing and searching embedding vectors.               |
| **LLMGraphTransformer**                  | LangChain tool that converts text into graph data using LLMs.           |
| **GraphCypherQAChain**                   | LangChain module that converts natural questions into Cypher queries.   |
| **Grock**                                | Platform providing API access to open-source LLMs for fast querying.    |
| **Named Entity Recognition (NER)**       | NLP method for identifying entities like people, places, or companies.  |


