# RAG Graph Career Assistant Demo

This notebook demonstrates how the RAG (Retrieval-Augmented Generation) system works with vector store, embeddings, and graph components for career development assistance.

## Architecture Overview

```mermaid
graph TD
    subgraph "RAG Graph Career Assistant"
        Query[User Query]
        
        subgraph "Vector Store Component"
            Embeddings[HuggingFace Embeddings]
            VectorStore[FAISS Vector Store]
            Docs[Document Chunks]
        end
        
        subgraph "Graph Component"
            Neo4j[Neo4j Database]
            Graph[Career Graph]
            Roles[Role Nodes]
            Skills[Skill Nodes]
        end
        
        Response[Response Generator]
        Advice[Career Advice]
    end
    
    Query --> Embeddings
    Embeddings --> VectorStore
    VectorStore --> Docs
    Docs --> Response
    
    Query --> Neo4j
    Neo4j --> Graph
    Graph --> Roles
    Graph --> Skills
    Roles --> Response
    Skills --> Response
    
    Response --> Advice
```

The diagram above shows the main components of our RAG Graph Career Assistant:

1. **Vector Store Component**:
   - Uses HuggingFace embeddings to convert text into vectors
   - FAISS vector store for efficient similarity search
   - Stores and retrieves document chunks

2. **Graph Component**:
   - Neo4j database for storing career knowledge graph
   - Contains role and skill nodes with relationships
   - Enables structured career path analysis

3. **Integration**:
   - Both components process the user query
   - Results are combined in the response generator
   - Provides comprehensive career advice

## 1. Setup and Imports

First, let's import the necessary libraries and set up our environment.

In [2]:
import os
from pathlib import Path
from dotenv import load_dotenv
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from neo4j import GraphDatabase

# Load environment variables
load_dotenv()

# Constants
ROOT_DIR = Path.cwd().parent
VECTOR_STORE_DIR = ROOT_DIR / "rag" / "vector_store"
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"

## 2. Vector Store and Embeddings

Let's explore how the vector store and embeddings work. We'll:
1. Initialize the embedding model
2. Load the vector store
3. Perform some similarity searches

In [12]:
# Initialize embeddings
embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL)

# Load vector store
vector_store = FAISS.load_local(
    str(VECTOR_STORE_DIR),
    embeddings,
    allow_dangerous_deserialization=True
)

# Example: Search for relevant documents
query = "What skills are needed for a data engineer?"
docs = vector_store.similarity_search(query, k=2)

print("Top 2 relevant documents:")
for i, doc in enumerate(docs, 1):
    print(f"\nDocument {i}:")
    print(f"Source: {doc.metadata.get('source', 'unknown')}")
    print(f"Content: {doc.page_content[:200]}...")

Top 2 relevant documents:

Document 1:
Source: /Users/slunyakin/Projects/ragenv/graph_rag_career_assistant/data/learning_paths/bi_to_data_engineer.md
Content: This learning path outlines the steps and resources needed to transition from a BI Engineer role to a Data Engineer position, focusing on acquiring the necessary technical skills and knowledge....

Document 2:
Source: /Users/slunyakin/Projects/ragenv/graph_rag_career_assistant/data/roles/data_engineer.md
Content: Data Engineers are responsible for designing, building, and maintaining the infrastructure that enables data collection, processing, and analysis. They create robust data pipelines and ensure data qua...


## 3. Graph Database Integration

Now let's explore the graph database component that stores structured career information:

In [13]:
# Initialize Neo4j connection
uri = os.getenv("NEO4J_URI", "bolt://localhost:7687")
user = os.getenv("NEO4J_USER", "neo4j")
password = os.getenv("NEO4J_PASSWORD")
database = os.getenv("NEO4J_DATABASE", "neo4j")

driver = GraphDatabase.driver(uri, auth=(user, password))

# Example: Query role information
with driver.session(database=database) as session:
    result = session.run("""
        MATCH (r:Role {name: 'Data Engineer'})-[:REQUIRES_SKILL]->(s:Skill)
        RETURN r.name as role, collect(s.name) as skills
    """)
    record = result.single()
    if record:
        print(f"Role: {record['role']}")
        print("Required Skills:")
        for skill in record['skills']:
            print(f"- {skill}")

Role: Data Engineer
Required Skills:
- Python
- SQL
- Data Warehousing


## 4. Combining Vector Store and Graph

Let's see how we can combine both components to provide comprehensive career advice:

In [14]:
def get_career_advice(query: str):
    # Get relevant documents from vector store
    docs = vector_store.similarity_search(query, k=2)
    
    # Extract roles and skills from the query
    roles = []
    skills = []
    
    # Get graph context
    graph_context = ""
    with driver.session(database=database) as session:
        if roles:
            for role in roles:
                result = session.run("""
                    MATCH (r:Role {name: $role})-[:REQUIRES_SKILL]->(s:Skill)
                    RETURN collect(s.name) as skills
                """, role=role)
                record = result.single()
                if record and record['skills']:
                    graph_context += f"\nRequired skills for {role}: {', '.join(record['skills'])}"
    
    # Combine and format the response
    response = []
    
    if graph_context:
        response.append("Career Graph Information:")
        response.append(graph_context)
    
    if docs:
        response.append("\nRelevant Resources:")
        for i, doc in enumerate(docs, 1):
            response.append(f"\nResource {i}:")
            response.append(f"Source: {doc.metadata.get('source', 'unknown')}")
            response.append(f"Content: {doc.page_content[:200]}...")
    
    return "\n".join(response)

# Example usage
query = "How can I transition from a BI engineer to a data engineer?"
advice = get_career_advice(query)
print(advice)


Relevant Resources:

Resource 1:
Source: /Users/slunyakin/Projects/ragenv/graph_rag_career_assistant/data/learning_paths/bi_to_data_engineer.md
Content: This learning path outlines the steps and resources needed to transition from a BI Engineer role to a Data Engineer position, focusing on acquiring the necessary technical skills and knowledge....

Resource 2:
Source: /Users/slunyakin/Projects/ragenv/graph_rag_career_assistant/data/roles/bi_engineer.md
Content: BI Engineers are responsible for designing, developing, and maintaining business intelligence solutions that enable data-driven decision making. They create interactive dashboards, reports, and data v...


## 5. Advanced Graph Queries

Let's explore some advanced graph queries for career path analysis:

In [15]:
def analyze_career_transition(from_role: str, to_role: str):
    with driver.session(database=database) as session:
        # Get transition path
        path_result = session.run("""
            MATCH path = shortestPath((r1:Role {name: $from_role})-[*..5]->(r2:Role {name: $to_role}))
            RETURN [node in nodes(path) | node.name] as path,
                   [rel in relationships(path) | type(rel)] as relationships
        """, from_role=from_role, to_role=to_role)
        
        # Get skill differences
        skills_result = session.run("""
            MATCH (r1:Role {name: $role1})-[:REQUIRES_SKILL]->(s1:Skill)
            MATCH (r2:Role {name: $role2})-[:REQUIRES_SKILL]->(s2:Skill)
            RETURN collect(DISTINCT s1.name) as skills1,
                   collect(DISTINCT s2.name) as skills2
        """, role1=from_role, role2=to_role)
        
        path_record = path_result.single()
        skills_record = skills_result.single()
        
        if path_record and skills_record:
            path = path_record['path']
            skills1 = set(skills_record['skills1']) if skills_record['skills1'] and skills_record['skills1'][0] is not None else set()
            skills2 = set(skills_record['skills2']) if skills_record['skills2'] and skills_record['skills2'][0] is not None else set()
            
            print(f"Career Transition Path: {' -> '.join(path)}")
            print("\nSkills Analysis:")
            print(f"Skills to Learn: {', '.join(skills2 - skills1)}")
            print(f"Skills to Maintain: {', '.join(skills1 & skills2)}")
            print(f"Skills to Phase Out: {', '.join(skills1 - skills2)}")

# Example usage
analyze_career_transition("BI Engineer", "Data Engineer")

Career Transition Path: BI Engineer -> Data Engineer

Skills Analysis:
Skills to Learn: Data Warehousing, Python
Skills to Maintain: SQL
Skills to Phase Out: Tableau, Power BI


## 6. Comparing Vector Store vs. Vector Store + Graph Responses

Let's see how adding the graph component improves the quality of career advice by comparing responses to the same question.

In [16]:
def get_vector_store_only_response(query: str):
    """Get response using only vector store (no graph context)"""
    docs = vector_store.similarity_search(query, k=2)
    
    response = ["Response using only Vector Store:"]
    for i, doc in enumerate(docs, 1):
        response.append(f"\nResource {i}:")
        response.append(f"Source: {doc.metadata.get('source', 'unknown')}")
        response.append(f"Content: {doc.page_content[:200]}...")
    
    return "\n".join(response)

def get_combined_response(query: str):
    """Get response using both vector store and graph"""
    # Get vector store results
    docs = vector_store.similarity_search(query, k=2)
    
    # Get graph context
    graph_context = ""
    with driver.session(database=database) as session:
        # Try to extract roles from the query
        if "data scientist" in query.lower():
            result = session.run("""
                MATCH (r:Role {name: 'Data Scientist'})-[:REQUIRES_SKILL]->(s:Skill)
                RETURN collect(s.name) as skills
            """)
            record = result.single()
            if record and record['skills']:
                graph_context += f"\nRequired skills for Data Scientist: {', '.join(record['skills'])}"
        
        # Get career path information if relevant
        if "transition" in query.lower() or "path" in query.lower():
            result = session.run("""
                MATCH path = shortestPath((r1:Role {name: 'Software Engineer'})-[*..5]->(r2:Role {name: 'Data Scientist'}))
                RETURN [node in nodes(path) | node.name] as path
            """)
            record = result.single()
            if record and record['path']:
                graph_context += f"\nCareer transition path: {' -> '.join(record['path'])}"
    
    # Combine and format the response
    response = ["Response using Vector Store + Graph:"]
    
    if graph_context:
        response.append("\nCareer Graph Information:")
        response.append(graph_context)
    
    if docs:
        response.append("\nRelevant Resources:")
        for i, doc in enumerate(docs, 1):
            response.append(f"\nResource {i}:")
            response.append(f"Source: {doc.metadata.get('source', 'unknown')}")
            response.append(f"Content: {doc.page_content[:200]}...")
    
    return "\n".join(response)

# Example 1: Career transition question
print("Example 1: Career Transition")
print("-" * 80)
query1 = "How can I transition from a BI Engineer to a Data Engineer?"
print(f"Query: {query1}\n")
print(get_vector_store_only_response(query1))
print("\n" + "=" * 80 + "\n")
print(get_combined_response(query1))

# Example 2: Skill requirements question
print("\n\nExample 2: Skill Requirements")
print("-" * 80)
query2 = "What skills do I need to become a Data Engineer?"
print(f"Query: {query2}\n")
print(get_vector_store_only_response(query2))
print("\n" + "=" * 80 + "\n")
print(get_combined_response(query2))

Example 1: Career Transition
--------------------------------------------------------------------------------
Query: How can I transition from a BI Engineer to a Data Engineer?

Response using only Vector Store:

Resource 1:
Source: /Users/slunyakin/Projects/ragenv/graph_rag_career_assistant/data/learning_paths/bi_to_data_engineer.md
Content: This learning path outlines the steps and resources needed to transition from a BI Engineer role to a Data Engineer position, focusing on acquiring the necessary technical skills and knowledge....

Resource 2:
Source: /Users/slunyakin/Projects/ragenv/graph_rag_career_assistant/data/roles/bi_engineer.md
Content: BI Engineers are responsible for designing, developing, and maintaining business intelligence solutions that enable data-driven decision making. They create interactive dashboards, reports, and data v...


Response using Vector Store + Graph:

Relevant Resources:

Resource 1:
Source: /Users/slunyakin/Projects/ragenv/graph_rag_career_assista