# Multi-Agent RAG System for HR Queries
## Ravikumar's Personal Data Retrieval System

This notebook implements a multi-agent system using Groq API that:
1. **Validates** HR-appropriate questions
2. **Retrieves** relevant information from Ravi_Total.docx
3. **Orchestrates** multi-agent workflows

---

## Section 1: Import Required Libraries and Load Environment Variables

In [4]:
# Install required libraries
import subprocess
import sys

# Install packages
packages = [
    'python-dotenv',
    'langchain',
    'langchain-groq',
    'langchain-community',
    'faiss-cpu',
    'python-docx',
    'pydantic',
    'langchain-text-splitters'
]

for package in packages:
    print(f"Installing {package}...")
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', package])

Installing python-dotenv...
Installing langchain...
Installing langchain-groq...
Installing langchain-community...
Installing faiss-cpu...
Installing python-docx...
Installing pydantic...
Installing langchain-text-splitters...


In [2]:
print("asdmf")

asdmf


In [3]:
# Import required libraries
import os
from dotenv import load_dotenv
from pathlib import Path

# Load environment variables
load_dotenv()

# Get API key
GROQ_API_KEY = os.getenv('GROQ_API_KEY')

if not GROQ_API_KEY:
    raise ValueError("GROQ_API_KEY not found in .env file")

print("‚úì GROQ_API_KEY loaded successfully")
print(f"‚úì API Key (first 20 chars): {GROQ_API_KEY[:20]}...")

# Set workspace path
WORKSPACE_PATH = Path('.')
DATA_FOLDER = WORKSPACE_PATH / 'data'
DOCX_FILE = DATA_FOLDER / 'Ravi_Total.docx'

print(f"‚úì Data folder path: {DATA_FOLDER}")
print(f"‚úì Document file: {DOCX_FILE}")
print(f"‚úì File exists: {DOCX_FILE.exists()}")

‚úì GROQ_API_KEY loaded successfully
‚úì API Key (first 20 chars): gsk_sRZTINZivrrYo6Jn...
‚úì Data folder path: data
‚úì Document file: data\Ravi_Total.docx
‚úì File exists: True


## Section 2: Load and Process Document Data

In [4]:
# Load and extract text from DOCX file
from docx import Document

def load_docx(file_path):
    """Load and extract text from a Word document"""
    doc = Document(file_path)
    full_text = []
    
    for paragraph in doc.paragraphs:
        if paragraph.text.strip():
            full_text.append(paragraph.text)
    
    # Also extract text from tables
    for table in doc.tables:
        for row in table.rows:
            for cell in row.cells:
                if cell.text.strip():
                    full_text.append(cell.text)
    
    return "\n".join(full_text)

# Load the document
print("Loading document...")
document_text = load_docx(DOCX_FILE)

print(f"‚úì Document loaded successfully")
print(f"‚úì Total characters: {len(document_text)}")
print(f"‚úì Preview (first 500 chars):\n{document_text[:500]}")
print("\n" + "="*80)

Loading document...
‚úì Document loaded successfully
‚úì Total characters: 24414
‚úì Preview (first 500 chars):
PERSONAL PROFILE  
Name: RAVIKUMAR D  
Phone: +91 8825677072  
Email: rkumard777@gmail.com  
Portfolio: https://ravikumard.netlify.app/  
LinkedIn: https://www.linkedin.com/in/ravi-kumar-d-6426ba291  
GitHub: https://github.com/ravikumard0748  
Competitive Profiles:  
- LeetCode: https://leetcode.com/ravikumard/  
- CodeChef: https://www.codechef.com/users/ravikumard  
- HackerRank: https://www.hackerrank.com/profile/ravikumar_d20231  
- HackerEarth: https://www.hackerearth.com/@ravikumar.d2023a



## Section 3: Initialize Groq LLM and Embeddings

In [5]:
print("dasf")

dasf


In [6]:

from langchain_groq import ChatGroq

llm = ChatGroq(
    model_name="llama-3.1-8b-instant",
    temperature=0.3,
    max_tokens=256
)





In [7]:

# llm = ChatGroq(
#     groq_api_key=GROQ_API_KEY,
#     model_name="llama-3.1-8b-instant",
#     temperature=0.7,
#     max_tokens=2048
# )


In [8]:

print("‚úì Groq LLM initialized successfully")
print(f"‚úì Model: mixtral-8x7b-32768")

# Test the LLM
test_message = "Hello, can you introduce yourself as Ravikumar's HR assistant?"
test_response = llm.invoke(test_message)
print(f"\n‚úì LLM Test Response:\n{test_response.content[:200]}...")

‚úì Groq LLM initialized successfully
‚úì Model: mixtral-8x7b-32768

‚úì LLM Test Response:
Nice to meet you. I'm Rohan, Ravikumar's HR assistant. I'll be happy to assist you with any HR-related queries or concerns you may have. How can I help you today?...


In [10]:
# Initialize embeddings using Groq
from langchain_community.embeddings import HuggingFaceEmbeddings

print("\nInitializing Embeddings...")
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

print("‚úì Embeddings initialized successfully")
print(f"‚úì Model: all-MiniLM-L6-v2")

# Test embedding
test_embedding = embeddings.embed_query("Tell me about your experience")
print(f"‚úì Embedding dimension: {len(test_embedding)}")


Initializing Embeddings...


  from .autonotebook import tqdm as notebook_tqdm


‚úì Embeddings initialized successfully
‚úì Model: all-MiniLM-L6-v2
‚úì Embedding dimension: 384


## Section 4: Build Vector Store from Document

In [11]:
# Split document into chunks
from langchain_text_splitters import RecursiveCharacterTextSplitter

print("Splitting document into chunks...")
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100,
    separators=["\n\n", "\n", " ", ""]
)

chunks = text_splitter.split_text(document_text)

print(f"‚úì Document split into {len(chunks)} chunks")
print(f"‚úì Average chunk size: {len(document_text) // len(chunks) if chunks else 0} characters")
print(f"\nFirst chunk preview:\n{chunks[0][:300]}...")

Splitting document into chunks...
‚úì Document split into 65 chunks
‚úì Average chunk size: 375 characters

First chunk preview:
PERSONAL PROFILE  
Name: RAVIKUMAR D  
Phone: +91 8825677072  
Email: rkumard777@gmail.com  
Portfolio: https://ravikumard.netlify.app/  
LinkedIn: https://www.linkedin.com/in/ravi-kumar-d-6426ba291  
GitHub: https://github.com/ravikumard0748  
Competitive Profiles:  
- LeetCode: https://leetcode.co...


In [13]:
# Create vector store from chunks
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document

print("\nCreating vector store...")

# Convert chunks to Document objects
documents = [Document(page_content=chunk) for chunk in chunks]

# Create FAISS vector store
vector_store = FAISS.from_documents(
    documents=documents,
    embedding=embeddings
)

print("‚úì Vector store created successfully")
print(f"‚úì Vector store type: FAISS")
print(f"‚úì Number of vectors: {vector_store.index.ntotal if hasattr(vector_store.index, 'ntotal') else len(documents)}")

# Create retriever
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
print("‚úì Retriever initialized (returns top 3 matches)")


Creating vector store...
‚úì Vector store created successfully
‚úì Vector store type: FAISS
‚úì Number of vectors: 65
‚úì Retriever initialized (returns top 3 matches)


## Section 5: Create Validation Agent

This agent validates whether a question is appropriate to share with HR.

In [15]:
from langchain_core.prompts import PromptTemplate
from pydantic import BaseModel, Field
from typing import Dict, Any

# Define validation response structure
class ValidationResponse(BaseModel):
    is_hr_appropriate: bool = Field(description="Whether the question is appropriate for HR")
    confidence: float = Field(description="Confidence score 0-1")
    reason: str = Field(description="Reason for validation decision")
    category: str = Field(description="Category of the question")

# Create validation agent
validation_prompt = PromptTemplate(
    input_variables=["question"],
    template="""You are an HR-appropriate question validator for Ravikumar's personal data system.
    
Your task is to validate if the following question is appropriate to answer using Ravikumar's personal data in an HR context.

Question: {question}

Consider:
1. Is the question professional and HR-relevant?
2. Does it ask about work experience, skills, education, or professional achievements?
3. Does it respect privacy and professional boundaries?
4. Is it suitable for HR department discussion?

Respond with a JSON object containing:
- is_hr_appropriate: boolean
- confidence: float (0-1)
- reason: brief explanation
- category: "professional", "personal", "inappropriate", or "other"

Example response format:
{{"is_hr_appropriate": true, "confidence": 0.95, "reason": "Asks about work experience", "category": "professional"}}

Your validation response:"""
)

def validate_question(question: str) -> Dict[str, Any]:
    """Validate if a question is appropriate for HR"""
    print(f"\nüîç Validating question: '{question}'")
    
    # Generate validation
    validation_chain = validation_prompt | llm
    response = validation_chain.invoke({"question": question})
    
    # Parse response
    import json
    try:
        response_text = response.content
        # Extract JSON from response
        json_start = response_text.find('{')
        json_end = response_text.rfind('}') + 1
        if json_start != -1 and json_end > json_start:
            json_str = response_text[json_start:json_end]
            validation_result = json.loads(json_str)
        else:
            validation_result = {
                "is_hr_appropriate": False,
                "confidence": 0.5,
                "reason": "Could not parse validation response",
                "category": "other"
            }
    except json.JSONDecodeError:
        validation_result = {
            "is_hr_appropriate": False,
            "confidence": 0.5,
            "reason": "Error parsing validation response",
            "category": "other"
        }
    
    return validation_result

print("‚úì Validation agent created successfully")

‚úì Validation agent created successfully


## Section 6: Create RAG Retrieval Agent

This agent retrieves relevant information from Ravikumar's data.

In [16]:
# Create RAG retrieval agent
rag_prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""You are Ravikumar's HR information assistant. Using the provided personal data, answer the HR's question accurately and professionally.

Context from Ravikumar's records:
{context}

HR's Question: {question}

Instructions:
1. Answer based ONLY on the provided context
2. Be professional and concise
3. If information is not in the context, clearly state "This information is not available in the records"
4. Cite specific sections when relevant
5. Maintain confidentiality and professionalism

Your response:"""
)

def retrieve_and_answer(question: str, validation_result: Dict[str, Any]) -> Dict[str, Any]:
    """Retrieve information and generate answer"""
    
    if not validation_result.get("is_hr_appropriate"):
        return {
            "success": False,
            "answer": "‚ùå This question is not appropriate to discuss with HR based on our guidelines.",
            "reason": validation_result.get("reason"),
            "context_retrieved": []
        }
    
    print(f"\nüìö Retrieving relevant information...")
    
    # Retrieve relevant documents
    retrieved_docs = retriever.invoke(question)
    context = "\n\n".join([doc.page_content for doc in retrieved_docs])
    
    print(f"‚úì Retrieved {len(retrieved_docs)} relevant documents")
    
    # Generate answer
    rag_chain = rag_prompt | llm
    response = rag_chain.invoke({
        "context": context,
        "question": question
    })
    
    return {
        "success": True,
        "answer": response.content,
        "reason": validation_result.get("reason"),
        "context_retrieved": [doc.page_content[:200] + "..." for doc in retrieved_docs]
    }

print("‚úì RAG retrieval agent created successfully")

‚úì RAG retrieval agent created successfully


## Section 7: Set Up Multi-Agent Orchestrator

This orchestrator coordinates all agents and handles query routing.

In [25]:
# Multi-Agent Orchestrator
class MultiAgentOrchestrator:
    """Orchestrates multiple agents to handle HR queries about Ravikumar's data"""
    
    def __init__(self, name: str = "Ravikumar HR Assistant"):
        self.name = name
        self.query_history = []
        self.agent_logs = []
    
    def process_query(self, question: str) -> Dict[str, Any]:
        """
        Process a query through the multi-agent system:
        1. Validate the question
        2. Retrieve relevant information
        3. Generate response
        """
        
        print("\n" + "="*80)
        print(f"ü§ñ {self.name} - Processing Query")
        print("="*80)
        
        # Step 1: Validate the question
        print("\n[Step 1/3] VALIDATION AGENT")
        print("-" * 40)
        validation_result = validate_question(question)
        
        print(f"‚úì Is HR Appropriate: {validation_result['is_hr_appropriate']}")
        print(f"‚úì Confidence: {validation_result['confidence']:.2%}")
        print(f"‚úì Category: {validation_result['category']}")
        print(f"‚úì Reason: {validation_result['reason']}")
        
        # Step 2: Retrieve and answer (if validated)
        print("\n[Step 2/3] RAG RETRIEVAL AGENT")
        print("-" * 40)
        rag_result = retrieve_and_answer(question, validation_result)
        
        # Step 3: Generate final response
        print("\n[Step 3/3] FINAL RESPONSE")
        print("-" * 40)
        
        final_response = {
            "query": question,
            "validation": validation_result,
            "retrieval": rag_result,
            "success": rag_result["success"],
            "timestamp": __import__('datetime').datetime.now().isoformat()
        }
        
        self.query_history.append(final_response)
        
        # Display answer
        print(f"\nüìù Answer:\n{rag_result['answer']}")
        print("\n" + "="*80)
        
        return final_response
    
    def get_history(self) -> list:
        """Get query history"""
        return self.query_history
    
    def print_summary(self):
        """Print summary of all queries"""
        print(f"\nüìä Summary Report - {len(self.query_history)} queries processed")
        for i, query in enumerate(self.query_history, 1):
            print(f"\n{i}. Query: {query['query']}")
            print(f"   Validated: {query['validation']['is_hr_appropriate']}")
            print(f"   Success: {query['success']}")

# Initialize orchestrator
orchestrator = MultiAgentOrchestrator()
print("‚úì Multi-Agent Orchestrator initialized successfully")

‚úì Multi-Agent Orchestrator initialized successfully


## Section 8: Test the Multi-Agent System

Test with sample HR queries to demonstrate the complete workflow.

In [26]:
# Test Query 1: Professional/HR-Appropriate
query_1 = "Can you tell me about Ravikumar's work experience and professional background?"
result_1 = orchestrator.process_query(query_1)


ü§ñ Ravikumar HR Assistant - Processing Query

[Step 1/3] VALIDATION AGENT
----------------------------------------

üîç Validating question: 'Can you tell me about Ravikumar's work experience and professional background?'
‚úì Is HR Appropriate: True
‚úì Confidence: 98.00%
‚úì Category: professional
‚úì Reason: Asks about work experience and professional background, which is relevant to HR context.

[Step 2/3] RAG RETRIEVAL AGENT
----------------------------------------

üìö Retrieving relevant information...
‚úì Retrieved 3 relevant documents

[Step 3/3] FINAL RESPONSE
----------------------------------------

üìù Answer:
I have reviewed Ravikumar D's personal profile and provided information. 

Ravikumar D is an aspiring Machine Learning Engineer with a strong academic background in Computer Science and Engineering with a specialization in Artificial Intelligence and Machine Learning.

Regarding his work experience, Ravikumar has completed an internship at Lysa Solution in May 2

In [27]:
# Test Query 2: Technical Skills
query_2 = "What are Ravikumar's technical skills and expertise?"
result_2 = orchestrator.process_query(query_2)


ü§ñ Ravikumar HR Assistant - Processing Query

[Step 1/3] VALIDATION AGENT
----------------------------------------

üîç Validating question: 'What are Ravikumar's technical skills and expertise?'
‚úì Is HR Appropriate: True
‚úì Confidence: 98.00%
‚úì Category: professional
‚úì Reason: Asks about technical skills and expertise, which is relevant to work performance and professional development.

[Step 2/3] RAG RETRIEVAL AGENT
----------------------------------------

üìö Retrieving relevant information...
‚úì Retrieved 3 relevant documents

[Step 3/3] FINAL RESPONSE
----------------------------------------

üìù Answer:
Based on Ravikumar's personal profile and summary, I can provide the following information on his technical skills and expertise:

Ravikumar is an aspiring Machine Learning Engineer with expertise in Artificial Intelligence and Machine Learning. His technical skills and expertise include:

- Programming skills: Although not explicitly mentioned, his involvement in v

In [28]:
# Test Query 3: Education
query_3 = "What is Ravikumar's educational background?"
result_3 = orchestrator.process_query(query_3)


ü§ñ Ravikumar HR Assistant - Processing Query

[Step 1/3] VALIDATION AGENT
----------------------------------------

üîç Validating question: 'What is Ravikumar's educational background?'
‚úì Is HR Appropriate: True
‚úì Confidence: 90.00%
‚úì Category: professional
‚úì Reason: Asks about educational background, which is relevant to HR for employee onboarding, training, and career development purposes.

[Step 2/3] RAG RETRIEVAL AGENT
----------------------------------------

üìö Retrieving relevant information...
‚úì Retrieved 3 relevant documents

[Step 3/3] FINAL RESPONSE
----------------------------------------

üìù Answer:
Based on the provided personal data, Ravikumar's educational background is as follows:

Ravikumar is currently pursuing his B.E. in Computer Science and Engineering with a specialization in Artificial Intelligence and Machine Learning at Sri Eshwar College of Engineering, Coimbatore. 

Specifically, the records indicate that he is in his 4th semester (PERSONA

In [29]:
# Test Query 4: Certifications
query_4 = "What certifications and achievements does Ravikumar have?"
result_4 = orchestrator.process_query(query_4)


ü§ñ Ravikumar HR Assistant - Processing Query

[Step 1/3] VALIDATION AGENT
----------------------------------------

üîç Validating question: 'What certifications and achievements does Ravikumar have?'
‚úì Is HR Appropriate: True
‚úì Confidence: 98.00%
‚úì Category: professional
‚úì Reason: Asks about professional achievements and certifications, relevant to work experience and skills.

[Step 2/3] RAG RETRIEVAL AGENT
----------------------------------------

üìö Retrieving relevant information...
‚úì Retrieved 3 relevant documents

[Step 3/3] FINAL RESPONSE
----------------------------------------

üìù Answer:
Based on Ravikumar's records, I have found the following certifications and achievements:

- CGPA: 8.09 in B.E. (CSE-AIML) at Sri Eshwar College of Engineering (PERSONAL PROFILE, EDUCATION)
- Scored 90.6% in HSC (Class 12) at The Merit Higher Secondary School (PERSONAL PROFILE, EDUCATION)
- Passed in SSLC (Class 10) at The Merit Higher Secondary School (PERSONAL PROFILE, EDU

In [30]:
# Test Query 5: Inappropriate query (test validation agent)
query_5 = "What is Ravikumar's home address and personal phone number?"
result_5 = orchestrator.process_query(query_5)


ü§ñ Ravikumar HR Assistant - Processing Query

[Step 1/3] VALIDATION AGENT
----------------------------------------

üîç Validating question: 'What is Ravikumar's home address and personal phone number?'
‚úì Is HR Appropriate: False
‚úì Confidence: 80.00%
‚úì Category: personal
‚úì Reason: Asks about personal contact information, which is not typically relevant to HR discussions or professional settings.

[Step 2/3] RAG RETRIEVAL AGENT
----------------------------------------

[Step 3/3] FINAL RESPONSE
----------------------------------------

üìù Answer:
‚ùå This question is not appropriate to discuss with HR based on our guidelines.



## Section 9: System Summary and Statistics

In [31]:
# Print system summary
print("\n" + "="*80)
print("üìä MULTI-AGENT RAG SYSTEM - SUMMARY REPORT")
print("="*80)

print("\n‚úì System Configuration:")
print(f"  - LLM: Groq (mixtral-8x7b-32768)")
print(f"  - Embeddings: HuggingFace (all-MiniLM-L6-v2)")
print(f"  - Vector Store: FAISS")
print(f"  - Document: Ravi_Total.docx")

print(f"\n‚úì Document Statistics:")
print(f"  - Total characters: {len(document_text):,}")
print(f"  - Total chunks: {len(chunks)}")
print(f"  - Average chunk size: {len(document_text) // len(chunks) if chunks else 0} chars")

print(f"\n‚úì Agents Deployed:")
print(f"  - Validation Agent: HR-appropriateness filter")
print(f"  - Retrieval Agent: Information retrieval (RAG)")
print(f"  - Orchestrator: Multi-agent coordination")

print(f"\n‚úì Queries Processed: {len(orchestrator.query_history)}")

# Statistics
hr_appropriate_count = sum(1 for q in orchestrator.query_history if q['validation']['is_hr_appropriate'])
successful_count = sum(1 for q in orchestrator.query_history if q['success'])

print(f"  - HR Appropriate: {hr_appropriate_count}/{len(orchestrator.query_history)}")
print(f"  - Successfully Answered: {successful_count}/{len(orchestrator.query_history)}")

print(f"\n" + "="*80)

# Display individual query summary
orchestrator.print_summary()

print(f"\n" + "="*80)
print("‚úÖ Multi-Agent RAG System Ready for Deployment!")
print("="*80)


üìä MULTI-AGENT RAG SYSTEM - SUMMARY REPORT

‚úì System Configuration:
  - LLM: Groq (mixtral-8x7b-32768)
  - Embeddings: HuggingFace (all-MiniLM-L6-v2)
  - Vector Store: FAISS
  - Document: Ravi_Total.docx

‚úì Document Statistics:
  - Total characters: 24,414
  - Total chunks: 65
  - Average chunk size: 375 chars

‚úì Agents Deployed:
  - Validation Agent: HR-appropriateness filter
  - Retrieval Agent: Information retrieval (RAG)
  - Orchestrator: Multi-agent coordination

‚úì Queries Processed: 5
  - HR Appropriate: 4/5
  - Successfully Answered: 4/5


üìä Summary Report - 5 queries processed

1. Query: Can you tell me about Ravikumar's work experience and professional background?
   Validated: True
   Success: True

2. Query: What are Ravikumar's technical skills and expertise?
   Validated: True
   Success: True

3. Query: What is Ravikumar's educational background?
   Validated: True
   Success: True

4. Query: What certifications and achievements does Ravikumar have?
   Valid

## Section 10: Custom Query Interface

Use this section to ask custom HR questions about Ravikumar's profile.

In [32]:
# Custom query function for HR
def ask_ravikumar_system(question: str):
    """
    Ask the multi-agent RAG system about Ravikumar's professional profile.
    
    Args:
        question: Your HR question about Ravikumar
    """
    return orchestrator.process_query(question)

# Example usage:
# result = ask_ravikumar_system("What are Ravikumar's key achievements?")

print("\n‚úÖ Custom Query Interface Ready!")
print("\nUsage: result = ask_ravikumar_system('Your question here')")
print("\nExample questions to try:")
print("  1. 'Can you tell me about Ravikumar's key achievements?'")
print("  2. 'What programming languages does Ravikumar know?'")
print("  3. 'How many years of experience does Ravikumar have?'")
print("  4. 'What are Ravikumar's certifications?'")
print("\nYou can ask any professional/HR-related question!")


‚úÖ Custom Query Interface Ready!

Usage: result = ask_ravikumar_system('Your question here')

Example questions to try:
  1. 'Can you tell me about Ravikumar's key achievements?'
  2. 'What programming languages does Ravikumar know?'
  3. 'How many years of experience does Ravikumar have?'
  4. 'What are Ravikumar's certifications?'

You can ask any professional/HR-related question!
