# HR Assistant with RAG

**Project Date:** March 14, 2023 (Updated: August 2025)

**Developer:** Chris Johnson (kutyadog@gmail.com)

Built for The Washington Post hackathon 2023-ai-ml

---

## Project Overview

This project implements a production-ready HR chatbot for The Washington Post's internal HR portal. The system uses Retrieval-Augmented Generation (RAG) with OpenAI's GPT-4 to answer employee questions about benefits, policies, and procedures.

### Key Features:
- Semantic search using embeddings for accurate information retrieval
- Confidence scoring to prevent hallucinations
- Interactive Gradio interface for easy employee interaction
- Source attribution for transparency and verification
- Handles 2300+ HR articles with efficient vector embeddings

### Technical Implementation:
- Uses OpenAI's text-embedding-ada-002 for document embeddings
- Employs cosine similarity for semantic search
- Implements confidence thresholding to ensure accurate responses
- Includes a web-based interface using Gradio

---

## Problem Statement

HR departments constantly receive calls and emails asking basic questions that are already detailed on the HR website. This creates unnecessary workload for HR staff and delays in getting information to employees.

### Solution Approach

Using the HR website as the knowledge source, we built an AI chatbot that can:

1. **Ingest Content**: Pull and process 2300+ articles from the HR database
2. **Create Embeddings**: Generate vector embeddings for all article content
3. **Retrieve Relevant Information**: For each query, find the most relevant content using semantic search
4. **Generate Accurate Responses**: Use retrieved context to generate precise answers

---

## Challenges and Solutions

### Challenge 1: Model Hallucinations
GPT-3.5 Turbo had strong tendencies to hallucinate. Finding a prompt that could prevent this was difficult.

**Solution**: Leveraged GPT-4's improved reasoning capabilities and reduced hallucination tendencies.

### Challenge 2: Outdated Content
The initial POC used a blind dump of all HR website content, regardless of age, causing confusion when policies changed.

**Solution**: Implemented content refinement strategies and additional validation logic.

### Challenge 3: Context Management
Managing the context window while ensuring comprehensive information retrieval.

**Solution**: Implemented smart context truncation and prioritization algorithms.

---

## Future Enhancements

1. **Content Management System**: React interface for HR staff to manage content
2. **Automated Content Updates**: Integration with HR website CMS
3. **Vector Database**: Implementation of Pinecone for efficient similarity search
4. **Feedback System**: Thumbs up/down functionality for continuous improvement
5. **Analytics**: Logging and analysis of user interactions for insights

---

## Setup and Installation

### Prerequisites
- Python 3.7+
- OpenAI API key
- Required Python packages (see imports below)

### Dependencies
```bash
pip install openai pandas numpy gradio
```

### Configuration
Set up your OpenAI API key:
```python
import openai
openai.api_key = "your-api-key"
```

In [None]:
# STEP 1: Install Required Packages
!pip install openai pandas numpy gradio

In [None]:
# Import Required Libraries
import openai
import pandas as pd
import numpy as np
import gradio as gr
from google.colab import userdata

# Set up OpenAI API
try:
    openai.organization = userdata.get('OPENAI_ORG')
    openai.api_key = userdata.get('OPENAI_API_KEY')
    print("OpenAI API configured successfully")
except Exception as e:
    print(f"Error configuring OpenAI API: {e}")
    print("Please ensure you have set up your API keys in Colab secrets")

## Data Loading and Processing

For this demonstration, we'll use pre-processed embeddings. In a production environment, you would:

1. Pull content from the HR website
2. Clean and preprocess the text
3. Generate embeddings using OpenAI's embedding API
4. Store embeddings in a vector database

The embeddings generation process can be time-consuming and costly, so we've pre-computed and saved them.

In [None]:
# Download pre-processed embeddings
!gdown 1HLyJJ7NciWvZaupfutqt5m_P6AsEcGsl -O question_embeddings.csv
print("Embeddings downloaded successfully")

In [None]:
# Load and Prepare Data
def load_data():
    """Load and preprocess the HR articles with embeddings."""
    try:
        # Load the CSV file containing embeddings
        theData = pd.read_csv('question_embeddings.csv')
        
        # Convert string embeddings back to numpy arrays
        theData['embedding'] = theData.embedding.apply(eval).apply(np.array)
        
        print(f"Loaded {len(theData)} HR articles with embeddings")
        return theData
        
    except Exception as e:
        print(f"Error loading data: {e}")
        return None

# Load the dataset
hr_data = load_data()
if hr_data is not None:
    display(hr_data.head())

## Core Functionality

### Semantic Search Engine

The core of our system is a semantic search engine that can find the most relevant HR articles based on a user's query. We use cosine similarity to measure the similarity between the query embedding and the article embeddings.

In [None]:
from openai.embeddings_utils import get_embedding, cosine_similarity

def find_relevant_articles(query, data, top_k=5, similarity_threshold=0.77):
    """
    Find the most relevant articles for a given query.
    
    Args:
        query (str): The user's question
        data (DataFrame): DataFrame containing articles and embeddings
        top_k (int): Number of articles to retrieve
        similarity_threshold (float): Minimum similarity score for relevance
        
    Returns:
        tuple: (context, urls) - Combined context and list of URLs
    """
    try:
        # Generate embedding for the query
        query_vector = get_embedding(query, engine='text-embedding-ada-002')
        
        # Calculate similarity scores
        data["similarities"] = data['embedding'].apply(lambda x: cosine_similarity(x, query_vector))
        
        # Sort by similarity score
        sorted_data = data.sort_values("similarities", ascending=False)
        
        # Filter by threshold and get top results
        relevant_articles = sorted_data.head(top_k)
        
        # Build context from relevant articles
        context_parts = []
        urls = []
        
        for _, row in relevant_articles.iterrows():
            if row['similarities'] >= similarity_threshold:
                context_parts.append(f"Link: {row['url']} - {row['context']}")
                urls.append(row['url'])
        
        # Combine context parts
        context = "\n".join(context_parts)
        
        # Truncate context to fit within token limits
        context = context[:10000] if len(context) > 10000 else context
        
        return context, urls
        
    except Exception as e:
        print(f"Error in semantic search: {e}")
        return "", []

### Prompt Engineering

Effective prompt engineering is crucial for getting accurate responses from the language model. Our prompt includes:

1. Clear instructions about the role and behavior
2. The retrieved context as reference material
3. Guidelines for handling unknown information
4. Formatting instructions for the response

In [None]:
def create_prompt(context, question):
    """
    Create a well-structured prompt for the language model.
    
    Args:
        context (str): Retrieved context from relevant articles
        question (str): User's question
        
    Returns:
        str: Formatted prompt
    """
    if len(context) <= 5:
        return ""
        
    prompt = f"""Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know". Offer the given link when appropriate.

Context:
{context}

Q: {question}
A:"""
    
    return prompt

### Response Generation

The response generation function handles the actual API call to OpenAI's language model, with parameters optimized for factual accuracy.

In [None]:
def generate_response(prompt, model="gpt-4", temperature=0, max_tokens=512):
    """
    Generate a response from the language model.
    
    Args:
        prompt (str): The prompt to send to the model
        model (str): Which model to use
        temperature (float): Controls randomness (0 = deterministic)
        max_tokens (int): Maximum length of the response
        
    Returns:
        str: Model's response
    """
    if prompt == "":
        return "I don't know"
        
    try:
        response = openai.ChatCompletion.create(
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful chatbot assistant for The Washington Post's HR website called Guidepost. Provide accurate information based on the given context."
                },
                {
                    "role": "user",
                    "content": prompt
                },
            ],
            temperature=temperature,
            max_tokens=max_tokens,
            top_p=1,
            frequency_penalty=0,
            presence_penalty=0,
            model=model,
        )
        
        return response["choices"][0]["message"]["content"]
        
    except Exception as e:
        print(f"Error generating response: {e}")
        return "I apologize, but I'm having trouble generating a response right now. Please try again later."

# Configuration constants
COMPLETIONS_MODEL = "gpt-4"
EMBEDDINGS_MODEL = "text-embedding-ada-002"

## Complete Query Pipeline

Now let's combine all the components into a complete pipeline that takes a user query and returns a helpful response with source attribution.

In [None]:
def process_query(question, data=hr_data):
    """
    Complete pipeline for processing a user query.
    
    Args:
        question (str): User's question
        data (DataFrame): HR articles data
        
    Returns:
        tuple: (answer, urls) - Answer and list of relevant URLs
    """
    try:
        # Step 1: Find relevant articles
        context, urls = find_relevant_articles(question, data)
        
        # Step 2: Create prompt
        prompt = create_prompt(context, question)
        
        # Step 3: Generate response
        answer = generate_response(prompt)
        
        # Format URLs for display
        url_string = ''
        if urls:
            for url in urls:
                url_string += f'<a href="{url}" target="_blank">{url}</a><BR>'
        
        return answer, url_string
        
    except Exception as e:
        print(f"Error processing query: {e}")
        return "I apologize, but I encountered an error processing your request. Please try again.", ""

## Testing the System

Let's test our HR chatbot with some sample questions to verify it's working correctly.

In [None]:
# Test with a sample question
test_question = "How do I change my 401(k) contributions?"
answer, urls = process_query(test_question)

print(f"Question: {test_question}")
print(f"\nAnswer: {answer}")
if urls:
    print(f"\nSources:\n{urls}")

In [None]:
# Test with another question
test_question2 = "What should I do if a coworker says something offensive to me?"
answer2, urls2 = process_query(test_question2)

print(f"Question: {test_question2}")
print(f"\nAnswer: {answer2}")
if urls2:
    print(f"\nSources:\n{urls2}")

## Interactive Interface

Now let's create a user-friendly interface using Gradio that allows HR staff to interact with the chatbot easily.

In [None]:
def chatbot_interface(question, history=[]):
    """
    Interface function for the Gradio chatbot.
    
    Args:
        question (str): User's question
        history (list): Chat history (not used in this simple implementation)
        
    Returns:
        tuple: ("", updated_history) - Empty string for input and updated history
    """
    if not question.strip():
        return "", history
        
    # Get response and sources
    answer, urls = process_query(question)
    
    # Format the response with sources
    response = answer
    if urls:
        response += f"\n\n<details><summary>Sources</summary>{urls}</details>"
    
    # Add to history
    history.append((question, response))
    
    return "", history

def create_interface():
    """
    Create and configure the Gradio interface.
    
    Returns:
        gr.Interface: Configured Gradio interface
    """
    with gr.Blocks(theme=gr.themes.Soft()) as demo:
        gr.Markdown("""
        # HR Assistant with RAG
        
        Welcome to The Washington Post's HR Assistant! I'm here to help answer your questions about HR policies, benefits, and procedures.
        
        **How to use:**
        1. Type your question in the box below
        2. Press Enter or click Submit
        3. I'll provide an answer based on official HR documentation
        4. Check the 'Sources' section for links to relevant policies
        """)
        
        chatbot = gr.Chatbot(height=500, label="HR Assistant Chat")
        msg = gr.Textbox(label="Your Question", placeholder="Ask me about HR policies, benefits, etc.")
        clear = gr.Button("Clear Chat")
        
        def user_input(user_message, history):
            return "", history + [[user_message, None]]
        
        def bot_response(history):
            if history:
                user_message = history[-1][0]
                response, _ = process_query(user_message)
                history[-1][1] = response
            return history
        
        msg.submit(user_input, [msg, chatbot], [msg, chatbot]).then(
            bot_response, chatbot, chatbot
        )
        
        clear.click(lambda: None, None, chatbot, queue=False)
    
    return demo

# Create and launch the interface
demo = create_interface()
demo.launch(share=True)

## Conclusion

This HR Assistant with RAG demonstrates several key AI capabilities:

1. **Semantic Search**: Using embeddings to find relevant content based on meaning rather than keywords
2. **Retrieval-Augmented Generation**: Combining retrieved context with language models for accurate responses
3. **Confidence Scoring**: Implementing thresholds to ensure only relevant information is used
4. **User Experience**: Creating an intuitive interface for non-technical users

### Key Achievements:
- Successfully processes and searches through 2300+ HR articles
- Provides accurate, source-attributed responses to employee questions
- Reduces HR workload by automating responses to common queries
- Scales efficiently as more content is added

### Future Improvements:
- Implement a vector database (like Pinecone) for faster similarity search
- Add user authentication and personalization
- Implement feedback mechanisms for continuous improvement
- Add multi-language support for diverse workforce

This project showcases practical AI implementation that solves real business problems while maintaining accuracy and reliability.