# Task 1: RAG Model for Business QA Bot

This notebook demonstrates a complete Retrieval-Augmented Generation (RAG) system for business question-answering using OpenAI API and Pinecone vector database.

## System Architecture

The RAG system consists of four main components:
1. **Document Processor**: Cleans and chunks business documents
2. **Vector Store**: Stores embeddings in Pinecone for semantic search
3. **RAG System**: Orchestrates the entire pipeline
4. **Streamlit Interface**: Provides user-friendly web interface

## Installation and Setup

In [None]:
# Install required packages
!pip install openai pinecone-client streamlit nltk numpy python-dotenv

# Download NLTK data
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

In [None]:
# Import required libraries
import os
import re
import logging
import time
from typing import List, Dict, Any, Optional
from datetime import datetime

# AI and ML libraries
import numpy as np
from openai import OpenAI
from pinecone import Pinecone, ServerlessSpec

# NLP libraries
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import string

# Configuration
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

## API Configuration

Set up your OpenAI and Pinecone API keys. You can get these from:
- OpenAI: https://platform.openai.com/api-keys
- Pinecone: https://app.pinecone.io/

In [None]:
# Set your API keys here
OPENAI_API_KEY = "your-openai-api-key-here"
PINECONE_API_KEY = "your-pinecone-api-key-here"

# Or load from environment variables
# OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
# PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")

# Initialize clients
openai_client = OpenAI(api_key=OPENAI_API_KEY)
pc = Pinecone(api_key=PINECONE_API_KEY)

## Document Processor

The Document Processor handles text cleaning, normalization, and chunking for optimal embedding generation.

In [None]:
class DocumentProcessor:
    """
    Document processor for preparing business documents for RAG pipeline
    """
    
    def __init__(self, chunk_size: int = 1000, chunk_overlap: int = 200):
        """
        Initialize document processor
        
        Args:
            chunk_size: Maximum size of each chunk in characters
            chunk_overlap: Overlap between chunks in characters
        """
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.lemmatizer = WordNetLemmatizer()
        self.stop_words = set(stopwords.words('english'))
        
    def clean_text(self, text: str) -> str:
        """
        Clean and normalize text
        
        Args:
            text: Raw text to clean
            
        Returns:
            Cleaned text
        """
        # Remove extra whitespace and normalize
        text = re.sub(r'\s+', ' ', text).strip()
        
        # Remove special characters but keep punctuation for sentence structure
        text = re.sub(r'[^\w\s\.\,\!\?\;\:\-\(\)]', '', text)
        
        # Remove multiple consecutive punctuation
        text = re.sub(r'([\.!?]){2,}', r'\1', text)
        
        return text
    
    def create_chunks(self, text: str, source: str) -> List[Dict[str, Any]]:
        """
        Create overlapping text chunks for better context preservation
        
        Args:
            text: Input text to chunk
            source: Source document name
            
        Returns:
            List of text chunks with metadata
        """
        chunks = []
        text_length = len(text)
        
        # If text is shorter than chunk size, return as single chunk
        if text_length <= self.chunk_size:
            return [{
                'content': text,
                'source': source,
                'start_char': 0,
                'end_char': text_length
            }]
        
        start = 0
        chunk_index = 0
        
        while start < text_length:
            # Calculate end position
            end = min(start + self.chunk_size, text_length)
            
            # Try to break at sentence boundary if possible
            if end < text_length:
                sentence_end = text.rfind('.', start, end)
                if sentence_end > start + self.chunk_size - 200:
                    end = sentence_end + 1
            
            chunk_text = text[start:end].strip()
            
            if chunk_text:
                chunks.append({
                    'content': chunk_text,
                    'source': source,
                    'start_char': start,
                    'end_char': end,
                    'chunk_index': chunk_index
                })
                chunk_index += 1
            
            # Move start position with overlap
            start = end - self.chunk_overlap
            
            # Ensure we don't go backwards
            if start <= 0:
                start = end
        
        return chunks
    
    def process_document(self, content: str, source: str) -> List[Dict[str, Any]]:
        """
        Main document processing pipeline
        
        Args:
            content: Document content
            source: Document source/filename
            
        Returns:
            List of processed chunks ready for embedding
        """
        try:
            # Clean the text
            cleaned_content = self.clean_text(content)
            
            if not cleaned_content.strip():
                logger.warning(f"Empty content after cleaning for document: {source}")
                return []
            
            # Create chunks
            chunks = self.create_chunks(cleaned_content, source)
            
            logger.info(f"Processed document {source} into {len(chunks)} chunks")
            return chunks
            
        except Exception as e:
            logger.error(f"Error processing document {source}: {e}")
            return []

## Vector Store

The Vector Store manages Pinecone operations for storing and retrieving document embeddings.

In [None]:
class VectorStore:
    """
    Vector store implementation using Pinecone for semantic search
    """
    
    def __init__(self, api_key: str, environment: str = "us-east-1", index_name: str = "business-rag-index"):
        """
        Initialize Pinecone vector store
        
        Args:
            api_key: Pinecone API key
            environment: Pinecone environment
            index_name: Name of the index to use
        """
        self.api_key = api_key
        self.environment = environment
        self.index_name = index_name
        self.dimension = 1536  # OpenAI ada-002 embedding dimension
        self.metric = "cosine"
        self.pc = None
        self.index = None
        
    def initialize_pinecone(self):
        """Initialize Pinecone client"""
        try:
            self.pc = Pinecone(api_key=self.api_key)
            logger.info("Pinecone client initialized successfully")
        except Exception as e:
            logger.error(f"Failed to initialize Pinecone client: {e}")
            raise
    
    def create_index(self):
        """Create Pinecone index if it doesn't exist"""
        try:
            if not self.pc:
                self.initialize_pinecone()
            
            # Check if index exists
            existing_indexes = self.pc.list_indexes()
            index_names = [index.name for index in existing_indexes]
            
            if self.index_name not in index_names:
                logger.info(f"Creating new index: {self.index_name}")
                
                # Create index with serverless spec
                self.pc.create_index(
                    name=self.index_name,
                    dimension=self.dimension,
                    metric=self.metric,
                    spec=ServerlessSpec(
                        cloud="aws",
                        region=self.environment
                    )
                )
                
                # Wait for index to be ready
                while not self.pc.describe_index(self.index_name).status['ready']:
                    time.sleep(1)
                
                logger.info(f"Index {self.index_name} created and ready")
            else:
                logger.info(f"Index {self.index_name} already exists")
                
        except Exception as e:
            logger.error(f"Error creating index: {e}")
            raise
    
    def connect_to_index(self):
        """Connect to existing index"""
        try:
            if not self.pc:
                self.initialize_pinecone()
            
            self.index = self.pc.Index(self.index_name)
            logger.info(f"Connected to index: {self.index_name}")
            
        except Exception as e:
            logger.error(f"Error connecting to index: {e}")
            raise
    
    def initialize_index(self):
        """Initialize the complete index setup"""
        try:
            self.initialize_pinecone()
            self.create_index()
            self.connect_to_index()
            logger.info("Vector store initialized successfully")
            
        except Exception as e:
            logger.error(f"Error initializing vector store: {e}")
            raise
    
    def upsert_vectors(self, vectors: List[Dict[str, Any]], batch_size: int = 100):
        """
        Upsert vectors to Pinecone index
        
        Args:
            vectors: List of vectors with id, values, and metadata
            batch_size: Number of vectors to process in each batch
        """
        try:
            if not self.index:
                raise ValueError("Index not initialized. Call initialize_index() first.")
            
            # Process in batches
            for i in range(0, len(vectors), batch_size):
                batch = vectors[i:i + batch_size]
                
                # Format vectors for Pinecone
                formatted_vectors = []
                for vector in batch:
                    formatted_vectors.append({
                        'id': vector['id'],
                        'values': vector['values'],
                        'metadata': vector['metadata']
                    })
                
                # Upsert batch
                self.index.upsert(vectors=formatted_vectors)
                logger.info(f"Upserted batch {i // batch_size + 1} with {len(batch)} vectors")
            
            logger.info(f"Successfully upserted {len(vectors)} vectors")
            
        except Exception as e:
            logger.error(f"Error upserting vectors: {e}")
            raise
    
    def query_vectors(self, query_vector: List[float], top_k: int = 5, 
                     include_metadata: bool = True, filter_dict: Optional[Dict] = None) -> Dict[str, Any]:
        """
        Query vectors from Pinecone index
        
        Args:
            query_vector: Query embedding vector
            top_k: Number of top results to return
            include_metadata: Whether to include metadata in results
            filter_dict: Optional metadata filter
            
        Returns:
            Query results from Pinecone
        """
        try:
            if not self.index:
                raise ValueError("Index not initialized. Call initialize_index() first.")
            
            # Perform query
            results = self.index.query(
                vector=query_vector,
                top_k=top_k,
                include_metadata=include_metadata,
                filter=filter_dict
            )
            
            logger.info(f"Query returned {len(results['matches'])} results")
            return results
            
        except Exception as e:
            logger.error(f"Error querying vectors: {e}")
            raise

## RAG System

The main RAG system that orchestrates document processing, embedding generation, and query processing.

In [None]:
class RAGSystem:
    """
    Main RAG system that orchestrates document processing, embedding generation,
    vector storage, and query processing for business QA
    """
    
    def __init__(self, openai_api_key: str, vector_store: VectorStore, document_processor: DocumentProcessor):
        """
        Initialize the RAG system
        
        Args:
            openai_api_key: OpenAI API key
            vector_store: Vector store instance
            document_processor: Document processor instance
        """
        self.openai_client = OpenAI(api_key=openai_api_key)
        self.vector_store = vector_store
        self.document_processor = document_processor
        self.metrics = {
            'total_documents': 0,
            'total_chunks': 0,
            'queries_processed': 0
        }
        
        # Initialize vector store
        self.vector_store.initialize_index()
        
    def generate_embeddings(self, texts: List[str]) -> List[List[float]]:
        """
        Generate embeddings for a list of texts using OpenAI's embedding model
        
        Args:
            texts: List of text strings to embed
            
        Returns:
            List of embedding vectors
        """
        try:
            # Use OpenAI's text-embedding-ada-002 model
            response = self.openai_client.embeddings.create(
                model="text-embedding-ada-002",
                input=texts
            )
            
            embeddings = [embedding.embedding for embedding in response.data]
            logger.info(f"Generated {len(embeddings)} embeddings")
            return embeddings
            
        except Exception as e:
            logger.error(f"Error generating embeddings: {e}")
            raise
    
    def add_document(self, content: str, source: str) -> None:
        """
        Add a document to the knowledge base
        
        Args:
            content: Document content
            source: Document source/filename
        """
        try:
            # Process document into chunks
            chunks = self.document_processor.process_document(content, source)
            
            if not chunks:
                logger.warning(f"No chunks generated for document: {source}")
                return
            
            # Generate embeddings for chunks
            chunk_texts = [chunk['content'] for chunk in chunks]
            embeddings = self.generate_embeddings(chunk_texts)
            
            # Store in vector database
            vectors = []
            for i, (chunk, embedding) in enumerate(zip(chunks, embeddings)):
                vector_id = f"{source}_{i}"
                vectors.append({
                    'id': vector_id,
                    'values': embedding,
                    'metadata': {
                        'content': chunk['content'],
                        'source': source,
                        'chunk_index': i,
                        'start_char': chunk['start_char'],
                        'end_char': chunk['end_char']
                    }
                })
            
            self.vector_store.upsert_vectors(vectors)
            
            # Update metrics
            self.metrics['total_documents'] += 1
            self.metrics['total_chunks'] += len(chunks)
            
            logger.info(f"Successfully added document: {source} with {len(chunks)} chunks")
            
        except Exception as e:
            logger.error(f"Error adding document {source}: {e}")
            raise
    
    def retrieve_relevant_chunks(self, query: str, top_k: int = 5) -> List[Dict[str, Any]]:
        """
        Retrieve relevant document chunks for a query
        
        Args:
            query: User query
            top_k: Number of top results to return
            
        Returns:
            List of relevant chunks with metadata
        """
        try:
            # Generate embedding for query
            query_embedding = self.generate_embeddings([query])[0]
            
            # Search in vector store
            results = self.vector_store.query_vectors(
                query_embedding,
                top_k=top_k,
                include_metadata=True
            )
            
            # Format results
            relevant_chunks = []
            for match in results['matches']:
                relevant_chunks.append({
                    'content': match['metadata']['content'],
                    'source': match['metadata']['source'],
                    'score': match['score'],
                    'metadata': match['metadata']
                })
            
            logger.info(f"Retrieved {len(relevant_chunks)} relevant chunks for query")
            return relevant_chunks
            
        except Exception as e:
            logger.error(f"Error retrieving relevant chunks: {e}")
            raise
    
    def generate_answer(self, query: str, context_chunks: List[Dict[str, Any]], temperature: float = 0.1) -> str:
        """
        Generate answer using retrieved context and OpenAI
        
        Args:
            query: User query
            context_chunks: Retrieved relevant chunks
            temperature: Response temperature
            
        Returns:
            Generated answer
        """
        try:
            # Prepare context from retrieved chunks
            context = "\n\n".join([
                f"Source: {chunk['source']}\nContent: {chunk['content']}"
                for chunk in context_chunks
            ])
            
            # Create prompt for answer generation
            system_prompt = f"""You are a helpful business assistant that answers questions based on the provided business knowledge base. 
            
            Instructions:
            1. Use only the information provided in the context to answer questions
            2. If the answer cannot be found in the context, clearly state that you don't have enough information
            3. Be concise but comprehensive in your responses
            4. Reference specific sources when possible
            5. Focus on business-relevant information and practical advice
            
            Context:
            {context}
            
            Question: {query}
            
            Please provide a helpful and accurate answer based on the context above."""
            
            # the newest OpenAI model is "gpt-4o" which was released May 13, 2024.
            # do not change this unless explicitly requested by the user
            response = self.openai_client.chat.completions.create(
                model="gpt-4o",
                messages=[
                    {
                        "role": "system",
                        "content": system_prompt
                    }
                ],
                temperature=temperature,
                max_tokens=1000
            )
            
            answer = response.choices[0].message.content
            logger.info("Generated answer using OpenAI")
            return answer
            
        except Exception as e:
            logger.error(f"Error generating answer: {e}")
            raise
    
    def query(self, query: str, top_k: int = 5, temperature: float = 0.1) -> Dict[str, Any]:
        """
        Main query method that retrieves relevant information and generates an answer
        
        Args:
            query: User query
            top_k: Number of top results to retrieve
            temperature: Response temperature
            
        Returns:
            Dictionary containing answer and sources
        """
        try:
            # Retrieve relevant chunks
            relevant_chunks = self.retrieve_relevant_chunks(query, top_k)
            
            if not relevant_chunks:
                return {
                    'answer': "I don't have enough information in the knowledge base to answer your question.",
                    'sources': []
                }
            
            # Generate answer
            answer = self.generate_answer(query, relevant_chunks, temperature)
            
            # Update metrics
            self.metrics['queries_processed'] += 1
            
            return {
                'answer': answer,
                'sources': relevant_chunks
            }
            
        except Exception as e:
            logger.error(f"Error processing query: {e}")
            return {
                'answer': f"Sorry, I encountered an error while processing your question: {str(e)}",
                'sources': []
            }
    
    def get_metrics(self) -> Dict[str, Any]:
        """
        Get system performance metrics
        
        Returns:
            Dictionary of metrics
        """
        return self.metrics.copy()

## Sample Business Documents

Let's create some sample business documents to demonstrate the RAG system.

In [None]:
# Sample business documents
business_policy = """
ACME Corporation Business Policy Manual

1. CODE OF CONDUCT

1.1 Professional Behavior
All employees must maintain professional conduct at all times. This includes:
- Treating colleagues, customers, and partners with respect and dignity
- Maintaining confidentiality of sensitive company information
- Avoiding conflicts of interest
- Reporting any unethical behavior to management

1.2 Dress Code
Business casual attire is required for all office employees. Remote employees should dress professionally for video calls and client meetings.

1.3 Communication Standards
- All business communications must be professional and courteous
- Use company email for business purposes only
- Social media posts should not reference company matters without approval

2. HUMAN RESOURCES POLICIES

2.1 Equal Employment Opportunity
ACME Corporation is an equal opportunity employer. We do not discriminate based on race, color, religion, gender, national origin, age, disability, or sexual orientation.

2.2 Harassment Prevention
We maintain a zero-tolerance policy for harassment of any kind. All employees have the right to work in an environment free from harassment, intimidation, and offensive behavior.

2.3 Performance Reviews
Annual performance reviews are conducted for all employees. Reviews assess job performance, goal achievement, and professional development needs.

3. WORKPLACE SAFETY

3.1 General Safety
- Report all accidents and injuries immediately
- Follow all safety protocols and procedures
- Use appropriate personal protective equipment when required
- Maintain clean and organized workspaces

3.2 Emergency Procedures
In case of emergency:
- Evacuate the building using designated exit routes
- Report to the designated assembly area
- Do not use elevators during emergencies
- Follow instructions from emergency personnel

4. INFORMATION TECHNOLOGY

4.1 Computer and Network Usage
- Use company computers and networks for business purposes only
- Do not install unauthorized software
- Report security incidents immediately
- Follow password security best practices

4.2 Data Protection
- Protect confidential and proprietary information
- Use secure methods for data transmission
- Back up important data regularly
- Comply with data privacy regulations

5. FINANCIAL POLICIES

5.1 Expense Reports
All business expenses must be documented and submitted within 30 days. Receipts are required for all expenses over $25.

5.2 Procurement
All purchases over $1,000 require management approval. Use approved vendors when possible.

5.3 Travel Policy
Business travel must be pre-approved by direct supervisor. Use cost-effective transportation and accommodation options.

6. DISCIPLINARY PROCEDURES

6.1 Progressive Discipline
ACME Corporation follows a progressive discipline policy:
1. Verbal warning
2. Written warning
3. Suspension
4. Termination

6.2 Termination
Employment may be terminated for cause, including but not limited to:
- Violation of company policies
- Poor performance
- Misconduct
- Attendance issues

This policy manual is effective as of January 1, 2024, and supersedes all previous versions.
"""

employee_handbook = """
ACME Corporation Employee Handbook

Welcome to ACME Corporation! This handbook provides essential information about our company policies, procedures, and benefits.

COMPANY OVERVIEW

Mission Statement
To provide innovative solutions that drive business success while maintaining the highest standards of integrity and customer service.

Core Values
- Innovation: We constantly seek new and better ways to serve our customers
- Integrity: We conduct business ethically and transparently
- Excellence: We strive for the highest quality in everything we do
- Teamwork: We work together to achieve common goals

EMPLOYMENT BASICS

Work Hours
- Standard business hours: 9:00 AM to 5:00 PM, Monday through Friday
- Lunch break: 1 hour between 12:00 PM and 2:00 PM
- Flexible work arrangements available with supervisor approval

Attendance Policy
- Regular attendance is essential for business operations
- Notify supervisor as soon as possible if unable to work
- Excessive absenteeism may result in disciplinary action

Time Off Policies

Vacation Time
- New employees: 10 days annually
- 2-5 years of service: 15 days annually
- 5+ years of service: 20 days annually
- Vacation requests must be submitted at least 2 weeks in advance

Sick Leave
- All employees receive 5 sick days annually
- Sick leave does not roll over to the next year
- Medical documentation may be required for extended absences

Personal Days
- 3 personal days per year for personal matters
- Cannot be used in conjunction with vacation time
- Must be approved by supervisor

Holidays
ACME Corporation observes the following holidays:
- New Year's Day
- Memorial Day
- Independence Day
- Labor Day
- Thanksgiving Day
- Christmas Day

BENEFITS

Health Insurance
- Company pays 80% of premium for employee coverage
- Family coverage available with employee contribution
- Enrollment period: First 30 days of employment

Retirement Plan
- 401(k) plan with company matching up to 4% of salary
- Immediate vesting for employee contributions
- Company match vests after 3 years of service

Life Insurance
- Company-provided life insurance equal to 2x annual salary
- Additional voluntary life insurance available for purchase

Disability Insurance
- Short-term disability: 60% of salary for up to 26 weeks
- Long-term disability: 60% of salary after 26 weeks

PROFESSIONAL DEVELOPMENT

Training Opportunities
- Annual training budget of $2,000 per employee
- Professional conference attendance encouraged
- Internal training programs available

Tuition Reimbursement
- Up to $5,000 annually for job-related education
- Minimum grade of "B" required for reimbursement
- Pre-approval required from HR and supervisor

Career Development
- Annual career development discussions
- Mentorship program available
- Internal job posting priority for existing employees

WORKPLACE POLICIES

Remote Work Policy
- Remote work available for eligible positions
- Requires supervisor approval and signed agreement
- Home office equipment provided by company

Technology Use
- Company equipment for business use only
- No personal software installation without approval
- Regular security updates required

Communication Guidelines
- Professional email etiquette required
- Confidential information must be protected
- Social media policy applies to all platforms

COMPENSATION

Payroll
- Bi-weekly pay schedule
- Direct deposit required
- Pay stubs available through employee portal

Performance Reviews
- Annual performance evaluations
- Merit increases based on performance and budget
- Promotion opportunities posted internally first

Overtime Policy
- Non-exempt employees eligible for overtime pay
- Overtime must be pre-approved by supervisor
- Time and a half for hours worked over 40 per week

EMPLOYEE RESOURCES

Human Resources
- HR office hours: 8:00 AM to 4:30 PM
- Employee assistance program available
- Confidential reporting hotline: 1-800-ETHICS

Employee Recognition
- Monthly employee spotlight program
- Annual service awards
- Peer recognition system

Wellness Programs
- On-site fitness facility
- Wellness seminars and health screenings
- Mental health resources available

CONTACT INFORMATION

Human Resources: hr@acmecorp.com | Extension 1001
IT Support: support@acmecorp.com | Extension 2000
Facilities: facilities@acmecorp.com | Extension 3000
Employee Hotline: 1-800-ETHICS

This handbook is effective as of January 1, 2024. Policies may be updated as needed with proper notice to employees.
"""

company_faq = """
ACME Corporation Frequently Asked Questions

GENERAL COMPANY INFORMATION

Q: When was ACME Corporation founded?
A: ACME Corporation was founded in 1985 and has been serving customers for over 35 years.

Q: What services does ACME Corporation provide?
A: We provide comprehensive business solutions including consulting, software development, project management, and technical support services.

Q: How many employees does ACME Corporation have?
A: We currently have approximately 500 employees across our three office locations.

Q: What are ACME Corporation's office locations?
A: Our offices are located in New York City (headquarters), Chicago, and San Francisco.

EMPLOYMENT QUESTIONS

Q: How do I apply for a job at ACME Corporation?
A: Job applications can be submitted through our careers page on the company website or by emailing your resume to careers@acmecorp.com.

Q: What is the hiring process like?
A: Our hiring process typically includes: initial application review, phone screening, in-person or video interview, skills assessment (if applicable), and reference checks.

Q: Does ACME Corporation offer internships?
A: Yes, we offer paid internships during summer and winter breaks. Applications are accepted from students in relevant degree programs.

Q: What is the company culture like?
A: ACME Corporation promotes a collaborative, innovative work environment that values work-life balance and professional growth.

BENEFITS AND COMPENSATION

Q: What benefits does ACME Corporation offer?
A: We offer comprehensive benefits including health insurance, dental and vision coverage, 401(k) with company match, paid time off, life insurance, and professional development opportunities.

Q: How often are performance reviews conducted?
A: Performance reviews are conducted annually, with informal check-ins quarterly.

Q: Are there opportunities for advancement?
A: Yes, we prioritize internal promotions and provide career development resources to help employees advance their careers.

Q: Does the company offer flexible work arrangements?
A: Yes, we offer flexible work schedules and remote work options for eligible positions, subject to supervisor approval.

WORKPLACE POLICIES

Q: What is the dress code policy?
A: We maintain a business casual dress code for office employees. Professional attire is required for client meetings and presentations.

Q: Are personal devices allowed to be used for work?
A: Personal devices may be used for work with IT approval and proper security measures installed.

Q: What is the policy on overtime work?
A: Non-exempt employees are eligible for overtime pay at time and a half for hours worked over 40 per week. All overtime must be pre-approved by supervisors.

Q: How are vacation requests handled?
A: Vacation requests should be submitted at least 2 weeks in advance through the employee portal or to your supervisor.

TRAINING AND DEVELOPMENT

Q: What training opportunities are available?
A: We offer various training programs including technical skills development, leadership training, and professional certification support.

Q: Does the company pay for professional development?
A: Yes, each employee has an annual professional development budget of $2,000 for training, conferences, and certifications.

Q: Is tuition reimbursement available?
A: Yes, we offer up to $5,000 annually in tuition reimbursement for job-related education with pre-approval.

TECHNOLOGY AND EQUIPMENT

Q: What equipment is provided to employees?
A: All employees receive necessary equipment including laptop, monitor, keyboard, mouse, and software licenses required for their role.

Q: How do I request IT support?
A: IT support can be reached at support@acmecorp.com or extension 2000. For urgent issues, use the emergency IT hotline.

Q: Are there restrictions on software installation?
A: Yes, all software installations must be approved by IT to ensure security and compliance standards.

HEALTH AND SAFETY

Q: What safety protocols are in place?
A: We maintain comprehensive safety protocols including emergency evacuation procedures, first aid stations, and safety training programs.

Q: Are there wellness programs available?
A: Yes, we offer on-site fitness facilities, wellness seminars, health screenings, and mental health resources.

Q: How do I report a workplace injury?
A: All workplace injuries must be reported immediately to your supervisor and HR. Incident reports must be filed within 24 hours.

COMMUNICATION AND FEEDBACK

Q: How can I provide feedback about company policies?
A: Feedback can be provided through your supervisor, HR, or anonymously through our employee suggestion system.

Q: Is there an open door policy?
A: Yes, we maintain an open door policy where employees can discuss concerns with management at any level.

Q: How does the company communicate important updates?
A: Company updates are communicated through email, the employee portal, team meetings, and quarterly all-hands meetings.

FACILITIES AND SERVICES

Q: What facilities are available at the office?
A: Our offices include conference rooms, break rooms, kitchen facilities, fitness center, and parking (where available).

Q: Are there food services available?
A: We provide complimentary coffee and snacks. Catered lunches are provided during company meetings and events.

Q: How do I reserve conference rooms?
A: Conference rooms can be reserved through the online booking system or by contacting facilities at extension 3000.

CUSTOMER SERVICE

Q: How can customers contact our support team?
A: Customers can reach our support team through our website contact form, email at support@acmecorp.com, or by calling our main number.

Q: What are our customer service hours?
A: Customer service is available Monday through Friday, 8:00 AM to 6:00 PM EST.

Q: How do we handle customer complaints?
A: All customer complaints are taken seriously and are escalated to the appropriate management level for resolution.

For additional questions not covered in this FAQ, please contact Human Resources at hr@acmecorp.com or extension 1001.

Last updated: January 1, 2024
"""

## Demo Implementation

Let's initialize the RAG system and demonstrate its capabilities.

In [None]:
# Initialize the RAG system components
print("Initializing RAG system...")

# Initialize components
vector_store = VectorStore(PINECONE_API_KEY)
document_processor = DocumentProcessor()
rag_system = RAGSystem(OPENAI_API_KEY, vector_store, document_processor)

print("RAG system initialized successfully!")

In [None]:
# Add sample documents to the knowledge base
print("Adding sample documents to knowledge base...")

# Add business documents
rag_system.add_document(business_policy, "business_policy.txt")
rag_system.add_document(employee_handbook, "employee_handbook.txt")
rag_system.add_document(company_faq, "company_faq.txt")

print("Sample documents added successfully!")

# Display system metrics
metrics = rag_system.get_metrics()
print(f"\nSystem Metrics:")
print(f"Total Documents: {metrics['total_documents']}")
print(f"Total Chunks: {metrics['total_chunks']}")
print(f"Queries Processed: {metrics['queries_processed']}")

## Demo Queries

Let's test the RAG system with various business-related queries.

In [None]:
# Test queries
test_queries = [
    "What is the company's vacation policy?",
    "What are the dress code requirements?",
    "How do I report a workplace injury?",
    "What benefits does the company offer?",
    "What is the process for expense reports?",
    "How can I apply for a job at ACME Corporation?"
]

print("Testing RAG system with sample queries...\n")

for i, query in enumerate(test_queries, 1):
    print(f"=== Query {i}: {query} ===")
    
    # Process query
    result = rag_system.query(query, top_k=3)
    
    # Display answer
    print(f"Answer: {result['answer']}")
    
    # Display sources
    print(f"\nSources:")
    for j, source in enumerate(result['sources'], 1):
        print(f"  {j}. {source['source']} (Score: {source['score']:.3f})")
        print(f"     Content: {source['content'][:100]}...")
    
    print("\n" + "="*80 + "\n")

In [None]:
# Interactive query interface
def interactive_query():
    """
    Interactive query interface for testing the RAG system
    """
    print("Interactive RAG Query Interface")
    print("Type 'quit' to exit\n")
    
    while True:
        query = input("Enter your question about ACME Corporation: ")
        
        if query.lower() == 'quit':
            print("Goodbye!")
            break
        
        if not query.strip():
            print("Please enter a valid question.\n")
            continue
        
        try:
            # Process query
            result = rag_system.query(query, top_k=3)
            
            # Display results
            print(f"\nAnswer: {result['answer']}")
            
            if result['sources']:
                print(f"\nSources:")
                for i, source in enumerate(result['sources'], 1):
                    print(f"  {i}. {source['source']} (Relevance: {source['score']:.3f})")
            
            print("\n" + "-"*60 + "\n")
            
        except Exception as e:
            print(f"Error processing query: {e}\n")

# Run interactive demo
interactive_query()

## Performance Analysis

Let's analyze the performance of our RAG system.

In [None]:
# Performance analysis
import time

def analyze_performance():
    """
    Analyze RAG system performance with various queries
    """
    performance_queries = [
        "What is the company mission?",
        "How do I request time off?",
        "What are the IT policies?",
        "Tell me about employee benefits",
        "What is the disciplinary procedure?"
    ]
    
    print("Performance Analysis\n")
    print(f"{'Query':<40} {'Time (s)':<10} {'Sources':<10} {'Answer Length':<15}")
    print("-" * 75)
    
    total_time = 0
    total_queries = len(performance_queries)
    
    for query in performance_queries:
        start_time = time.time()
        
        result = rag_system.query(query, top_k=3)
        
        end_time = time.time()
        query_time = end_time - start_time
        total_time += query_time
        
        # Truncate query for display
        display_query = query if len(query) <= 37 else query[:37] + "..."
        
        print(f"{display_query:<40} {query_time:<10.2f} {len(result['sources']):<10} {len(result['answer']):<15}")
    
    print("-" * 75)
    print(f"Average query time: {total_time/total_queries:.2f} seconds")
    print(f"Total queries processed: {rag_system.get_metrics()['queries_processed']}")
    
    return total_time / total_queries

# Run performance analysis
avg_time = analyze_performance()

## Conclusion

This notebook demonstrates a complete RAG implementation for business QA systems. The system includes:

### Key Features:
1. **Document Processing**: Intelligent text cleaning and chunking
2. **Vector Storage**: Efficient semantic search using Pinecone
3. **Embedding Generation**: OpenAI's text-embedding-ada-002 model
4. **Answer Generation**: GPT-4o for contextual responses
5. **Source Attribution**: Transparent sourcing of information

### System Performance:
- **Average Query Time**: Fast response times for business queries
- **Scalability**: Handles multiple documents and concurrent queries
- **Accuracy**: Contextually relevant answers with source attribution

### Business Applications:
- Employee onboarding and training
- HR policy inquiries
- Compliance and procedure questions
- Customer service automation
- Knowledge management

### Next Steps:
1. Deploy as a web application using Streamlit
2. Add support for multiple file formats (PDF, DOCX)
3. Implement user authentication and access control
4. Add analytics and usage tracking
5. Optimize for production deployment

This RAG system provides a solid foundation for building enterprise-grade business QA applications.