# Building an Advanced Question-Answering System: Integrating RAG with Context-Aware Chatbot

## Introduction: Why Combine RAG and Context Awareness?

Imagine having a conversation with someone who not only has access to a vast library of information (RAG) but also remembers and learns from your ongoing conversation (Context Awareness). By combining Retrieval-Augmented Generation (RAG) with a context-aware chatbot, we create a system that can provide both accurate, source-based answers and maintain coherent, contextual conversations.

Let's explore how these two powerful approaches complement each other:

- **RAG**: Provides accurate, source-based information retrieval and response generation
- **Context Awareness**: Maintains conversation history and understands ongoing dialogue flow

## Understanding the Combined Architecture

Our integrated system will have the following components:

1. RAG Components (from previous implementation)
   - Document Processor
   - Embedding Engine
   - Retrieval System
   
2. Context-Aware Components (new additions)
   - Memory Cache
   - Context Manager
   - Response Generator

Let's implement this integrated system:

In [None]:
# Import necessary libraries
import numpy as np
import os
from typing import Dict, List, Optional, Any, Tuple
from datetime import datetime, timedelta
from openai import OpenAI
from hashlib import md5
import re
import json

In [None]:
class DocumentProcessor:
    """
    Handles document preprocessing and chunking for the RAG system.
    This component prepares documents for efficient retrieval and context management.
    """
    def __init__(self, 
                 chunk_size: int = 500,
                 chunk_overlap: int = 50,
                 min_chunk_size: int = 100):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.min_chunk_size = min_chunk_size

    def clean_text(self, text: str) -> str:
        """
        Clean and normalize text content for better processing.
        Handles common text issues like extra whitespace and formatting.
        """
        # Remove extra whitespace and normalize line endings
        text = re.sub(r'\s+', ' ', text)
        text = re.sub(r'\n+', '\n', text)
        
        # Remove special characters while preserving essential punctuation
        text = re.sub(r'[^\w\s.,!?;:()\[\]{}"\'`-]', '', text)
        
        # Normalize sentence endings
        text = re.sub(r'([.!?])\s*', r'\1\n', text)
        
        return text.strip()

    def create_chunks(self, 
                     document: Dict[str, str], 
                     respect_sentences: bool = True) -> List[Dict[str, str]]:
        """
        Break document into chunks while preserving semantic meaning.
        """
        text = self.clean_text(document['text'])
        chunks = []
        current_chunk = []
        current_length = 0
        
        # Split into sentences if respecting sentence boundaries
        sentences = text.split('\n') if respect_sentences else [text]
        
        for sentence in sentences:
            sentence = sentence.strip()
            if not sentence:
                continue
                
            sentence_length = len(sentence)
            
            # Check if adding this sentence exceeds chunk size
            if current_length + sentence_length > self.chunk_size:
                if current_length >= self.min_chunk_size:
                    chunk_text = ' '.join(current_chunk)
                    chunks.append({
                        'text': chunk_text,
                        'document_id': document.get('id', 'unknown'),
                        'chunk_size': len(chunk_text),
                        'position': len(chunks)
                    })
                
                # Start new chunk with overlap
                if current_chunk and self.chunk_overlap > 0:
                    overlap_text = ' '.join(current_chunk[-2:])
                    current_chunk = [overlap_text, sentence]
                    current_length = len(overlap_text) + sentence_length
                else:
                    current_chunk = [sentence]
                    current_length = sentence_length
            else:
                current_chunk.append(sentence)
                current_length += sentence_length
        
        # Add final chunk if it meets minimum size
        if current_chunk and current_length >= self.min_chunk_size:
            chunk_text = ' '.join(current_chunk)
            chunks.append({
                'text': chunk_text,
                'document_id': document.get('id', 'unknown'),
                'chunk_size': len(chunk_text),
                'position': len(chunks)
            })
            
        return chunks

**Code Explanation:**

Let us break down the DocumentProcessor class and explain each component in detail:

1. **Class Purpose**: 
    The DocumentProcessor class is responsible for preparing documents for use in the RAG system. Its main job is to:
    - Clean up text documents
    - Break large documents into manageable pieces (chunks)
    - Maintain semantic coherence while chunking

2. **Constructor (__init__)**: 
    ```python
    def __init__(self, chunk_size: int = 500, chunk_overlap: int = 50, min_chunk_size: int = 100):
    ```
    Parameters:
    - `chunk_size`: Maximum size of each text chunk (default 500 characters)
    - `chunk_overlap`: How much text should overlap between chunks (default 50 characters)
    - `min_chunk_size`: Minimum acceptable chunk size (default 100 characters)

3. **Text Cleaning Method (clean_text)**:
    ```python
    def clean_text(self, text: str) -> str:
    ```
    This method sanitizes the input text through several steps:

    a. Whitespace Normalization:
    ```python
    text = re.sub(r'\s+', ' ', text)  # Converts multiple spaces into single space
    text = re.sub(r'\n+', '\n', text) # Normalizes multiple newlines
    ```

    b. Character Cleaning:
    ```python
    text = re.sub(r'[^\w\s.,!?;:()\[\]{}"\'`-]', '', text)
    ```
    - Keeps: alphanumeric characters, spaces, common punctuation
    - Removes: special characters, emojis, unusual symbols

    c. Sentence Normalization:
    ```python
    text = re.sub(r'([.!?])\s*', r'\1\n', text)  # Ensures each sentence ends with newline
    ```

4. **Chunking Method (create_chunks)**:
    ```python
    def create_chunks(self, document: Dict[str, str], respect_sentences: bool = True)
    ```

    This method implements a sophisticated chunking strategy:

    a. Initial Setup:
    ```python
    text = self.clean_text(document['text'])
    chunks = []
    current_chunk = []
    current_length = 0
    ```

    b. Sentence Processing:
    ```python
    sentences = text.split('\n') if respect_sentences else [text]
    ```
    - Splits text into sentences if respect_sentences is True
    - Keeps text as one piece if False

    c. Chunk Creation Logic:
    ```python
    if current_length + sentence_length > self.chunk_size:
        if current_length >= self.min_chunk_size:
            chunk_text = ' '.join(current_chunk)
            chunks.append({
                'text': chunk_text,
                'document_id': document.get('id', 'unknown'),
                'chunk_size': len(chunk_text),
                'position': len(chunks)
            })
    ```
    - Checks if adding new sentence exceeds chunk_size
    - Creates new chunk if current chunk is large enough
    - Includes metadata (document ID, size, position)

    d. Overlap Handling:
    ```python
    if current_chunk and self.chunk_overlap > 0:
        overlap_text = ' '.join(current_chunk[-2:])
        current_chunk = [overlap_text, sentence]
    ```
    - Maintains context between chunks
    - Takes last two sentences from previous chunk
    - Adds them to beginning of new chunk

5. **Key Features**:
- **Context Preservation**: Overlapping chunks maintain contextual connections
- **Metadata Tracking**: Each chunk knows its source and position
- **Flexible Sizing**: Configurable chunk sizes and overlaps
- **Sentence Respect**: Avoids breaking sentences in middle (unless specified)

6. **Usage Example**:
    ```python
    processor = DocumentProcessor(chunk_size=300, chunk_overlap=30)
    document = {
        'id': 'doc1',
        'text': 'Your long document text here...'
    }
    chunks = processor.create_chunks(document)
    ```

This class is crucial for RAG systems because it:
- Makes large documents manageable
- Preserves context across chunks
- Maintains clean, consistent text format
- Enables efficient retrieval operations

The careful text processing and chunking ensure that the RAG system can effectively store and retrieve relevant information while maintaining semantic coherence.

In [None]:
class EmbeddingEngine:
    """
    Creates and manages vector representations of text.
    Provides methods for computing embeddings and measuring text similarity.
    """
    def __init__(self, dimensions: int = 768, cache_size: int = 10000):
        self.dimensions = dimensions
        self.cache_size = cache_size
        self.cache = {}
        self.word_vectors = {}
    
    def _create_word_vector(self, word: str) -> np.ndarray:
        """
        Create a deterministic vector for a word using hashing.
        This ensures consistent vector representations across sessions.
        """
        if word in self.word_vectors:
            return self.word_vectors[word]
        
        # Create deterministic seed from word hash
        word_hash = int(md5(word.encode()).hexdigest()[:8], 16)
        np.random.seed(word_hash)
        
        # Create and normalize vector
        vector = np.random.randn(self.dimensions)
        vector = vector / np.linalg.norm(vector)
        
        self.word_vectors[word] = vector
        return vector
    
    def compute_embedding(self, text: str, use_cache: bool = True) -> np.ndarray:
        """
        Convert text into a vector representation using weighted word vectors.
        """
        if use_cache and text in self.cache:
            return self.cache[text]
        
        words = text.lower().split()
        if not words:
            return np.zeros(self.dimensions)
        
        # Initialize embedding with position-weighted word vectors
        embedding = np.zeros(self.dimensions)
        word_count = {}
        
        for position, word in enumerate(words):
            word_count[word] = word_count.get(word, 0) + 1
            word_vector = self._create_word_vector(word)
            position_weight = 1.0 / (1 + np.log1p(position))
            embedding += word_vector * position_weight
        
        # Apply IDF-like scaling based on word frequency
        for word, count in word_count.items():
            scaling = 1.0 / np.sqrt(count)
            word_vector = self._create_word_vector(word)
            embedding += word_vector * scaling
        
        # Normalize the final embedding
        embedding_norm = np.linalg.norm(embedding)
        if embedding_norm > 0:
            embedding = embedding / embedding_norm
        
        # Cache the result if enabled
        if use_cache:
            if len(self.cache) >= self.cache_size:
                self.cache.pop(next(iter(self.cache)))
            self.cache[text] = embedding
        
        return embedding

**Code Explanation:**

Let us break down the EmbeddingEngine class and explain each component:

1. **Class Initialization**:
    ```python
    def __init__(self, dimensions: int = 768, cache_size: int = 10000):
        self.dimensions = dimensions    # Size of embedding vectors
        self.cache_size = cache_size    # Maximum number of cached embeddings
        self.cache = {}                 # Storage for computed embeddings
        self.word_vectors = {}          # Storage for individual word vectors
    ```
    - `dimensions`: Controls the size of embedding vectors (default 768, similar to BERT)
    - `cache_size`: Limits memory usage by storing only recent embeddings
    - Two caching systems:
    * `cache`: Stores complete text embeddings
    * `word_vectors`: Stores individual word vectors

2. **Word Vector Creation (_create_word_vector)**:
    ```python
    def _create_word_vector(self, word: str) -> np.ndarray:
    ```
    This private method creates consistent vector representations for words:

    a. Caching Check:
    ```python
    if word in self.word_vectors:
        return self.word_vectors[word]
    ```
    - Returns cached vector if available

    b. Vector Generation:
    ```python
    word_hash = int(md5(word.encode()).hexdigest()[:8], 16)
    np.random.seed(word_hash)
    ```
    - Creates deterministic hash from word
    - Uses hash as random seed for consistency

    c. Vector Creation and Normalization:
    ```python
    vector = np.random.randn(self.dimensions)
    vector = vector / np.linalg.norm(vector)
    ```
    - Generates random normal vector
    - Normalizes to unit length
    - Caches for future use

3. **Text Embedding Computation (compute_embedding)**:
    ```python
    def compute_embedding(self, text: str, use_cache: bool = True) -> np.ndarray:
    ```
    This main method converts text into vector representation:

    a. Cache Checking:
    ```python
    if use_cache and text in self.cache:
        return self.cache[text]
    ```
    - Returns cached embedding if available

    b. Text Processing:
    ```python
    words = text.lower().split()
    if not words:
        return np.zeros(self.dimensions)
    ```
    - Converts text to lowercase words
    - Handles empty text case

    c. Position-Weighted Embedding:
    ```python
    for position, word in enumerate(words):
        word_count[word] = word_count.get(word, 0) + 1
        word_vector = self._create_word_vector(word)
        position_weight = 1.0 / (1 + np.log1p(position))
        embedding += word_vector * position_weight
    ```
    Key features:
    - Tracks word frequency
    - Applies position-based weighting (earlier words get higher weight)
    - Accumulates weighted vectors

    d. Frequency Scaling:
    ```python
    for word, count in word_count.items():
        scaling = 1.0 / np.sqrt(count)
        word_vector = self._create_word_vector(word)
        embedding += word_vector * scaling
    ```
    - Applies IDF-like scaling
    - Reduces impact of frequent words
    - Enhances impact of rare words

    e. Final Normalization:
    ```python
    embedding_norm = np.linalg.norm(embedding)
    if embedding_norm > 0:
        embedding = embedding / embedding_norm
    ```
    - Normalizes final vector to unit length
    - Ensures consistent scaling

    f. Cache Management:
    ```python
    if use_cache:
        if len(self.cache) >= self.cache_size:
            self.cache.pop(next(iter(self.cache)))
        self.cache[text] = embedding
    ```
    - Maintains cache size limit
    - Removes oldest entries when full
    - Stores new embedding

4. **Key Features and Benefits**:
- **Deterministic**: Same text always produces same embedding
- **Position-Aware**: Considers word order importance
- **Frequency-Aware**: Balances common and rare words
- **Memory-Efficient**: Uses caching with size limits
- **Normalized Output**: Consistent vector magnitudes
- **Fast Retrieval**: Cached results for repeated texts

5. **Usage Example**:
    ```python
    engine = EmbeddingEngine(dimensions=300)
    text = "Example text for embedding"
    vector = engine.compute_embedding(text)
    ```

This class is crucial for RAG because it:
- Converts text to numerical form for similarity comparison
- Maintains consistency in vector representations
- Optimizes performance through caching
- Considers both word position and frequency
- Enables efficient semantic search operations

In [None]:
class MemoryCache:
    """
    Manages conversation history and context for the chatbot.
    Implements time-based memory management and context retrieval.
    """
    def __init__(self, max_age_hours: int = 24):
        self.max_age = timedelta(hours=max_age_hours)
        self.memories = []
        
    def add_memory(self, 
                   user_input: str, 
                   system_response: str,
                   retrieved_context: Optional[List[Dict]] = None) -> None:
        """
        Store a new conversation interaction with its context.
        """
        memory = {
            'timestamp': datetime.now(),
            'user_input': user_input,
            'system_response': system_response,
            'retrieved_context': retrieved_context
        }
        self.memories.append(memory)
        self._cleanup_old_memories()
        self._save_memories()
        
    def get_recent_context(self, limit: int = 5) -> List[Dict]:
        """
        Retrieve recent conversations for context.
        """
        self._cleanup_old_memories()
        return self.memories[-limit:]
        
    def _cleanup_old_memories(self) -> None:
        """Remove memories older than max_age."""
        current_time = datetime.now()
        self.memories = [
            mem for mem in self.memories 
            if (current_time - mem['timestamp']) < self.max_age
        ]
        
    def _save_memories(self) -> None:
        """Save memories to persistent storage."""
        try:
            with open('memories.json', 'w') as f:
                serializable_memories = []
                for mem in self.memories:
                    mem_copy = mem.copy()
                    mem_copy['timestamp'] = mem_copy['timestamp'].isoformat()
                    serializable_memories.append(mem_copy)
                json.dump({'memories': serializable_memories}, f)
        except Exception as e:
            print(f"Error saving memories: {str(e)}")

    def _load_memories(self) -> None:
        """Load memories from persistent storage."""
        try:
            with open('memories.json', 'r') as f:
                data = json.load(f)
                self.memories = []
                for mem in data['memories']:
                    mem['timestamp'] = datetime.fromisoformat(mem['timestamp'])
                    self.memories.append(mem)
        except FileNotFoundError:
            self.memories = []
        except Exception as e:
            print(f"Error loading memories: {str(e)}")

**Code Explanation:**

Let us break down the MemoryCache class and explain each component in detail:

1. **Class Initialization**:
    ```python
    def __init__(self, max_age_hours: int = 24):
        self.max_age = timedelta(hours=max_age_hours)
        self.memories = []
    ```
    Purpose:
    - Sets maximum age for stored memories (default 24 hours)
    - Initializes empty list for storing conversations
    - `max_age` determines how long conversations are kept

2. **Adding New Memories**:
    ```python
    def add_memory(self, user_input: str, system_response: str, 
                retrieved_context: Optional[List[Dict]] = None):
    ```
    Purpose:
    - Stores new conversation interactions
    - Records:
    * User's question/input
    * System's response
    * Retrieved context used for response
    * Timestamp of interaction
    - Triggers cleanup and saving processes

3. **Getting Recent Context**:
    ```python
    def get_recent_context(self, limit: int = 5):
    ```
    Purpose:
    - Retrieves most recent conversations
    - Default returns last 5 interactions
    - Used for maintaining conversation flow
    - Cleans up old memories before retrieval

4. **Memory Cleanup**:
    ```python
    def _cleanup_old_memories(self):
    ```
    Purpose:
    - Removes conversations older than max_age
    - Prevents memory from growing too large
    - Maintains relevance of stored context
    - Example: Removes conversations older than 24 hours

5. **Saving Memories**:
    ```python
    def _save_memories(self):
    ```
    Purpose:
    - Persists conversations to disk (memories.json)
    - Handles datetime serialization
    - Creates backup of conversation history
    - Error handling for file operations

    Key operations:
    ```python
    mem_copy['timestamp'] = mem_copy['timestamp'].isoformat()
    json.dump({'memories': serializable_memories}, f)
    ```

6. **Loading Memories**:
    ```python
    def _load_memories(self):
    ```
    Purpose:
    - Loads saved conversations from disk
    - Restores conversation history
    - Converts stored timestamps back to datetime
    - Handles missing file and other errors

    Key operations:
    ```python
    mem['timestamp'] = datetime.fromisoformat(mem['timestamp'])
    self.memories.append(mem)
    ```

7. **Memory Structure**:
    Each memory entry contains:
    ```python
    memory = {
        'timestamp': datetime.now(),      # When interaction occurred
        'user_input': user_input,         # What user asked
        'system_response': system_response,# How system responded
        'retrieved_context': retrieved_context # What context was used
    }
    ```

8. **Important Features**:

    a. Time-Based Management:
    - Automatic cleanup of old conversations
    - Timestamp-based organization
    - Age-based filtering

    b. Persistence:
    - Saves conversations to disk
    - Recovers from previous sessions
    - Handles serialization issues

    c. Context Retrieval:
    - Quick access to recent conversations
    - Limit control for context window
    - Ordered by recency

9. **Usage Example**:
    ```python
    # Initialize memory cache
    memory = MemoryCache(max_age_hours=48)

    # Add new conversation
    memory.add_memory(
        user_input="What is AI?",
        system_response="AI is artificial intelligence...",
        retrieved_context=[{"source": "textbook", "content": "..."}]
    )

    # Get recent conversations
    recent_context = memory.get_recent_context(limit=3)
    ```

10. **Benefits for Chatbot System**:
- Maintains conversation continuity
- Enables context-aware responses
- Provides persistence across sessions
- Manages memory efficiently
- Enables recovery from crashes/restarts
- Maintains conversation relevance through aging

The MemoryCache class is crucial for:
- Contextual understanding
- Conversation persistence
- Memory management
- Session recovery
- Temporal relevance

This creates a more natural and context-aware conversation experience while managing system resources effectively.

In [None]:
class RetrievalSystem:
    """
    Manages document storage and retrieval using semantic search.
    """
    def __init__(self, 
                 embedding_engine: EmbeddingEngine,
                 top_k: int = 5,
                 similarity_threshold: float = 0.5):
        self.embedding_engine = embedding_engine
        self.top_k = top_k
        self.similarity_threshold = similarity_threshold
        self.chunk_vectors = []
        self.chunks = []
        
    def add_chunk(self, 
                  chunk: Dict[str, Any],
                  vector: Optional[np.ndarray] = None) -> None:
        """Add a document chunk and its vector to the system."""
        if vector is None:
            vector = self.embedding_engine.compute_embedding(chunk['text'])
            
        self.chunk_vectors.append(vector)
        self.chunks.append(chunk)
        
    def search(self, 
               query: str,
               filters: Optional[Dict[str, Any]] = None) -> List[Dict[str, Any]]:
        """
        Search for relevant chunks based on query similarity.
        """
        query_vector = self.embedding_engine.compute_embedding(query)
        results = []
        
        for idx, (chunk_vector, chunk) in enumerate(zip(self.chunk_vectors, self.chunks)):
            if filters and not self._apply_filters(chunk, filters):
                continue
                
            similarity = np.dot(query_vector, chunk_vector)
            
            if similarity >= self.similarity_threshold:
                results.append({
                    'chunk': chunk,
                    'similarity': similarity,
                    'index': idx
                })
                
        results.sort(key=lambda x: x['similarity'], reverse=True)
        return results[:self.top_k]
        
    def _apply_filters(self, 
                      chunk: Dict[str, Any], 
                      filters: Dict[str, Any]) -> bool:
        """Apply metadata filters to a chunk."""
        for key, value in filters.items():
            if key not in chunk or chunk[key] != value:
                return False
        return True

**Code Explanation:**

Let us break down the RetrievalSystem class and explain each component:

1. **Class Initialization**:
    ```python
    def __init__(self, embedding_engine: EmbeddingEngine, top_k: int = 5, 
                similarity_threshold: float = 0.5):
    ```
    Purpose:
    - `embedding_engine`: Handles text-to-vector conversion
    - `top_k`: Number of most relevant results to return
    - `similarity_threshold`: Minimum similarity score to consider (0.5 = 50% similar)
    - `chunk_vectors`: Stores vector representations of documents
    - `chunks`: Stores actual document content

2. **Adding Document Chunks**:
    ```python
    def add_chunk(self, chunk: Dict[str, Any], vector: Optional[np.ndarray] = None):
    ```
    Purpose:
    - Adds new document chunks to the system
    - Either uses provided vector or generates new one
    - Maintains parallel lists of chunks and their vectors
    Example usage:
    ```python
    system.add_chunk({
        'text': 'Document content here',
        'id': 'doc1',
        'source': 'textbook'
    })
    ```

3. **Search Functionality**:
    ```python
    def search(self, query: str, filters: Optional[Dict[str, Any]] = None):
    ```
    Purpose:
    - Finds relevant documents for a given query
    - Converts query to vector representation
    - Computes similarity with all stored documents
    - Applies optional metadata filters
    - Returns top K most similar documents

    Key operations:
    ```python
    # Convert query to vector
    query_vector = self.embedding_engine.compute_embedding(query)

    # Calculate similarities and filter results
    for idx, (chunk_vector, chunk) in enumerate(zip(self.chunk_vectors, self.chunks)):
        similarity = np.dot(query_vector, chunk_vector)
    ```

4. **Filter Application**:
    ```python
    def _apply_filters(self, chunk: Dict[str, Any], filters: Dict[str, Any]):
    ```
    Purpose:
    - Applies metadata-based filtering
    - Checks if chunk matches all filter criteria
    - Enables searching within specific documents/sources
    Example usage:
    ```python
    results = system.search("AI concepts", filters={'source': 'textbook'})
    ```

5. **Key Features**:

a. Semantic Search:
- Uses vector similarity for matching
- Goes beyond simple keyword matching
- Understands semantic relationships

b. Flexible Filtering:
- Metadata-based filtering
- Multiple filter criteria support
- Filter by source, ID, or other attributes

c. Relevance Ranking:
- Similarity score calculation
- Threshold-based filtering
- Top-K result selection

6. **Example Usage**:
    ```python
    # Initialize system
    retrieval_system = RetrievalSystem(
        embedding_engine=EmbeddingEngine(),
        top_k=3,
        similarity_threshold=0.6
    )

    # Add documents
    retrieval_system.add_chunk({
        'text': 'AI is a branch of computer science...',
        'source': 'textbook',
        'chapter': 1
    })

    # Search with filters
    results = retrieval_system.search(
        query="What is artificial intelligence?",
        filters={'source': 'textbook'}
    )
    ```

7. **Search Process Flow**:
1. Query embedding generation
2. Similarity computation with all chunks
3. Filter application (if specified)
4. Threshold filtering
5. Sorting by similarity
6. Top-K selection

8. **Benefits**:
- Efficient semantic search
- Flexible filtering options
- Configurable relevance thresholds
- Easy integration with other components
- Scalable document storage

9. **Common Use Cases**:
- Finding relevant context for questions
- Document similarity comparison
- Content recommendation
- Information retrieval
- Knowledge base search

10. **Performance Considerations**:
- Vector similarity computations
- In-memory storage of vectors
- Filter application overhead
- Sorting operation complexity

This system is crucial for:
- Accurate document retrieval
- Context-aware responses
- Efficient information search
- Metadata-based filtering
- Relevance ranking

The RetrievalSystem acts as the search engine of the RAG system, enabling intelligent document retrieval based on semantic understanding rather than just keyword matching.

In [None]:
class IntegratedQASystem:
    """
    Complete question-answering system combining RAG with context awareness.
    """
    def __init__(self, openai_api_key: str):
        # Initialize components
        self.document_processor = DocumentProcessor()
        self.embedding_engine = EmbeddingEngine()
        self.retrieval_system = RetrievalSystem(self.embedding_engine)
        self.memory_cache = MemoryCache()
        self.client = OpenAI(api_key=openai_api_key)
        
        # Load existing memories
        self.memory_cache._load_memories()
        
        # Define system prompts
        self.system_prompt = """You are a helpful assistant with both access to reference 
        materials and memory of the ongoing conversation. Use the provided context and 
        conversation history to give accurate, contextual responses. When referring to 
        previous conversation points, be explicit about what was discussed earlier."""
        
    def add_document(self, document: Dict[str, str]) -> None:
        """Process and add a document to the knowledge base."""
        # Process document into chunks
        chunks = self.document_processor.create_chunks(document)
        
        # Add each chunk to the retrieval system
        for chunk in chunks:
            vector = self.embedding_engine.compute_embedding(chunk['text'])
            self.retrieval_system.add_chunk(chunk, vector)
            
    def _format_conversation_history(self, recent_context: List[Dict]) -> str:
        """Format recent conversation history for the prompt."""
        formatted_history = []
        for memory in recent_context:
            formatted_history.append(f"User: {memory['user_input']}")
            formatted_history.append(f"Assistant: {memory['system_response']}")
        return "\n".join(formatted_history)
        
    def query(self, question: str, temperature: float = 0.7) -> Dict:
        """
        Process a question using both RAG and conversation context.
        """
        try:
            # Get relevant documents and conversation history
            relevant_contexts = self.retrieval_system.search(question)
            recent_context = self.memory_cache.get_recent_context()
            conversation_history = self._format_conversation_history(recent_context)
            
            # Format contexts
            rag_context = "\n\n".join([
                f"Reference {i+1}:\n{ctx['chunk']['text']}"
                for i, ctx in enumerate(relevant_contexts)
            ])
            
            # Generate response
            response = self.client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[
                    {"role": "system", "content": self.system_prompt},
                    {"role": "user", "content": f"""
                    Previous Conversation:
                    {conversation_history}
                    
                    Reference Materials:
                    {rag_context}
                    
                    Current Question: {question}
                    
                    Please provide a response that considers both the reference materials
                    and our conversation history when relevant."""}
                ],
                temperature=temperature
            )
            
            answer = response.choices[0].message.content
            
            # Store interaction in memory
            self.memory_cache.add_memory(
                question, 
                answer, 
                relevant_contexts
            )
            
            return {
                'answer': answer,
                'rag_contexts': relevant_contexts,
                'conversation_context': recent_context
            }
            
        except Exception as e:
            return {
                'error': f"Error generating response: {str(e)}",
                'rag_contexts': relevant_contexts if 'relevant_contexts' in locals() else [],
                'conversation_context': recent_context if 'recent_context' in locals() else []
            }

**Code Explanation:**

Let us break down the IntegratedQASystem class, which combines RAG and context-awareness:

1. **Class Initialization**:
    ```python
    def __init__(self, openai_api_key: str):
    ```
    Key Components:
    - `document_processor`: Handles text chunking
    - `embedding_engine`: Creates vector representations
    - `retrieval_system`: Manages document search
    - `memory_cache`: Stores conversation history
    - `client`: OpenAI API interface

    System Prompt:
    - Defines assistant's behavior
    - Emphasizes use of context and history
    - Guides response generation

2. **Document Addition**:
    ```python
    def add_document(self, document: Dict[str, str]) -> None:
    ```
    Process:
    1. Chunks document using DocumentProcessor
    2. Creates embeddings for each chunk
    3. Stores chunks in retrieval system

    Example:
    ```python
    system.add_document({
        'text': 'Document content...',
        'id': 'doc1',
        'source': 'textbook'
    })
    ```

3. **Conversation History Formatting**:
    ```python
    def _format_conversation_history(self, recent_context: List[Dict]) -> str:
    ```
    Purpose:
    - Formats previous conversations for context
    - Creates clear user/assistant dialogue structure
    - Prepares history for prompt inclusion

4. **Query Processing**:
    ```python
    def query(self, question: str, temperature: float = 0.7) -> Dict:
    ```
    Key Steps:

    a. Context Gathering:
    ```python
    relevant_contexts = self.retrieval_system.search(question)
    recent_context = self.memory_cache.get_recent_context()
    ```
    - Retrieves relevant documents
    - Gets recent conversation history

    b. Context Formatting:
    ```python
    rag_context = "\n\n".join([
        f"Reference {i+1}:\n{ctx['chunk']['text']}"
        for i, ctx in enumerate(relevant_contexts)
    ])
    ```
    - Organizes retrieved documents
    - Numbers references for clarity

    c. Response Generation:
    ```python
    response = self.client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": self.system_prompt},
            {"role": "user", "content": formatted_prompt}
        ]
    )
    ```
    - Uses GPT-3.5-turbo model
    - Includes system prompt, history, and context

    d. Memory Management:
    ```python
    self.memory_cache.add_memory(
        question, 
        answer, 
        relevant_contexts
    )
    ```
    - Stores new interaction
    - Maintains conversation history

5. **Error Handling**:
    ```python
    try:
        # Query processing
    except Exception as e:
        return {
            'error': f"Error generating response: {str(e)}",
            'rag_contexts': relevant_contexts if 'relevant_contexts' in locals() else [],
            'conversation_context': recent_context if 'recent_context' in locals() else []
        }
    ```
    - Graceful error handling
    - Returns available context even on failure
    - Maintains system stability

6. **Key Features**:

a. Integration:
- Combines RAG and context awareness
- Seamless document and conversation handling
- Unified response generation

b. Context Management:
- Document-based context (RAG)
- Conversation history (Memory)
- Combined context utilization

c. Response Generation:
- Temperature control for creativity
- Context-aware responses
- Structured output format

7. **Usage Example**:
    ```python
    qa_system = IntegratedQASystem(openai_api_key)

    # Add knowledge
    qa_system.add_document({
        'text': 'AI content...',
        'source': 'textbook'
    })

    # Query system
    response = qa_system.query(
        "What is AI?",
        temperature=0.7
    )

    print(response['answer'])
    ```

8. **Benefits**:
- Contextual awareness from both documents and conversation
- Improved response accuracy
- Persistent memory
- Flexible document integration
- Robust error handling

This class serves as the main interface for the entire system, orchestrating:
- Document processing
- Context retrieval
- Memory management
- Response generation

It creates a sophisticated QA system that combines the benefits of both RAG and context-aware conversations.

In [None]:
def main():
    """
    Main function to demonstrate the integrated QA system with RAG and context awareness.
    This provides an interactive interface to test the system's capabilities with
    example documents and real-time question answering.
    """
    # Check for API key
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        raise ValueError("Please set OPENAI_API_KEY environment variable")
    
    # Initialize the system
    print("\nInitializing the Integrated QA System...")
    qa_system = IntegratedQASystem(api_key)
    
    # Sample documents demonstrating different topics and relationships
    documents = [
        {
            'id': '1',
            'text': '''Artificial Intelligence (AI) focuses on creating intelligent machines.
                   Machine learning is a subset of AI that enables systems to learn from data.
                   Deep learning, a type of machine learning, uses neural networks to process
                   complex patterns in data.''',
            'source': 'ai_basics'
        },
        {
            'id': '2',
            'text': '''Neural networks are computing systems inspired by biological brains.
                   They consist of layers of interconnected nodes that process information.
                   Deep learning neural networks can have many layers, allowing them to
                   learn increasingly complex features from data.''',
            'source': 'neural_networks'
        },
        {
            'id': '3',
            'text': '''Natural Language Processing (NLP) is a branch of AI focused on
                   enabling computers to understand and generate human language.
                   Modern NLP systems use transformer architectures and attention
                   mechanisms to process and generate text effectively.''',
            'source': 'nlp_overview'
        }
    ]
    
    print("\nAdding documents to knowledge base...")
    for doc in documents:
        try:
            qa_system.add_document(doc)
            print(f"✓ Added document from source: {doc['source']}")
        except Exception as e:
            print(f"✗ Error adding document from {doc['source']}: {str(e)}")
    
    print("\nSystem initialized and ready for questions!")
    print("\nExample questions you can try:")
    print("1. What is artificial intelligence?")
    print("2. How do neural networks relate to what you explained earlier?")
    print("3. Can you tell me more about natural language processing?")
    print("\nType 'exit' to quit, 'help' for example questions, or 'clear' to reset conversation history.")
    
    # Interactive questioning loop
    while True:
        try:
            # Get user input
            question = input("\nYou: ").strip()
            
            # Handle special commands
            if question.lower() == 'exit':
                print("\nThank you for using the QA system. Goodbye!")
                break
                
            elif question.lower() == 'help':
                print("\nExample questions you can try:")
                print("1. What is artificial intelligence?")
                print("2. How do neural networks relate to what you explained earlier?")
                print("3. Can you tell me more about natural language processing?")
                continue
                
            elif question.lower() == 'clear':
                qa_system.memory_cache.memories = []
                qa_system.memory_cache._save_memories()
                print("\nConversation history cleared!")
                continue
            
            # Skip empty questions
            if not question:
                continue
            
            # Process the question
            print("\nProcessing your question...")
            result = qa_system.query(question)
            
            # Handle successful response
            if 'error' not in result:
                print("\nAssistant:", result['answer'])
                
                # Show reference information
                if result['rag_contexts']:
                    print("\nReferences used:")
                    for i, ctx in enumerate(result['rag_contexts'], 1):
                        similarity = ctx['similarity'] * 100  # Convert to percentage
                        source = ctx['chunk'].get('source', 'unknown')
                        print(f"\n{i}. Source: {source} (Relevance: {similarity:.1f}%)")
                
            # Handle errors
            else:
                print("\nError:", result['error'])
                print("Please try asking your question again.")
            
        except KeyboardInterrupt:
            print("\n\nExiting gracefully...")
            break
            
        except Exception as e:
            print(f"\nAn unexpected error occurred: {str(e)}")
            print("The system is still running. Please try again.")
            continue


Initializing the Integrated QA System...

Adding documents to knowledge base...
✓ Added document from source: ai_basics
✓ Added document from source: neural_networks
✓ Added document from source: nlp_overview

System initialized and ready for questions!

Example questions you can try:
1. What is artificial intelligence?
2. How do neural networks relate to what you explained earlier?
3. Can you tell me more about natural language processing?

Type 'exit' to quit, 'help' for example questions, or 'clear' to reset conversation history.

Processing your question...

Assistant: Artificial Intelligence (AI) refers to the simulation of human intelligence processes by machines, especially computer systems. It involves the development of algorithms and models that enable machines to perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation.

In our previous conversations, we discussed various aspects of tec

**Code Explanation:**

Let us break down the main function that demonstrates the integrated QA system:

1. **Initialization and Setup**:
    ```python
    # API Key Check
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        raise ValueError("Please set OPENAI_API_KEY environment variable")

    # System Initialization
    print("\nInitializing the Integrated QA System...")
    qa_system = IntegratedQASystem(api_key)
    ```
    Purpose:
    - Verifies OpenAI API key availability
    - Initializes the QA system

2. **Knowledge Base Setup**:
    ```python
    documents = [
        {
            'id': '1',
            'text': '''Artificial Intelligence (AI) focuses on...''',
            'source': 'ai_basics'
        },
        # ... more documents ...
    ]
    ```
    Features:
    - Sample documents on related topics
    - Structured document format
    - Clear source attribution

3. **Document Loading**:
    ```python
    print("\nAdding documents to knowledge base...")
    for doc in documents:
        try:
            qa_system.add_document(doc)
            print(f"✓ Added document from source: {doc['source']}")
        except Exception as e:
            print(f"✗ Error adding document from {doc['source']}: {str(e)}")
    ```
    Purpose:
    - Loads documents into system
    - Provides loading status feedback
    - Handles loading errors gracefully

4. **User Interface Setup**:
    ```python
    print("\nSystem initialized and ready for questions!")
    print("\nExample questions you can try:")
    print("1. What is artificial intelligence?")
    print("2. How do neural networks relate to what you explained earlier?")
    print("3. Can you tell me more about natural language processing?")
    ```
    Features:
    - Clear system status indication
    - Example questions for guidance
    - User-friendly interface

5. **Command Handling**:
    ```python
    # Special commands
    if question.lower() == 'exit':
        print("\nThank you for using the QA system. Goodbye!")
        break
    elif question.lower() == 'help':
        # Show help menu
    elif question.lower() == 'clear':
        qa_system.memory_cache.memories = []
        qa_system.memory_cache._save_memories()
    ```
    Commands:
    - `exit`: Ends the session
    - `help`: Shows example questions
    - `clear`: Resets conversation history

6. **Question Processing**:
    ```python
    print("\nProcessing your question...")
    result = qa_system.query(question)

    if 'error' not in result:
        print("\nAssistant:", result['answer'])
        
        # Show reference information
        if result['rag_contexts']:
            print("\nReferences used:")
            for i, ctx in enumerate(result['rag_contexts'], 1):
                similarity = ctx['similarity'] * 100
                source = ctx['chunk'].get('source', 'unknown')
                print(f"\n{i}. Source: {source} (Relevance: {similarity:.1f}%)")
    ```
    Features:
    - Question processing feedback
    - Clear answer display
    - Reference information showing
    - Relevance scoring

7. **Error Handling**:
    ```python
    try:
        # Processing code
    except KeyboardInterrupt:
        print("\n\nExiting gracefully...")
        break
    except Exception as e:
        print(f"\nAn unexpected error occurred: {str(e)}")
        print("The system is still running. Please try again.")
    ```
    Handles:
    - Keyboard interrupts
    - Processing errors
    - System continuity

8. **Key Features**:

a. Interactive Interface:
- Real-time question answering
- Command system
- Status feedback

b. Error Management:
- Graceful error handling
- Clear error messages
- System recovery

c. Reference Display:
- Source attribution
- Relevance scores
- Context information

In [None]:
if __name__ == "__main__":
    try:
        os.environ["OPENAI_API_KEY"] = 'OPENAI_API_KEY'
        main()
    except Exception as e:
        print(f"\nCritical error: {str(e)}")
        print("Please ensure your OpenAI API key is set and all requirements are installed.")














```python

```

## How the Integration Works

Let's understand how this integrated system enhances our question-answering capabilities:

1. **Memory Management**
   - The `MemoryCache` stores recent conversations with timestamps
   - It automatically removes old conversations to maintain relevance
   - Each memory entry includes both the conversation and any RAG contexts used

2. **Context Integration**
   - When processing a question, the system considers:
     * Relevant documents from the RAG system
     * Recent conversation history from the memory cache
   - The combined context helps generate more coherent and informed responses

3. **Enhanced Response Generation**
   - The system prompt instructs the model to use both reference materials and conversation history
   - Responses can refer back to previous discussions while grounding answers in source documents
   - The system maintains conversational flow while ensuring factual accuracy

## Advantages of the Integrated Approach

1. **Improved Accuracy**
   - RAG provides factual grounding through reference documents
   - Context awareness ensures consistency across the conversation

2. **Better User Experience**
   - The system can maintain coherent conversations
   - It can refer back to previous topics naturally
   - It combines fresh information with conversation history

3. **Flexible Knowledge Base**
   - Documents can be added or updated at any time
   - Conversation context evolves naturally
   - The system can handle both general and specific queries

## Exercise: Building and Testing the System

Try these exercises to understand the system better:

1. **Basic Testing**
```python
# Initialize the system
qa_system = IntegratedQASystem(your_api_key)

# Add some test documents
test_doc = {
    'id': 'test1',
    'text': 'Your test document text here',
    'source': 'test_source'
}
qa_system.add_document(test_doc)

# Try a series of related questions
questions = [
    "What is the main topic of the document?",
    "Can you elaborate on that?",
    "How does this relate to what we discussed earlier?"
]

for question in questions:
    response = qa_system.query(question)
    print(f"Q: {question}")
    print(f"A: {response['answer']}\n")
```

2. **Context Analysis**
   - Examine how the system uses both RAG and conversation context
   - Observe how responses change with and without context
   - Try questions that require understanding of previous conversation

## Conclusion

By integrating RAG with context awareness, we've created a more sophisticated question-answering system that combines the benefits of both approaches. The system can provide accurate, source-based information while maintaining natural, contextual conversations. This integration represents a significant step toward more intelligent and user-friendly AI interactions.

Remember to experiment with different settings and use cases to fully understand the capabilities and limitations of the integrated system.