# 🐍 Python Concepts - Greek Derby RAG Chatbot

## Learning Objectives
By the end of this lesson, you will understand:
- Modern Python programming patterns and best practices
- Object-Oriented Programming (OOP) in Python
- Type hints and type annotations
- Error handling and exception management
- Environment management and configuration
- Async programming concepts
- Data structures and algorithms
- API development with FastAPI
- Web scraping and data processing
- Memory management and optimization
- Testing and debugging techniques

---

## Q1: How do we structure Python classes and use OOP principles in our Greek Derby chatbot?

**Answer:**

### What is Object-Oriented Programming (OOP)?

**Object-Oriented Programming (OOP)** is a programming paradigm that organizes code into objects that contain both data (attributes) and behavior (methods). Think of it like creating blueprints for real-world entities.

**Key OOP Concepts:**
- **Class**: A blueprint or template for creating objects (like a blueprint for a house)
- **Object**: An instance of a class (like a specific house built from the blueprint)
- **Attribute**: Data stored in an object (like the color of a house)
- **Method**: Functions that belong to an object (like opening the door of a house)

**Simple Example:**
```python
# Class definition (blueprint)
class Car:
    def __init__(self, brand, color):
        self.brand = brand  # Attribute
        self.color = color  # Attribute
    
    def start_engine(self):  # Method
        return f"The {self.color} {self.brand} is starting!"

# Creating objects (instances)
my_car = Car("Toyota", "red")
your_car = Car("Honda", "blue")

print(my_car.start_engine())  # "The red Toyota is starting!"
print(your_car.start_engine())  # "The blue Honda is starting!"
```

Object-Oriented Programming is fundamental to our Greek Derby RAG chatbot. We use classes to organize code, encapsulate functionality, and create reusable components.

### Class Structure in Our Chatbot:

Before diving into our chatbot code, let's understand the key terms:

**Type Hints and Annotations:**
- **Type hints** are annotations that indicate what type of data a variable should contain
- **`from typing import List, Dict, Any`** - imports type definitions for lists, dictionaries, and any type
- **`TypedDict`** - creates a dictionary with specific key-value types (like a structured data format)

**Example of Type Hints:**
```python
def greet_user(name: str, age: int) -> str:
    """Function with type hints: takes str and int, returns str"""
    return f"Hello {name}, you are {age} years old!"

# Type hints help catch errors and make code more readable
user_name: str = "John"
user_age: int = 25
message: str = greet_user(user_name, user_age)
```

Now let's look at our chatbot structure:

```python
# backend/standalone-service/greek_derby_chatbot.py
from typing import List, Dict, Any
from typing_extensions import TypedDict

# TypedDict creates a structured dictionary with specific types
class GreekDerbyState(TypedDict):
    """Type definition for our chatbot state - like a data contract"""
    question: str        # Must be a string
    context: List[Document]  # Must be a list of Document objects
    answer: str          # Must be a string

class GreekDerbyChatbot:
    """Interactive RAG chatbot for Greek Derby discussions"""
    
    def __init__(self):
        """Constructor method - called when creating a new chatbot instance"""
        print("🚀 Initializing Greek Derby RAG Chatbot...")
        
        # Load environment variables
        self._load_environment()
        
        # Initialize components
        self._init_llm()           # Initialize Language Model
        self._init_embeddings()    # Initialize text embeddings
        self._init_vector_store()  # Initialize vector database
        self._init_rag_system()    # Initialize RAG system
        self._init_memory()        # Initialize conversation memory
        
        # Load or create knowledge base
        self._load_knowledge_base()
        
        print("✅ Greek Derby Chatbot initialized successfully!")
```

### Key OOP Principles Demonstrated:

#### 1. **Encapsulation**

**What is Encapsulation?**
Encapsulation is the principle of hiding internal implementation details and only exposing what's necessary. It's like a car - you don't need to know how the engine works internally, you just use the steering wheel and pedals.

**Key Terms:**
- **Private attributes/methods**: Internal details (prefixed with `_`) that shouldn't be accessed directly
- **Public interface**: Methods and attributes that external code can safely use
- **Data hiding**: Keeping internal state protected from external modification

**Simple Example:**
```python
class BankAccount:
    def __init__(self, initial_balance):
        self._balance = initial_balance  # Private - don't access directly
    
    def deposit(self, amount):  # Public method
        if amount > 0:
            self._balance += amount
            return True
        return False
    
    def get_balance(self):  # Public method
        return self._balance

# Usage
account = BankAccount(100)
account.deposit(50)  # ✅ Good - use public method
print(account.get_balance())  # ✅ Good - use public method
# account._balance = 1000  # ❌ Bad - don't access private directly
```

Now let's see how our chatbot uses encapsulation:

```python
class GreekDerbyChatbot:
    def __init__(self):
        # Private attributes (indicated by underscore) - internal implementation
        self._llm = None              # Language model - internal detail
        self._embeddings = None       # Embeddings model - internal detail
        self._vector_store = None     # Vector database - internal detail
        self._memory = None           # Memory system - internal detail
        
        # Public interface - what external code can safely use
        self.conversation_history = []  # Public - safe to access
        self.stats = {"questions_asked": 0, "responses_generated": 0}  # Public - safe to access
    
    def _load_environment(self):
        """Private method - internal implementation details"""
        # This method handles environment setup - users don't need to know how
        pass
    
    def ask_question(self, question: str) -> str:
        """Public method - interface for external use"""
        # This is what users call - it uses private methods internally
        return self._process_question(question)
```

#### 2. **Inheritance and Composition**

**What is Composition?**
Composition is when a class contains objects of other classes as attributes. It's a "HAS-A" relationship - our chatbot HAS-A language model, HAS-A memory system, etc.

**What is Inheritance?**
Inheritance is when a class inherits properties and methods from another class. It's an "IS-A" relationship - a Dog IS-A Animal.

**Simple Example of Composition:**
```python
class Engine:
    def start(self):
        return "Engine started"

class Car:
    def __init__(self):
        self.engine = Engine()  # Composition: Car HAS-A Engine
    
    def start_car(self):
        return self.engine.start()  # Use the engine's method

# Usage
my_car = Car()
print(my_car.start_car())  # "Engine started"
```

**Simple Example of Inheritance:**
```python
class Animal:
    def __init__(self, name):
        self.name = name
    
    def speak(self):
        return "Some sound"

class Dog(Animal):  # Dog inherits from Animal
    def speak(self):  # Override the parent method
        return f"{self.name} says Woof!"

# Usage
dog = Dog("Buddy")
print(dog.speak())  # "Buddy says Woof!"
```

Now let's see how our chatbot uses composition:

```python
# Using composition with LangChain components
class GreekDerbyChatbot:
    def __init__(self):
        # Composition: chatbot HAS-A language model
        self.llm = init_chat_model("gpt-4o-mini", model_provider="openai")
        self.embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
        self.memory = ConversationBufferMemory(return_messages=True)
    
    def _init_rag_system(self):
        """Initialize RAG system using composition"""
        # The chatbot uses other objects to build functionality
        self.retriever = self.vector_store.as_retriever(search_kwargs={"k": 5})
        self.rag_chain = self._create_rag_chain()
```

#### 3. **Polymorphism**
```python
# Different types of document loaders can be used interchangeably
class DocumentLoader:
    def load_documents(self) -> List[Document]:
        raise NotImplementedError

class WebLoader(DocumentLoader):
    def load_documents(self) -> List[Document]:
        # Web-specific loading logic
        return self._load_from_web()

class FileLoader(DocumentLoader):
    def load_documents(self) -> List[Document]:
        # File-specific loading logic
        return self._load_from_file()

# Polymorphic usage
def load_knowledge_base(loader: DocumentLoader) -> List[Document]:
    return loader.load_documents()  # Works with any DocumentLoader subclass
```

### Advanced Class Patterns:

#### 1. **Context Managers**

**What is a Context Manager?**
A context manager is an object that defines what happens when you enter and exit a `with` statement. It's like having automatic setup and cleanup - perfect for resources that need to be properly closed.

**Key Terms:**
- **`__enter__`**: Method called when entering the `with` block
- **`__exit__`**: Method called when exiting the `with` block (even if an error occurs)
- **`with` statement**: Python syntax for using context managers safely

**Simple Example:**
```python
class FileManager:
    def __init__(self, filename):
        self.filename = filename
        self.file = None
    
    def __enter__(self):
        print(f"Opening {self.filename}")
        self.file = open(self.filename, 'r')
        return self.file
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        print(f"Closing {self.filename}")
        if self.file:
            self.file.close()

# Usage - file is automatically closed
with FileManager("data.txt") as f:
    content = f.read()
# File is automatically closed here, even if an error occurs
```

Now let's see a more complex example for our chatbot:

```python
class DatabaseConnection:
    def __init__(self, connection_string: str):
        self.connection_string = connection_string
        self.connection = None
    
    def __enter__(self):
        """Called when entering 'with' block"""
        self.connection = self._establish_connection()
        return self.connection
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        """Called when exiting 'with' block"""
        if self.connection:
            self.connection.close()
    
    def _establish_connection(self):
        # Connection logic
        pass

# Usage
with DatabaseConnection("postgresql://...") as conn:
    # Use connection
    pass  # Automatically closed
```

#### 2. **Property Decorators**

**What are Property Decorators?**
Property decorators allow you to define methods that can be accessed like attributes, but with custom logic for getting, setting, and deleting values. It's like having smart attributes that can validate or compute values.

**Key Terms:**
- **`@property`**: Makes a method look like an attribute when getting a value
- **`@property_name.setter`**: Defines what happens when setting a value
- **`@property_name.deleter`**: Defines what happens when deleting a value

**Simple Example:**
```python
class Circle:
    def __init__(self, radius):
        self._radius = radius
    
    @property
    def radius(self):
        """Get the radius"""
        return self._radius
    
    @radius.setter
    def radius(self, value):
        """Set the radius with validation"""
        if value < 0:
            raise ValueError("Radius cannot be negative")
        self._radius = value
    
    @property
    def area(self):
        """Calculate area (computed property)"""
        return 3.14159 * self._radius ** 2

# Usage
circle = Circle(5)
print(circle.radius)  # 5 (calls the getter)
print(circle.area)    # 78.54 (calls the computed property)
circle.radius = 10    # Calls the setter
# circle.radius = -5   # Raises ValueError
```

Now let's see how our chatbot uses properties:

```python
class GreekDerbyChatbot:
    def __init__(self):
        self._conversation_count = 0
        self._last_question = None
    
    @property
    def conversation_count(self) -> int:
        """Get conversation count (read-only)"""
        return self._conversation_count
    
    @property
    def last_question(self) -> str:
        """Get last question asked"""
        return self._last_question
    
    @last_question.setter
    def last_question(self, value: str):
        """Set last question with validation"""
        if not isinstance(value, str):
            raise ValueError("Question must be a string")
        self._last_question = value
        self._conversation_count += 1
```

#### 3. **Class Methods and Static Methods**

**What are Class Methods and Static Methods?**

**Class Methods (`@classmethod`):**
- Belong to the class, not to any specific instance
- First parameter is `cls` (the class itself)
- Can create alternative constructors
- Can access class-level data

**Static Methods (`@staticmethod`):**
- Don't need access to the class or instance
- Like regular functions but belong to the class
- No `self` or `cls` parameter
- Used for utility functions related to the class

**Simple Example:**
```python
class MathUtils:
    PI = 3.14159  # Class variable
    
    @classmethod
    def from_radius(cls, radius):
        """Alternative constructor"""
        return cls(radius, radius * 2)  # diameter
    
    @staticmethod
    def is_even(number):
        """Static method - doesn't need class or instance"""
        return number % 2 == 0
    
    def __init__(self, width, height):
        self.width = width
        self.height = height

# Usage
rect1 = MathUtils(5, 10)  # Regular constructor
rect2 = MathUtils.from_radius(3)  # Class method constructor
print(MathUtils.is_even(4))  # True - static method
```

Now let's see how our chatbot uses these patterns:

```python
class GreekDerbyChatbot:
    _instance = None  # Class variable for singleton
    
    def __new__(cls):
        """Singleton pattern implementation"""
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance
    
    @classmethod
    def from_config(cls, config_file: str) -> 'GreekDerbyChatbot':
        """Alternative constructor from config file"""
        instance = cls()
        instance._load_from_config(config_file)
        return instance
    
    @staticmethod
    def validate_question(question: str) -> bool:
        """Static method - doesn't need instance"""
        return len(question.strip()) > 0 and len(question) < 1000
```

### Best Practices for Our Project:

1. **Single Responsibility**: Each class has one clear purpose
2. **Dependency Injection**: Dependencies are injected rather than hardcoded
3. **Interface Segregation**: Small, focused interfaces
4. **Open/Closed Principle**: Open for extension, closed for modification
5. **Type Hints**: All methods have proper type annotations
6. **Documentation**: Clear docstrings for all public methods


## Q2: How do we implement type hints and error handling in our Python chatbot?

**Answer:**

### What are Type Hints and Why Do We Need Them?

**Type Hints** are annotations that tell Python (and other developers) what type of data a variable should contain. They don't change how Python runs your code, but they help catch errors early and make code more readable.

**Benefits:**
- **Early Error Detection**: Catch type-related errors before runtime
- **Better IDE Support**: Autocomplete and error highlighting
- **Self-Documenting Code**: Clear expectations about data types
- **Easier Refactoring**: Tools can understand your code better

**Simple Example:**
```python
# Without type hints
def add_numbers(a, b):
    return a + b

# With type hints
def add_numbers(a: int, b: int) -> int:
    return a + b

# Usage
result = add_numbers(5, 3)  # ✅ Good
# result = add_numbers("hello", 3)  # ❌ Type checker will warn
```

Type hints and robust error handling are crucial for maintaining code quality and preventing runtime errors in our RAG chatbot. Python's type system helps catch errors early and makes code more maintainable.

### Type Hints and Annotations:

#### 1. **Basic Type Hints**

**Common Type Hint Patterns:**

**Basic Types:**
- `str` - String (text)
- `int` - Integer (whole number)
- `float` - Floating point number
- `bool` - Boolean (True/False)
- `list` - List of items
- `dict` - Dictionary (key-value pairs)

**Advanced Types from `typing` module:**
- `List[Type]` - List containing specific type
- `Dict[KeyType, ValueType]` - Dictionary with specific key/value types
- `Optional[Type]` - Can be the type or None
- `Union[Type1, Type2]` - Can be one of several types
- `Any` - Any type (use sparingly)

**Simple Examples:**
```python
# Basic types
name: str = "John"
age: int = 25
is_student: bool = True
scores: list = [85, 90, 78]  # Generic list

# Specific types
scores: List[int] = [85, 90, 78]  # List of integers
person: Dict[str, Any] = {"name": "John", "age": 25}  # Dictionary

# Optional types
middle_name: Optional[str] = None  # Can be string or None
phone: Optional[str] = "123-456-7890"  # Can be string or None

# Union types
id_value: Union[int, str] = 123  # Can be int or str
id_value = "ABC123"  # Also valid
```

Now let's see how our chatbot uses these patterns:

```python
# backend/standalone-service/greek_derby_chatbot.py
from typing import List, Dict, Any, Optional, Union
from typing_extensions import TypedDict
from datetime import datetime

# TypedDict for structured data - like a data contract
class GreekDerbyState(TypedDict):
    question: str                    # Must be a string
    context: List[Document]          # Must be a list of Document objects
    answer: str                      # Must be a string
    timestamp: datetime              # Must be a datetime object

# Function type hints - clear input and output types
def process_question(self, question: str) -> str:
    """Process a user question and return response"""
    return self._generate_response(question)

# Complex type hints with default values
def search_documents(self, query: str, limit: int = 5) -> List[Document]:
    """Search for relevant documents"""
    return self.vector_store.similarity_search(query, k=limit)

# Optional and Union types - flexible but type-safe
def get_conversation_history(self, session_id: Optional[str] = None) -> List[Dict[str, Any]]:
    """Get conversation history for a session"""
    if session_id:
        return self.memory.get_conversation(session_id)
    return self.conversation_history

# Generic types - working with collections
def process_batch_questions(self, questions: List[str]) -> Dict[str, str]:
    """Process multiple questions at once"""
    results = {}
    for question in questions:
        results[question] = self.process_question(question)
    return results
```

#### 2. **Advanced Type Hints**

**Advanced Type Concepts:**

**TypeVar and Generics:**
- **`TypeVar`**: Creates a type variable that can represent any type
- **`Generic[T]`**: Makes a class generic, where T can be any type
- **`Callable`**: Represents a function type

**Protocol (Duck Typing):**
- **`Protocol`**: Defines an interface without inheritance
- If it walks like a duck and quacks like a duck, it's a duck!
- Allows flexible type checking based on methods, not inheritance

**Abstract Base Classes:**
- **`ABC`**: Abstract Base Class - cannot be instantiated directly
- **`@abstractmethod`**: Method that must be implemented by subclasses

**Simple Examples:**
```python
from typing import TypeVar, Generic, Callable

# TypeVar example
T = TypeVar('T')

class Box(Generic[T]):
    def __init__(self, item: T):
        self.item = item
    
    def get_item(self) -> T:
        return self.item

# Usage
string_box = Box("hello")  # T is str
int_box = Box(42)          # T is int
```

```python
# Protocol example
from typing import Protocol

class Drawable(Protocol):
    def draw(self) -> None:
        ...

def render_shape(shape: Drawable) -> None:
    shape.draw()  # Any object with draw() method works

class Circle:
    def draw(self):
        print("Drawing circle")

class Square:
    def draw(self):
        print("Drawing square")

# Both work with render_shape()
render_shape(Circle())  # ✅
render_shape(Square())  # ✅
```

Now let's see how our chatbot uses these advanced patterns:

```python
from typing import TypeVar, Generic, Callable, Protocol
from abc import ABC, abstractmethod

# Type variables for generic classes
T = TypeVar('T')

class DocumentProcessor(Generic[T]):
    """Generic document processor - can process to any type"""
    def __init__(self, processor_func: Callable[[str], T]):
        self.processor_func = processor_func
    
    def process(self, document: str) -> T:
        return self.processor_func(document)

# Protocol for duck typing - any class with these methods works
class VectorStore(Protocol):
    def similarity_search(self, query: str, k: int) -> List[Document]:
        ...
    
    def add_documents(self, documents: List[Document]) -> None:
        ...

# Abstract base class - must be inherited and methods implemented
class BaseChatbot(ABC):
    @abstractmethod
    def ask_question(self, question: str) -> str:
        """Abstract method that must be implemented"""
        pass
    
    @abstractmethod
    def clear_memory(self) -> None:
        """Clear conversation memory"""
        pass
```

### Error Handling Strategies:

**What is Error Handling?**
Error handling is the process of anticipating, detecting, and responding to errors that occur during program execution. Instead of letting your program crash, you catch errors and handle them gracefully.

**Key Terms:**
- **Exception**: An error that occurs during program execution
- **Try/Except**: Python's way of catching and handling exceptions
- **Raise**: Creating and throwing an exception
- **Finally**: Code that always runs, even if an exception occurs

**Simple Example:**
```python
def divide_numbers(a, b):
    try:
        result = a / b
        return result
    except ZeroDivisionError:
        print("Cannot divide by zero!")
        return None
    except TypeError:
        print("Please provide numbers!")
        return None
    finally:
        print("Division attempt completed")

# Usage
print(divide_numbers(10, 2))   # 5.0
print(divide_numbers(10, 0))   # None (handled error)
```

#### 1. **Exception Hierarchy**

**What is an Exception Hierarchy?**
An exception hierarchy organizes different types of errors into a tree structure. You can catch specific errors or catch broader categories. It's like organizing different types of problems.

**Simple Example:**
```python
class MathError(Exception):
    """Base class for math-related errors"""
    pass

class DivisionError(MathError):
    """Error when division fails"""
    pass

class ZeroDivisionError(DivisionError):
    """Error when dividing by zero"""
    pass

class NegativeNumberError(MathError):
    """Error when working with negative numbers"""
    pass

# Usage
try:
    if number < 0:
        raise NegativeNumberError("Cannot work with negative numbers")
    result = 10 / number
except ZeroDivisionError:
    print("Cannot divide by zero")
except DivisionError:
    print("Division failed")
except MathError:
    print("Math error occurred")
```

Now let's see how our chatbot uses custom exceptions:

```python
# Custom exceptions for our chatbot
class GreekDerbyError(Exception):
    """Base exception for Greek Derby chatbot"""
    pass

class APIKeyError(GreekDerbyError):
    """Raised when API keys are missing or invalid"""
    pass

class VectorStoreError(GreekDerbyError):
    """Raised when vector store operations fail"""
    pass

class WebScrapingError(GreekDerbyError):
    """Raised when web scraping fails"""
    pass

class ValidationError(GreekDerbyError):
    """Raised when input validation fails"""
    pass
```

#### 2. **Comprehensive Error Handling**

**What is Comprehensive Error Handling?**
Comprehensive error handling means anticipating different types of errors and handling each one appropriately. It's like having a plan for different types of problems that might occur.

**Key Strategies:**
- **Specific Exception Handling**: Catch specific errors and handle them differently
- **Graceful Degradation**: Provide fallback behavior when things go wrong
- **User-Friendly Messages**: Convert technical errors into messages users can understand
- **Logging**: Record errors for debugging while keeping users informed

**Simple Example:**
```python
def safe_divide(a, b):
    try:
        result = a / b
        return f"Result: {result}"
    except ZeroDivisionError:
        return "Cannot divide by zero"
    except TypeError:
        return "Please provide numbers"
    except Exception as e:
        return f"Unexpected error: {e}"

# Usage
print(safe_divide(10, 2))    # "Result: 5.0"
print(safe_divide(10, 0))    # "Cannot divide by zero"
print(safe_divide(10, "a"))  # "Please provide numbers"
```

Now let's see how our chatbot implements comprehensive error handling:

```python
class GreekDerbyChatbot:
    def __init__(self):
        try:
            self._load_environment()
            self._init_components()
        except APIKeyError as e:
            print(f"❌ API Key Error: {e}")
            raise
        except Exception as e:
            print(f"❌ Unexpected error during initialization: {e}")
            raise GreekDerbyError(f"Failed to initialize chatbot: {e}")
    
    def _load_environment(self) -> None:
        """Load and validate environment variables"""
        try:
            from dotenv import load_dotenv
            load_dotenv()
        except ImportError:
            print("⚠️  python-dotenv not installed. Using system environment variables.")
        
        required_vars = ['OPENAI_API_KEY', 'PINECONE_API_KEY']
        missing_vars = [var for var in required_vars if not os.getenv(var)]
        
        if missing_vars:
            raise APIKeyError(f"Missing required environment variables: {', '.join(missing_vars)}")
    
    def ask_question(self, question: str) -> str:
        """Ask a question with comprehensive error handling"""
        try:
            # Input validation
            if not self._validate_question(question):
                raise ValidationError("Invalid question format")
            
            # Process question
            response = self._process_question(question)
            
            # Log successful interaction
            self._log_interaction(question, response)
            
            return response
            
        except ValidationError as e:
            print(f"❌ Validation Error: {e}")
            return "I'm sorry, but I couldn't process your question. Please try rephrasing it."
        
        except VectorStoreError as e:
            print(f"❌ Vector Store Error: {e}")
            return "I'm having trouble accessing my knowledge base. Please try again later."
        
        except Exception as e:
            print(f"❌ Unexpected error: {e}")
            return "I encountered an unexpected error. Please try again."
    
    def _validate_question(self, question: str) -> bool:
        """Validate question input"""
        if not isinstance(question, str):
            return False
        if len(question.strip()) == 0:
            return False
        if len(question) > 1000:
            return False
        return True
```

#### 3. **Context Managers for Resource Management**

**What are Context Managers for Resource Management?**
Context managers help ensure that resources (like files, database connections, or timers) are properly cleaned up, even if an error occurs. They're perfect for "setup and cleanup" operations.

**Key Terms:**
- **`@contextmanager`**: Decorator that turns a function into a context manager
- **`yield`**: The point where the context manager enters the `with` block
- **Resource Management**: Ensuring resources are properly acquired and released

**Simple Example:**
```python
from contextlib import contextmanager

@contextmanager
def file_manager(filename):
    print(f"Opening {filename}")
    file = open(filename, 'r')
    try:
        yield file  # This is where the 'with' block runs
    finally:
        print(f"Closing {filename}")
        file.close()

# Usage
with file_manager("data.txt") as f:
    content = f.read()
# File is automatically closed here
```

Now let's see how our chatbot uses context managers:

```python
from contextlib import contextmanager
import time

@contextmanager
def timed_operation(operation_name: str):
    """Context manager for timing operations"""
    start_time = time.time()
    try:
        print(f"🔄 Starting {operation_name}...")
        yield
    except Exception as e:
        print(f"❌ Error in {operation_name}: {e}")
        raise
    finally:
        duration = time.time() - start_time
        print(f"✅ {operation_name} completed in {duration:.2f} seconds")

# Usage in our chatbot
def _load_knowledge_base(self) -> None:
    """Load knowledge base with timing"""
    with timed_operation("knowledge base loading"):
        try:
            documents = self._scrape_web_content()
            self._process_documents(documents)
        except WebScrapingError as e:
            print(f"⚠️  Web scraping failed, using fallback content: {e}")
            self._load_fallback_content()
```

#### 4. **Retry Mechanism with Exponential Backoff**
```python
import time
import random
from functools import wraps

def retry_with_backoff(max_retries: int = 3, base_delay: float = 1.0):
    """Decorator for retrying operations with exponential backoff"""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            last_exception = None
            
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    last_exception = e
                    if attempt < max_retries - 1:
                        delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
                        print(f"⚠️  Attempt {attempt + 1} failed: {e}. Retrying in {delay:.2f}s...")
                        time.sleep(delay)
                    else:
                        print(f"❌ All {max_retries} attempts failed")
            
            raise last_exception
        return wrapper
    return decorator

# Usage in our chatbot
class GreekDerbyChatbot:
    @retry_with_backoff(max_retries=3, base_delay=1.0)
    def _scrape_web_content(self) -> List[Document]:
        """Scrape web content with retry mechanism"""
        try:
            loader = WebBaseLoader(self.urls)
            documents = loader.load()
            return documents
        except Exception as e:
            raise WebScrapingError(f"Failed to scrape web content: {e}")
```

### Logging and Monitoring:

#### 1. **Structured Logging**
```python
import logging
import json
from datetime import datetime

class GreekDerbyLogger:
    def __init__(self, name: str):
        self.logger = logging.getLogger(name)
        self.logger.setLevel(logging.INFO)
        
        # Create formatter
        formatter = logging.Formatter(
            '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
        )
        
        # Create handler
        handler = logging.StreamHandler()
        handler.setFormatter(formatter)
        self.logger.addHandler(handler)
    
    def log_interaction(self, question: str, response: str, duration: float):
        """Log chatbot interaction"""
        log_data = {
            "timestamp": datetime.utcnow().isoformat(),
            "question": question,
            "response_length": len(response),
            "duration": duration,
            "type": "interaction"
        }
        self.logger.info(json.dumps(log_data))
    
    def log_error(self, error: Exception, context: Dict[str, Any]):
        """Log errors with context"""
        log_data = {
            "timestamp": datetime.utcnow().isoformat(),
            "error_type": type(error).__name__,
            "error_message": str(error),
            "context": context,
            "type": "error"
        }
        self.logger.error(json.dumps(log_data))
```

### Best Practices for Our Project:

1. **Always use type hints** for function parameters and return values
2. **Create custom exceptions** for specific error conditions
3. **Use context managers** for resource management
4. **Implement retry mechanisms** for external API calls
5. **Log all important operations** with structured logging
6. **Validate inputs** at the boundaries of your system
7. **Handle exceptions gracefully** with meaningful error messages
8. **Use protocols** for duck typing instead of inheritance when appropriate


## Q3: How do we implement async programming and FastAPI in our chatbot?

**Answer:**

### What is Async Programming?

**Async Programming** allows your program to do multiple things at the same time without waiting for each one to finish. It's like being able to cook, watch TV, and answer the phone simultaneously instead of doing them one after another.

**Key Terms:**
- **`async`**: Marks a function as asynchronous (can be paused and resumed)
- **`await`**: Pauses execution until an async operation completes
- **`asyncio`**: Python's built-in library for async programming
- **Concurrency**: Multiple operations happening at the same time
- **Non-blocking**: Operations that don't stop other operations from running

**Simple Example:**
```python
import asyncio
import time

async def cook_pasta():
    print("Starting to cook pasta...")
    await asyncio.sleep(3)  # Simulate cooking time
    print("Pasta is ready!")

async def watch_tv():
    print("Starting to watch TV...")
    await asyncio.sleep(2)  # Simulate watching time
    print("Finished watching TV!")

async def main():
    # Run both tasks concurrently
    await asyncio.gather(cook_pasta(), watch_tv())

# Run the async program
asyncio.run(main())
# Output: Both tasks run at the same time!
```

**Why Use Async Programming?**
- **Better Performance**: Handle many requests simultaneously
- **Responsive Applications**: Don't freeze while waiting for slow operations
- **Efficient Resource Usage**: Use CPU and network more effectively

Async programming is essential for building high-performance web APIs. Our Greek Derby chatbot uses FastAPI with async/await patterns to handle multiple concurrent requests efficiently.

### Async Programming Concepts:

#### 1. **Basic Async/Await Pattern**

**What is the Async/Await Pattern?**
The async/await pattern is the modern way to write asynchronous code in Python. It makes async code look like regular code, making it easier to read and understand.

**Key Concepts:**
- **`async def`**: Defines an asynchronous function
- **`await`**: Waits for an async operation to complete
- **`run_in_executor`**: Runs blocking code in a thread pool (so it doesn't block the async loop)
- **Event Loop**: The core of async programming that manages all async operations

**Simple Example:**
```python
import asyncio

async def fetch_data(url):
    print(f"Fetching data from {url}...")
    await asyncio.sleep(1)  # Simulate network request
    return f"Data from {url}"

async def main():
    # These run concurrently, not one after another
    data1 = await fetch_data("api1.com")
    data2 = await fetch_data("api2.com")
    print(data1, data2)

asyncio.run(main())
```

Now let's see how our chatbot uses async patterns:

```python
# backend/api/greek_derby_api.py
import asyncio
from typing import Dict, List, Optional
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

class GreekDerbyAPI:
    def __init__(self):
        self.app = FastAPI(title="Greek Derby RAG Chatbot API")
        self.chatbot = None
        self._setup_routes()
    
    async def initialize(self):
        """Async initialization of the chatbot"""
        print("🚀 Initializing chatbot asynchronously...")
        # Run blocking operations in thread pool
        self.chatbot = await asyncio.get_event_loop().run_in_executor(
            None, self._create_chatbot
        )
        print("✅ Chatbot initialized successfully!")
    
    def _create_chatbot(self):
        """Blocking chatbot creation (runs in thread pool)"""
        from greek_derby_chatbot import GreekDerbyChatbot
        return GreekDerbyChatbot()
    
    async def process_question_async(self, question: str) -> str:
        """Process question asynchronously"""
        if not self.chatbot:
            raise HTTPException(status_code=503, detail="Chatbot not initialized")
        
        # Run the blocking chatbot operation in thread pool
        response = await asyncio.get_event_loop().run_in_executor(
            None, self.chatbot.ask_question, question
        )
        return response
```

#### 2. **FastAPI Async Endpoints**

**What is FastAPI?**
FastAPI is a modern, fast web framework for building APIs with Python. It's designed to be easy to use and automatically generates API documentation.

**Key Features:**
- **Automatic Documentation**: Creates interactive API docs automatically
- **Type Validation**: Uses Python type hints for request/response validation
- **Async Support**: Built-in support for async/await patterns
- **High Performance**: One of the fastest Python web frameworks

**Simple Example:**
```python
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class Item(BaseModel):
    name: str
    price: float

@app.get("/")
async def read_root():
    return {"Hello": "World"}

@app.post("/items/")
async def create_item(item: Item):
    return {"item_name": item.name, "item_price": item.price}
```

Now let's see how our chatbot uses FastAPI:

```python
# FastAPI app setup
app = FastAPI(
    title="Greek Derby RAG Chatbot API",
    description="A RAG-powered chatbot about Greek football derby",
    version="1.0.0"
)

# Global chatbot instance
chatbot_api = GreekDerbyAPI()

@app.on_event("startup")
async def startup_event():
    """Initialize chatbot on startup"""
    await chatbot_api.initialize()

# Async endpoint definitions
@app.get("/")
async def root() -> Dict[str, str]:
    """Root endpoint with API information"""
    return {
        "message": "🇬🇷 Greek Derby RAG Chatbot API",
        "version": "1.0.0",
        "status": "running"
    }

@app.get("/health")
async def health_check() -> Dict[str, str]:
    """Health check endpoint"""
    return {"status": "healthy", "timestamp": datetime.utcnow().isoformat()}

@app.post("/chat")
async def chat(request: ChatRequest) -> ChatResponse:
    """Main chat endpoint"""
    try:
        start_time = time.time()
        
        # Process question asynchronously
        answer = await chatbot_api.process_question_async(request.question)
        
        # Calculate response time
        response_time = time.time() - start_time
        
        return ChatResponse(
            answer=answer,
            timestamp=datetime.utcnow().isoformat(),
            response_time=response_time
        )
    
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
```

#### 3. **Concurrent Operations**
```python
import asyncio
from typing import List

class ConcurrentProcessor:
    """Handle multiple operations concurrently"""
    
    async def process_multiple_questions(self, questions: List[str]) -> List[str]:
        """Process multiple questions concurrently"""
        # Create tasks for concurrent execution
        tasks = [
            self._process_single_question(question) 
            for question in questions
        ]
        
        # Wait for all tasks to complete
        results = await asyncio.gather(*tasks, return_exceptions=True)
        
        # Handle exceptions
        processed_results = []
        for i, result in enumerate(results):
            if isinstance(result, Exception):
                processed_results.append(f"Error processing question {i+1}: {str(result)}")
            else:
                processed_results.append(result)
        
        return processed_results
    
    async def _process_single_question(self, question: str) -> str:
        """Process a single question"""
        # Simulate async processing
        await asyncio.sleep(0.1)  # Simulate processing time
        return f"Response to: {question}"
    
    async def batch_vector_search(self, queries: List[str]) -> List[List[Document]]:
        """Perform multiple vector searches concurrently"""
        tasks = [
            self._search_vectors(query) 
            for query in queries
        ]
        return await asyncio.gather(*tasks)
    
    async def _search_vectors(self, query: str) -> List[Document]:
        """Search vectors for a single query"""
        # Run blocking vector search in thread pool
        return await asyncio.get_event_loop().run_in_executor(
            None, self.vector_store.similarity_search, query, 5
        )
```

### FastAPI Advanced Features:

#### 1. **Dependency Injection**
```python
from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials

# Security dependency
security = HTTPBearer()

async def get_current_user(credentials: HTTPAuthorizationCredentials = Depends(security)):
    """Dependency to get current user"""
    # Validate token and return user info
    if not credentials.credentials:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid authentication credentials"
        )
    return {"user_id": "user123", "username": "greek_derby_fan"}

# Rate limiting dependency
from collections import defaultdict
from datetime import datetime, timedelta

class RateLimiter:
    def __init__(self, max_requests: int = 100, window_seconds: int = 3600):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.requests = defaultdict(list)
    
    async def __call__(self, request: Request):
        client_ip = request.client.host
        now = datetime.utcnow()
        
        # Clean old requests
        cutoff = now - timedelta(seconds=self.window_seconds)
        self.requests[client_ip] = [
            req_time for req_time in self.requests[client_ip] 
            if req_time > cutoff
        ]
        
        # Check rate limit
        if len(self.requests[client_ip]) >= self.max_requests:
            raise HTTPException(
                status_code=status.HTTP_429_TOO_MANY_REQUESTS,
                detail="Rate limit exceeded"
            )
        
        # Add current request
        self.requests[client_ip].append(now)

# Usage in endpoints
rate_limiter = RateLimiter(max_requests=50, window_seconds=3600)

@app.post("/chat")
async def chat(
    request: ChatRequest,
    current_user: dict = Depends(get_current_user),
    _: None = Depends(rate_limiter)
) -> ChatResponse:
    """Chat endpoint with authentication and rate limiting"""
    # Process chat request
    pass
```

#### 2. **Background Tasks**
```python
from fastapi import BackgroundTasks

class BackgroundProcessor:
    def __init__(self):
        self.task_queue = asyncio.Queue()
        self.workers = []
        self._start_workers()
    
    def _start_workers(self):
        """Start background worker tasks"""
        for i in range(3):  # 3 worker tasks
            worker = asyncio.create_task(self._worker(f"worker-{i}"))
            self.workers.append(worker)
    
    async def _worker(self, worker_name: str):
        """Background worker task"""
        while True:
            try:
                task = await self.task_queue.get()
                await self._process_task(task)
                self.task_queue.task_done()
            except Exception as e:
                print(f"Worker {worker_name} error: {e}")
    
    async def _process_task(self, task: dict):
        """Process a background task"""
        task_type = task.get("type")
        if task_type == "update_vector_db":
            await self._update_vector_database()
        elif task_type == "log_interaction":
            await self._log_interaction(task.get("data"))
    
    async def add_task(self, task: dict):
        """Add task to background queue"""
        await self.task_queue.put(task)

# Usage in endpoints
background_processor = BackgroundProcessor()

@app.post("/chat")
async def chat(
    request: ChatRequest,
    background_tasks: BackgroundTasks
) -> ChatResponse:
    """Chat endpoint with background task processing"""
    # Process chat
    response = await chatbot_api.process_question_async(request.question)
    
    # Add background tasks
    background_tasks.add_task(
        background_processor.add_task,
        {
            "type": "log_interaction",
            "data": {
                "question": request.question,
                "response": response,
                "timestamp": datetime.utcnow().isoformat()
            }
        }
    )
    
    return ChatResponse(answer=response, timestamp=datetime.utcnow().isoformat())
```

#### 3. **WebSocket Support**
```python
from fastapi import WebSocket, WebSocketDisconnect
from typing import List

class ConnectionManager:
    def __init__(self):
        self.active_connections: List[WebSocket] = []
    
    async def connect(self, websocket: WebSocket):
        await websocket.accept()
        self.active_connections.append(websocket)
    
    def disconnect(self, websocket: WebSocket):
        self.active_connections.remove(websocket)
    
    async def send_personal_message(self, message: str, websocket: WebSocket):
        await websocket.send_text(message)
    
    async def broadcast(self, message: str):
        for connection in self.active_connections:
            await connection.send_text(message)

manager = ConnectionManager()

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    """WebSocket endpoint for real-time chat"""
    await manager.connect(websocket)
    try:
        while True:
            # Receive message from client
            data = await websocket.receive_text()
            
            # Process message
            response = await chatbot_api.process_question_async(data)
            
            # Send response back
            await manager.send_personal_message(response, websocket)
    
    except WebSocketDisconnect:
        manager.disconnect(websocket)
```

### Performance Optimization:

#### 1. **Connection Pooling**
```python
import aiohttp
import asyncio
from typing import Optional

class HTTPClient:
    def __init__(self):
        self.session: Optional[aiohttp.ClientSession] = None
    
    async def __aenter__(self):
        """Async context manager entry"""
        self.session = aiohttp.ClientSession(
            connector=aiohttp.TCPConnector(limit=100, limit_per_host=30)
        )
        return self
    
    async def __aexit__(self, exc_type, exc_val, exc_tb):
        """Async context manager exit"""
        if self.session:
            await self.session.close()
    
    async def get(self, url: str, **kwargs):
        """Make GET request with connection pooling"""
        async with self.session.get(url, **kwargs) as response:
            return await response.json()

# Usage
async def fetch_web_content(urls: List[str]) -> List[Dict]:
    """Fetch multiple URLs concurrently with connection pooling"""
    async with HTTPClient() as client:
        tasks = [client.get(url) for url in urls]
        return await asyncio.gather(*tasks)
```

#### 2. **Caching with Redis**
```python
import redis.asyncio as redis
import json
from typing import Optional

class AsyncCache:
    def __init__(self, redis_url: str = "redis://localhost:6379"):
        self.redis = redis.from_url(redis_url)
    
    async def get(self, key: str) -> Optional[dict]:
        """Get cached value"""
        value = await self.redis.get(key)
        if value:
            return json.loads(value)
        return None
    
    async def set(self, key: str, value: dict, expire: int = 3600):
        """Set cached value with expiration"""
        await self.redis.setex(key, expire, json.dumps(value))
    
    async def delete(self, key: str):
        """Delete cached value"""
        await self.redis.delete(key)

# Usage in chatbot
class CachedChatbot:
    def __init__(self):
        self.cache = AsyncCache()
    
    async def ask_question(self, question: str) -> str:
        """Ask question with caching"""
        cache_key = f"question:{hash(question)}"
        
        # Check cache first
        cached_response = await self.cache.get(cache_key)
        if cached_response:
            return cached_response["answer"]
        
        # Process question
        response = await self._process_question(question)
        
        # Cache response
        await self.cache.set(cache_key, {"answer": response})
        
        return response
```

### Best Practices for Async Programming:

1. **Use async/await consistently** throughout the application
2. **Run blocking operations in thread pools** using `run_in_executor`
3. **Implement proper error handling** for async operations
4. **Use connection pooling** for external API calls
5. **Implement caching** to reduce redundant operations
6. **Use background tasks** for non-critical operations
7. **Monitor async performance** with proper logging
8. **Test async code thoroughly** with proper test frameworks


## Q4: What data structures and algorithms do we use in our RAG chatbot?

**Answer:**

### What are Data Structures and Algorithms?

**Data Structures** are ways of organizing and storing data in a computer so that it can be accessed and modified efficiently. Think of them as different types of containers for your data.

**Algorithms** are step-by-step procedures for solving problems or performing tasks. They're like recipes that tell you exactly how to do something.

**Why Are They Important?**
- **Performance**: The right data structure can make operations 100x faster
- **Memory Efficiency**: Store data in the most space-efficient way
- **Scalability**: Handle large amounts of data without slowing down
- **Problem Solving**: Choose the right tool for each job

**Simple Example:**
```python
# Different data structures for different needs
# List - good for ordered data
scores = [85, 90, 78, 92]
print(scores[0])  # Fast access by index

# Dictionary - good for key-value lookups
student_grades = {"Alice": 85, "Bob": 90, "Charlie": 78}
print(student_grades["Alice"])  # Fast lookup by name

# Set - good for unique items and fast membership testing
unique_numbers = {1, 2, 3, 4, 5}
print(3 in unique_numbers)  # Very fast membership test
```

Our Greek Derby RAG chatbot uses various data structures and algorithms to efficiently process, store, and retrieve information. Understanding these concepts is crucial for building performant AI applications.

### Core Data Structures:

#### 1. **Document Processing and Storage**

**What is Document Processing?**
Document processing involves taking raw text (like web pages or PDFs) and converting it into a structured format that our chatbot can understand and search through efficiently.

**Key Concepts:**
- **`@dataclass`**: Automatically generates common methods for a class
- **`defaultdict`**: Dictionary that provides default values for missing keys
- **`deque`**: Double-ended queue for efficient adding/removing from both ends
- **Indexing**: Creating fast lookup structures for quick searches

**Simple Example:**
```python
from dataclasses import dataclass
from collections import defaultdict

@dataclass
class Book:
    title: str
    author: str
    pages: int

# defaultdict example
word_count = defaultdict(int)
text = "hello world hello"
for word in text.split():
    word_count[word] += 1
print(dict(word_count))  # {'hello': 2, 'world': 1}
```

Now let's see how our chatbot processes documents:
```python
from typing import List, Dict, Any, Optional
from dataclasses import dataclass
from collections import defaultdict, deque
import heapq
from datetime import datetime

@dataclass
class Document:
    """Document structure for our knowledge base"""
    content: str
    metadata: Dict[str, Any]
    embedding: Optional[List[float]] = None
    id: Optional[str] = None
    timestamp: datetime = None
    
    def __post_init__(self):
        if self.timestamp is None:
            self.timestamp = datetime.utcnow()

class DocumentStore:
    """Efficient document storage and retrieval"""
    
    def __init__(self):
        # Multiple indexing strategies
        self.documents: List[Document] = []
        self.id_index: Dict[str, int] = {}  # O(1) lookup by ID
        self.content_index: Dict[str, List[int]] = defaultdict(list)  # Word -> doc indices
        self.metadata_index: Dict[str, List[int]] = defaultdict(list)  # Metadata -> doc indices
    
    def add_document(self, doc: Document) -> None:
        """Add document with O(1) indexing"""
        doc_id = len(self.documents)
        self.documents.append(doc)
        
        # Update indexes
        if doc.id:
            self.id_index[doc.id] = doc_id
        
        # Index content words
        words = doc.content.lower().split()
        for word in words:
            self.content_index[word].append(doc_id)
        
        # Index metadata
        for key, value in doc.metadata.items():
            self.metadata_index[f"{key}:{value}"].append(doc_id)
    
    def search_by_id(self, doc_id: str) -> Optional[Document]:
        """O(1) search by document ID"""
        if doc_id in self.id_index:
            return self.documents[self.id_index[doc_id]]
        return None
    
    def search_by_content(self, query: str) -> List[Document]:
        """Search documents by content words"""
        query_words = query.lower().split()
        doc_scores = defaultdict(int)
        
        # Score documents based on word frequency
        for word in query_words:
            if word in self.content_index:
                for doc_id in self.content_index[word]:
                    doc_scores[doc_id] += 1
        
        # Return documents sorted by relevance
        sorted_docs = sorted(doc_scores.items(), key=lambda x: x[1], reverse=True)
        return [self.documents[doc_id] for doc_id, _ in sorted_docs[:10]]
```

#### 2. **Conversation Memory Management**

**What is Conversation Memory?**
Conversation memory is how our chatbot remembers what was said in previous messages. It's like having a conversation with someone who remembers what you talked about earlier.

**Key Concepts:**
- **`deque`**: Double-ended queue - efficient for adding/removing from both ends
- **Sliding Window**: Only keep the most recent messages (like a moving window)
- **Token Limits**: AI models have limits on how much text they can process at once
- **Context Management**: Providing relevant conversation history to the AI

**Simple Example:**
```python
from collections import deque

# deque with max size - automatically removes old items
recent_messages = deque(maxlen=3)

recent_messages.append("Hello")
recent_messages.append("How are you?")
recent_messages.append("I'm fine")
recent_messages.append("What's new?")  # "Hello" is automatically removed

print(list(recent_messages))  # ['How are you?', "I'm fine", "What's new?"]
```

Now let's see how our chatbot manages conversation memory:

```python
from collections import deque
from typing import Tuple

class ConversationMemory:
    """Efficient conversation memory with sliding window"""
    
    def __init__(self, max_size: int = 100):
        self.messages = deque(maxlen=max_size)  # O(1) append/pop
        self.session_data: Dict[str, Any] = {}
        self.message_count = 0
    
    def add_message(self, role: str, content: str, metadata: Dict = None) -> None:
        """Add message with automatic memory management"""
        message = {
            "role": role,
            "content": content,
            "timestamp": datetime.utcnow(),
            "metadata": metadata or {},
            "id": self.message_count
        }
        self.messages.append(message)
        self.message_count += 1
    
    def get_recent_messages(self, n: int = 10) -> List[Dict]:
        """Get last n messages efficiently"""
        return list(self.messages)[-n:]
    
    def get_conversation_context(self, max_tokens: int = 2000) -> str:
        """Get conversation context within token limit"""
        context_parts = []
        current_tokens = 0
        
        # Iterate from most recent to oldest
        for message in reversed(self.messages):
            message_text = f"{message['role']}: {message['content']}"
            message_tokens = len(message_text.split())
            
            if current_tokens + message_tokens > max_tokens:
                break
            
            context_parts.insert(0, message_text)
            current_tokens += message_tokens
        
        return "\n".join(context_parts)
    
    def clear_session(self) -> None:
        """Clear conversation memory"""
        self.messages.clear()
        self.session_data.clear()
```

### Advanced Algorithms:

**What are Algorithms in AI?**
Algorithms are step-by-step procedures for solving specific problems. In AI applications, we use algorithms to find similar text, group related documents, and rank results by relevance.

**Key Algorithm Types:**
- **Similarity Algorithms**: Find how similar two pieces of text are
- **Clustering Algorithms**: Group similar documents together
- **Ranking Algorithms**: Sort results by relevance or importance
- **Search Algorithms**: Find the best matches for a query

**Simple Example:**
```python
# Simple similarity calculation
def simple_similarity(text1, text2):
    words1 = set(text1.lower().split())
    words2 = set(text2.lower().split())
    
    # Calculate Jaccard similarity
    intersection = len(words1 & words2)
    union = len(words1 | words2)
    
    return intersection / union if union > 0 else 0

# Usage
text1 = "Greek football derby"
text2 = "Greek soccer match"
similarity = simple_similarity(text1, text2)
print(f"Similarity: {similarity:.2f}")  # 0.33
```

#### 1. **Text Similarity and Clustering**

**What is Text Similarity?**
Text similarity measures how similar two pieces of text are. It's like asking "How much do these two sentences mean the same thing?"

**Key Concepts:**
- **Cosine Similarity**: Measures the angle between two vectors (0 = identical, 1 = completely different)
- **Embeddings**: Convert text into numbers that represent meaning
- **Clustering**: Group similar documents together automatically
- **K-means**: Algorithm that finds natural groups in data

**Simple Example:**
```python
# Simple word-based similarity
def word_similarity(text1, text2):
    words1 = set(text1.lower().split())
    words2 = set(text2.lower().split())
    
    common_words = words1 & words2
    total_words = words1 | words2
    
    return len(common_words) / len(total_words)

# Usage
similarity = word_similarity("Greek football", "Greek soccer")
print(f"Similarity: {similarity:.2f}")  # 0.5 (50% similar)
```

Now let's see how our chatbot uses advanced similarity algorithms:

```python
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.cluster import KMeans
from typing import List, Tuple

class TextSimilarityEngine:
    """Advanced text similarity and clustering algorithms"""
    
    def __init__(self):
        self.embeddings_cache: Dict[str, np.ndarray] = {}
    
    def cosine_similarity(self, text1: str, text2: str) -> float:
        """Calculate cosine similarity between two texts"""
        emb1 = self._get_embedding(text1)
        emb2 = self._get_embedding(text2)
        
        similarity = cosine_similarity([emb1], [emb2])[0][0]
        return float(similarity)
    
    def find_similar_documents(self, query: str, documents: List[Document], 
                             threshold: float = 0.7) -> List[Tuple[Document, float]]:
        """Find documents similar to query using cosine similarity"""
        query_embedding = self._get_embedding(query)
        similarities = []
        
        for doc in documents:
            if doc.embedding is None:
                doc.embedding = self._get_embedding(doc.content)
            
            similarity = cosine_similarity([query_embedding], [doc.embedding])[0][0]
            if similarity >= threshold:
                similarities.append((doc, similarity))
        
        # Sort by similarity score
        return sorted(similarities, key=lambda x: x[1], reverse=True)
    
    def cluster_documents(self, documents: List[Document], n_clusters: int = 5) -> List[List[Document]]:
        """Cluster documents using K-means"""
        if not documents:
            return []
        
        # Prepare embeddings
        embeddings = []
        for doc in documents:
            if doc.embedding is None:
                doc.embedding = self._get_embedding(doc.content)
            embeddings.append(doc.embedding)
        
        # Perform clustering
        kmeans = KMeans(n_clusters=n_clusters, random_state=42)
        cluster_labels = kmeans.fit_predict(embeddings)
        
        # Group documents by cluster
        clusters = [[] for _ in range(n_clusters)]
        for doc, label in zip(documents, cluster_labels):
            clusters[label].append(doc)
        
        return clusters
    
    def _get_embedding(self, text: str) -> np.ndarray:
        """Get or compute embedding for text"""
        if text in self.embeddings_cache:
            return self.embeddings_cache[text]
        
        # Compute embedding (simplified - in real app, use OpenAI embeddings)
        embedding = np.random.rand(1024)  # Placeholder
        self.embeddings_cache[text] = embedding
        return embedding
```

#### 2. **Priority Queue for Document Ranking**
```python
import heapq
from typing import List, Tuple

class DocumentRanker:
    """Rank documents using priority queue and multiple scoring factors"""
    
    def __init__(self):
        self.weights = {
            "relevance": 0.4,
            "recency": 0.3,
            "popularity": 0.2,
            "quality": 0.1
        }
    
    def rank_documents(self, query: str, documents: List[Document], 
                      top_k: int = 10) -> List[Document]:
        """Rank documents using multiple scoring factors"""
        # Use heap for efficient top-k selection
        heap = []
        
        for doc in documents:
            score = self._calculate_composite_score(query, doc)
            # Use negative score for max-heap behavior
            heapq.heappush(heap, (-score, doc.id or id(doc), doc))
        
        # Extract top-k documents
        top_documents = []
        for _ in range(min(top_k, len(heap))):
            if heap:
                _, _, doc = heapq.heappop(heap)
                top_documents.append(doc)
        
        return top_documents
    
    def _calculate_composite_score(self, query: str, doc: Document) -> float:
        """Calculate composite score for document"""
        scores = {
            "relevance": self._relevance_score(query, doc),
            "recency": self._recency_score(doc),
            "popularity": self._popularity_score(doc),
            "quality": self._quality_score(doc)
        }
        
        # Weighted sum
        composite_score = sum(
            scores[factor] * self.weights[factor] 
            for factor in scores
        )
        
        return composite_score
    
    def _relevance_score(self, query: str, doc: Document) -> float:
        """Calculate relevance score based on text similarity"""
        query_words = set(query.lower().split())
        doc_words = set(doc.content.lower().split())
        
        # Jaccard similarity
        intersection = len(query_words.intersection(doc_words))
        union = len(query_words.union(doc_words))
        
        return intersection / union if union > 0 else 0.0
    
    def _recency_score(self, doc: Document) -> float:
        """Calculate recency score based on timestamp"""
        if doc.timestamp is None:
            return 0.5  # Default score for documents without timestamp
        
        # Score based on how recent the document is
        days_old = (datetime.utcnow() - doc.timestamp).days
        return max(0.0, 1.0 - (days_old / 365))  # Decay over a year
    
    def _popularity_score(self, doc: Document) -> float:
        """Calculate popularity score based on metadata"""
        # Extract view count or similar metric from metadata
        view_count = doc.metadata.get("view_count", 0)
        return min(1.0, view_count / 1000)  # Normalize to 0-1
    
    def _quality_score(self, doc: Document) -> float:
        """Calculate quality score based on content characteristics"""
        content = doc.content
        
        # Factors that indicate quality
        length_score = min(1.0, len(content) / 1000)  # Prefer longer content
        structure_score = 1.0 if any(char in content for char in ".,!?") else 0.5
        uniqueness_score = len(set(content.split())) / len(content.split()) if content else 0
        
        return (length_score + structure_score + uniqueness_score) / 3
```

#### 3. **Efficient Text Processing**
```python
import re
from collections import Counter
from typing import Set, List

class TextProcessor:
    """Advanced text processing algorithms"""
    
    def __init__(self):
        self.stop_words = {"the", "a", "an", "and", "or", "but", "in", "on", "at", "to", "for", "of", "with", "by"}
        self.word_frequencies = Counter()
    
    def preprocess_text(self, text: str) -> str:
        """Clean and preprocess text"""
        # Remove special characters and normalize
        text = re.sub(r'[^\w\s]', ' ', text.lower())
        
        # Remove extra whitespace
        text = re.sub(r'\s+', ' ', text).strip()
        
        return text
    
    def extract_keywords(self, text: str, top_k: int = 10) -> List[Tuple[str, int]]:
        """Extract top keywords using TF-IDF-like scoring"""
        words = self.preprocess_text(text).split()
        
        # Filter stop words and short words
        filtered_words = [
            word for word in words 
            if word not in self.stop_words and len(word) > 2
        ]
        
        # Count word frequencies
        word_counts = Counter(filtered_words)
        
        # Return top-k most frequent words
        return word_counts.most_common(top_k)
    
    def build_vocabulary(self, documents: List[Document]) -> Set[str]:
        """Build vocabulary from document collection"""
        vocabulary = set()
        
        for doc in documents:
            words = self.preprocess_text(doc.content).split()
            vocabulary.update(word for word in words if len(word) > 2)
        
        return vocabulary
    
    def calculate_tf_idf(self, term: str, document: str, 
                         document_collection: List[str]) -> float:
        """Calculate TF-IDF score for a term"""
        # Term frequency in document
        doc_words = document.lower().split()
        tf = doc_words.count(term.lower()) / len(doc_words)
        
        # Document frequency
        docs_containing_term = sum(1 for doc in document_collection 
                                 if term.lower() in doc.lower())
        idf = np.log(len(document_collection) / docs_containing_term) if docs_containing_term > 0 else 0
        
        return tf * idf
```

### Performance Optimization:

#### 1. **Caching and Memoization**
```python
from functools import lru_cache, wraps
import time

def memoize_with_ttl(ttl_seconds: int = 3600):
    """Memoization decorator with time-to-live"""
    def decorator(func):
        cache = {}
        
        @wraps(func)
        def wrapper(*args, **kwargs):
            # Create cache key
            key = str(args) + str(sorted(kwargs.items()))
            current_time = time.time()
            
            # Check if cached result is still valid
            if key in cache:
                result, timestamp = cache[key]
                if current_time - timestamp < ttl_seconds:
                    return result
            
            # Compute and cache result
            result = func(*args, **kwargs)
            cache[key] = (result, current_time)
            
            return result
        
        return wrapper
    return decorator

# Usage in our chatbot
class OptimizedChatbot:
    @memoize_with_ttl(ttl_seconds=1800)  # Cache for 30 minutes
    def get_embedding(self, text: str) -> List[float]:
        """Get embedding with caching"""
        # Expensive embedding computation
        return self._compute_embedding(text)
    
    @lru_cache(maxsize=1000)
    def preprocess_text(self, text: str) -> str:
        """Preprocess text with LRU caching"""
        return self._clean_text(text)
```

#### 2. **Batch Processing**
```python
class BatchProcessor:
    """Process multiple items efficiently in batches"""
    
    def __init__(self, batch_size: int = 32):
        self.batch_size = batch_size
    
    def process_documents_batch(self, documents: List[Document]) -> List[Document]:
        """Process documents in batches for efficiency"""
        processed_docs = []
        
        for i in range(0, len(documents), self.batch_size):
            batch = documents[i:i + self.batch_size]
            processed_batch = self._process_batch(batch)
            processed_docs.extend(processed_batch)
        
        return processed_docs
    
    def _process_batch(self, batch: List[Document]) -> List[Document]:
        """Process a single batch of documents"""
        # Batch processing logic
        for doc in batch:
            doc.content = self._preprocess_text(doc.content)
            doc.embedding = self._compute_embedding(doc.content)
        
        return batch
```

### Best Practices for Data Structures and Algorithms:

1. **Choose appropriate data structures** for your use case
2. **Use built-in collections** (deque, defaultdict, Counter) when possible
3. **Implement caching** for expensive computations
4. **Use batch processing** for handling large datasets
5. **Profile your code** to identify bottlenecks
6. **Consider memory usage** when working with large datasets
7. **Use generators** for memory-efficient iteration
8. **Implement proper indexing** for fast lookups


## Q5: How do we implement testing and debugging strategies for our Python chatbot?

**Answer:**

### What is Testing and Why Do We Need It?

**Testing** is the process of checking if your code works correctly by running it with different inputs and verifying the outputs. It's like quality control for software.

**Why Test Your Code?**
- **Catch Bugs Early**: Find problems before users do
- **Prevent Regressions**: Make sure new changes don't break old functionality
- **Document Behavior**: Tests show how your code is supposed to work
- **Confidence**: Know that your code works in different scenarios
- **Refactoring Safety**: Change code knowing tests will catch any mistakes

**Types of Testing:**
- **Unit Tests**: Test individual functions or methods
- **Integration Tests**: Test how different parts work together
- **End-to-End Tests**: Test the complete user workflow

**Simple Example:**
```python
def add_numbers(a, b):
    return a + b

def test_add_numbers():
    # Test normal case
    assert add_numbers(2, 3) == 5
    
    # Test edge cases
    assert add_numbers(0, 0) == 0
    assert add_numbers(-1, 1) == 0
    
    # Test with floats
    assert add_numbers(1.5, 2.5) == 4.0

# Run the test
test_add_numbers()
print("All tests passed! ✅")
```

Testing and debugging are essential for maintaining code quality and ensuring our RAG chatbot works reliably. We need comprehensive testing strategies that cover unit tests, integration tests, and debugging techniques.

### Testing Strategies:

#### 1. **Unit Testing with pytest**

**What is pytest?**
pytest is a popular testing framework for Python that makes it easy to write and run tests. It automatically finds and runs test functions.

**Key Concepts:**
- **`@pytest.fixture`**: Creates reusable test data or objects
- **`Mock`**: Simulates objects for testing without using real dependencies
- **`patch`**: Temporarily replaces functions or objects during testing
- **`assert`**: Checks if conditions are true (test passes) or false (test fails)

**Simple Example:**
```python
import pytest
from unittest.mock import Mock

def calculate_total(items, tax_rate):
    subtotal = sum(items)
    tax = subtotal * tax_rate
    return subtotal + tax

def test_calculate_total():
    # Test with normal values
    items = [10, 20, 30]
    total = calculate_total(items, 0.1)  # 10% tax
    assert total == 66.0  # (60 + 6)
    
    # Test with empty list
    assert calculate_total([], 0.1) == 0.0

# Run with: pytest test_file.py
```

Now let's see how our chatbot uses pytest for testing:
```python
# tests/test_chatbot.py
import pytest
from unittest.mock import Mock, patch, MagicMock
from backend.standalone_service.greek_derby_chatbot import GreekDerbyChatbot
from backend.api.greek_derby_api import ChatRequest, ChatResponse

class TestGreekDerbyChatbot:
    """Unit tests for GreekDerbyChatbot"""
    
    @pytest.fixture
    def mock_chatbot(self):
        """Create a mock chatbot for testing"""
        with patch('backend.standalone_service.greek_derby_chatbot.OpenAIEmbeddings'):
            with patch('backend.standalone_service.greek_derby_chatbot.PineconeVectorStore'):
                chatbot = GreekDerbyChatbot()
                return chatbot
    
    def test_chatbot_initialization(self, mock_chatbot):
        """Test chatbot initializes correctly"""
        assert mock_chatbot is not None
        assert hasattr(mock_chatbot, 'llm')
        assert hasattr(mock_chatbot, 'embeddings')
        assert hasattr(mock_chatbot, 'vector_store')
    
    def test_ask_question_valid_input(self, mock_chatbot):
        """Test asking a valid question"""
        # Mock the response
        mock_response = "Olympiakos and Panathinaikos are two of the biggest football clubs in Greece."
        mock_chatbot.llm.invoke = Mock(return_value=Mock(content=mock_response))
        
        question = "What is the Greek Derby?"
        response = mock_chatbot.ask_question(question)
        
        assert response == mock_response
        mock_chatbot.llm.invoke.assert_called_once()
    
    def test_ask_question_empty_input(self, mock_chatbot):
        """Test handling empty input"""
        with pytest.raises(ValueError, match="Question cannot be empty"):
            mock_chatbot.ask_question("")
    
    def test_ask_question_invalid_type(self, mock_chatbot):
        """Test handling invalid input type"""
        with pytest.raises(TypeError, match="Question must be a string"):
            mock_chatbot.ask_question(123)
    
    @patch('backend.standalone_service.greek_derby_chatbot.WebBaseLoader')
    def test_load_knowledge_base_success(self, mock_loader, mock_chatbot):
        """Test successful knowledge base loading"""
        # Mock successful web scraping
        mock_docs = [Mock(content="Test content", metadata={})]
        mock_loader.return_value.load.return_value = mock_docs
        
        result = mock_chatbot._load_knowledge_base()
        
        assert result is not None
        mock_loader.assert_called()
    
    @patch('backend.standalone_service.greek_derby_chatbot.WebBaseLoader')
    def test_load_knowledge_base_fallback(self, mock_loader, mock_chatbot):
        """Test fallback when web scraping fails"""
        # Mock web scraping failure
        mock_loader.return_value.load.side_effect = Exception("Network error")
        
        # Should not raise exception, should use fallback
        result = mock_chatbot._load_knowledge_base()
        assert result is not None
```

#### 2. **Integration Testing**
```python
# tests/test_integration.py
import pytest
import asyncio
from fastapi.testclient import TestClient
from backend.api.greek_derby_api import app

class TestAPIIntegration:
    """Integration tests for the API"""
    
    @pytest.fixture
    def client(self):
        """Create test client"""
        return TestClient(app)
    
    def test_root_endpoint(self, client):
        """Test root endpoint"""
        response = client.get("/")
        assert response.status_code == 200
        data = response.json()
        assert "message" in data
        assert "Greek Derby" in data["message"]
    
    def test_health_endpoint(self, client):
        """Test health check endpoint"""
        response = client.get("/health")
        assert response.status_code == 200
        data = response.json()
        assert data["status"] == "healthy"
    
    @patch('backend.api.greek_derby_api.chatbot_api')
    def test_chat_endpoint_success(self, mock_chatbot_api, client):
        """Test successful chat request"""
        # Mock chatbot response
        mock_chatbot_api.process_question_async.return_value = "Test response"
        
        response = client.post("/chat", json={"question": "What is the Greek Derby?"})
        assert response.status_code == 200
        
        data = response.json()
        assert "answer" in data
        assert data["answer"] == "Test response"
        assert "timestamp" in data
    
    def test_chat_endpoint_invalid_input(self, client):
        """Test chat endpoint with invalid input"""
        response = client.post("/chat", json={"invalid": "data"})
        assert response.status_code == 422  # Validation error
    
    @patch('backend.api.greek_derby_api.chatbot_api')
    def test_chat_endpoint_error_handling(self, mock_chatbot_api, client):
        """Test error handling in chat endpoint"""
        # Mock chatbot error
        mock_chatbot_api.process_question_async.side_effect = Exception("Test error")
        
        response = client.post("/chat", json={"question": "Test question"})
        assert response.status_code == 500
```

#### 3. **Async Testing**
```python
# tests/test_async.py
import pytest
import asyncio
from unittest.mock import AsyncMock, patch

class TestAsyncFunctionality:
    """Test async functionality"""
    
    @pytest.mark.asyncio
    async def test_async_chat_processing(self):
        """Test async chat processing"""
        from backend.api.greek_derby_api import GreekDerbyAPI
        
        api = GreekDerbyAPI()
        
        # Mock the chatbot
        mock_chatbot = Mock()
        mock_chatbot.ask_question.return_value = "Async response"
        api.chatbot = mock_chatbot
        
        # Test async processing
        response = await api.process_question_async("Test question")
        
        assert response == "Async response"
        mock_chatbot.ask_question.assert_called_once_with("Test question")
    
    @pytest.mark.asyncio
    async def test_concurrent_requests(self):
        """Test handling concurrent requests"""
        from backend.api.greek_derby_api import GreekDerbyAPI
        
        api = GreekDerbyAPI()
        mock_chatbot = Mock()
        mock_chatbot.ask_question.return_value = "Concurrent response"
        api.chatbot = mock_chatbot
        
        # Create multiple concurrent requests
        tasks = [
            api.process_question_async(f"Question {i}")
            for i in range(5)
        ]
        
        responses = await asyncio.gather(*tasks)
        
        assert len(responses) == 5
        assert all(response == "Concurrent response" for response in responses)
```

### Debugging Techniques:

**What is Debugging?**
Debugging is the process of finding and fixing errors (bugs) in your code. It's like being a detective, following clues to find out why something isn't working as expected.

**Key Debugging Concepts:**
- **Logging**: Recording what your program is doing as it runs
- **Breakpoints**: Pausing execution at specific points to examine variables
- **Profiling**: Measuring how much time and memory your code uses
- **Traceback**: The path your program took when an error occurred

**Simple Example:**
```python
import logging

# Set up logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

def divide_numbers(a, b):
    logger.debug(f"Dividing {a} by {b}")
    
    if b == 0:
        logger.error("Cannot divide by zero!")
        return None
    
    result = a / b
    logger.debug(f"Result: {result}")
    return result

# Usage
result = divide_numbers(10, 2)  # Will log the process
result = divide_numbers(10, 0)  # Will log the error
```

#### 1. **Logging and Debugging**

**What is Logging?**
Logging is recording information about what your program is doing as it runs. It's like keeping a diary of your program's activities.

**Log Levels:**
- **DEBUG**: Detailed information for debugging
- **INFO**: General information about program flow
- **WARNING**: Something unexpected happened, but program continues
- **ERROR**: A serious problem occurred
- **CRITICAL**: A very serious error occurred

**Simple Example:**
```python
import logging

# Set up logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(levelname)s - %(message)s'
)

logger = logging.getLogger(__name__)

def process_data(data):
    logger.info(f"Processing {len(data)} items")
    
    for i, item in enumerate(data):
        logger.debug(f"Processing item {i}: {item}")
        # Process item...
    
    logger.info("Processing complete")

# Usage
data = [1, 2, 3, 4, 5]
process_data(data)
```

Now let's see how our chatbot uses advanced debugging:

```python
import logging
import traceback
from functools import wraps
import time

class DebugLogger:
    """Enhanced logging for debugging"""
    
    def __init__(self, name: str):
        self.logger = logging.getLogger(name)
        self.logger.setLevel(logging.DEBUG)
        
        # Create detailed formatter
        formatter = logging.Formatter(
            '%(asctime)s - %(name)s - %(levelname)s - %(funcName)s:%(lineno)d - %(message)s'
        )
        
        handler = logging.StreamHandler()
        handler.setFormatter(formatter)
        self.logger.addHandler(handler)
    
    def debug_function_call(self, func):
        """Decorator to debug function calls"""
        @wraps(func)
        def wrapper(*args, **kwargs):
            self.logger.debug(f"Calling {func.__name__} with args={args}, kwargs={kwargs}")
            
            start_time = time.time()
            try:
                result = func(*args, **kwargs)
                duration = time.time() - start_time
                self.logger.debug(f"{func.__name__} completed in {duration:.4f}s, result={result}")
                return result
            except Exception as e:
                duration = time.time() - start_time
                self.logger.error(f"{func.__name__} failed after {duration:.4f}s: {e}")
                self.logger.error(f"Traceback: {traceback.format_exc()}")
                raise
        
        return wrapper

# Usage in our chatbot
debug_logger = DebugLogger("greek_derby_chatbot")

class DebuggableChatbot:
    def __init__(self):
        self.logger = debug_logger
    
    @debug_logger.debug_function_call
    def ask_question(self, question: str) -> str:
        """Ask question with debug logging"""
        self.logger.debug(f"Processing question: {question}")
        
        # Process question
        response = self._process_question(question)
        
        self.logger.debug(f"Generated response: {response[:100]}...")
        return response
```

#### 2. **Performance Profiling**
```python
import cProfile
import pstats
from functools import wraps
import time
import memory_profiler

class PerformanceProfiler:
    """Performance profiling utilities"""
    
    @staticmethod
    def profile_function(func):
        """Decorator to profile function performance"""
        @wraps(func)
        def wrapper(*args, **kwargs):
            profiler = cProfile.Profile()
            profiler.enable()
            
            start_time = time.time()
            result = func(*args, **kwargs)
            end_time = time.time()
            
            profiler.disable()
            
            # Print profiling results
            stats = pstats.Stats(profiler)
            stats.sort_stats('cumulative')
            print(f"\n=== Profiling {func.__name__} ===")
            print(f"Execution time: {end_time - start_time:.4f}s")
            stats.print_stats(10)  # Top 10 functions
            
            return result
        
        return wrapper
    
    @staticmethod
    def memory_profile(func):
        """Decorator to profile memory usage"""
        @wraps(func)
        def wrapper(*args, **kwargs):
            print(f"\n=== Memory profiling {func.__name__} ===")
            result = memory_profiler.profile(func)(*args, **kwargs)
            return result
        
        return wrapper

# Usage
class ProfiledChatbot:
    @PerformanceProfiler.profile_function
    @PerformanceProfiler.memory_profile
    def process_question(self, question: str) -> str:
        """Process question with profiling"""
        # Chatbot logic here
        return "Response"
```

#### 3. **Interactive Debugging**
```python
import pdb
import ipdb
from IPython import embed

class InteractiveDebugger:
    """Interactive debugging utilities"""
    
    @staticmethod
    def debug_breakpoint():
        """Set a breakpoint for debugging"""
        pdb.set_trace()
    
    @staticmethod
    def ipython_breakpoint():
        """Set an IPython breakpoint for enhanced debugging"""
        ipdb.set_trace()
    
    @staticmethod
    def ipython_embed():
        """Drop into IPython shell"""
        embed()
    
    @staticmethod
    def debug_chatbot_interaction(chatbot, question: str):
        """Debug a chatbot interaction step by step"""
        print(f"Debugging question: {question}")
        
        # Step 1: Validate input
        print("Step 1: Validating input...")
        if not question.strip():
            print("ERROR: Empty question")
            return
        
        # Step 2: Process question
        print("Step 2: Processing question...")
        try:
            response = chatbot.ask_question(question)
            print(f"Response: {response}")
        except Exception as e:
            print(f"ERROR: {e}")
            # Drop into debugger
            ipdb.set_trace()

# Usage in development
def debug_chatbot():
    """Debug chatbot interactively"""
    chatbot = GreekDerbyChatbot()
    
    # Test with various inputs
    test_questions = [
        "What is the Greek Derby?",
        "",  # Empty question
        "Tell me about Olympiakos",
        "Invalid question with special chars !@#$%"
    ]
    
    for question in test_questions:
        InteractiveDebugger.debug_chatbot_interaction(chatbot, question)
```

### Test Configuration and Setup:

#### 1. **pytest Configuration**
```python
# conftest.py
import pytest
import asyncio
from unittest.mock import Mock, patch
import os

@pytest.fixture(scope="session")
def event_loop():
    """Create an instance of the default event loop for the test session."""
    loop = asyncio.get_event_loop_policy().new_event_loop()
    yield loop
    loop.close()

@pytest.fixture
def mock_environment():
    """Mock environment variables for testing"""
    with patch.dict(os.environ, {
        'OPENAI_API_KEY': 'test_key',
        'PINECONE_API_KEY': 'test_pinecone_key',
        'PINECONE_GREEK_DERBY_INDEX_NAME': 'test_index'
    }):
        yield

@pytest.fixture
def mock_chatbot(mock_environment):
    """Create a mock chatbot for testing"""
    with patch('backend.standalone_service.greek_derby_chatbot.OpenAIEmbeddings'):
        with patch('backend.standalone_service.greek_derby_chatbot.PineconeVectorStore'):
            from backend.standalone_service.greek_derby_chatbot import GreekDerbyChatbot
            return GreekDerbyChatbot()

# pytest.ini
[tool:pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
addopts = -v --tb=short --strict-markers
markers =
    slow: marks tests as slow (deselect with '-m "not slow"')
    integration: marks tests as integration tests
    unit: marks tests as unit tests
```

#### 2. **Test Data Management**
```python
# tests/fixtures.py
import json
from pathlib import Path

class TestDataManager:
    """Manage test data for consistent testing"""
    
    def __init__(self):
        self.fixtures_dir = Path(__file__).parent / "fixtures"
        self.fixtures_dir.mkdir(exist_ok=True)
    
    def load_test_documents(self) -> list:
        """Load test documents from fixtures"""
        docs_file = self.fixtures_dir / "test_documents.json"
        if docs_file.exists():
            with open(docs_file) as f:
                return json.load(f)
        return []
    
    def create_test_document(self, content: str, metadata: dict = None) -> dict:
        """Create a test document"""
        return {
            "content": content,
            "metadata": metadata or {},
            "id": f"test_doc_{hash(content)}",
            "timestamp": "2024-01-01T00:00:00Z"
        }
    
    def save_test_results(self, test_name: str, results: dict):
        """Save test results for analysis"""
        results_file = self.fixtures_dir / f"{test_name}_results.json"
        with open(results_file, 'w') as f:
            json.dump(results, f, indent=2)

# Usage in tests
@pytest.fixture
def test_data():
    return TestDataManager()

def test_document_processing(test_data):
    """Test document processing with test data"""
    test_docs = test_data.load_test_documents()
    # Test processing logic
    pass
```

### Best Practices for Testing and Debugging:

1. **Write tests first** (TDD approach) when possible
2. **Use descriptive test names** that explain what is being tested
3. **Mock external dependencies** to isolate units under test
4. **Test both success and failure cases**
5. **Use fixtures** for common test setup
6. **Profile performance** regularly to catch regressions
7. **Use interactive debugging** for complex issues
8. **Log extensively** in development, reduce in production
9. **Test async code** with proper async test patterns
10. **Maintain test data** separately from production data


## Q6: How do we create a complete Python development workflow for our RAG chatbot?

**Answer:**

### What is a Development Workflow?

A **development workflow** is a systematic process for writing, testing, and deploying code. It's like having a recipe for building software that ensures quality and consistency.

**Why Do We Need a Workflow?**
- **Consistency**: Everyone follows the same process
- **Quality**: Catch problems early and often
- **Collaboration**: Multiple developers can work together smoothly
- **Reliability**: Deploy code with confidence
- **Maintainability**: Code is easier to understand and modify

**Key Components:**
- **Project Structure**: How to organize files and folders
- **Environment Setup**: How to configure development tools
- **Code Quality**: How to maintain clean, readable code
- **Testing**: How to verify code works correctly
- **Deployment**: How to release code to users

**Simple Example:**
```python
# A simple workflow for a Python project
# 1. Create project structure
my_project/
├── src/
│   └── my_module.py
├── tests/
│   └── test_my_module.py
├── requirements.txt
└── README.md

# 2. Write code with tests
def add_numbers(a, b):
    return a + b

def test_add_numbers():
    assert add_numbers(2, 3) == 5

# 3. Run tests
# pytest tests/

# 4. Deploy
# pip install -e .
```

Let's put everything together to create a comprehensive Python development workflow that covers the entire lifecycle of our RAG chatbot, from initial development to production deployment.

### Complete Development Workflow:

#### 1. **Project Structure and Organization**

**What is Project Structure?**
Project structure is how you organize your code files and folders. A good structure makes it easy to find things and understand how the project is organized.

**Key Principles:**
- **Separation of Concerns**: Different types of code in different folders
- **Modularity**: Break code into logical modules
- **Scalability**: Structure that grows with your project
- **Clarity**: Easy to understand what goes where

**Simple Example:**
```python
# Good project structure
my_app/
├── src/                    # Source code
│   ├── models/            # Data models
│   ├── services/          # Business logic
│   └── api/               # API endpoints
├── tests/                 # Test code
├── docs/                  # Documentation
├── requirements.txt       # Dependencies
└── README.md             # Project info
```

Now let's see how our chatbot project is organized:
```python
# Project structure for our Greek Derby RAG chatbot
greek-derby-rag-chatbot/
├── backend/
│   ├── api/
│   │   ├── __init__.py
│   │   ├── greek_derby_api.py
│   │   └── models.py
│   ├── standalone-service/
│   │   ├── __init__.py
│   │   ├── greek_derby_chatbot.py
│   │   └── utils.py
│   ├── scheduler/
│   │   ├── __init__.py
│   │   ├── update_vector_db.py
│   │   └── config.py
│   ├── tests/
│   │   ├── __init__.py
│   │   ├── test_chatbot.py
│   │   ├── test_api.py
│   │   ├── test_integration.py
│   │   └── fixtures/
│   ├── requirements.txt
│   ├── requirements-dev.txt
│   ├── Dockerfile
│   └── README.md
├── front-end/
│   └── react-chatbot/
├── educational-content/
│   ├── 01_fastapi_concepts.ipynb
│   ├── 02_rag_concepts.ipynb
│   ├── 03_react_frontend_concepts.ipynb
│   ├── 04_docker_containerization.ipynb
│   ├── 05_web_scraping_concepts.ipynb
│   ├── 06_devops_cicd_concepts.ipynb
│   ├── 07_python_concepts.ipynb
│   └── README.md
├── .github/
│   └── workflows/
│       ├── ci.yml
│       ├── cd.yml
│       └── security.yml
├── docker-compose.yml
├── .env.example
├── .gitignore
├── pyproject.toml
└── README.md
```

#### 2. **Development Environment Setup**

**What is Development Environment Setup?**
Development environment setup is configuring your computer and tools so you can write, test, and run Python code effectively. It's like setting up your workspace with all the right tools.

**Key Components:**
- **Project Configuration**: Files that describe your project and its needs
- **Dependency Management**: Keeping track of what libraries your project uses
- **Tool Configuration**: Setting up code quality tools and testing frameworks
- **Environment Variables**: Configuration that changes between different environments

**Simple Example:**
```python
# requirements.txt - Simple dependency management
requests>=2.25.0
pytest>=6.0.0
black>=21.0.0

# .env - Environment variables
API_KEY=your_api_key_here
DEBUG=True
```

Now let's see our comprehensive setup:

```python
# pyproject.toml - Modern Python project configuration
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "greek-derby-rag-chatbot"
version = "1.0.0"
description = "A RAG-powered chatbot about Greek football derby"
authors = [{name = "Your Name", email = "your.email@example.com"}]
license = {text = "MIT"}
readme = "README.md"
requires-python = ">=3.9"
classifiers = [
    "Development Status :: 4 - Beta",
    "Intended Audience :: Developers",
    "License :: OSI Approved :: MIT License",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
]

dependencies = [
    "fastapi>=0.100.0",
    "uvicorn[standard]>=0.20.0",
    "langchain>=0.1.0",
    "langchain-openai>=0.0.5",
    "langchain-pinecone>=0.0.3",
    "pinecone-client>=2.2.0",
    "beautifulsoup4>=4.12.0",
    "requests>=2.31.0",
    "python-dotenv>=1.0.0",
    "pydantic>=2.0.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=7.0.0",
    "pytest-asyncio>=0.21.0",
    "pytest-cov>=4.0.0",
    "black>=23.0.0",
    "isort>=5.12.0",
    "flake8>=6.0.0",
    "mypy>=1.0.0",
    "pre-commit>=3.0.0",
    "jupyter>=1.0.0",
    "ipython>=8.0.0",
]

[project.scripts]
greek-derby-chatbot = "backend.standalone_service.greek_derby_chatbot:main"
greek-derby-api = "backend.api.greek_derby_api:main"

[tool.black]
line-length = 88
target-version = ['py39']
include = '\.pyi?$'
extend-exclude = '''
/(
  # directories
  \.eggs
  | \.git
  | \.hg
  | \.mypy_cache
  | \.tox
  | \.venv
  | build
  | dist
)/
'''

[tool.isort]
profile = "black"
multi_line_output = 3
line_length = 88

[tool.mypy]
python_version = "3.9"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
check_untyped_defs = true
disallow_untyped_decorators = true
no_implicit_optional = true
warn_redundant_casts = true
warn_unused_ignores = true
warn_no_return = true
warn_unreachable = true
strict_equality = true

[tool.pytest.ini_options]
testpaths = ["backend/tests"]
python_files = ["test_*.py"]
python_classes = ["Test*"]
python_functions = ["test_*"]
addopts = [
    "--strict-markers",
    "--strict-config",
    "--cov=backend",
    "--cov-report=term-missing",
    "--cov-report=html",
    "--cov-report=xml",
]
markers = [
    "slow: marks tests as slow",
    "integration: marks tests as integration tests",
    "unit: marks tests as unit tests",
]
```

#### 3. **Pre-commit Hooks Configuration**
```yaml
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-added-large-files
      - id: check-merge-conflict
      - id: debug-statements

  - repo: https://github.com/psf/black
    rev: 23.3.0
    hooks:
      - id: black
        language_version: python3.9

  - repo: https://github.com/pycqa/isort
    rev: 5.12.0
    hooks:
      - id: isort
        args: ["--profile", "black"]

  - repo: https://github.com/pycqa/flake8
    rev: 6.0.0
    hooks:
      - id: flake8
        args: [--max-line-length=88, --extend-ignore=E203]

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.3.0
    hooks:
      - id: mypy
        additional_dependencies: [types-requests, types-beautifulsoup4]
```

#### 4. **Development Scripts**
```python
# scripts/dev_setup.py
#!/usr/bin/env python3
"""Development environment setup script"""

import subprocess
import sys
import os
from pathlib import Path

def run_command(command: str, description: str) -> bool:
    """Run a command and return success status"""
    print(f"🔄 {description}...")
    try:
        result = subprocess.run(command, shell=True, check=True, capture_output=True, text=True)
        print(f"✅ {description} completed successfully")
        return True
    except subprocess.CalledProcessError as e:
        print(f"❌ {description} failed: {e}")
        print(f"Error output: {e.stderr}")
        return False

def setup_development_environment():
    """Set up the complete development environment"""
    print("🚀 Setting up Greek Derby RAG Chatbot development environment...")
    
    # Check Python version
    if sys.version_info < (3, 9):
        print("❌ Python 3.9 or higher is required")
        sys.exit(1)
    
    # Install development dependencies
    commands = [
        ("pip install -e .[dev]", "Installing development dependencies"),
        ("pre-commit install", "Installing pre-commit hooks"),
        ("python -m pytest --version", "Verifying pytest installation"),
        ("black --version", "Verifying black installation"),
        ("isort --version", "Verifying isort installation"),
        ("flake8 --version", "Verifying flake8 installation"),
        ("mypy --version", "Verifying mypy installation"),
    ]
    
    success_count = 0
    for command, description in commands:
        if run_command(command, description):
            success_count += 1
    
    print(f"\n📊 Setup completed: {success_count}/{len(commands)} steps successful")
    
    if success_count == len(commands):
        print("🎉 Development environment setup complete!")
        print("\nNext steps:")
        print("1. Copy .env.example to .env and configure your API keys")
        print("2. Run 'python -m pytest' to run tests")
        print("3. Run 'python backend/standalone_service/greek_derby_chatbot.py' to test the chatbot")
        print("4. Run 'python backend/api/greek_derby_api.py' to start the API server")
    else:
        print("⚠️  Some setup steps failed. Please check the errors above.")

if __name__ == "__main__":
    setup_development_environment()
```

#### 5. **Testing and Quality Assurance**
```python
# scripts/run_tests.py
#!/usr/bin/env python3
"""Comprehensive test runner"""

import subprocess
import sys
import argparse
from pathlib import Path

def run_tests(test_type: str = "all", verbose: bool = False):
    """Run tests based on type"""
    base_cmd = "python -m pytest"
    
    if verbose:
        base_cmd += " -v"
    
    test_commands = {
        "unit": f"{base_cmd} -m unit backend/tests/test_chatbot.py",
        "integration": f"{base_cmd} -m integration backend/tests/test_integration.py",
        "api": f"{base_cmd} backend/tests/test_api.py",
        "all": f"{base_cmd} backend/tests/",
        "coverage": f"{base_cmd} --cov=backend --cov-report=html --cov-report=term-missing backend/tests/",
    }
    
    if test_type not in test_commands:
        print(f"❌ Unknown test type: {test_type}")
        print(f"Available types: {', '.join(test_commands.keys())}")
        sys.exit(1)
    
    print(f"🧪 Running {test_type} tests...")
    result = subprocess.run(test_commands[test_type], shell=True)
    
    if result.returncode == 0:
        print(f"✅ {test_type} tests passed!")
    else:
        print(f"❌ {test_type} tests failed!")
        sys.exit(1)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Run tests for Greek Derby RAG Chatbot")
    parser.add_argument("--type", default="all", help="Type of tests to run")
    parser.add_argument("--verbose", "-v", action="store_true", help="Verbose output")
    
    args = parser.parse_args()
    run_tests(args.type, args.verbose)
```

#### 6. **Code Quality and Linting**
```python
# scripts/lint_code.py
#!/usr/bin/env python3
"""Code quality and linting script"""

import subprocess
import sys

def run_linter(tool: str, files: str = "backend/"):
    """Run a specific linter"""
    commands = {
        "black": f"black --check --diff {files}",
        "isort": f"isort --check-only --diff {files}",
        "flake8": f"flake8 {files}",
        "mypy": f"mypy {files}",
        "all": f"black --check {files} && isort --check-only {files} && flake8 {files} && mypy {files}"
    }
    
    if tool not in commands:
        print(f"❌ Unknown linter: {tool}")
        print(f"Available linters: {', '.join(commands.keys())}")
        sys.exit(1)
    
    print(f"🔍 Running {tool} on {files}...")
    result = subprocess.run(commands[tool], shell=True)
    
    if result.returncode == 0:
        print(f"✅ {tool} passed!")
    else:
        print(f"❌ {tool} found issues!")
        if tool != "all":
            sys.exit(1)

if __name__ == "__main__":
    tool = sys.argv[1] if len(sys.argv) > 1 else "all"
    files = sys.argv[2] if len(sys.argv) > 2 else "backend/"
    run_linter(tool, files)
```

### Production Deployment Workflow:

#### 1. **Docker Multi-stage Build**
```dockerfile
# Dockerfile
FROM python:3.9-slim as base

# Set environment variables
ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

# Create and set working directory
WORKDIR /app

# Copy requirements first for better caching
COPY requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY backend/ ./backend/

# Create non-root user
RUN useradd --create-home --shell /bin/bash app && chown -R app:app /app
USER app

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# Run the application
CMD ["python", "-m", "uvicorn", "backend.api.greek_derby_api:app", "--host", "0.0.0.0", "--port", "8000"]
```

#### 2. **Environment Configuration**
```python
# backend/config.py
"""Configuration management for the application"""

import os
from typing import Optional
from pydantic import BaseSettings, Field

class Settings(BaseSettings):
    """Application settings"""
    
    # API Configuration
    app_name: str = "Greek Derby RAG Chatbot"
    app_version: str = "1.0.0"
    debug: bool = Field(default=False, env="DEBUG")
    
    # OpenAI Configuration
    openai_api_key: str = Field(..., env="OPENAI_API_KEY")
    openai_model: str = Field(default="gpt-4o-mini", env="OPENAI_MODEL")
    openai_embedding_model: str = Field(default="text-embedding-3-small", env="OPENAI_EMBEDDING_MODEL")
    
    # Pinecone Configuration
    pinecone_api_key: str = Field(..., env="PINECONE_API_KEY")
    pinecone_index_name: str = Field(..., env="PINECONE_GREEK_DERBY_INDEX_NAME")
    pinecone_environment: str = Field(default="us-west1-gcp", env="PINECONE_ENVIRONMENT")
    
    # Web Scraping Configuration
    user_agent: str = Field(default="Greek Derby Bot/1.0", env="USER_AGENT")
    scraping_delay: float = Field(default=1.0, env="SCRAPING_DELAY")
    
    # API Configuration
    api_host: str = Field(default="0.0.0.0", env="API_HOST")
    api_port: int = Field(default=8000, env="API_PORT")
    api_workers: int = Field(default=1, env="API_WORKERS")
    
    # CORS Configuration
    cors_origins: list = Field(default=["*"], env="CORS_ORIGINS")
    
    # Logging Configuration
    log_level: str = Field(default="INFO", env="LOG_LEVEL")
    
    class Config:
        env_file = ".env"
        case_sensitive = False

# Global settings instance
settings = Settings()
```

### Key Takeaways and Best Practices:

1. **Use modern Python tooling** (pyproject.toml, pre-commit, type hints)
2. **Implement comprehensive testing** (unit, integration, async)
3. **Maintain code quality** with automated linting and formatting
4. **Use proper project structure** for scalability
5. **Implement proper configuration management** for different environments
6. **Use Docker for consistent deployments**
7. **Follow Python best practices** (PEP 8, type hints, documentation)
8. **Implement proper error handling and logging**
9. **Use async programming** for better performance
10. **Profile and optimize** your code regularly

---

**🎉 Congratulations!** You now have a comprehensive understanding of Python concepts and how to implement them in a real-world RAG chatbot project. This knowledge will help you build robust, scalable, and maintainable Python applications.

### Next Steps:

1. **Practice the concepts** by implementing them in your own projects
2. **Explore advanced Python features** like metaclasses, decorators, and generators
3. **Learn about Python performance optimization** techniques
4. **Study design patterns** and their Python implementations
5. **Contribute to open-source Python projects** to gain real-world experience
