# Reddit API Data Retrieval Prototype

**Purpose**: Prototype implementation for streaming Reddit-based sentiment analysis data pipeline

**Features**:
- Rate-limited Reddit API access (600 requests/10min)
- Exponential backoff and circuit breaker patterns
- Structured data storage with SQLite
- Keyword filtering and metadata extraction
- API usage monitoring and error handling

---

## 1. Setup & Configuration

In [1]:
# Dependencies installation
!pip install praw pandas python-dotenv requests

Collecting praw
  Downloading praw-7.8.1-py3-none-any.whl.metadata (9.4 kB)
Collecting prawcore<3,>=2.4 (from praw)
  Downloading prawcore-2.4.0-py3-none-any.whl.metadata (5.0 kB)
Collecting update_checker>=0.18 (from praw)
  Downloading update_checker-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Downloading praw-7.8.1-py3-none-any.whl (189 kB)
Downloading prawcore-2.4.0-py3-none-any.whl (17 kB)
Downloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Installing collected packages: update_checker, prawcore, praw
Successfully installed praw-7.8.1 prawcore-2.4.0 update_checker-0.18.0


In [1]:
import praw
import pandas as pd
import sqlite3
import time
import json
import logging
import requests
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Union
from dataclasses import dataclass, asdict
from collections import deque
import os
from dotenv import load_dotenv
import threading
import queue
from enum import Enum

# Load environment variables
load_dotenv()

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('reddit_api.log'),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

In [2]:
# Configuration
@dataclass
class RedditConfig:
    client_id: str
    client_secret: str
    user_agent: str
    username: Optional[str] = None
    password: Optional[str] = None
    
    # Rate limiting
    max_requests_per_window: int = 600
    window_duration_minutes: int = 10
    base_delay: float = 1.0
    max_delay: float = 60.0
    max_retries: int = 5
    circuit_breaker_threshold: int = 5
    
    # Target subreddits and keywords
    target_subreddits: List[str] = None
    target_keywords: List[str] = None
    
    def __post_init__(self):
        if self.target_subreddits is None:
            self.target_subreddits = ['technology', 'politics', 'investing', 'MachineLearning']
        if self.target_keywords is None:
            self.target_keywords = ['AI', 'interest rates', 'EVs', 'recession', 'inflation']

# Initialize configuration
config = RedditConfig(
    client_id=os.getenv('REDDIT_CLIENT_ID', 'your_client_id'),
    client_secret=os.getenv('REDDIT_CLIENT_SECRET', 'your_client_secret'),
    user_agent=os.getenv('REDDIT_USER_AGENT', 'SentimentAnalyzer:v1.0 (by /u/your_username)'),
    username=os.getenv('REDDIT_USERNAME'),
    password=os.getenv('REDDIT_PASSWORD')
)

print(f"Configuration loaded for subreddits: {config.target_subreddits}")
print(f"Target keywords: {config.target_keywords}")

Configuration loaded for subreddits: ['technology', 'politics', 'investing', 'MachineLearning']
Target keywords: ['AI', 'interest rates', 'EVs', 'recession', 'inflation']


## 2. Data Models

In [3]:
class ContentType(Enum):
    POST = "post"
    COMMENT = "comment"

@dataclass
class RedditPost:
    id: str
    title: str
    content: str
    upvotes: int
    timestamp: datetime
    subreddit: str
    author: str
    author_karma: int
    url: str
    num_comments: int
    content_type: str = ContentType.POST.value
    
    def to_dict(self) -> Dict:
        data = asdict(self)
        data['timestamp'] = self.timestamp.isoformat()
        return data

@dataclass
class RedditComment:
    id: str
    parent_id: str
    content: str
    upvotes: int
    timestamp: datetime
    subreddit: str
    author: str
    author_karma: int
    post_id: str
    content_type: str = ContentType.COMMENT.value
    
    def to_dict(self) -> Dict:
        data = asdict(self)
        data['timestamp'] = self.timestamp.isoformat()
        return data

## 3. Rate Limiting & Circuit Breaker Implementation

In [16]:
class CircuitBreakerState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class RateLimitedRedditClient:
    def __init__(self, config: RedditConfig):
        self.config = config
        self.circuit_state = CircuitBreakerState.CLOSED
        self.failure_count = 0
        self.last_failure_time = None
        self.request_times = deque(maxlen=config.max_requests_per_window)
        self.requests_made = 0
        self.requests_failed = 0
        
        # Initialize Reddit client - READ-ONLY mode (no username/password for data collection)
        # This avoids invalid_grant errors by using client credentials only
        self.reddit = praw.Reddit(
            client_id=config.client_id,
            client_secret=config.client_secret,
            user_agent=config.user_agent
            # Note: Intentionally NOT including username/password for read-only access
        )
        
        logger.info("Reddit client initialized in read-only mode")
    
    def _check_rate_limit(self) -> bool:
        """Check if we're within rate limits"""
        now = datetime.now()
        window_start = now - timedelta(minutes=self.config.window_duration_minutes)
        
        # Remove old requests outside the window
        while self.request_times and self.request_times[0] < window_start:
            self.request_times.popleft()
        
        # Check if we're at the limit
        return len(self.request_times) < self.config.max_requests_per_window

    def make_request(self, request_func, *args, **kwargs):
        """Make rate-limited request with circuit breaker"""
        if self.circuit_state == CircuitBreakerState.OPEN:
            raise Exception("Circuit breaker is OPEN")
        
        for attempt in range(self.config.max_retries):
            try:
                # Wait for rate limit
                if not self._check_rate_limit():
                    time.sleep(1)  # Simple rate limiting
                
                # Make the request
                result = request_func(*args, **kwargs)
                self.request_times.append(datetime.now())
                self.requests_made += 1
                self.failure_count = 0
                return result
                
            except Exception as e:
                self.failure_count += 1
                self.requests_failed += 1
                
                logger.error(f"Request failed (attempt {attempt + 1}): {e}")
                
                if self.failure_count >= self.config.circuit_breaker_threshold:
                    self.circuit_state = CircuitBreakerState.OPEN
                    logger.error(f"Circuit breaker OPEN after {self.failure_count} failures")
                
                if attempt < self.config.max_retries - 1:
                    delay = min(self.config.base_delay * (2 ** attempt), self.config.max_delay)
                    logger.info(f"Retrying in {delay} seconds...")
                    time.sleep(delay)
                else:
                    raise e

## 4. Data Collection Functions

In [17]:
class RedditDataCollector:
    def __init__(self, config: RedditConfig):
        self.config = config
        self.client = RateLimitedRedditClient(config)
        self.collected_posts = []
        self.collected_comments = []
    
    def _extract_post_data(self, submission) -> RedditPost:
        """Extract data from Reddit submission"""
        try:
            return RedditPost(
                id=submission.id,
                title=submission.title,
                content=submission.selftext or "",
                upvotes=submission.score,
                timestamp=datetime.fromtimestamp(submission.created_utc),
                subreddit=submission.subreddit.display_name,
                author=str(submission.author) if submission.author else "[deleted]",
                author_karma=submission.author.comment_karma + submission.author.link_karma if submission.author else 0,
                url=submission.url,
                num_comments=submission.num_comments
            )
        except Exception as e:
            logger.error(f"Error extracting post data: {e}")
            return None

    def collect_subreddit_posts(self, subreddit_name: str, limit: int = 10) -> List[RedditPost]:
        """Collect posts from a specific subreddit"""
        logger.info(f"Collecting {limit} posts from r/{subreddit_name}")
        
        def _get_subreddit_posts():
            subreddit = self.client.reddit.subreddit(subreddit_name)
            return subreddit.hot(limit=limit)
        
        try:
            submissions = self.client.make_request(_get_subreddit_posts)
            posts = []
            
            for submission in submissions:
                post_data = self._extract_post_data(submission)
                if post_data:
                    posts.append(post_data)
                    logger.info(f"Collected post: {post_data.title[:50]}...")
            
            self.collected_posts.extend(posts)
            logger.info(f"Successfully collected {len(posts)} posts from r/{subreddit_name}")
            return posts

        except Exception as e:
            logger.error(f"Failed to collect posts from r/{subreddit_name}: {e}")
            return []

    def collect_all_data(self, posts_per_subreddit: int = 5) -> Dict:
        """Collect data from all target subreddits"""
        logger.info(f"Starting data collection from {len(self.config.target_subreddits)} subreddits")
        
        all_posts = []
        
        for subreddit in self.config.target_subreddits:
            try:
                posts = self.collect_subreddit_posts(subreddit, limit=posts_per_subreddit)
                all_posts.extend(posts)
                time.sleep(0.5)  # Small delay between subreddits
            except Exception as e:
                logger.error(f"Error collecting data from r/{subreddit}: {e}")
                continue
        
        results = {
            'posts': all_posts,
            'collection_time': datetime.now().isoformat(),
            'requests_made': self.client.requests_made,
            'requests_failed': self.client.requests_failed
        }
        
        logger.info(f"Data collection completed: {len(all_posts)} posts")
        return results

## 5. Data Storage

In [18]:
class RedditDataStorage:
    def __init__(self, db_path: str = 'reddit_data.db'):
        self.db_path = db_path
        self.init_database()
    
    def init_database(self):
        """Initialize SQLite database"""
        with sqlite3.connect(self.db_path) as conn:
            cursor = conn.cursor()
            
            cursor.execute('''
                CREATE TABLE IF NOT EXISTS posts (
                    id TEXT PRIMARY KEY,
                    title TEXT NOT NULL,
                    content TEXT,
                    upvotes INTEGER,
                    timestamp DATETIME,
                    subreddit TEXT,
                    author TEXT,
                    author_karma INTEGER,
                    url TEXT,
                    num_comments INTEGER,
                    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
                )
            ''')
            
            conn.commit()
        
        logger.info(f"Database initialized: {self.db_path}")
    
    def store_posts(self, posts: List[RedditPost]) -> int:
        """Store posts in database"""
        if not posts:
            return 0
            return 0
        
        with sqlite3.connect(self.db_path) as conn:
            cursor = conn.cursor()
            stored_count = 0
            
            for post in posts:
                try:
                    cursor.execute('''
                        INSERT OR REPLACE INTO posts 
                        (id, title, content, upvotes, timestamp, subreddit, author, 
                         author_karma, url, num_comments)
                        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
                    ''', (
                        post.id, post.title, post.content, post.upvotes,
                        post.timestamp, post.subreddit, post.author,
                        post.author_karma, post.url, post.num_comments
                    ))
                    stored_count += 1
                except Exception as e:
                    logger.error(f"Error storing post {post.id}: {e}")
            
            conn.commit()
        
        logger.info(f"Stored {stored_count} posts to database")
        return stored_count

    def get_data_summary(self) -> Dict:
        """Get summary of stored data"""
        with sqlite3.connect(self.db_path) as conn:
            cursor = conn.cursor()
            
            cursor.execute('SELECT COUNT(*) FROM posts')
            total_posts = cursor.fetchone()[0]
            
            cursor.execute('SELECT COUNT(DISTINCT subreddit) FROM posts')
            unique_subreddits = cursor.fetchone()[0]
            
            return {
                'total_posts': total_posts,
                'unique_subreddits': unique_subreddits,
                'database_size_mb': os.path.getsize(self.db_path) / 1024 / 1024 if os.path.exists(self.db_path) else 0
            }

## 6. Testing & Demonstration

In [19]:
# Test Reddit Authentication Before Data Collection
def test_reddit_connection(config: RedditConfig):
    """Test Reddit API connection with detailed debugging"""
    print("🔍 Testing Reddit API authentication...")
    
    # Check credentials first
    print("📋 Credential Check:")
    print(f"  Client ID: {'✅ Present' if config.client_id and config.client_id != 'your_client_id' else '❌ Missing/Default'}")
    print(f"  Client Secret: {'✅ Present' if config.client_secret and config.client_secret != 'your_client_secret' else '❌ Missing/Default'}")
    print(f"  User Agent: {'✅ Present' if config.user_agent and 'your_username' not in config.user_agent else '⚠️ Default (should be customized)'}")
    
    if config.client_id == 'your_client_id' or config.client_secret == 'your_client_secret':
        print("❌ CRITICAL: Default credentials detected!")
        print("📝 To get Reddit API credentials:")
        print("1. Go to https://www.reddit.com/prefs/apps")
        print("2. Click 'Create App' or 'Create Another App'")
        print("3. Choose 'script' for personal use")
        print("4. Copy the client ID (under the app name)")
        print("5. Copy the client secret")
        print("6. Add them to your .env file:")
        print("   REDDIT_CLIENT_ID=your_actual_client_id")
        print("   REDDIT_CLIENT_SECRET=your_actual_client_secret")
        return False
    
    try:
        # Test read-only access (no username/password to avoid invalid_grant errors)
        print("🔗 Testing read-only API access...")
        test_reddit = praw.Reddit(
            client_id=config.client_id,
            client_secret=config.client_secret,
            user_agent=config.user_agent
            # Note: No username/password for read-only access
        )
        
        # Simple test - get subreddit info (this should work with just client credentials)
        test_subreddit = test_reddit.subreddit('test')
        subreddit_name = test_subreddit.display_name
        
        print(f"✅ Authentication successful!")
        print(f"   Connected to r/{subreddit_name}")
        
        # Try to get one post to verify read access
        try:
            posts = list(test_subreddit.hot(limit=1))
            if posts:
                print(f"   Sample post: {posts[0].title[:50]}...")
                print("✅ Read access confirmed!")
                return True
            else:
                print("✅ Authentication works (no posts in test subreddit)")
                return True
        except Exception as post_error:
            print(f"⚠️ Auth works but post retrieval failed: {post_error}")
            return True  # Auth still works
            
    except Exception as e:
        print(f"❌ Authentication failed: {e}")
        print("💡 Common fixes:")
        print("1. Verify your REDDIT_CLIENT_ID is correct")
        print("2. Verify your REDDIT_CLIENT_SECRET is correct") 
        print("3. Ensure your Reddit app type is 'script' at https://reddit.com/prefs/apps")
        print("4. Make sure your user agent is unique and descriptive")
        print("5. Check if your Reddit account email is verified")
        
        # Additional debugging for specific errors
        if "invalid_grant" in str(e):
            print("🔍 invalid_grant Error Analysis:")
            print("  - This usually occurs with username/password auth issues")
            print("  - For data collection, we use read-only mode (no username/password needed)")
            print("  - Your app type should be 'script', not 'web app'")
        elif "401" in str(e):
            print("🔍 401 Error Analysis:")
            print("  - Invalid client_id or client_secret")
            print("  - App might be deleted or suspended")
            print("  - Check https://reddit.com/prefs/apps for your app status")
        elif "403" in str(e):
            print("🔍 403 Error Analysis:")
            print("  - Account might be suspended")
            print("  - Rate limiting (wait and try again)")
        
        return False

# Test authentication before proceeding
auth_success = test_reddit_connection(config)
if not auth_success:
    print("🛑 Please fix authentication before continuing!")
    print("Quick Setup Guide:")
    print("1. Copy .env.example to .env")
    print("2. Get Reddit credentials from https://reddit.com/prefs/apps")
    print("3. Update your .env file with real credentials")
    print("4. Restart this notebook")
else:
    print("🎉 Authentication successful! Ready to collect data.")

🔍 Testing Reddit API authentication...
📋 Credential Check:
  Client ID: ✅ Present
  Client Secret: ✅ Present
  User Agent: ✅ Present
🔗 Testing read-only API access...
✅ Authentication successful!
   Connected to r/test
   Sample post: Some test commands...
✅ Read access confirmed!
🎉 Authentication successful! Ready to collect data.


In [11]:
# Test Reddit Authentication Before Data Collection
def test_reddit_connection(config: RedditConfig):
    """Test Reddit API connection with detailed debugging"""
    print("🔍 Testing Reddit API authentication...")
    
    # Check credentials first
    print("📋 Credential Check:")
    print(f"  Client ID: {'✅ Present' if config.client_id and config.client_id != 'your_client_id' else '❌ Missing/Default'}")
    print(f"  Client Secret: {'✅ Present' if config.client_secret and config.client_secret != 'your_client_secret' else '❌ Missing/Default'}")
    print(f"  User Agent: {'✅ Present' if config.user_agent and 'your_username' not in config.user_agent else '⚠️ Default (should be customized)'}")
    
    if config.client_id == 'your_client_id' or config.client_secret == 'your_client_secret':
        print("❌ CRITICAL: Default credentials detected!")
        print("📝 To get Reddit API credentials:")
        print("1. Go to https://www.reddit.com/prefs/apps")
        print("2. Click 'Create App' or 'Create Another App'")
        print("3. Choose 'script' for personal use")
        print("4. Copy the client ID (under the app name)")
        print("5. Copy the client secret")
        print("6. Add them to your .env file:")
        print("   REDDIT_CLIENT_ID=your_actual_client_id")
        print("   REDDIT_CLIENT_SECRET=your_actual_client_secret")
        return False
    
    try:
        # Test read-only access
        print("🔗 Testing read-only API access...")
        test_reddit = praw.Reddit(
            client_id=config.client_id,
            client_secret=config.client_secret,
            user_agent=config.user_agent
        )
        
        # Simple test - get subreddit info (this should work with just client credentials)
        test_subreddit = test_reddit.subreddit('test')
        subreddit_name = test_subreddit.display_name
        
        print(f"✅ Authentication successful!")
        print(f"   Connected to r/{subreddit_name}")
        
        # Try to get one post to verify read access
        try:

            posts = list(test_subreddit.hot(limit=1))
            if posts:
                print(f"   Sample post: {posts[0].title[:50]}...")
                print("✅ Read access confirmed!")
                return True
            else:
                print("✅ Authentication works (no posts in test subreddit)")
                return True
        except Exception as post_error:
            print(f"⚠️ Auth works but post retrieval failed: {post_error}")
            return True  # Auth still works
            
    except Exception as e:
        print(f"❌ Authentication failed: {e}")
        print("n💡 Common fixes:")
        print("1. Verify your REDDIT_CLIENT_ID is correct")
        print("2. Verify your REDDIT_CLIENT_SECRET is correct") 
        print("3. Ensure your Reddit app type is 'script' at https://reddit.com/prefs/apps")
        print("4. Make sure your user agent is unique and descriptive")
        print("5. Check if your Reddit account email is verified")
        
        # Additional debugging
        if "401" in str(e):
            print("🔍 401 Error Analysis:")
            print("  - Invalid client_id or client_secret")
            print("  - App might be deleted or suspended")
            print("  - Check https://reddit.com/prefs/apps for your app status")
        elif "403" in str(e):
            print("🔍 403 Error Analysis:")
            print("  - Account might be suspended")
            print("  - Rate limiting (wait and try again)")
        
        return False

# Test authentication before proceeding
auth_success = test_reddit_connection(config)
if not auth_success:
    print("🛑 Please fix authentication before continuing!")
    print("Quick Setup Guide:")
    print("1. Copy .env.example to .env")
    print("2. Get Reddit credentials from https://reddit.com/prefs/apps")
    print("3. Update your .env file with real credentials")
    print("4. Restart this notebook")
else:
    print("🎉 Authentication successful! Ready to collect data.")

🔍 Testing Reddit API authentication...
📋 Credential Check:
  Client ID: ✅ Present
  Client Secret: ✅ Present
  User Agent: ✅ Present
🔗 Testing read-only API access...
✅ Authentication successful!
   Connected to r/test
   Sample post: Some test commands...
✅ Read access confirmed!
🎉 Authentication successful! Ready to collect data.


In [20]:
# Quick Test: Try collecting just 1 post from a reliable subreddit
print("🧪 Quick test: Collecting 1 post from r/test...")

try:
    # Reinitialize collector with fixed configuration
    test_collector = RedditDataCollector(config)
    
    # Test with a simple, reliable subreddit
    test_posts = test_collector.collect_subreddit_posts('test', limit=1)
    
    if test_posts:
        print(f"✅ SUCCESS! Retrieved {len(test_posts)} post(s)")
        for post in test_posts:
            print(f"   📰 {post.title[:60]}...")
            print(f"   👆 {post.upvotes} upvotes | 💬 {post.num_comments} comments")
    else:
        print("⚠️ No posts found in r/test, but authentication worked!")
    
    print("🎉 Authentication and data collection are working!")
    print("Ready to collect from target subreddits.")
    
except Exception as e:
    print(f"❌ Test failed: {e}")
    if "invalid_grant" in str(e):
        print("💡 Still getting invalid_grant - restart the notebook kernel and try again")
    else:
        print("💡 Different error - check your Reddit credentials")

2025-08-06 01:21:02,946 - INFO - Reddit client initialized in read-only mode
2025-08-06 01:21:02,947 - INFO - Collecting 1 posts from r/test


🧪 Quick test: Collecting 1 post from r/test...


2025-08-06 01:21:03,488 - INFO - Collected post: Some test commands...
2025-08-06 01:21:03,488 - INFO - Successfully collected 1 posts from r/test


✅ SUCCESS! Retrieved 1 post(s)
   📰 Some test commands...
   👆 41 upvotes | 💬 1765 comments
🎉 Authentication and data collection are working!
Ready to collect from target subreddits.


In [23]:
# Test: Collect sample data
print("🚀 Starting sample data collection...")
collector = RedditDataCollector(config)
storage = RedditDataStorage('reddit_prototype.db')
try:
    # Collect 2 posts from each subreddit
    results = collector.collect_all_data(posts_per_subreddit=2)
    
    print("📈 Collection Results:")
    print(f"  Posts collected: {len(results['posts'])}")
    print(f"  Collection time: {results['collection_time']}")
    
    # Store data
    posts_stored = storage.store_posts(results['posts'])
    
    print("💾 Storage Results:")
    print(f"  Posts stored: {posts_stored}")
    
except Exception as e:
    print(f"❌ Error during data collection: {e}")
    print("Make sure your Reddit API credentials are configured correctly in the .env file")

2025-08-06 01:22:47,705 - INFO - Reddit client initialized in read-only mode
2025-08-06 01:22:47,710 - INFO - Database initialized: reddit_prototype.db
2025-08-06 01:22:47,710 - INFO - Starting data collection from 4 subreddits
2025-08-06 01:22:47,711 - INFO - Collecting 2 posts from r/technology


🚀 Starting sample data collection...


2025-08-06 01:22:48,315 - INFO - Collected post: Grok generates fake Taylor Swift nudes without bei...
2025-08-06 01:22:48,434 - INFO - Collected post: White House Orders NASA to Destroy Important Satel...
2025-08-06 01:22:48,435 - INFO - Successfully collected 2 posts from r/technology
2025-08-06 01:22:48,940 - INFO - Collecting 2 posts from r/politics
2025-08-06 01:22:49,216 - INFO - Collected post: Discussion Thread: Texas House Convenes and Texas ...
2025-08-06 01:22:49,322 - INFO - Collected post: Republicans Subpoena Everyone and Anyone Over Epst...
2025-08-06 01:22:49,323 - INFO - Successfully collected 2 posts from r/politics
2025-08-06 01:22:49,829 - INFO - Collecting 2 posts from r/investing
2025-08-06 01:22:50,166 - INFO - Collected post: Daily General Discussion and Advice Thread - Augus...
2025-08-06 01:22:50,269 - INFO - Collected post: Annual PSA: Investing and Trading Scam Reminder...
2025-08-06 01:22:50,270 - INFO - Successfully collected 2 posts from r/investing
2025-

📈 Collection Results:
  Posts collected: 8
  Collection time: 2025-08-06T01:22:51.669024
💾 Storage Results:
  Posts stored: 8


In [24]:
# Display results
summary = storage.get_data_summary()
print("📊 Final Summary:")
for key, value in summary.items():
    if 'size' in key:
        print(f"  {key}: {value:.2f}")
    else:
        print(f"  {key}: {value}")

print("🚀 Ready for integration with sentiment analysis pipeline!")

📊 Final Summary:
  total_posts: 8
  unique_subreddits: 4
  database_size_mb: 0.02
🚀 Ready for integration with sentiment analysis pipeline!


## 7. Next Steps

This prototype demonstrates core Reddit API functionality. For production:

1. **Configure Reddit API**: Copy `.env.example` to `.env` and add credentials
2. **Add Sentiment Analysis**: Integrate VADER or RoBERTa models
3. **Scale Storage**: Move to PostgreSQL for production
4. **Add Monitoring**: Implement comprehensive metrics and alerting
5. **Create Dashboard**: Build Streamlit interface for visualization