# 03: Reset Database

## What This Notebook Does
This notebook resets the database by dropping all tables created during data ingestion. Use this notebook when you want to start fresh and re-run `01_data_ingestion.ipynb`.

**‚ö†Ô∏è WARNING**: This will delete all data from the following tables:
- `fund_holdings` - All holdings data
- `fund_trades` - All trades data

**Use Cases:**
- Starting fresh with new data
- Fixing data ingestion issues
- Testing data ingestion process
- Re-running ingestion after schema changes

## What Each Cell Does

**Cell 1: Imports and Setup**
- Imports required libraries (sqlalchemy, dotenv)
- Loads environment variables (database URL)
- Sets up database connection

**Cell 2: Drop Tables**
- Drops `fund_holdings` table (if exists)
- Drops `fund_trades` table (if exists)
- Drops associated indexes
- Verifies tables are deleted

**Cell 3: Verify Reset**
- Checks that tables no longer exist
- Confirms database is ready for fresh ingestion
- Provides next steps instructions

In [None]:
# Cell 1: Imports and Setup
#
# WHAT THIS CELL DOES:
# - Imports required libraries for database operations
# - Loads database connection URL from environment variables
# - Sets up SQLAlchemy connection string
# - Displays warning about data deletion
#
# LOGIC:
# - Uses dotenv to load .env file from parent directory
# - Constructs database URL from environment variable or default
# - Ensures psycopg driver is specified (required for SQLAlchemy)
# - Shows warning to prevent accidental data loss

import os  # For environment variables
from pathlib import Path  # For path handling
from dotenv import load_dotenv  # For loading .env file
from sqlalchemy import create_engine, text, inspect, pool  # For database operations

# Project root: Go up one level from notebook2/ to loop-task/
project_root = Path.cwd().parent

# Load environment variables from .env file in parent directory
# Logic: .env file contains DATABASE_URL configuration
load_dotenv(project_root.parent / ".env")

# Database URL from environment variable or default
# Logic: Defaults to localhost if not set, ensures psycopg driver is specified
DATABASE_URL = os.getenv("DATABASE_URL", "postgresql://postgres:postgres@localhost:5432/fund_data")
# Ensure psycopg driver is in URL (required for SQLAlchemy with PostgreSQL)
if DATABASE_URL.startswith("postgresql://") and "+psycopg" not in DATABASE_URL:
    DATABASE_URL = DATABASE_URL.replace("postgresql://", "postgresql+psycopg://", 1)

print("‚úÖ Imports loaded")
print(f"   Database: {DATABASE_URL.split('@')[-1] if '@' in DATABASE_URL else DATABASE_URL}")
print("\n‚ö†Ô∏è  WARNING: This will delete all data from fund_holdings and fund_trades tables!")

‚úÖ Imports loaded
   Database: localhost:5432/funddb



In [None]:
# Cell 2: Drop Tables
#
# WHAT THIS CELL DOES:
# - Connects to PostgreSQL database
# - Checks which tables exist
# - Shows row counts before deletion (for confirmation)
# - Drops fund_holdings and fund_trades tables
# - Removes all indexes and constraints (CASCADE)
#
# LOGIC:
# - Connection: Creates SQLAlchemy engine with connection pooling
# - Table inspection: Uses SQLAlchemy inspector to check existing tables
# - Row count: Executes COUNT(*) before dropping to show what will be deleted
# - CASCADE: Drops indexes and foreign key constraints automatically
# - IF EXISTS: Prevents errors if tables don't exist
# - Transaction: Commits each drop operation

# Create database engine
# Logic: Same configuration as other notebooks for consistency
engine = create_engine(
    DATABASE_URL,
    poolclass=pool.QueuePool,  # Connection pooling
    pool_size=10,
    max_overflow=20,
    pool_pre_ping=True,  # Test connections before use
    echo=False
)

# Test connection
# Logic: Simple SELECT 1 query verifies database is accessible
try:
    with engine.connect() as conn:
        conn.execute(text("SELECT 1"))
    print("‚úÖ Database connected")
except Exception as e:
    print(f"‚ùå Database connection failed: {e}")
    raise

# Drop tables (CASCADE will also drop indexes and constraints)
# Logic: CASCADE automatically drops dependent objects (indexes, constraints)
with engine.connect() as conn:
    # Check if tables exist before dropping
    # Logic: Uses SQLAlchemy inspector to get list of existing tables
    inspector = inspect(engine)
    existing_tables = inspector.get_table_names()
    
    tables_to_drop = ["fund_holdings", "fund_trades"]
    
    for table_name in tables_to_drop:
        if table_name in existing_tables:
            # Get row count before dropping
            # Logic: Shows user how much data will be deleted
            try:
                count_result = conn.execute(text(f"SELECT COUNT(*) FROM {table_name}"))
                row_count = count_result.scalar()
                print(f"\nüìä {table_name}: {row_count} rows")
            except:
                pass  # If count fails, continue with drop
            
            # Drop table with CASCADE to remove indexes and constraints
            # Logic: IF EXISTS prevents errors, CASCADE removes dependent objects
            conn.execute(text(f"DROP TABLE IF EXISTS {table_name} CASCADE"))
            conn.commit()  # Commit the drop operation
            print(f"‚úÖ Dropped table: {table_name}")
        else:
            print(f"‚ÑπÔ∏è  Table does not exist: {table_name}")

print("\n‚úÖ All tables dropped successfully")

‚úÖ Database connected

üìä fund_holdings: 1020 rows
‚úÖ Dropped table: fund_holdings

üìä fund_trades: 649 rows
‚úÖ Dropped table: fund_trades

‚úÖ All tables dropped successfully


In [None]:
# Cell 3: Verify Reset
#
# WHAT THIS CELL DOES:
# - Verifies that tables were successfully deleted
# - Lists any remaining tables in the database
# - Provides next steps instructions
# - Confirms database is ready for fresh data ingestion
#
# LOGIC:
# - Table inspection: Uses SQLAlchemy inspector to check remaining tables
# - Verification: Checks if target tables (fund_holdings, fund_trades) still exist
# - Status reporting: Shows success or warning if tables still exist
# - Next steps: Provides clear instructions for re-running data ingestion

# Verify tables are deleted
# Logic: Inspects database to confirm tables no longer exist
inspector = inspect(engine)
remaining_tables = inspector.get_table_names()

target_tables = ["fund_holdings", "fund_trades"]
# Check if any target tables still exist
# Logic: Compares target list with remaining tables
tables_still_exist = [t for t in target_tables if t in remaining_tables]

if tables_still_exist:
    print(f"‚ö†Ô∏è  Warning: These tables still exist: {tables_still_exist}")
else:
    print("‚úÖ Verification: All target tables have been deleted")
    print(f"   Remaining tables in database: {remaining_tables if remaining_tables else 'None'}")

print("\n" + "=" * 80)
print("DATABASE RESET COMPLETE")
print("=" * 80)
print("\n‚úÖ Next Steps:")
print("   1. Run notebook: 01_data_ingestion.ipynb")
print("   2. This will recreate the tables and load fresh data")
print("   3. Then run notebook: 02_test_chatbot.ipynb to verify")
print("\nüí° Tip: You can now safely re-run 01_data_ingestion.ipynb")

‚úÖ Verification: All target tables have been deleted
   Remaining tables in database: None

DATABASE RESET COMPLETE

‚úÖ Next Steps:
   1. Run notebook: 01_data_ingestion.ipynb
   2. This will recreate the tables and load fresh data
   3. Then run notebook: 02_test_chatbot.ipynb to verify

üí° Tip: You can now safely re-run 01_data_ingestion.ipynb
