# Tier 1 Analysis - Setup Instructions

## Prerequisites
1. **Python 3.8+** installed on your system
2. **Jupyter Notebook/Lab** installed
3. **Database access** to the Worldly database

## Setup Steps

### 1. Clone and Navigate to Project
```bash
git clone <https://github.com/higgco/isaac-at-worldly>
cd tier-1-grouping
```

### 2. Create Virtual Environment
```bash
# Create virtual environment
python -m venv tier-1-venv

# Activate virtual environment
# On macOS/Linux:
source tier-1-venv/bin/activate
# On Windows:
# tier-1-venv\Scripts\activate
```

### 3. Install Dependencies
```bash
pip install -r requirements.txt
```

### 4. Database Configuration
Create a `.env` file in the project root with your database credentials:

```bash
# Create .env file
touch .env
```

Add the following content to `.env`:
```env
# Database Configuration
DB_USER=your_username
DB_PASSWORD=your_password
DB_HOST=wg-data-rds.data.higg.org
DB_PORT=5432
DB_NAME=db_higg
```

**⚠️ Important:** Replace the placeholder values with your actual database credentials.

### 5. Start Jupyter
```bash
# Make sure virtual environment is activated
source tier-1-venv/bin/activate

# Start Jupyter
jupyter notebook
# or
jupyter lab
```

### 6. Select Kernel
- Open `tier 1 analysis.ipynb`
- Select **"Tier 1 Analysis"** as your kernel
- Run the cells in order

## Security Notes
- **Never commit** the `.env` file to version control
- **Keep credentials secure** and don't share them
- **Use different credentials** for different environments (dev/staging/prod)

## Troubleshooting
- **Connection issues**: Verify your database credentials in `.env`
- **Kernel not found**: Make sure you've activated the virtual environment
- **Package errors**: Run `pip install -r requirements.txt` again


In [None]:
# Direct Python Database Connection
import pandas as pd
import psycopg2
from sqlalchemy import create_engine, text
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

print("🔧 Setting up direct Python database connection...")

# Create connection string
DB_USER = os.getenv('DB_USER')
DB_PASSWORD = os.getenv('DB_PASSWORD')
DB_HOST = os.getenv('DB_HOST')
DB_PORT = os.getenv('DB_PORT')
DB_NAME = os.getenv('DB_NAME')

connection_string = f"postgresql+psycopg2://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}"

print(f"Connection string: postgresql+psycopg2://{DB_USER}:***@{DB_HOST}:{DB_PORT}/{DB_NAME}")
print("✅ Environment variables loaded successfully!")


The sql extension is already loaded. To reload it, use:
  %reload_ext sql
Database connection established using .env file!
Connected to: wg-data-rds.data.higg.org:5432/db_higg


In [15]:
# Test Database Connection (Fixed)
try:
    # Test with a simple query
    result = %sql SELECT 1 as test_connection;
    print("✅ Database connection successful!")
    print("Test query result:", result[0][0])
except Exception as e:
    print("❌ Database connection failed:")
    print(f"Error: {e}")
    print("\nTroubleshooting steps:")
    print("1. Check your .env file has correct credentials")
    print("2. Verify database host is accessible")
    print("3. Ensure your IP is whitelisted (if required)")
    print("4. Check if the database server is running")


 * postgresql+psycopg2://isaac_hopwood:***@wg-data-rds.data.higg.org:5432/db_higg
❌ Database connection failed:
Error: 'DEFAULT'

Troubleshooting steps:
1. Check your .env file has correct credentials
2. Verify database host is accessible
3. Ensure your IP is whitelisted (if required)
4. Check if the database server is running


In [16]:
# Test PostgreSQL Version (run this after connection test passes)
%sql SELECT version();


 * postgresql+psycopg2://isaac_hopwood:***@wg-data-rds.data.higg.org:5432/db_higg


KeyError: 'DEFAULT'

In [None]:
# Enhanced Method 1 with error handling
try:
    # Read the SQL file
    with open('facility type and pc.sql', 'r') as file:
        sql_query = file.read()
    
    print("SQL file loaded successfully!")
    print(f"Query length: {len(sql_query)} characters")
    
    # Store the query in a global variable for use with %sql
    globals()['sql_query'] = sql_query
    print("Query is ready to execute with: %sql $sql_query")
    
except FileNotFoundError:
    print("Error: SQL file not found. Make sure 'facility type and pc.sql' exists in the current directory.")
except Exception as e:
    print(f"Error reading SQL file: {e}")

In [None]:
# Simple Connection Test (No PrettyTable Issues)
# This will show the result directly without formatting issues
%sql SELECT 'Connection successful!' as status, current_timestamp as connected_at;


In [None]:
# Test Your SQL File
# Load and execute your facility type and pc.sql file
with open('facility type and pc.sql', 'r') as file:
    sql_query = file.read()

print("SQL file loaded successfully!")
print(f"Query length: {len(sql_query)} characters")

# Store the query in a global variable for use with %sql
globals()['sql_query'] = sql_query
print("Query is ready to execute with: %sql $sql_query")


In [None]:
# Skip the problematic test cell and go straight to your SQL file
print("✅ Database connection is working! (We can see the connection string above)")
print("Let's proceed with your SQL file execution...")


In [None]:
# Execute the SQL query
%sql $sql_query


In [None]:
# Method 3: Convert to DataFrame for better handling
print("=== Method 3: Convert to DataFrame ===")
import pandas as pd

# Enable autopandas for automatic DataFrame conversion
%config SqlMagic.autopandas = True

# Run query and get DataFrame
df = %sql sql_query
print("DataFrame:")
print(df)
print("\nDataFrame info:")
print(df.info())


In [None]:
# Method 3: Convert to DataFrame (FIXED VERSION)
print("=== Method 3: Convert to DataFrame (Fixed) ===")
import pandas as pd

# Enable autopandas for automatic DataFrame conversion
%config SqlMagic.autopandas = True

# Run a simple test query first (not using variables)
df = %sql SELECT 'Hello from database!' as message, current_timestamp as time;
print("DataFrame:")
print(df)
print("\nDataFrame info:")
print(df.info())


In [20]:
# Execute Your SQL File with DataFrame Output
print("=== Executing Your SQL File ===")

# Load your SQL file
with open('facility type and pc.sql', 'r') as file:
    sql_query = file.read()

print("SQL file loaded successfully!")
print(f"Query length: {len(sql_query)} characters")

# Method 1: Direct output (recommended for viewing)
print("\n--- Direct Output ---")
%sql $sql_query


=== Executing Your SQL File ===
SQL file loaded successfully!
Query length: 1461 characters

--- Direct Output ---
 * postgresql+psycopg2://isaac_hopwood:***@wg-data-rds.data.higg.org:5432/db_higg


In [22]:
# Direct Python Database Connection (No SQL Magic)
import pandas as pd
import psycopg2
from sqlalchemy import create_engine, text
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

print("🔧 Setting up direct Python database connection...")

# Create connection string
DB_USER = os.getenv('DB_USER')
DB_PASSWORD = os.getenv('DB_PASSWORD')
DB_HOST = os.getenv('DB_HOST')
DB_PORT = os.getenv('DB_PORT')
DB_NAME = os.getenv('DB_NAME')

connection_string = f"postgresql+psycopg2://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}"

print(f"Connection string: postgresql+psycopg2://{DB_USER}:***@{DB_HOST}:{DB_PORT}/{DB_NAME}")
print("✅ Environment variables loaded successfully!")


🔧 Setting up direct Python database connection...
Connection string: postgresql+psycopg2://isaac_hopwood:***@wg-data-rds.data.higg.org:5432/db_higg
✅ Environment variables loaded successfully!


In [23]:
# Test Direct Database Connection
print("🔍 Testing direct database connection...")

try:
    # Create SQLAlchemy engine
    engine = create_engine(connection_string)
    
    # Test connection with a simple query
    with engine.connect() as connection:
        result = connection.execute(text("SELECT 'Connection successful!' as status, current_timestamp as connected_at"))
        row = result.fetchone()
        
    print("✅ DIRECT DATABASE CONNECTION SUCCESSFUL!")
    print(f"Status: {row[0]}")
    print(f"Connected at: {row[1]}")
    print("✅ Ready to execute queries!")
    
except Exception as e:
    print("❌ DIRECT DATABASE CONNECTION FAILED!")
    print(f"Error: {e}")
    print("\n🔧 Troubleshooting:")
    print("1. Check your .env file credentials")
    print("2. Verify database server is running")
    print("3. Check network connectivity")
    print("4. Ensure IP is whitelisted")


🔍 Testing direct database connection...
✅ DIRECT DATABASE CONNECTION SUCCESSFUL!
Status: Connection successful!
Connected at: 2025-09-09 21:22:16.745512+00:00
✅ Ready to execute queries!


In [25]:
# Execute Your SQL File with Direct Python Connection
print("📁 Loading and executing your SQL file...")

try:
    # Read your SQL file
    with open('facility type and pc.sql', 'r') as file:
        sql_query = file.read()
    
    print(f"✅ SQL file loaded successfully! ({len(sql_query)} characters)")
    print("🔍 Executing query...")
    
    # Execute the query using pandas
    df_results = pd.read_sql(sql_query, engine)
    
    print("✅ QUERY EXECUTED SUCCESSFULLY!")
    print(f"📊 Results shape: {df_results.shape}")
    print(f"📋 Columns: {list(df_results.columns)}")
    print("\n📄 First 5 rows:")
    print(df_results.head())
    
    # Store results for further analysis
    print(f"\n💾 Results stored in 'df_results' variable")
    print(f"📈 Total rows returned: {len(df_results)}")
    
except Exception as e:
    print("❌ QUERY EXECUTION FAILED!")
    print(f"Error: {e}")
    print("\n🔧 Possible issues:")
    print("1. SQL syntax error in your file")
    print("2. Table 'fem_simple_090825' doesn't exist")
    print("3. Missing permissions on the table")
    print("4. Database connection lost")


📁 Loading and executing your SQL file...
✅ SQL file loaded successfully! (1461 characters)
🔍 Executing query...
❌ QUERY EXECUTION FAILED!
Error: sqlalchemy.cyextension.immutabledict.immutabledict is not a sequence

🔧 Possible issues:
1. SQL syntax error in your file
2. Table 'fem_simple_090825' doesn't exist
3. Missing permissions on the table
4. Database connection lost


In [27]:
# Fix SQLAlchemy Compatibility Issue
print("🔧 Fixing SQLAlchemy compatibility issue...")

try:
    # Use a different approach that's more compatible
    from sqlalchemy import create_engine, text
    import pandas as pd
    
    # Create engine with specific parameters to avoid compatibility issues
    engine_fixed = create_engine(
        connection_string,
        pool_pre_ping=True,
        pool_recycle=300,
        echo=False
    )
    
    print("✅ Fixed engine created successfully!")
    
    # Test with a simple query using the fixed engine
    with engine_fixed.connect() as connection:
        result = connection.execute(text("SELECT 'Connection test' as message, current_timestamp as time"))
        row = result.fetchone()
        print(f"✅ Test successful: {row[0]} at {row[1]}")
    
    print("✅ Fixed engine is working!")
    
except Exception as e:
    print(f"❌ Error with fixed engine: {e}")


🔧 Fixing SQLAlchemy compatibility issue...
✅ Fixed engine created successfully!
✅ Test successful: Connection test at 2025-09-09 21:24:34.214548+00:00
✅ Fixed engine is working!


In [29]:
# Execute SQL File with Manual Method Only
print("📁 Executing SQL file with manual method...")

try:
    # Read your SQL file
    with open('facility type and pc.sql', 'r') as file:
        sql_query = file.read()
    
    print(f"✅ SQL file loaded! ({len(sql_query)} characters)")
    
    # Manual execution and DataFrame creation
    print("🔧 Using manual execution method...")
    with engine_fixed.connect() as connection:
        result = connection.execute(text(sql_query))
        rows = result.fetchall()
        columns = result.keys()
        
    # Create DataFrame manually
    df_results = pd.DataFrame(rows, columns=columns)
    
    print("✅ SUCCESS with manual execution!")
    print(f"📊 Results shape: {df_results.shape}")
    print(f"📋 Columns: {list(df_results.columns)}")
    print("\n📄 First 5 rows:")
    print(df_results.head())
    
except Exception as e:
    print(f"❌ Manual execution failed: {e}")
    print("\n🔍 Let's check what's in your SQL file...")
    print("SQL Query content:")
    print("=" * 50)
    print(sql_query)
    print("=" * 50)


📁 Executing SQL file with manual method...
✅ SQL file loaded! (1461 characters)
🔧 Using manual execution method...
✅ SUCCESS with manual execution!
📊 Results shape: (5434, 3)
📋 Columns: ['assessment_id', 'sipfacilityapparelpc', 'apparel_pc_count']

📄 First 5 rows:
                                    assessment_id  \
0  femsurvey:fffff92a-914f-446f-812f-8141dbe416a6   
1  femsurvey:ffff536a-d061-4fbd-84af-bfcaf59ac297   
2  femsurvey:fff81b83-f145-404b-aeae-d0bb63b0fa1a   
3  femsurvey:fff4757f-0b72-4f89-82cb-771a86980e0f   
4  femsurvey:ffe917d7-0bf2-4469-af77-cba73a34e513   

                                sipfacilityapparelpc  apparel_pc_count  
0          [Hosiery, Pants, Shirts, Skirts, T-shirt]                 5  
1                           [Jackets, Pants, Shirts]                 3  
2          [Dresses, Pants, Shirts, Skirts, T-shirt]                 5  
3  [Baselayers, Hosiery, Pants, Shirts, T-shirt, ...                 6  
4                                           [Shirts