# Tier 1 Analysis - Clean Version

## Setup Instructions

### 1. Create Virtual Environment
```bash
# Create virtual environment
python -m venv tier-1-venv

# Activate virtual environment
# On macOS/Linux:
source tier-1-venv/bin/activate
# On Windows:
# tier-1-venv\Scripts\activate
```

### 2. Install Dependencies
```bash
# Make sure virtual environment is activated
pip install -r requirements.txt
```

### 3. Create .env File
Create a `.env` file in the project root with your database credentials:

```env
DB_USER=your_username
DB_PASSWORD=your_password
DB_HOST=wg-data-rds.data.higg.org
DB_PORT=5432
DB_NAME=db_higg
```

**⚠️ Important:** Replace the placeholder values with your actual database credentials.

### 4. Start Jupyter
```bash
# Make sure virtual environment is activated
source tier-1-venv/bin/activate

# Start Jupyter
jupyter notebook
# or
jupyter lab
```

### 5. Select Kernel
- Open this notebook
- Select **"Tier 1 Analysis"** as your kernel
- Run the cells in order

## Security Notes
- **Never commit** the `.env` file to version control
- **Keep credentials secure** and don't share them
- **Use different credentials** for different environments (dev/staging/prod)

## Troubleshooting
- **Connection issues**: Verify your database credentials in `.env`
- **Kernel not found**: Make sure you've activated the virtual environment
- **Package errors**: Run `pip install -r requirements.txt` again

In [34]:
# Setup: Import libraries, load environment variables, and configure SQL file
import pandas as pd
import psycopg2
from sqlalchemy import create_engine, text
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Configuration: SQL file to execute
SQL_FILE = 'facility type and pc.sql'  # Change this to run a different SQL file

# Create connection string
DB_USER = os.getenv('DB_USER')
DB_PASSWORD = os.getenv('DB_PASSWORD')
DB_HOST = os.getenv('DB_HOST')
DB_PORT = os.getenv('DB_PORT')
DB_NAME = os.getenv('DB_NAME')

connection_string = f"postgresql+psycopg2://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}"

print(f"✅ Connected to: {DB_HOST}:{DB_PORT}/{DB_NAME}")
print(f"📁 SQL file to execute: {SQL_FILE}")
print("✅ Environment variables loaded successfully!")

✅ Connected to: wg-data-rds.data.higg.org:5432/db_higg
📁 SQL file to execute: facility type and pc.sql
✅ Environment variables loaded successfully!


In [35]:
# Create database engine and test connection
print("🔧 Creating database engine...")

try:
    # Create engine with optimized settings
    engine = create_engine(
        connection_string,
        pool_pre_ping=True,
        pool_recycle=300,
        echo=False
    )
    
    # Test connection
    with engine.connect() as connection:
        result = connection.execute(text("SELECT 'Connection successful!' as status, current_timestamp as time"))
        row = result.fetchone()
        
    print("✅ DATABASE CONNECTION SUCCESSFUL!")
    print(f"Status: {row[0]}")
    print(f"Connected at: {row[1]}")
    
except Exception as e:
    print("❌ DATABASE CONNECTION FAILED!")
    print(f"Error: {e}")
    print("\n🔧 Check your .env file credentials")

🔧 Creating database engine...
✅ DATABASE CONNECTION SUCCESSFUL!
Status: Connection successful!
Connected at: 2025-09-09 21:37:57.335180+00:00


In [36]:
# Execute your SQL file
print(f"📁 Loading and executing SQL file: {SQL_FILE}")

try:
    # Read SQL file using the configured variable
    with open(SQL_FILE, 'r') as file:
        sql_query = file.read()
    
    print(f"✅ SQL file loaded ({len(sql_query)} characters)")
    
    # Execute query using manual method
    with engine.connect() as connection:
        result = connection.execute(text(sql_query))
        rows = result.fetchall()
        columns = result.keys()
        
    # Create DataFrame
    df_results = pd.DataFrame(rows, columns=columns)
    
    print("✅ QUERY EXECUTED SUCCESSFULLY!")
    print(f"📊 Results: {df_results.shape[0]} rows, {df_results.shape[1]} columns")
    print(f"📋 Columns: {list(df_results.columns)}")
    
except FileNotFoundError:
    print(f"❌ SQL FILE NOT FOUND: {SQL_FILE}")
    print("🔧 Make sure the file exists in the current directory")
    print("💡 You can change the SQL_FILE variable in Cell 2 to point to a different file")
    
except Exception as e:
    print(f"❌ QUERY EXECUTION FAILED: {e}")
    print(f"\n🔍 SQL Query content from {SQL_FILE}:")
    print("=" * 50)
    print(sql_query)
    print("=" * 50)

📁 Loading and executing SQL file: facility type and pc.sql
✅ SQL file loaded (1461 characters)
✅ QUERY EXECUTED SUCCESSFULLY!
📊 Results: 5434 rows, 3 columns
📋 Columns: ['assessment_id', 'sipfacilityapparelpc', 'apparel_pc_count']


In [37]:
# Display and analyze results
if 'df_results' in locals() and not df_results.empty:
    print("📊 Dataset Overview:")
    print(f"   • Total rows: {len(df_results)}")
    print(f"   • Total columns: {len(df_results.columns)}")
    
    print("\n📋 Column Information:")
    for col in df_results.columns:
        dtype = df_results[col].dtype
        non_null = df_results[col].count()
        print(f"   • {col}: {dtype} ({non_null} non-null)")
    
    print("\n📄 First 10 rows:")
    print(df_results.head(10))
    
    print("\n💾 To save results:")
    print("df_results.to_csv('tier1_results.csv', index=False)")
    
else:
    print("❌ No results available. Run the previous cell first.")

📊 Dataset Overview:
   • Total rows: 5434
   • Total columns: 3

📋 Column Information:
   • assessment_id: object (5434 non-null)
   • sipfacilityapparelpc: object (5434 non-null)
   • apparel_pc_count: int64 (5434 non-null)

📄 First 10 rows:
                                    assessment_id  \
0  femsurvey:fffff92a-914f-446f-812f-8141dbe416a6   
1  femsurvey:ffff536a-d061-4fbd-84af-bfcaf59ac297   
2  femsurvey:fff81b83-f145-404b-aeae-d0bb63b0fa1a   
3  femsurvey:fff4757f-0b72-4f89-82cb-771a86980e0f   
4  femsurvey:ffe917d7-0bf2-4469-af77-cba73a34e513   
5  femsurvey:ffe53b8f-d79c-4480-a52f-92f026b318d2   
6  femsurvey:ffba25e8-eaea-486e-9abf-6e362be0f88d   
7  femsurvey:ffa679a1-848a-465e-92f3-9d518fbe7633   
8  femsurvey:ffa0e74d-2f96-4117-b7ba-46680f7741f4   
9  femsurvey:ff8205f1-281f-4d6f-8d69-72e447f694ca   

                                sipfacilityapparelpc  apparel_pc_count  
0          [Hosiery, Pants, Shirts, Skirts, T-shirt]                 5  
1                          

In [None]:
# Load PIC default product weights for apparel categories
print("📁 Loading PIC default product weights...")

try:
    # Load the CSV file (it's actually tab-separated)
    weights_df = pd.read_csv('PIC default product weights.csv', sep='\t')
    
    print("✅ PIC default product weights loaded successfully!")
    print(f"📊 Weights data: {weights_df.shape[0]} rows, {weights_df.shape[1]} columns")
    print(f"📋 Columns: {list(weights_df.columns)}")
    
    # Display the weights data (only FEM Apparel PC and Product Weight columns)
    print("\n📄 PIC Default Product Weights:")
    display_df = weights_df[['FEM Apparel PC', 'Product Weight (kg)']]
    print(display_df)
    
    # Create a dictionary mapping for easy lookup
    apparel_weights = dict(zip(weights_df['FEM Apparel PC'], weights_df['Product Weight (kg)']))
    
    print(f"\n🔗 Created weight mapping for {len(apparel_weights)} apparel categories:")
    for category, weight in apparel_weights.items():
        print(f"   • {category}: {weight} kg")
     
except FileNotFoundError:
    print("❌ PIC default product weights.csv file not found!")
    print("🔧 Make sure the file exists in the current directory")
    
except Exception as e:
    print(f"❌ Error loading weights file: {e}")


📁 Loading PIC default product weights...
✅ PIC default product weights loaded successfully!
📊 Weights data: 14 rows, 3 columns
📋 Columns: ['FEM Apparel PC', 'PIC Product', 'Product Weight (kg)']

📄 PIC Default Product Weights:
       FEM Apparel PC  Product Weight (kg)
0              Shirts             0.250000
1             Dresses             0.374213
2             Jackets             0.950000
3               Pants             0.453592
4              Skirts             0.290299
5               Socks             0.400000
6            Sweaters             0.550000
7           Swimsuits             0.100000
8          Baselayers             0.111130
9             Hosiery             0.227000
10  Leggings & Tights             0.227000
11            Jerseys             0.150000
12            T-shirt             0.150000
13          Underwear             0.138346

🔗 Created weight mapping for 14 apparel categories:
   • Shirts: 0.25 kg
   • Dresses: 0.3742134 kg
   • Jackets: 0.95 kg
   • 