# Day 1 Setup Check

**Run this notebook before class starts to verify your environment is ready!**

This notebook checks:
1. Python version
2. Required packages (pandas, numpy, duckdb)
3. Data files are accessible
4. Jupyter is working correctly

---

## Step 1: Check Python Version

In [1]:
import sys

print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")

# Check version is 3.8+
version_info = sys.version_info
if version_info.major == 3 and version_info.minor >= 8:
    print(f"✅ Python {version_info.major}.{version_info.minor} detected (good!)")
else:
    print(f"⚠️  Python {version_info.major}.{version_info.minor} detected")
    print("   Recommended: Python 3.8 or higher")

Python version: 3.13.7 (main, Aug 14 2025, 11:12:11) [Clang 17.0.0 (clang-1700.0.13.3)]
Python executable: /Users/amh/.venvs/my_env/bin/python
✅ Python 3.13 detected (good!)


## Step 2: Check Required Packages

In [None]:
# Check pandas
try:
    import pandas as pd
    print(f"✅ pandas {pd.__version__} installed")
except ImportError:
    print("❌ pandas not installed")
    print("   Install with: pip install pandas")

In [None]:
# Check numpy
try:
    import numpy as np
    print(f"✅ numpy {np.__version__} installed")
except ImportError:
    print("❌ numpy not installed")
    print("   Install with: pip install numpy")

In [None]:
# Check duckdb (we'll use this later today)
try:
    import duckdb
    print(f"✅ duckdb {duckdb.__version__} installed")
except ImportError:
    print("⚠️  duckdb not installed (needed for Block B)")
    print("   Install with: pip install duckdb")

## Step 3: Test Basic Functionality

In [None]:
# Test pandas basic operations
try:
    test_df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
    result = test_df['a'].sum()
    assert result == 6, "Calculation error"
    print("✅ pandas operations working")
except Exception as e:
    print(f"❌ pandas test failed: {e}")

In [None]:
# Test numpy basic operations
try:
    test_array = np.array([1, 2, 3])
    result = test_array.sum()
    assert result == 6, "Calculation error"
    print("✅ numpy operations working")
except Exception as e:
    print(f"❌ numpy test failed: {e}")

## Step 4: Check Data Files

In [None]:
import os

# Check if data directory exists
data_dir = '../../data/day1'
if os.path.exists(data_dir):
    print(f"✅ Data directory exists: {data_dir}")
else:
    print(f"❌ Data directory not found: {data_dir}")
    print("   Make sure you're running from the notebooks/ directory")

In [None]:
# Check if dataset file exists
data_file = '../../data/day1/dirty_cafe_sales.csv'
if os.path.exists(data_file):
    print(f"✅ Dataset file exists: {data_file}")
    
    # Check file size
    file_size = os.path.getsize(data_file)
    print(f"   File size: {file_size:,} bytes ({file_size/1024:.1f} KB)")
    
    if file_size > 100000:  # Should be ~500KB
        print("   ✅ File size looks correct")
    else:
        print("   ⚠️  File seems small, may be incomplete")
else:
    print(f"❌ Dataset file not found: {data_file}")
    print("   Contact instructor for data files")

## Step 5: Test Data Loading

In [None]:
# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

# Try to load the dataset
try:
    df = pd.read_csv('../../data/day1/dirty_cafe_sales.csv')
    print(f"✅ Successfully loaded dataset")
    print(f"   Shape: {df.shape[0]} rows, {df.shape[1]} columns")
    print(f"   Columns: {', '.join(df.columns.tolist())}")
    
    # Verify expected structure
    if df.shape[0] == 10000:
        print("   ✅ Row count matches expected (10,000)")
    else:
        print(f"   ⚠️  Expected 10,000 rows, found {df.shape[0]}")
    
    if df.shape[1] == 8:
        print("   ✅ Column count matches expected (8)")
    else:
        print(f"   ⚠️  Expected 8 columns, found {df.shape[1]}")
        
except FileNotFoundError:
    print("❌ Could not find data file")
    print("   Make sure you're in the notebooks/ directory")
except Exception as e:
    print(f"❌ Error loading data: {e}")

## Step 6: Test Jupyter Display

In [None]:
# Test DataFrame display
try:
    test_df = pd.DataFrame({
        'Product': ['Coffee', 'Tea', 'Sandwich'],
        'Price': [3.50, 2.50, 6.00],
        'Quantity': [10, 5, 8]
    })
    print("Testing DataFrame display:")
    display(test_df)
    print("✅ DataFrame display working")
except Exception as e:
    print(f"⚠️  Display test: {e}")
    print("   This might be okay, display() may not work in all environments")

---

## Summary

In [None]:
print("=" * 50)
print("SETUP CHECK SUMMARY")
print("=" * 50)
print()
print("If you see ✅ for all critical items above, you're ready for class!")
print()
print("Critical items:")
print("  • Python 3.8+")
print("  • pandas installed")
print("  • numpy installed")
print("  • Dataset file accessible")
print()
print("Nice to have:")
print("  • duckdb installed (for Block B)")
print()
print("If you see ❌ for any critical item, please:")
print("  1. Install missing packages: pip install pandas numpy")
print("  2. Verify you're in the correct directory")
print("  3. Contact instructor if problems persist")
print()
print("See you in class!")
print("=" * 50)

---

## Quick Installation Commands

If you need to install packages, run these commands in your terminal (NOT in Jupyter):

```bash
# Install all required packages
pip install pandas numpy duckdb

# Or if using conda
conda install pandas numpy
pip install duckdb
```

After installing, restart this notebook kernel: **Kernel → Restart & Run All**