# Week 3 Activity 1: Phase 0 - Setup & Data Preparation

## Overview

**Phase 0** is about **preparation**: setting up your environment, defining your study area, documenting your land cover classes, and creating training polygons in QGIS.

**Think of this as the foundation.** Without quality training data, even the best CNN will fail.

---

## Learning Objectives

By the end of Phase 0, you will:

1. ✅ Verify your Python environment has all required packages
2. ✅ Authenticate and connect to Google Earth Engine
3. ✅ Define your study area (Area of Interest)
4. ✅ Document clear definitions for your 5 land cover classes
5. ✅ Create training polygons in QGIS with proper attributes
6. ✅ Validate your training data is ready for Phase 1

---

## Workflow Overview

```
Phase 0: Setup & Preparation (this notebook)
  ↓
  1. Environment verification
  2. Earth Engine authentication  
  3. Define study area (AOI)
  4. Document class definitions
  5. Digitize polygons in QGIS ⭐
  6. Validate training data
  ↓
Phase 1: Validation & Configuration
  ↓
Phase 2: Batch Extraction
  ↓
Phase 3: CNN Training
```

**Estimated Time:** 30-40 minutes (plus QGIS digitization time)

---

## 🎯 Key Concept: Data Quality > Model Complexity

**The most important lesson in geospatial ML:**

A simple model trained on high-quality, well-labeled data will **always** outperform a complex model trained on poor data.

**What makes training data "high quality"?**
- Clear, unambiguous class definitions
- Consistent labeling across all examples
- Sufficient spatial distribution
- Appropriate polygon sizes
- Proper attribute structure

**Phase 0 focuses on getting this right.**

---

## Section 1: Environment Verification

Let's verify all required packages are installed and working.

In [27]:
# Test imports
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import ee
import geemap
from pathlib import Path
import json

# Set random seed for reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

# Check versions
print("Package Versions:")
print("="*50)
print(f"✓ NumPy:      {np.__version__}")
print(f"✓ Pandas:     {pd.__version__}")
print(f"✓ GeoPandas:  {gpd.__version__}")
print(f"✓ Earth Engine: {ee.__version__}")
print(f"✓ geemap:     {geemap.__version__}")
print("="*50)
print("\n✅ All packages imported successfully!")

Package Versions:
✓ NumPy:      2.3.3
✓ Pandas:     2.3.3
✓ GeoPandas:  1.1.1
✓ Earth Engine: 1.6.11
✓ geemap:     0.36.4

✅ All packages imported successfully!


### Set up paths

In [28]:
# Define project paths
REPO = Path.cwd().parent  # Repository root
DATA = REPO / 'data'
DATA_EXTERNAL = DATA / 'external'
DATA_LABELS = DATA / 'labels'

# Create directories if needed
DATA_EXTERNAL.mkdir(exist_ok=True, parents=True)
DATA_LABELS.mkdir(exist_ok=True, parents=True)

# Key files
AOI_PATH = DATA_EXTERNAL / 'larger_aoi.geojson'
POLYGONS_PATH = DATA_LABELS / 'larger_polygons.geojson'

print("Project Structure:")
print("="*50)
print(f"Repository: {REPO}")
print(f"Data:       {DATA}")
print(f"\nKey Files:")
print(f"AOI:      {AOI_PATH}")
print(f"Polygons: {POLYGONS_PATH}")
print("="*50)

Project Structure:
Repository: /Users/mstone14/QGIS/GeoAI_Class/github/earth-vision-portfolio
Data:       /Users/mstone14/QGIS/GeoAI_Class/github/earth-vision-portfolio/data

Key Files:
AOI:      /Users/mstone14/QGIS/GeoAI_Class/github/earth-vision-portfolio/data/external/larger_aoi.geojson
Polygons: /Users/mstone14/QGIS/GeoAI_Class/github/earth-vision-portfolio/data/labels/larger_polygons.geojson


---

## Section 2: Earth Engine Authentication & Setup

Google Earth Engine provides access to petabytes of satellite imagery. We need to authenticate once, then we can access data in all future sessions.

In [29]:
# Initialize Earth Engine
try:
    ee.Initialize()
    print("✅ Earth Engine initialized successfully!")
    
    # Test connection
    test_image = ee.Image('COPERNICUS/S2_SR_HARMONIZED/20190101T143749_20190101T144746_T18GYT')
    bands = test_image.bandNames().getInfo()
    print(f"✓ Connection test passed (accessed {len(bands)} bands)")
    
except Exception as e:
    print("⚠️  Earth Engine not authenticated or connection failed")
    print("\n📝 To authenticate:")
    print("   1. Run in terminal: earthengine authenticate")
    print("   2. Follow the prompts to authorize in browser")
    print("   3. Restart this notebook")
    print(f"\nError details: {e}")

✅ Earth Engine initialized successfully!
⚠️  Earth Engine not authenticated or connection failed

📝 To authenticate:
   1. Run in terminal: earthengine authenticate
   2. Follow the prompts to authorize in browser
   3. Restart this notebook

Error details: Image.load: Image asset 'COPERNICUS/S2_SR_HARMONIZED/20190101T143749_20190101T144746_T18GYT' not found (does not exist or caller does not have access).


### 💡 If authentication needed:

**Option 1: Terminal (recommended)**
```bash
earthengine authenticate
```

**Option 2: In notebook**
```python
ee.Authenticate()
ee.Initialize()
```

You only need to authenticate **once** per machine. After that, `ee.Initialize()` will work automatically.

---

## Section 3: Define Study Area (Area of Interest)

### What is an AOI?

An **Area of Interest (AOI)** is the geographic extent of your study. It defines:
- Where you'll extract satellite imagery
- Where you'll digitize training polygons
- The spatial extent of your final land cover classification

### Creating your AOI

**You have two options:**

**Option 1: Use existing AOI** (for this activity)
- We've provided `larger_aoi.geojson` for Los Lagos, Chile
- Covers parcelization study area

**Option 2: Create your own in QGIS** (for your project)
1. Open QGIS
2. Add Google Satellite basemap
3. Create new GeoJSON layer (Polygon)
4. Draw rectangle around your study area
5. Save as `data/external/larger_aoi.geojson`
6. Ensure CRS is EPSG:4326 (WGS84)

Let's load and visualize the AOI:

In [30]:
# Check if AOI exists
if not AOI_PATH.exists():
    print(f"❌ AOI file not found: {AOI_PATH}")
    print("\n📝 Create AOI in QGIS (see instructions above)")
else:
    # Load AOI
    aoi = gpd.read_file(AOI_PATH)
    
    # Ensure WGS84
    if aoi.crs.to_string() != 'EPSG:4326':
        print(f"⚠️  Converting from {aoi.crs} to EPSG:4326")
        aoi = aoi.to_crs('EPSG:4326')
    
    # Display info
    bounds = aoi.total_bounds
    print("✅ AOI loaded successfully!")
    print("\nAOI Information:")
    print("="*50)
    print(f"CRS:       {aoi.crs}")
    print(f"Bounds:    {bounds}")
    print(f"  West:    {bounds[0]:.4f}°")
    print(f"  South:   {bounds[1]:.4f}°")
    print(f"  East:    {bounds[2]:.4f}°")
    print(f"  North:   {bounds[3]:.4f}°")
    
    # Calculate approximate area
    aoi_utm = aoi.to_crs('EPSG:32718')  # UTM 18S for Chile
    area_km2 = aoi_utm.area.sum() / 1e6
    print(f"Area:      {area_km2:.1f} km²")
    print("="*50)
    
    print("\n💡 This red box shows where you'll digitize training polygons.\n")
    
 

✅ AOI loaded successfully!

AOI Information:
CRS:       EPSG:4326
Bounds:    [-73.11168979 -41.16593399 -72.98757712 -41.06467994]
  West:    -73.1117°
  South:   -41.1659°
  East:    -72.9876°
  North:   -41.0647°
Area:      117.2 km²

💡 This red box shows where you'll digitize training polygons.



In [31]:
# Visualize with interactive GEE Map
# Create interactive map
Map = geemap.Map()

# Add AOI layer using add_gdf (more robust method)
Map.add_gdf(aoi, layer_name='Study Area (AOI)')

# Center map on AOI using bounds
Map.centerObject(ee.Geometry.BBox(bounds[0], bounds[1], bounds[2], bounds[3]), 10)

# Display the map
Map

Map(center=[-41.11529950125061, -73.04963345695884], controls=(WidgetControl(options=['position', 'transparent…

---

## Section 4: Define Land Cover Classes

### Why clear definitions matter

**Ambiguous class definitions = poor model performance**

If you're not sure whether a pixel is "Forest" or "Agriculture", how can you expect a CNN to learn the difference?

**Good class definitions have:**
1. **Clear descriptions** - What the class represents
2. **Spectral characteristics** - How it appears in satellite imagery
3. **Spatial characteristics** - Typical patterns and textures
4. **Examples** - Specific land cover types included
5. **Exclusions** - What NOT to include (prevents ambiguity)

### Our 5 Classes for Los Lagos Parcelización

---

### Class 0: Forest 🌲

**Description:** Native and plantation forests with dense canopy cover

**Spectral Characteristics:**
- High NIR reflectance (healthy vegetation)
- Low red reflectance (chlorophyll absorption)
- High NDVI (>0.6)
- Moderate SWIR reflectance

**Spatial Characteristics:**
- Continuous canopy texture
- Irregular boundaries (native) or regular (plantation)
- Large contiguous patches

**Include:**
- Temperate rainforest
- Eucalyptus plantations  
- Mixed native forest

**Exclude:**
- Sparse trees in agricultural areas
- Recently cleared forest
- Shrubland with <30% canopy cover

---

### Class 1: Agriculture 🌾

**Description:** Active agricultural fields including crops and pasture

**Spectral Characteristics:**
- Variable NDVI (0.3-0.7) depending on crop stage
- Lower NIR than forest
- Seasonal variability

**Spatial Characteristics:**
- Regular field boundaries
- Rectangular or geometric shapes
- Uniform texture within fields
- Medium patch sizes

**Include:**
- Crop fields (wheat, potato, etc.)
- Managed pasture
- Hay fields

**Exclude:**
- Fallow/bare fields
- Residential gardens (classify as Parcels)

---

### Class 2: Parcels 🏘️ ⭐ **KEY CLASS**

**Description:** Subdivided residential areas (parcelización) — the phenomenon we're studying!

**Spectral Characteristics:**
- Mixed signature (buildings + vegetation + bare soil)
- Moderate NDVI (0.2-0.5) due to mixed cover
- High spatial heterogeneity
- Often includes bright surfaces (roofs, roads)

**Spatial Characteristics:**
- **Small, regular subdivisions** (typically <1 hectare)
- **Grid pattern or linear arrangement** along roads
- Mix of structures and vegetation
- Access roads visible
- Often at forest-agriculture interface

**Include:**
- Residential subdivisions
- Rural housing developments
- Parceled land with structures

**Exclude:**
- Traditional rural homesteads (isolated houses)
- Urban areas (classify as Urban)
- Agricultural buildings without subdivision pattern

---

### Class 3: Water 💧

**Description:** Water bodies including lakes, rivers, and coastal areas

**Spectral Characteristics:**
- Very low NIR reflectance (water absorbs NIR)
- Negative NDVI
- High NDWI (>0.3)
- Low reflectance in all bands (clear water)

**Spatial Characteristics:**
- Smooth, homogeneous texture
- Irregular boundaries (natural) or regular (reservoirs)
- Low spatial variability

**Include:**
- Lakes, rivers, coastal waters
- Reservoirs

**Exclude:**
- Shadows (similar spectral signature)
- Wetlands with emergent vegetation
- Temporary flooding

---

### Class 4: Urban 🏙️

**Description:** Established urban and built-up areas

**Spectral Characteristics:**
- High reflectance in visible bands (concrete, asphalt)
- Low NDVI (<0.2)
- High NDBI (built-up index)
- High spatial heterogeneity

**Spatial Characteristics:**
- Dense building patterns
- Road networks
- Large contiguous built-up areas
- Mix of bright (roofs) and dark (roads) surfaces

**Include:**
- Town centers
- Industrial areas
- Commercial districts
- Dense residential areas

**Exclude:**
- Parcels (classify separately - key distinction!)
- Isolated rural buildings
- Agricultural infrastructure

---

### ✅ Class Definitions Summary

| Class | ID | Key Distinguishing Feature |
|-------|----|--------------------------|
| Forest | 0 | High NDVI, continuous canopy |
| Agriculture | 1 | Geometric fields, variable NDVI |
| **Parcels** | 2 | **Small subdivisions, mixed cover** |
| Water | 3 | Very low NIR, smooth texture |
| Urban | 4 | Dense buildings, low NDVI |

**💡 Keep these definitions handy while digitizing in QGIS!**

---

## Section 5: QGIS Digitization Guide

### 🎯 Your Main Task: Create Training Polygons

This is the **most important step** in Phase 0. High-quality training polygons are essential for CNN success.

---

## Step-by-Step QGIS Workflow

### Step 1: Set Up QGIS Project

1. **Open QGIS** (version 3.x)
2. **Create new project:** `Project → New`
3. **Load AOI:**
   - `Layer → Add Layer → Add Vector Layer`
   - Navigate to: `data/external/larger_aoi.geojson`
   - Click `Add`
4. **Add basemap:**
   - **Option A:** Google Satellite (if QuickMapServices plugin installed)
     - `Web → QuickMapServices → Google → Google Satellite`
   - **Option B:** OpenStreetMap
     - `Browser Panel → XYZ Tiles → OpenStreetMap → drag to map`
5. **Zoom to AOI:** Right-click AOI layer → `Zoom to Layer`

---

### Step 2: Create Training Polygon Layer

**CRITICAL: Attribute structure must match exactly for Phase 1!**

1. **Create new layer:**
   - `Layer → Create Layer → New GeoJSON Layer`

2. **Configure layer:**
   - **File name:** Browse to `data/labels/larger_polygons.geojson`
   - **Geometry type:** Polygon
   - **CRS:** EPSG:4326 (WGS84) ⚠️ **IMPORTANT**

3. **Add attribute fields** (click "New Field" for each):

   | Field Name | Type | Length | Description |
   |------------|------|--------|-------------|
   | `class_name` | Text | 50 | Land cover class name |
   | `class_id` | Integer | 10 | Numeric class ID (0-4) |
   | `confidence` | Text | 20 | Your confidence (High/Medium/Low) |
   | `notes` | Text | 200 | Observations |
   | `date_digitized` | Text | 20 | Date created |

4. **Click OK** to create layer

---

### Step 3: Configure Attribute Form (Prevents Typos!)

This step makes digitization faster and ensures consistency:

1. **Open properties:** Right-click layer → `Properties`
2. **Go to:** `Attributes Form` tab

3. **For `class_name` field:**
   - Click `class_name` in left panel
   - **Widget Type:** `Value Map`
   - Click `+` to add each value:
     - Forest, Forest
     - Agriculture, Agriculture
     - Parcels, Parcels
     - Water, Water
     - Urban, Urban

4. **For `class_id` field:**
   - **Widget Type:** `Value Map`
   - Add values:
     - 0, 0
     - 1, 1
     - 2, 2
     - 3, 3
     - 4, 4

5. **For `confidence` field:**
   - **Widget Type:** `Value Map`
   - Add values:
     - High, High
     - Medium, Medium
     - Low, Low

6. **Click OK**

Now when you digitize, you'll get dropdown menus instead of typing!

---

### Step 4: Digitization Strategy

**Goal:** 50-100 polygons per class, well-distributed across AOI

#### How Many Polygons?

| Class | Minimum | Ideal | Notes |
|-------|---------|-------|-------|
| Forest | 30 | 60 | Easier to find |
| Agriculture | 30 | 60 | Easier to find |
| **Parcels** | 30 | 60 | **Key class - prioritize!** |
| Water | 20 | 40 | May be less common |
| Urban | 20 | 40 | May be less common |

**Total:** 130-260 polygons across all classes

#### How Big Should Polygons Be?

**Minimum:** 200m × 200m (20 pixels × 20 pixels at 10m resolution)
- Smaller polygons won't yield enough training patches

**Ideal:** 300-500m × 300-500m
- Large enough for multiple patch extractions
- Small enough to stay within class boundaries

#### Where to Digitize?

**Spatial distribution is critical!**

Don't cluster all polygons in one area. Spread them across:
- North, South, East, West of AOI
- Different elevations (if relevant)
- Different aspects (north-facing vs. south-facing slopes)

**Why?** Ensures your model learns generalizable features, not location-specific patterns.

#### What to Avoid?

❌ **Mixed pixels at boundaries**
- Draw polygons well inside homogeneous areas
- Leave buffer from class boundaries

❌ **Ambiguous areas**
- If you're not sure, skip it or mark as "Low" confidence

❌ **Cloud shadows**
- Check imagery date, avoid shadowed areas

❌ **Seasonal changes**
- Ensure land cover matches your imagery date

---

### Step 5: Digitize Polygons

1. **Enable editing:**
   - Click pencil icon OR press `Ctrl+E`

2. **Add polygon:**
   - Click `Add Polygon Feature` icon
   - OR press `Ctrl+.`

3. **Draw polygon:**
   - **Left-click** to add vertices
   - **Right-click** to finish

4. **Fill attribute form:**
   - `class_name`: Select from dropdown (e.g., "Parcels")
   - `class_id`: Select corresponding ID (e.g., 2)
   - `confidence`: Select High/Medium/Low
   - `notes`: Optional (e.g., "Mixed native-plantation")
   - `date_digitized`: Today's date (e.g., "2025-10-20")

5. **Click OK**

6. **Repeat** for all training areas

7. **Save frequently:**
   - Click `Save Layer Edits` (disk icon)
   - OR press `Ctrl+S`

---

### Step 6: Quality Check Your Polygons

Before finishing, verify:

1. **Open attribute table:**
   - Right-click layer → `Open Attribute Table`

2. **Check:**
   - ✅ All polygons have `class_name` and `class_id`
   - ✅ No typos in class names
   - ✅ class_id matches class_name (0=Forest, 1=Agriculture, etc.)
   - ✅ Roughly balanced class distribution

3. **Visual check:**
   - Style by `class_name` to see distribution
   - Ensure polygons spread across AOI
   - Check polygon sizes (not too small)

---

### Step 7: Save and Finish

1. **Save final edits:**
   - Click `Save Layer Edits`

2. **Toggle editing off:**
   - Click pencil icon to disable editing

3. **Verify file:**
   - Check file exists: `data/labels/larger_polygons.geojson`
   - Check file size (should be >5 KB if you have 100+ polygons)

4. **Return to this notebook** to validate!

---

## ✅ Digitization Checklist

Before proceeding, verify:

- [ ] Training polygon layer created with correct fields
- [ ] 50-100+ polygons per class (or best effort)
- [ ] Polygons distributed across AOI
- [ ] All polygons have `class_name` and `class_id`
- [ ] Polygons in homogeneous areas (avoid edges)
- [ ] Polygon sizes ≥200m × 200m
- [ ] File saved: `data/labels/larger_polygons.geojson`
- [ ] CRS is EPSG:4326

**Once complete, run the next section to validate your polygons!**

---

## Section 6: Validate Training Polygons

Let's check that your digitized polygons are ready for Phase 1.

In [32]:
# Check if polygons file exists
if not POLYGONS_PATH.exists():
    print("❌ Training polygons file not found!")
    print(f"   Expected: {POLYGONS_PATH}")
    print("\n📝 Complete QGIS digitization (Section 5) before proceeding.")
    print("   Return to this cell after creating polygons in QGIS.")
else:
    # Load polygons
    polygons = gpd.read_file(POLYGONS_PATH)
    
    print("✅ Training polygons loaded!")
    print("="*60)
    print(f"Total polygons: {len(polygons)}")
    print(f"CRS: {polygons.crs}")
    
    # Validation 1: Check required attributes
    required = ['class_name', 'class_id']
    missing = [attr for attr in required if attr not in polygons.columns]
    
    if missing:
        print(f"\n⚠️  Missing required attributes: {missing}")
        print("   Go back to QGIS and add these fields")
    else:
        print(f"\n✓ Required attributes present: {required}")
    
    # Validation 2: Check CRS
    if polygons.crs.to_string() != 'EPSG:4326':
        print(f"\n⚠️  CRS is {polygons.crs}, converting to EPSG:4326")
        polygons = polygons.to_crs('EPSG:4326')
        print("   ✓ Converted")
    else:
        print("✓ CRS is correct (EPSG:4326)")
    
    # Validation 3: Check for missing values
    missing_class_name = polygons['class_name'].isna().sum()
    missing_class_id = polygons['class_id'].isna().sum()
    
    if missing_class_name > 0 or missing_class_id > 0:
        print(f"\n⚠️  Missing values:")
        print(f"   class_name: {missing_class_name} missing")
        print(f"   class_id: {missing_class_id} missing")
    else:
        print("✓ No missing values in required fields")
    
    # Validation 4: Check class names
    expected_classes = ['Forest', 'Agriculture', 'Parcels', 'Water', 'Urban']
    actual_classes = polygons['class_name'].unique().tolist()
    unexpected = [c for c in actual_classes if c not in expected_classes]
    
    if unexpected:
        print(f"\n⚠️  Unexpected class names: {unexpected}")
        print(f"   Expected: {expected_classes}")
        print("   Check for typos in QGIS")
    else:
        print("✓ All class names valid")
    
    # Validation 5: Class distribution
    counts = polygons['class_name'].value_counts().sort_index()
    
    print("\n📊 Class Distribution:")
    print("="*60)
    for class_name, count in counts.items():
        bar = '█' * min(50, int(count / 2))
        status = "✓" if count >= 20 else "⚠️"
        print(f"{status} {class_name:12s}: {count:3d} {bar}")
    print("="*60)
    
    if counts.min() < 20:
        print("\n⚠️  Some classes have <20 polygons")
        print("   Minimum 20 recommended, 50+ ideal")
    else:
        print("\n✓ All classes have sufficient polygons")
    
    print("\n💡 Visual check:")
    print("   - Polygons should be distributed across AOI (not clustered)")
    print("   - Each color should be visible (all classes represented)")
    print("   - Polygons should be within AOI bounds (black dashed box)")

✅ Training polygons loaded!
Total polygons: 126
CRS: EPSG:4326

✓ Required attributes present: ['class_name', 'class_id']
✓ CRS is correct (EPSG:4326)
✓ No missing values in required fields
✓ All class names valid

📊 Class Distribution:
✓ Agriculture :  35 █████████████████
⚠️ Forest      :  16 ████████
✓ Parcels     :  33 ████████████████
✓ Urban       :  29 ██████████████
⚠️ Water       :  13 ██████

⚠️  Some classes have <20 polygons
   Minimum 20 recommended, 50+ ideal

💡 Visual check:
   - Polygons should be distributed across AOI (not clustered)
   - Each color should be visible (all classes represented)
   - Polygons should be within AOI bounds (black dashed box)


In [33]:
# Visualize training polygons on interactive map
# Create interactive map
Map2 = geemap.Map()

# Add AOI boundary for context
if AOI_PATH.exists():
    Map2.add_gdf(aoi, layer_name='AOI Boundary', style={'color': 'black', 'fillColor': 'none', 'weight': 2})

# Define colors for each class
class_colors = {
    'Forest': '#228B22',
    'Agriculture': '#FFD700', 
    'Parcels': '#FF6347',
    'Water': '#1E90FF',
    'Urban': '#808080'
}

# Add each class as a separate layer
for class_name in actual_classes:
    subset = polygons[polygons['class_name'] == class_name]
    color = class_colors.get(class_name, '#999999')
    Map2.add_gdf(subset, layer_name=class_name, style={'color': color, 'fillOpacity': 0.6})

# Center on polygons
polygon_bounds = polygons.total_bounds
Map2.centerObject(ee.Geometry.BBox(polygon_bounds[0], polygon_bounds[1], 
                                   polygon_bounds[2], polygon_bounds[3]), 10)

# Display the map
Map2

Map(center=[-41.11548875480267, -73.04949878953424], controls=(WidgetControl(options=['position', 'transparent…

---

## Section 7: Phase 0 Readiness Check

Let's verify everything is ready for Phase 1.

In [34]:
# Readiness checklist
checks = {
    '1. Environment': True,  # Already verified in Section 1
    '2. Earth Engine': True,  # Already verified in Section 2
    '3. AOI defined': AOI_PATH.exists(),
    '4. Polygons created': POLYGONS_PATH.exists(),
}

# Additional checks if polygons exist
if POLYGONS_PATH.exists():
    polygons = gpd.read_file(POLYGONS_PATH)
    checks['5. Attributes valid'] = all(col in polygons.columns for col in ['class_name', 'class_id'])
    checks['6. CRS correct'] = polygons.crs.to_string() == 'EPSG:4326'
    checks['7. Sufficient polygons'] = len(polygons) >= 50
    counts = polygons['class_name'].value_counts()
    checks['8. Classes balanced'] = (counts.min() / counts.max()) > 0.3 if len(counts) > 0 else False

# Display checklist
print("="*60)
print("PHASE 0 READINESS CHECKLIST")
print("="*60)

for item, status in checks.items():
    symbol = "✅" if status else "❌"
    print(f"{symbol} {item}")

print("="*60)

all_ready = all(checks.values())

if all_ready:
    print("\n🎉 Phase 0 Complete! Ready for Phase 1!")
    
    # Create configuration file for Phase 1
    config = {
        'phase': 'Phase 0 Complete',
        'date': pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S'),
        'aoi_file': str(AOI_PATH),
        'polygons_file': str(POLYGONS_PATH),
        'total_polygons': len(polygons),
        'classes': counts.to_dict(),
        'composite_asset': 'users/markstonegobigred/Parcela/s2_2019_median_6b',
        'bands': ['B2', 'B3', 'B4', 'B8', 'B11', 'B12']
    }
    
    # Save configuration
    config_path = REPO / 'notebooks' / 'phase0_config.json'
    with open(config_path, 'w') as f:
        json.dump(config, f, indent=2)
    
    print(f"\n✓ Configuration saved: {config_path}")
    print("  (Phase 1 can import this configuration)")
    
    # Display summary
    print("\n" + "="*60)
    print("PHASE 0 SUMMARY")
    print("="*60)
    print(f"Total polygons:  {len(polygons)}")
    print(f"Classes:         {len(counts)}")
    print(f"\nClass breakdown:")
    for cls, cnt in counts.items():
        print(f"  {cls:12s}: {cnt:3d}")
    print("="*60)
    
else:
    print("\n⚠️  Phase 0 not complete")
    print("\nPlease complete the following:")
    for item, status in checks.items():
        if not status:
            print(f"  • {item}")
    print("\nReturn to earlier sections to fix issues.")

PHASE 0 READINESS CHECKLIST
✅ 1. Environment
✅ 2. Earth Engine
✅ 3. AOI defined
✅ 4. Polygons created
✅ 5. Attributes valid
✅ 6. CRS correct
✅ 7. Sufficient polygons
✅ 8. Classes balanced

🎉 Phase 0 Complete! Ready for Phase 1!

✓ Configuration saved: /Users/mstone14/QGIS/GeoAI_Class/github/earth-vision-portfolio/notebooks/phase0_config.json
  (Phase 1 can import this configuration)

PHASE 0 SUMMARY
Total polygons:  126
Classes:         5

Class breakdown:
  Agriculture :  35
  Parcels     :  33
  Urban       :  29
  Forest      :  16
  Water       :  13


---

## 🎓 Phase 0 Complete!

### What You Accomplished

✅ **Environment verified** - All packages installed and working

✅ **Earth Engine connected** - Authenticated and ready

✅ **Study area defined** - AOI created and visualized

✅ **Classes documented** - Clear definitions for all 5 classes

✅ **Training data created** - Polygons digitized with proper attributes

✅ **Data validated** - Quality checks passed

### Next Steps: Phase 1

**Phase 1: Validation & Configuration**

Now that your training data is ready, Phase 1 will:

1. **Analyze polygon sizes** - Understand size distribution and constraints
2. **Load Sentinel-2 composite** - Access pre-built imagery
3. **Test single patch extraction** - Verify extraction works (critical!)
4. **Determine patch size** - Optimize based on polygon constraints
5. **Configure for Phase 2** - Set parameters for batch extraction

**Estimated time:** 30-45 minutes

---

### Key Takeaway from Phase 0

**"Data quality matters more than model complexity."**

By investing time in:
- Clear class definitions
- Careful polygon digitization
- Proper attribute structure
- Quality validation

...you've set yourself up for CNN success in the later phases.

**Well done! Continue to Phase 1 →**

---