<a href="https://colab.research.google.com/github/lawrencejesse/Sentinel2_Extractor/blob/main/Reclamation_Analysis_AEFv2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Oilfield Reclamation Site Assessment Using AlphaEarth Foundations

**Objective:** Use Google's AlphaEarth Foundation 64D embeddings to assess reclamation success at oilfield lease sites by comparing to healthy regional cropland references.

**Methodology:**
1. Upload field boundary (arable land) and lease boundary polygons
2. Extract AAFC Annual Crop Inventory data to identify crop type per year (2017-2023)
3. Build regional "healthy reference" embeddings by sampling same crop within 10-20km
4. Compare lease embeddings vs regional reference and vs background field
5. Track recovery trajectory over time using cosine similarity

**Key Datasets:**
- AlphaEarth Foundation (AEF): `GOOGLE/SATELLITE_EMBEDDING/V1/ANNUAL` (64D embeddings, 10m resolution)
- AAFC Annual Crop Inventory: `AAFC/ACI` (30m resolution, 2009-2023)

**Site Location:** 50.30523°, -101.80618° (Saskatchewan, Canada)

## Setup and Authentication

In [4]:
# Import required libraries
import ee
import geemap
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.spatial.distance import cosine
import ipywidgets as widgets
from IPython.display import display, HTML
import json
import os
from io import BytesIO
import zipfile

In [5]:
# Authenticate and initialize Earth Engine
# The EEException: ee.Initialize: no project found error indicates that Earth Engine initialization failed because a project was not specified.
# To fix this, you need to authenticate with Earth Engine and select a project.
# The ee.Authenticate() function will guide you through the authentication process and allow you to set up your project.
# After successful authentication, ee.Initialize() will work correctly.
print("Authenticating with Earth Engine...")
ee.Authenticate()
ee.Initialize(project="jessemapping")
print("✓ Earth Engine initialized successfully")

Authenticating with Earth Engine...
✓ Earth Engine initialized successfully


In [7]:
%pip install fiona geopandas

Collecting fiona
  Downloading fiona-1.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (56 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/56.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.6/56.6 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Collecting click-plugins>=1.0 (from fiona)
  Downloading click_plugins-1.1.1.2-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting cligj>=0.5 (from fiona)
  Downloading cligj-0.7.2-py3-none-any.whl.metadata (5.0 kB)
Downloading fiona-1.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.2/17.2 MB[0m [31m97.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading click_plugins-1.1.1.2-py2.py3-none-any.whl (11 kB)
Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Installing collected packages: cligj, click-plugins, fiona
Successfully installed click-plugins-1.1.

In [8]:
# Initialize boundary variables
field_boundary = None
lease_boundary = None
site_center = None

# Use google.colab.files for uploading
from google.colab import files

def upload_files():
    """Upload boundary files using google.colab.files"""
    global field_boundary, lease_boundary, site_center

    print("Please upload your Field Boundary file (KML, GeoJSON, SHP/ZIP)...")
    field_uploaded = files.upload()

    if not field_uploaded:
        print("✗ Field Boundary upload cancelled or failed.")
        return

    print("\nPlease upload your Lease Boundary file (KML, GeoJSON, SHP/ZIP)...")
    lease_uploaded = files.upload()

    if not lease_uploaded:
        print("✗ Lease Boundary upload cancelled or failed.")
        # Consider clearing field_boundary if lease upload is essential
        field_boundary = None
        return

    # Assume only one file uploaded per prompt
    field_filename = list(field_uploaded.keys())[0]
    lease_filename = list(lease_uploaded.keys())[0]

    print(f"\nProcessing '{field_filename}' and '{lease_filename}'...")

    try:
        # Process Field Boundary
        field_content = field_uploaded[field_filename]
        field_boundary = process_uploaded_file_content(field_content, field_filename, "Field Boundary")

        # Process Lease Boundary
        lease_content = lease_uploaded[lease_filename]
        lease_boundary = process_uploaded_file_content(lease_content, lease_filename, "Lease Boundary")

        if field_boundary and lease_boundary:
            # Get site centroid for reference
            site_center = field_boundary.centroid().coordinates().getInfo()
            print(f"\n✓ Both boundaries loaded successfully!")
            print(f"Site center: {site_center[1]:.5f}°, {site_center[0]:.5f}°")
            print("\nYou can now proceed to the next steps.")
        else:
            print("\n✗ Error processing files. Check the messages above.")

    except Exception as e:
        print(f"\n✗ An error occurred during file processing: {str(e)}")


def process_uploaded_file_content(content, filename, name):
    """Process uploaded file content and convert to ee.Geometry"""
    temp_path = f'/tmp/{filename}'
    # Ensure the /tmp directory exists
    os.makedirs('/tmp', exist_ok=True)

    with open(temp_path, 'wb') as f:
        f.write(content)

    try:
        # Handle different file types
        if filename.endswith('.kml'):
            import fiona
            fiona.drvsupport.supported_drivers['KML'] = 'r'
            import geopandas as gpd
            gdf = gpd.read_file(temp_path, driver='KML')
        elif filename.endswith(('.geojson', '.json')):
            import geopandas as gpd
            gdf = gpd.read_file(temp_path)
        elif filename.endswith('.zip'):
            # Extract zip content to a temporary directory first
            with zipfile.ZipFile(temp_path, 'r') as zip_ref:
                 zip_ref.extractall('/tmp/shp_extract')
            # Find the .shp file in the extracted directory
            shp_file = None
            for root, dirs, files in os.walk('/tmp/shp_extract'):
                for file in files:
                    if file.endswith('.shp'):
                        shp_file = os.path.join(root, file)
                        break
                if shp_file:
                    break
            if shp_file:
                import geopandas as gpd
                gdf = gpd.read_file(shp_file)
            else:
                 raise ValueError("No .shp file found in the uploaded zip archive.")

        elif filename.endswith('.shp'):
             # This case is less common for direct upload without zip, but included for completeness
             import geopandas as gpd
             gdf = gpd.read_file(temp_path)
        else:
            raise ValueError(f"Unsupported file format: {filename}")

        # Ensure WGS84 projection
        if gdf.crs and gdf.crs.to_string() != 'EPSG:4326':
            gdf = gdf.to_crs('EPSG:4326')

        # Convert to GeoJSON
        geojson = json.loads(gdf.to_json())

        # Get first feature geometry
        if geojson['features']:
            geometry = geojson['features'][0]['geometry']
            ee_geom = ee.Geometry(geometry)
            print(f"✓ {name} parsed successfully from {filename}")
            return ee_geom
        else:
            raise ValueError(f"No features found in {filename}")

    except Exception as e:
        print(f"✗ Error processing {name} file '{filename}': {str(e)}")
        return None
    finally:
        # Cleanup temp file
        if os.path.exists(temp_path):
            os.remove(temp_path)
        # Clean up extracted shapefile directory if it exists
        if os.path.exists('/tmp/shp_extract'):
             import shutil
             shutil.rmtree('/tmp/shp_extract')

# Call the upload function to start the process
upload_files()

Please upload your Field Boundary file (KML, GeoJSON, SHP/ZIP)...


Saving 6-14Field.kml to 6-14Field (1).kml

Please upload your Lease Boundary file (KML, GeoJSON, SHP/ZIP)...


Saving 6-14Lease.kml to 6-14Lease (1).kml

Processing '6-14Field (1).kml' and '6-14Lease (1).kml'...
✓ Field Boundary parsed successfully from 6-14Field (1).kml
✓ Lease Boundary parsed successfully from 6-14Lease (1).kml

✓ Both boundaries loaded successfully!
Site center: 50.33521°, -101.84006°

You can now proceed to the next steps.


## 1. Upload Boundary Files

Upload your polygon files (KML, GeoJSON, or SHP/ZIP):
- **Field Boundary:** The clean agricultural area (quarter section minus non-arable areas)
- **Lease Boundary:** The disturbed oilfield lease site

## 2. Extract Crop History (AAFC Annual Crop Inventory)

Identify what crop was grown in the field for each year (2017-2023)

In [11]:
# AAFC crop classification lookup
# Source: https://agriculture.canada.ca/atlas/data_donnees/annualCropInventory/supportdocument_documentdesupport/
CROP_CLASSES = {
    10: 'Cloud',
    20: 'Water',
    30: 'Exposed Land and Barren',
    34: 'Urban and Developed',
    35: 'Greenhouses',
50: 'Shrubland',
    80: 'Wetland',
    85: 'Peatland',
    110: 'Grassland',
    120: 'Agriculture (undifferentiated)',
    122: 'Pasture and Forages',
    130: 'Too Wet to be Seeded',
    131: 'Fallow',
    132: 'Cereals',
    133: 'Barley',
    134: 'Other Grains',
    135: 'Millet',
    136: 'Oats',
    137: 'Rye',
    138: 'Spelt',
    139: 'Triticale',
    140: 'Wheat',
    141: 'Switchgrass',
    142: 'Sorghum',
    143: 'Quinoa',
    145: 'Winter Wheat',
    146: 'Spring Wheat',
    147: 'Corn',
    148: 'Tobacco',
    149: 'Ginseng',
    150: 'Oilseeds',
    151: 'Borage',
    152: 'Camelina',
    153: 'Canola and Rapeseed',
    154: 'Flaxseed',
    155: 'Mustard',
    156: 'Safflower',
    157: 'Sunflower',
    158: 'Soybeans',
    160: 'Pulses',
    161: 'Other Pulses',
    162: 'Peas',
    163: 'Chickpeas',
    167: 'Beans',
    168: 'Fababeans',
    174: 'Lentils',
    175: 'Vegetables',
    176: 'Tomatoes',
    177: 'Potatoes',
    178: 'Sugarbeets',
    179: 'Other Vegetables',
    180: 'Fruits',
    181: 'Berries',
    182: 'Blueberry',
    183: 'Cranberry',
    185: 'Other Berry',
    188: 'Orchards',
    189: 'Other Fruits',
    190: 'Vineyards',
    191: 'Hops',
    192: 'Sod',
    193: 'Herbs',
    194: 'Nursery',
    195: 'Buckwheat',
    196: 'Canaryseed',
    197: 'Hemp',
    198: 'Vetch',
    199: 'Other Crops',
    200: 'Forest (undifferentiated)',
    210: 'Coniferous',
    220: 'Broadleaf',
    230: 'Mixedwood'
}

# The user pointed out that Hops (153) and Sugar Beets (146) were incorrect for this region.
# While I've updated the core list based on AAFC documentation,
# I'll keep the deletion logic commented out in case the user wants to remove specific crops not relevant to their study area.
# del CROP_CLASSES[153] # Remove Hops
# del CROP_CLASSES[146] # Remove Sugar Beets
# print(f"Remaining crop classes: {list(CROP_CLASSES.keys())}") # Uncomment to see remaining codes

def get_crop_history(geometry, years=range(2017, 2024), scale=30, sample_size=500):
    """
    Extract the most frequent crop type within a geometry for each year,
    considering only allowed crop codes.
    """
    aafc = ee.ImageCollection('AAFC/ACI')
    allowed_codes = list(CROP_CLASSES.keys()) # Get codes from the modified dictionary

    crop_history = {}

    for year in years:
        # Get crop inventory for this year
        crop_img = aafc.filter(ee.Filter.date(f'{year}-01-01', f'{year}-12-31')).first()

        if crop_img:
            # Sample pixels within the geometry
            samples = crop_img.select('landcover').sample(
                region=geometry,
                scale=scale,
                numPixels=sample_size,
                seed=year, # Use year as seed for reproducibility per year
                geometries=False
            )

            # Convert samples to a list and filter based on allowed codes
            sample_list = samples.aggregate_array('landcover').getInfo()

            # Filter out codes that are not in the allowed list
            filtered_samples = [code for code in sample_list if code in allowed_codes]

            if filtered_samples:
                # Find the mode (most frequent) among the filtered samples
                # If there's a tie, numpy.bincount takes the smallest value
                # We can use collections.Counter for potentially more robust mode finding
                from collections import Counter
                code_counts = Counter(filtered_samples)
                # Get the most common code(s) - returns a list of tuples (code, count)
                most_common = code_counts.most_common(1)

                if most_common:
                     crop_code = most_common[0][0]
                     crop_name = CROP_CLASSES.get(crop_code, f'Unknown ({crop_code})')
                     crop_history[year] = {'code': crop_code, 'name': crop_name}
                else:
                     # Should not happen if filtered_samples is not empty, but as a fallback
                     print(f"Warning: No valid crop code found in filtered samples for year {year}")
                     crop_history[year] = {'code': None, 'name': 'No Valid Crop Found'}
            else:
                 print(f"Warning: No sampled pixels had an allowed crop code for year {year}. Consider increasing sample_size or checking boundaries/AAFC data.")
                 crop_history[year] = {'code': None, 'name': 'No Allowed Crop Sampled'}

    return crop_history

In [12]:
# Extract crop history for the field
if field_boundary:
    print("Extracting crop history from AAFC Annual Crop Inventory...")
    crop_history = get_crop_history(field_boundary)

    # Display as table
    crop_df = pd.DataFrame.from_dict(crop_history, orient='index')
    crop_df.index.name = 'Year'
    print("\nCrop History:")
    display(crop_df)
else:
    print("Please upload field boundary first.")

Extracting crop history from AAFC Annual Crop Inventory...

Crop History:


Unnamed: 0_level_0,code,name
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
2017,146,Spring Wheat
2018,153,Canola and Rapeseed
2019,146,Spring Wheat
2020,153,Canola and Rapeseed
2021,146,Spring Wheat
2022,153,Canola and Rapeseed
2023,146,Spring Wheat


## 3. Extract AlphaEarth Foundation Embeddings

Get 64D embeddings for:
- Lease pixels (disturbed site)
- Background pixels (healthy field, excluding lease)
- Regional reference (same crop within 10-20km)

In [13]:
# Load AEF dataset
aef_collection = ee.ImageCollection("GOOGLE/SATELLITE_EMBEDDING/V1/ANNUAL")

def get_embeddings(geometry, year, scale=10):
    """
    Extract mean 64D embedding for a geometry and year

    Args:
        geometry: ee.Geometry
        year: int (2017-2024)
        scale: int (default 10m)

    Returns:
        dict with 'embedding' (64D array) and 'pixel_count'
    """
    # Filter to specific year
    aef_year = aef_collection.filter(ee.Filter.date(f'{year}-01-01', f'{year}-12-31')).first()

    # Get all 64 bands
    band_names = [f'embedding_{i}' for i in range(64)]

    # Compute mean embedding across the geometry
    stats = aef_year.select(band_names).reduceRegion(
        reducer=ee.Reducer.mean().combine(
            reducer2=ee.Reducer.count(),
            sharedInputs=True
        ),
        geometry=geometry,
        scale=scale,
        maxPixels=1e8
    )

    result = stats.getInfo()

    # Extract embedding values
    embedding = np.array([result.get(f'embedding_{i}_mean', np.nan) for i in range(64)])
    pixel_count = result.get('embedding_0_count', 0)

    return {
        'embedding': embedding,
        'pixel_count': pixel_count,
        'year': year
    }

def get_regional_reference(center_point, crop_code, year, radius_km=15, sample_pixels=1000, max_radius_km=50):
    """
    Build regional reference embedding by sampling healthy pixels of same crop

    Args:
        center_point: ee.Geometry.Point
        crop_code: int (AAFC crop classification code)
        year: int
        radius_km: float (initial sampling radius in km)
        sample_pixels: int (number of pixels to sample)
        max_radius_km: float (maximum radius to try if no samples found)

    Returns:
        dict with 'embedding' (64D centroid), 'sample_count', and 'actual_radius'
    """
    band_names = [f'embedding_{i}' for i in range(64)]

    # Try progressively larger radii if needed
    for current_radius in [radius_km, radius_km * 2, max_radius_km]:
        # Create sampling region (circular buffer)
        sampling_region = center_point.buffer(current_radius * 1000)  # Convert km to meters

        # Get crop mask for this year
        aafc = ee.ImageCollection('AAFC/ACI')
        crop_img = aafc.filter(ee.Filter.date(f'{year}-01-01', f'{year}-12-31')).first()

        # Create mask for target crop only
        crop_mask = crop_img.select('landcover').eq(crop_code)

        # Get AEF embeddings for this year
        aef_year = aef_collection.filter(ee.Filter.date(f'{year}-01-01', f'{year}-12-31')).first()

        # Mask to only include target crop
        masked_embeddings = aef_year.updateMask(crop_mask)

        # Sample pixels
        samples = masked_embeddings.select(band_names).sample(
            region=sampling_region,
            scale=10,
            numPixels=sample_pixels,
            seed=42,
            geometries=False
        )

        sample_count = samples.size().getInfo()

        # If we found enough samples, compute centroid
        if sample_count >= 10:  # Require at least 10 samples for reliability
            # Compute mean (centroid) embedding
            centroid = samples.reduceColumns(
                reducer=ee.Reducer.mean().repeat(64),
                selectors=band_names
            )

            result = centroid.getInfo()

            # Handle case where mean might be None or missing
            if result and 'mean' in result and result['mean'] is not None:
                embedding = np.array(result['mean'], dtype=float)

                # Verify embedding is valid
                if len(embedding) == 64 and not np.all(np.isnan(embedding)):
                    print(f"    Found {sample_count} samples within {current_radius}km radius")
                    return {
                        'embedding': embedding,
                        'sample_count': sample_count,
                        'year': year,
                        'crop_code': crop_code,
                        'actual_radius': current_radius
                    }

        # If we didn't find enough samples, try next radius
        print(f"    Only found {sample_count} samples at {current_radius}km, expanding search...")

    # If we exhausted all radii, return NaN embedding
    print(f"    ⚠ WARNING: Could not find sufficient samples for crop code {crop_code} within {max_radius_km}km")
    return {
        'embedding': np.full(64, np.nan),
        'sample_count': 0,
        'year': year,
        'crop_code': crop_code,
        'actual_radius': max_radius_km
    }

print("✓ Embedding extraction functions ready")


✓ Embedding extraction functions ready


## 4. Compute Similarity Metrics

Calculate cosine similarity between lease and references

In [14]:
def cosine_similarity(vec1, vec2):
    """Compute cosine similarity between two vectors (1 = identical, 0 = orthogonal, -1 = opposite)"""
    # Remove any NaN values
    if np.any(np.isnan(vec1)) or np.any(np.isnan(vec2)):
        return np.nan

    # Cosine similarity = 1 - cosine distance
    return 1 - cosine(vec1, vec2)

def euclidean_distance(vec1, vec2):
    """Compute Euclidean distance between two vectors"""
    if np.any(np.isnan(vec1)) or np.any(np.isnan(vec2)):
        return np.nan
    return np.linalg.norm(vec1 - vec2)

print("✓ Similarity metric functions ready")

✓ Similarity metric functions ready


## 5. Run Complete Analysis

Process all years and compute reclamation assessment metrics

In [15]:
# Run analysis for all available years
if field_boundary and lease_boundary and crop_history:
    print("Running reclamation analysis...\n")

    results = []
    site_center_point = field_boundary.centroid()

    # Calculate background area (field minus lease)
    background_area = field_boundary.difference(lease_boundary)

    for year in sorted(crop_history.keys()):
        crop_info = crop_history[year]
        print(f"\nProcessing {year} - {crop_info['name']}...")

        try:
            # Extract embeddings
            print("  - Extracting lease embeddings...")
            lease_emb = get_embeddings(lease_boundary, year)

            print("  - Extracting background embeddings...")
            background_emb = get_embeddings(background_area, year)

            print("  - Building regional reference...")
            regional_ref = get_regional_reference(
                site_center_point,
                crop_info['code'],
                year,
                radius_km=15
            )

            # Compute similarities
            lease_vs_regional = cosine_similarity(lease_emb['embedding'], regional_ref['embedding'])
            background_vs_regional = cosine_similarity(background_emb['embedding'], regional_ref['embedding'])
            lease_vs_background = cosine_similarity(lease_emb['embedding'], background_emb['embedding'])

            # Difference-in-differences: How much worse is lease compared to background?
            did_score = lease_vs_regional - background_vs_regional

            results.append({
                'year': year,
                'crop': crop_info['name'],
                'crop_code': crop_info['code'],
                'lease_pixels': lease_emb['pixel_count'],
                'background_pixels': background_emb['pixel_count'],
                'regional_samples': regional_ref['sample_count'],
                'lease_vs_regional': lease_vs_regional,
                'background_vs_regional': background_vs_regional,
                'lease_vs_background': lease_vs_background,
                'difference_in_differences': did_score
            })

            print(f"  ✓ Lease vs Regional: {lease_vs_regional:.4f}")
            print(f"  ✓ Background vs Regional: {background_vs_regional:.4f}")
            print(f"  ✓ Difference-in-Differences: {did_score:.4f}")

        except Exception as e:
            print(f"  ✗ Error: {str(e)}")
            continue

    # Create results DataFrame
    results_df = pd.DataFrame(results)

    print("\n" + "="*80)
    print("ANALYSIS COMPLETE")
    print("="*80)
    display(results_df)
else:
    print("Please complete all previous steps first.")

Running reclamation analysis...


Processing 2017 - Spring Wheat...
  - Extracting lease embeddings...
  ✗ Error: Image.select: Band pattern 'embedding_0' did not match any bands. Available bands: [A00, A01, A02, A03, A04, A05, A06, A07, A08, A09, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24, A25, A26, A27, A28, A29, A30, A31, A32, A33, A34, A35, A36, A37, A38, A39, A40, A41, A42, A43, A44, A45, A46, A47, A48, A49, A50, A51, A52, A53, A54, A55, A56, A57, A58, A59, A60, A61, A62, A63]

Processing 2018 - Canola and Rapeseed...
  - Extracting lease embeddings...
  ✗ Error: Image.select: Band pattern 'embedding_0' did not match any bands. Available bands: [A00, A01, A02, A03, A04, A05, A06, A07, A08, A09, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24, A25, A26, A27, A28, A29, A30, A31, A32, A33, A34, A35, A36, A37, A38, A39, A40, A41, A42, A43, A44, A45, A46, A47, A48, A49, A50, A51, A52, A53, A54, A55, A56, A57, A58, A59, A60, A61, 

## 6. Visualization: Recovery Trajectory

Plot similarity metrics over time to assess reclamation progress

In [None]:
if 'results_df' in locals() and len(results_df) > 0:
    fig, axes = plt.subplots(2, 1, figsize=(12, 10))

    # Plot 1: Cosine similarity trends
    ax1 = axes[0]
    ax1.plot(results_df['year'], results_df['lease_vs_regional'],
             marker='o', linewidth=2, label='Lease vs Regional Reference', color='red')
    ax1.plot(results_df['year'], results_df['background_vs_regional'],
             marker='s', linewidth=2, label='Background vs Regional Reference', color='green')
    ax1.plot(results_df['year'], results_df['lease_vs_background'],
             marker='^', linewidth=2, label='Lease vs Background', color='blue', linestyle='--')

    ax1.set_xlabel('Year', fontsize=12)
    ax1.set_ylabel('Cosine Similarity', fontsize=12)
    ax1.set_title('Reclamation Site Recovery Trajectory\n(Higher = More Similar to Healthy Reference)',
                  fontsize=14, fontweight='bold')
    ax1.legend(loc='best', fontsize=10)
    ax1.grid(True, alpha=0.3)
    ax1.set_ylim([0, 1])

    # Add crop labels
    for idx, row in results_df.iterrows():
        ax1.text(row['year'], row['lease_vs_regional'] + 0.02,
                row['crop'][:10], fontsize=8, rotation=45, ha='left')

    # Plot 2: Difference-in-Differences (Recovery Gap)
    ax2 = axes[1]
    colors = ['red' if x < 0 else 'green' for x in results_df['difference_in_differences']]
    ax2.bar(results_df['year'], results_df['difference_in_differences'],
            color=colors, alpha=0.7, edgecolor='black')
    ax2.axhline(y=0, color='black', linestyle='-', linewidth=1)

    ax2.set_xlabel('Year', fontsize=12)
    ax2.set_ylabel('Difference-in-Differences Score', fontsize=12)
    ax2.set_title('Recovery Gap: Lease Performance vs Background\n(Positive = Lease Recovering, Negative = Lease Underperforming)',
                  fontsize=14, fontweight='bold')
    ax2.grid(True, alpha=0.3, axis='y')

    plt.tight_layout()
    plt.show()

    # Summary statistics
    print("\nSummary Statistics:")
    print(f"Mean Lease vs Regional Similarity: {results_df['lease_vs_regional'].mean():.4f}")
    print(f"Mean Background vs Regional Similarity: {results_df['background_vs_regional'].mean():.4f}")
    print(f"Mean Recovery Gap: {results_df['difference_in_differences'].mean():.4f}")

    if results_df['difference_in_differences'].mean() >= -0.05:
        print("\n✓ ASSESSMENT: Lease appears to be performing similarly to background field.")
        print("  Reclamation may be approaching equivalent land capability.")
    else:
        print("\n⚠ ASSESSMENT: Lease is underperforming compared to background field.")
        print("  Further reclamation work or monitoring may be needed.")
else:
    print("No results to visualize yet.")

## 7. Export Results

Save analysis results for reporting

In [None]:
if 'results_df' in locals():
    # Export to CSV
    output_file = 'reclamation_analysis_results.csv'
    results_df.to_csv(output_file, index=False)
    print(f"✓ Results saved to {output_file}")

    # Create summary report
    summary = f"""
    RECLAMATION ASSESSMENT SUMMARY
    ==============================

    Site Location: {site_center[1]:.5f}°, {site_center[0]:.5f}°
    Analysis Period: {results_df['year'].min()} - {results_df['year'].max()}

    Average Metrics:
    - Lease vs Regional Reference: {results_df['lease_vs_regional'].mean():.4f}
    - Background vs Regional Reference: {results_df['background_vs_regional'].mean():.4f}
    - Recovery Gap (DiD): {results_df['difference_in_differences'].mean():.4f}

    Trend Analysis:
    - First Year DiD: {results_df.iloc[0]['difference_in_differences']:.4f}
    - Last Year DiD: {results_df.iloc[-1]['difference_in_differences']:.4f}
    - Change: {results_df.iloc[-1]['difference_in_differences'] - results_df.iloc[0]['difference_in_differences']:.4f}

    Interpretation:
    The difference-in-differences (DiD) score shows how the lease performs relative
    to the background field when both are compared to regional healthy cropland.

    - DiD ≈ 0: Lease performing similar to background (equivalent land use)
    - DiD < -0.05: Lease underperforming (needs attention)
    - DiD > 0: Lease outperforming background (unexpected but possible)
    """

    print(summary)

    with open('reclamation_summary.txt', 'w') as f:
        f.write(summary)
    print("\n✓ Summary report saved to reclamation_summary.txt")

## Interpretation Guide

### Cosine Similarity Scores
- **1.0** = Identical embedding vectors (perfect match)
- **0.9-1.0** = Very high similarity (typical for same crop type in good condition)
- **0.7-0.9** = Moderate similarity (some differences but generally similar)
- **<0.7** = Low similarity (significant differences)

### Difference-in-Differences (DiD) Score
This metric answers: **"Given this year's crop and regional conditions, did the lease behave like healthy peers?"**

**DiD = (Lease vs Regional) - (Background vs Regional)**

- **DiD ≈ 0** (±0.05): Lease is performing equivalently to background field → **Reclamation Success**
- **DiD < -0.05**: Lease is underperforming compared to background → **Needs Attention**
- **DiD > 0.05**: Lease is outperforming background (rare, investigate if real or artifact)

### Recovery Trajectory
Look for these patterns over time:
- **Improving trend**: DiD increasing toward zero = recovery in progress
- **Stable at zero**: DiD consistently near zero = equivalent land capability achieved
- **Declining trend**: DiD becoming more negative = degradation or poor reclamation

### Spatial Resolution Considerations
- **AEF Resolution**: 10m × 10m pixels
- **100m × 100m lease** = ~100 pixels (adequate for statistical analysis)
- **15m access road** = 1-2 pixels wide (may be too small for reliable assessment)

For small features like access roads, consider aggregating multiple years or focusing on larger disturbed areas.