# Reclamation Assessment using Robust Z-Score Transformed NDVI

## Overview
This notebook analyzes crop growth within lease boundaries compared to the background field using robust z-score transformation of NDVI rasters. The robust z-score method uses median and MAD (Median Absolute Deviation) statistics, making it more resistant to outliers than standard z-scores.

## Workflow
1. Upload multiple NDVI rasters (GeoTIFFs)
2. Upload two polygon boundaries:
   - Field boundary (entire field)
   - Lease boundary (area of interest within field)
3. For each NDVI raster:
   - Create background mask (field minus lease)
   - Calculate robust statistics on background pixels
   - Transform entire raster using background statistics
   - Generate z-score raster showing standard deviations from background median
4. Download transformed rasters for further analysis

## Interpretation
- **Z-score = 0**: Pixel value equals background median
- **Z-score > 0**: Above background median (better than background)
- **Z-score < 0**: Below background median (worse than background)
- **|Z-score| > 2**: Significantly different from background (outlier)

## 1. Setup and Imports

In [None]:
# Install required packages
%pip install -q geopandas rasterio fiona shapely numpy pandas matplotlib

# Import libraries
import os
import warnings
import zipfile
from datetime import datetime
from typing import List, Tuple, Optional, Dict, Any

import numpy as np
import pandas as pd
import geopandas as gpd
import rasterio
from rasterio.io import MemoryFile
from rasterio.mask import mask
from rasterio.warp import calculate_default_transform, reproject, Resampling
from shapely.geometry import shape, mapping
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from matplotlib.patches import Patch

# Google Colab specific imports
from google.colab import files
from IPython.display import display, HTML

warnings.filterwarnings('ignore')

print("‚úÖ Setup complete. All libraries imported successfully.")
print("üìç Running in Google Colab environment")

## 2. File Upload Interface

In [None]:
# Create upload interface
print("üìÇ Please upload your files:")
print("="*50)
print("1Ô∏è‚É£ NDVI Rasters (multiple .tif files)")
print("2Ô∏è‚É£ Field Boundary polygon (1 file: .kml, .geojson, or .shp)")
print("3Ô∏è‚É£ Lease Boundary polygon (1 file: .kml, .geojson, or .shp)")
print("="*50)

uploaded = files.upload()

# Categorize uploaded files
ndvi_files = []
polygon_files = []

for filename in uploaded.keys():
    if filename.lower().endswith(('.tif', '.tiff')):
        ndvi_files.append(filename)
    elif filename.lower().endswith(('.kml', '.geojson', '.shp')):
        polygon_files.append(filename)

print(f"\n‚úÖ Upload Summary:")
print(f"   - NDVI rasters: {len(ndvi_files)} files")
print(f"   - Polygon files: {len(polygon_files)} files")

if len(ndvi_files) == 0:
    print("‚ö†Ô∏è Warning: No NDVI rasters uploaded")
if len(polygon_files) < 2:
    print("‚ö†Ô∏è Warning: Need 2 polygon files (field and lease boundaries)")

## 3. Identify Field and Lease Boundaries

In [None]:
# Helper function to load polygon
def load_polygon(filename: str) -> gpd.GeoDataFrame:
    """Load polygon from various formats"""
    try:
        with open(filename, 'rb') as f:
            file_content = f.read()
        
        # Try loading with geopandas
        with MemoryFile(file_content) as memfile:
            gdf = gpd.read_file(memfile)
        return gdf
    except Exception as e:
        print(f"Error loading {filename}: {e}")
        return None

# Load and identify polygons
if len(polygon_files) >= 2:
    print("\nüîç Identifying field and lease boundaries...")
    print("Please specify which file is which:")
    print("\nAvailable polygon files:")
    for i, filename in enumerate(polygon_files):
        print(f"  {i+1}. {filename}")
    
    # For automated processing, assume files with 'field' or 'lease' in name
    field_boundary_file = None
    lease_boundary_file = None
    
    for filename in polygon_files:
        if 'field' in filename.lower():
            field_boundary_file = filename
        elif 'lease' in filename.lower():
            lease_boundary_file = filename
    
    # If not automatically identified, use first two files
    if not field_boundary_file or not lease_boundary_file:
        print("\n‚ö†Ô∏è Could not auto-identify boundaries from filenames.")
        print("   Assuming first file is field boundary, second is lease boundary.")
        field_boundary_file = polygon_files[0]
        lease_boundary_file = polygon_files[1] if len(polygon_files) > 1 else polygon_files[0]
    
    print(f"\nüìç Field Boundary: {field_boundary_file}")
    print(f"üìç Lease Boundary: {lease_boundary_file}")
    
    # Load the polygons
    field_gdf = load_polygon(field_boundary_file)
    lease_gdf = load_polygon(lease_boundary_file)
    
    if field_gdf is not None and lease_gdf is not None:
        print("\n‚úÖ Both boundaries loaded successfully")
        print(f"   Field CRS: {field_gdf.crs}")
        print(f"   Lease CRS: {lease_gdf.crs}")
else:
    print("\n‚ùå Need at least 2 polygon files to proceed")
    field_gdf = None
    lease_gdf = None

## 4. Process NDVI Rasters with Robust Z-Score Transformation

In [None]:
def calculate_robust_stats(data: np.ndarray) -> Dict[str, float]:
    """Calculate robust statistics (median and MAD)"""
    # Remove NaN and NoData values
    valid_data = data[~np.isnan(data)]
    valid_data = valid_data[(valid_data != -9999) & (valid_data != -10000)]  # Common NoData values
    
    if len(valid_data) == 0:
        return {'median': np.nan, 'mad': np.nan, 'robust_std': np.nan}
    
    median = np.median(valid_data)
    mad = np.median(np.abs(valid_data - median))
    robust_std = 1.4826 * mad  # Scale factor for consistency with standard deviation
    
    return {
        'median': median,
        'mad': mad,
        'robust_std': robust_std,
        'n_valid': len(valid_data)
    }

def robust_z_score_transform(data: np.ndarray, median: float, robust_std: float) -> np.ndarray:
    """Transform data to robust z-scores"""
    if robust_std == 0 or np.isnan(robust_std):
        return np.zeros_like(data)
    
    z_scores = (data - median) / robust_std
    return z_scores

def process_ndvi_raster(raster_file: str, field_geom, lease_geom) -> Dict[str, Any]:
    """Process a single NDVI raster with robust z-score transformation"""
    
    results = {'filename': raster_file}
    
    try:
        with rasterio.open(raster_file) as src:
            # Read metadata
            raster_crs = src.crs
            transform = src.transform
            
            # Reproject polygons to match raster CRS if needed
            if field_gdf.crs != raster_crs:
                field_geom_proj = field_gdf.to_crs(raster_crs).geometry[0]
                lease_geom_proj = lease_gdf.to_crs(raster_crs).geometry[0]
            else:
                field_geom_proj = field_geom
                lease_geom_proj = lease_geom
            
            # Read the full raster
            full_data = src.read(1)
            
            # Create masks for field and lease
            field_mask_data, field_transform = mask(src, [field_geom_proj], crop=True)
            lease_mask_data, lease_transform = mask(src, [lease_geom_proj], crop=True)
            
            # Get the full field extent for consistent output
            field_bounds = field_geom_proj.bounds
            
            # Mask the full raster to field extent
            field_data, out_transform = mask(src, [field_geom_proj], crop=True)
            field_data = field_data[0]  # Get first band
            
            # Create lease mask within field extent
            from rasterio.features import geometry_mask
            lease_mask = geometry_mask(
                [lease_geom_proj],
                out_shape=field_data.shape,
                transform=out_transform,
                invert=True
            )
            
            # Create background mask (field minus lease)
            background_mask = ~lease_mask  # Areas in field but not in lease
            
            # Extract background pixels
            background_pixels = field_data.copy()
            background_pixels[lease_mask] = np.nan  # Mask out lease area
            background_pixels[field_data == src.nodata] = np.nan  # Mask NoData
            
            # Calculate robust statistics on background
            stats = calculate_robust_stats(background_pixels)
            
            # Transform entire field raster to robust z-scores
            z_score_raster = robust_z_score_transform(
                field_data, 
                stats['median'], 
                stats['robust_std']
            )
            
            # Mask NoData values
            z_score_raster[field_data == src.nodata] = np.nan
            
            # Store results
            results['success'] = True
            results['z_score_raster'] = z_score_raster
            results['transform'] = out_transform
            results['crs'] = raster_crs
            results['stats'] = stats
            results['lease_mask'] = lease_mask
            results['shape'] = z_score_raster.shape
            
            # Calculate summary statistics for lease area
            lease_pixels = z_score_raster[lease_mask]
            lease_pixels = lease_pixels[~np.isnan(lease_pixels)]
            
            if len(lease_pixels) > 0:
                results['lease_stats'] = {
                    'mean_z': np.mean(lease_pixels),
                    'median_z': np.median(lease_pixels),
                    'std_z': np.std(lease_pixels),
                    'min_z': np.min(lease_pixels),
                    'max_z': np.max(lease_pixels),
                    'n_pixels': len(lease_pixels)
                }
            else:
                results['lease_stats'] = None
            
    except Exception as e:
        results['success'] = False
        results['error'] = str(e)
    
    return results

print("‚úÖ Processing functions defined")

In [None]:
# Process all NDVI rasters
processed_rasters = []

if ndvi_files and field_gdf is not None and lease_gdf is not None:
    print("\nüîÑ Processing NDVI rasters...")
    print("="*50)
    
    field_geom = field_gdf.geometry[0]
    lease_geom = lease_gdf.geometry[0]
    
    for i, raster_file in enumerate(ndvi_files):
        print(f"\n[{i+1}/{len(ndvi_files)}] Processing: {raster_file}")
        
        result = process_ndvi_raster(raster_file, field_geom, lease_geom)
        
        if result['success']:
            processed_rasters.append(result)
            stats = result['stats']
            print(f"   ‚úÖ Background Statistics:")
            print(f"      - Median: {stats['median']:.4f}")
            print(f"      - MAD: {stats['mad']:.4f}")
            print(f"      - Robust Std: {stats['robust_std']:.4f}")
            print(f"      - Valid pixels: {stats['n_valid']:,}")
            
            if result['lease_stats']:
                lease_stats = result['lease_stats']
                print(f"   üìä Lease Area Z-Score Statistics:")
                print(f"      - Mean Z: {lease_stats['mean_z']:.3f}")
                print(f"      - Median Z: {lease_stats['median_z']:.3f}")
                print(f"      - Range: [{lease_stats['min_z']:.3f}, {lease_stats['max_z']:.3f}]")
        else:
            print(f"   ‚ùå Error: {result.get('error', 'Unknown error')}")
    
    print(f"\n‚úÖ Processed {len(processed_rasters)}/{len(ndvi_files)} rasters successfully")
else:
    print("\n‚ùå Cannot process: Missing required files")

## 5. Visualize Z-Score Transformed Rasters

In [None]:
def plot_z_score_raster(result: Dict, figsize=(12, 8)):
    """Create visualization of z-score transformed raster"""
    
    fig, axes = plt.subplots(1, 2, figsize=figsize)
    
    z_raster = result['z_score_raster']
    lease_mask = result['lease_mask']
    
    # Create custom colormap (red-white-green)
    colors = ['darkred', 'red', 'white', 'lightgreen', 'darkgreen']
    n_bins = 100
    cmap = mcolors.LinearSegmentedColormap.from_list('z_score', colors, N=n_bins)
    
    # Set color limits for better visualization
    vmin, vmax = -3, 3  # Standard range for z-scores
    
    # Plot 1: Full field z-score map
    im1 = axes[0].imshow(z_raster, cmap=cmap, vmin=vmin, vmax=vmax)
    axes[0].set_title('Z-Score Transformed NDVI\n(Full Field)', fontsize=12, fontweight='bold')
    axes[0].axis('off')
    
    # Add lease boundary overlay
    lease_overlay = np.ma.masked_where(~lease_mask, np.ones_like(z_raster))
    axes[0].imshow(lease_overlay, alpha=0.2, cmap='Blues')
    
    # Plot 2: Histogram of z-scores
    valid_z = z_raster[~np.isnan(z_raster)]
    lease_z = z_raster[lease_mask & ~np.isnan(z_raster)]
    background_z = z_raster[~lease_mask & ~np.isnan(z_raster)]
    
    axes[1].hist(background_z, bins=50, alpha=0.5, label='Background', color='gray', density=True)
    axes[1].hist(lease_z, bins=50, alpha=0.7, label='Lease Area', color='blue', density=True)
    axes[1].axvline(0, color='black', linestyle='--', label='Background Median')
    axes[1].axvline(-2, color='red', linestyle=':', alpha=0.5)
    axes[1].axvline(2, color='red', linestyle=':', alpha=0.5, label='¬±2 Std Dev')
    
    axes[1].set_xlabel('Z-Score', fontsize=10)
    axes[1].set_ylabel('Density', fontsize=10)
    axes[1].set_title('Distribution of Z-Scores', fontsize=12, fontweight='bold')
    axes[1].legend(loc='upper right', fontsize=9)
    axes[1].grid(True, alpha=0.3)
    
    # Add colorbar
    cbar = plt.colorbar(im1, ax=axes, orientation='horizontal', pad=0.1, aspect=30)
    cbar.set_label('Z-Score (Robust Standard Deviations from Background Median)', fontsize=10)
    
    # Add title with filename
    fig.suptitle(f"File: {result['filename']}", fontsize=14, fontweight='bold', y=1.02)
    
    plt.tight_layout()
    return fig

# Visualize processed rasters
if processed_rasters:
    print("\nüìä Generating visualizations...")
    
    # Show first few rasters (to avoid overwhelming output)
    max_plots = min(3, len(processed_rasters))
    
    for i in range(max_plots):
        fig = plot_z_score_raster(processed_rasters[i])
        plt.show()
    
    if len(processed_rasters) > max_plots:
        print(f"\nüìå Showing first {max_plots} of {len(processed_rasters)} visualizations")
        print("   (All rasters will be included in download)")

## 6. Export Z-Score Transformed Rasters

In [None]:
def save_z_score_geotiff(result: Dict, output_dir: str) -> str:
    """Save z-score raster as GeoTIFF"""
    
    # Create output filename
    base_name = os.path.splitext(result['filename'])[0]
    output_file = os.path.join(output_dir, f"{base_name}_zscore.tif")
    
    # Write GeoTIFF
    with rasterio.open(
        output_file,
        'w',
        driver='GTiff',
        height=result['shape'][0],
        width=result['shape'][1],
        count=1,
        dtype='float32',
        crs=result['crs'],
        transform=result['transform'],
        compress='lzw',
        nodata=np.nan
    ) as dst:
        dst.write(result['z_score_raster'].astype(np.float32), 1)
        
        # Add metadata tags
        dst.update_tags(
            description="Robust Z-Score Transformed NDVI",
            background_median=str(result['stats']['median']),
            background_mad=str(result['stats']['mad']),
            background_robust_std=str(result['stats']['robust_std']),
            processing_date=datetime.now().isoformat(),
            interpretation="Values represent robust standard deviations from background median"
        )
    
    return output_file

# Create output directory and save all processed rasters
if processed_rasters:
    output_dir = 'zscore_outputs'
    os.makedirs(output_dir, exist_ok=True)
    
    print("\nüíæ Saving z-score transformed rasters...")
    print("="*50)
    
    saved_files = []
    
    for i, result in enumerate(processed_rasters):
        try:
            output_file = save_z_score_geotiff(result, output_dir)
            saved_files.append(output_file)
            print(f"   ‚úÖ [{i+1}/{len(processed_rasters)}] Saved: {os.path.basename(output_file)}")
        except Exception as e:
            print(f"   ‚ùå [{i+1}/{len(processed_rasters)}] Error saving {result['filename']}: {e}")
    
    print(f"\n‚úÖ Saved {len(saved_files)} z-score rasters to '{output_dir}/'")

## 7. Generate Summary Statistics

In [None]:
# Create summary statistics CSV
if processed_rasters:
    print("\nüìä Generating summary statistics...")
    
    summary_data = []
    
    for result in processed_rasters:
        row = {
            'Filename': result['filename'],
            'Background_Median': result['stats']['median'],
            'Background_MAD': result['stats']['mad'],
            'Background_Robust_Std': result['stats']['robust_std'],
            'Background_Pixels': result['stats']['n_valid']
        }
        
        if result['lease_stats']:
            row.update({
                'Lease_Mean_Z': result['lease_stats']['mean_z'],
                'Lease_Median_Z': result['lease_stats']['median_z'],
                'Lease_Std_Z': result['lease_stats']['std_z'],
                'Lease_Min_Z': result['lease_stats']['min_z'],
                'Lease_Max_Z': result['lease_stats']['max_z'],
                'Lease_Pixels': result['lease_stats']['n_pixels']
            })
        
        summary_data.append(row)
    
    # Create DataFrame and save to CSV
    df_summary = pd.DataFrame(summary_data)
    summary_file = os.path.join(output_dir, 'zscore_summary_statistics.csv')
    df_summary.to_csv(summary_file, index=False)
    
    print("\nüìã Summary Statistics:")
    print(df_summary.to_string(index=False))
    print(f"\n‚úÖ Summary saved to: {summary_file}")

## 8. Create Download Archive

In [None]:
# Create ZIP archive for download
if processed_rasters and 'saved_files' in locals():
    print("\nüì¶ Creating download archive...")
    
    zip_filename = 'zscore_transformed_ndvi.zip'
    
    with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
        # Add all saved GeoTIFFs
        for file in saved_files:
            if os.path.exists(file):
                zipf.write(file, os.path.basename(file))
        
        # Add summary CSV
        if os.path.exists(summary_file):
            zipf.write(summary_file, os.path.basename(summary_file))
        
        # Add README
        readme_content = """Z-Score Transformed NDVI Rasters
=====================================

This archive contains robust z-score transformed NDVI rasters.

Transformation Method:
- Background area: Field boundary minus lease boundary
- Statistics: Median and MAD (Median Absolute Deviation) calculated on background pixels
- Transformation: Z = (NDVI - Background_Median) / (1.4826 * Background_MAD)

Interpretation:
- Z = 0: Pixel equals background median
- Z > 0: Above background median (better performance)
- Z < 0: Below background median (worse performance)
- |Z| > 2: Significantly different from background

Files Included:
- *_zscore.tif: Z-score transformed NDVI rasters (GeoTIFF format)
- zscore_summary_statistics.csv: Summary statistics for all processed rasters

Processing Date: {}
""".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
        
        zipf.writestr('README.txt', readme_content)
    
    print(f"‚úÖ Archive created: {zip_filename}")
    print(f"   Size: {os.path.getsize(zip_filename) / 1024 / 1024:.2f} MB")
    print("\n‚¨áÔ∏è Starting download...")
    
    # Trigger download
    files.download(zip_filename)
    
    print("\nüéâ Processing complete! Your z-score transformed rasters are ready.")
else:
    print("\n‚ö†Ô∏è No files to download. Please process rasters first.")

## 9. Interpretation Guide

### Understanding Z-Scores in Reclamation Context

The robust z-score transformation provides a standardized way to compare lease area performance against the background field:

#### Z-Score Ranges:
- **Z < -2**: Significantly below background (potential problem area)
- **-2 ‚â§ Z < -1**: Moderately below background
- **-1 ‚â§ Z < 0**: Slightly below background
- **Z ‚âà 0**: Similar to background
- **0 < Z ‚â§ 1**: Slightly above background
- **1 < Z ‚â§ 2**: Moderately above background
- **Z > 2**: Significantly above background (excellent performance)

#### Reclamation Assessment:
- **Successful Reclamation**: Lease area Z-scores close to 0 or positive
- **Needs Attention**: Lease area Z-scores consistently negative
- **Excellent Recovery**: Lease area Z-scores consistently positive

#### Advantages of Robust Z-Score:
1. **Outlier Resistant**: Uses median/MAD instead of mean/std
2. **Standardized Scale**: Easy comparison across dates
3. **Statistical Significance**: ¬±2 represents significant deviation
4. **Relative Performance**: Accounts for field-wide conditions