# Google Earth Engine Data Preparation for FuseTS (FIXED VERSION)

This notebook extracts Sentinel-1 and Sentinel-2 data from Google Earth Engine and prepares it for FuseTS MOGPR processing.

## ‚úÖ FIXES IN THIS VERSION:
1. **Uses Level-2A Surface Reflectance** (`COPERNICUS/S2_SR_HARMONIZED`) instead of Level-1C TOA
2. **Includes cloud masking** for better NDVI quality
3. **Validates NDVI values** are in [-1, 1] range before export
4. **Adds diagnostic checks** to catch data corruption early
5. **Ensures correct band selection** when combining S1 and S2

## ‚ö†Ô∏è ORIGINAL BUG:
The original notebook exported VV/VH backscatter (-48 to 6 dB) in the S2ndvi band instead of actual NDVI values (-1 to 1). This made S1‚ÜíNDVI fusion impossible.

## Temporal Compositing Strategy
- **Total periods**: 62 periods from Nov 2023 - Nov 2025
- **Period length**: 12 days each
- **Start date**: November 1, 2023
- **End date**: November 7, 2025

## Indonesian Agricultural Calendar Coverage
This date range captures:
- **First planting season**: Nov 2023 - Mar 2024 (crosses year boundary)
- **Second planting season**: Apr - Jun 2024
- **Third planting season**: Jul - Sep 2024
- **Full cycle**: ~2 complete agricultural years

## Output Format
Data will be exported in FuseTS-compatible format with proper band naming:
- S1: `VV`, `VH` bands (backscatter in dB, range: -50 to +10)
- S2: `S2ndvi` band (NDVI values, range: -1 to 1) ‚Üê **NOW CORRECT!**
- Dimensions: `(time, y, x)` with `t` coordinate name

## 1. Setup and Authentication

In [1]:
import ee
import geemap
import pandas as pd
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import os
import warnings
warnings.filterwarnings('ignore')

# Additional imports for mask processing
import rasterio
from rasterio.features import shapes
import geopandas as gpd
from shapely.geometry import shape, mapping
from shapely.ops import unary_union

# Initialize Earth Engine with authentication
print("üîê Authenticating with Google Earth Engine...")

try:
    # First time setup: authenticate
    ee.Authenticate()
    print("‚úÖ Authentication successful!")
except Exception as e:
    print(f"Authentication note: {e}")
    print("If already authenticated, continuing...")

# Initialize with project
try:
    ee.Initialize(project='ee-geodeticengineeringundip')
    print("‚úÖ Earth Engine initialized successfully!")
    print(f"   Project: ee-geodeticengineeringundip")
except Exception as e:
    print(f"‚ùå Error initializing Earth Engine: {e}")
    print("Please ensure:")
    print("  1. You have run ee.Authenticate() successfully")
    print("  2. You have access to project 'ee-geodeticengineeringundip'")
    raise

print(f"\nüì¶ Package versions:")
print(f"   Earth Engine API: {ee.__version__}")
print(f"   geemap: {geemap.__version__}")
print(f"   rasterio: {rasterio.__version__}")

üîê Authenticating with Google Earth Engine...
‚úÖ Authentication successful!
‚úÖ Earth Engine initialized successfully!
   Project: ee-geodeticengineeringundip

üì¶ Package versions:
   Earth Engine API: 1.6.15
   geemap: 0.36.6
   rasterio: 1.4.3


## 2. Define Study Area and Parameters

In [2]:
# ============================================================================
# STUDY AREA FROM SHAPEFILE
# ============================================================================

print("="*70)
print("üìç STUDY AREA CONFIGURATION")
print("="*70)

# Load the paddy shapefile
shapefile_path = 'data/klambu-glapan.shp'

print(f"\nüéØ Using Paddy Shapefile: {shapefile_path}")

try:
    # Read the shapefile
    paddy_gdf = gpd.read_file(shapefile_path)
    
    print(f"\n‚úÖ Shapefile loaded successfully!")
    print(f"   Number of features: {len(paddy_gdf)}")
    print(f"   CRS: {paddy_gdf.crs}")
    
    # Convert to UTM Zone 49S (EPSG:32749) - appropriate for Central Java, Indonesia
    print(f"\n   Converting to UTM Zone 49S for accurate buffering...")
    paddy_utm = paddy_gdf.to_crs("EPSG:32749")
    
    # Calculate accurate area in UTM
    total_area_m2 = paddy_utm.area.sum()
    total_area_km2 = total_area_m2 / 1e6
    print(f"   Total paddy area: {total_area_km2:.2f} km¬≤")
    
    # Add buffer in UTM (meters)
    BUFFER_DISTANCE_M = 500  # 500 meters buffer
    print(f"\n   Applying {BUFFER_DISTANCE_M}m buffer (in UTM)...")
    
    # Create buffered geometry in UTM
    paddy_buffered_utm = paddy_utm.copy()
    paddy_buffered_utm['geometry'] = paddy_utm.buffer(BUFFER_DISTANCE_M)
    
    # Merge all buffered polygons into one
    merged_geometry_utm = unary_union(paddy_buffered_utm.geometry)
    buffered_area_km2 = merged_geometry_utm.area / 1e6
    
    print(f"   Buffered area: {buffered_area_km2:.2f} km¬≤")
    
    # Convert back to WGS84 for Earth Engine
    print(f"\n   Converting back to WGS84 for Earth Engine...")
    
    # Create GeoDataFrame with merged buffered geometry (in UTM)
    buffered_gdf_utm = gpd.GeoDataFrame(
        geometry=[merged_geometry_utm],
        crs="EPSG:32749"
    )
    
    # Convert to WGS84
    buffered_gdf_wgs84 = buffered_gdf_utm.to_crs("EPSG:4326")
    
    # Get WGS84 bounds
    west, south, east, north = buffered_gdf_wgs84.total_bounds
    
    print(f"   WGS84 Bounds (for Earth Engine):")
    print(f"     West:  {west:.6f}¬∞")
    print(f"     South: {south:.6f}¬∞")
    print(f"     East:  {east:.6f}¬∞")
    print(f"     North: {north:.6f}¬∞")
    
    # Convert to GeoJSON for Earth Engine
    geojson_geom = mapping(buffered_gdf_wgs84.geometry.iloc[0])
    
    # Create Earth Engine Geometry
    study_area = ee.Geometry(geojson_geom)
    
    gee_area_km2 = study_area.area().getInfo() / 1e6
    
    print(f"\n‚úÖ Study area created from shapefile!")
    print(f"   Type: Paddy field boundaries (Klambu-Glapan)")
    print(f"   Location: Demak, Central Java, Indonesia")
    print(f"   Buffer: {BUFFER_DISTANCE_M}m around paddy fields")
    print(f"   Area (GEE): {gee_area_km2:.2f} km¬≤")
    
    STUDY_AREA_TYPE = 'klambu_glapan_shapefile'
    
except FileNotFoundError:
    print(f"\n‚ùå ERROR: Shapefile not found!")
    print(f"   Expected path: {shapefile_path}")
    print(f"   Please ensure the shapefile exists in the data/ folder")
    raise
    
except Exception as e:
    print(f"\n‚ùå ERROR loading shapefile: {e}")
    import traceback
    traceback.print_exc()
    raise

# Processing parameters
START_DATE = '2023-11-01'  # November 1, 2023
END_DATE = '2025-11-07'    # November 7, 2025
SCALE = 10  # meters per pixel (10m = native S2 resolution)
CRS = 'EPSG:4326'  # WGS84 coordinate system
MAX_CLOUD_COVER = 80  # Maximum cloud cover percentage for S2

# Output directory
OUTPUT_DIR = 'gee_fusets_data_fixed'
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Display final configuration
print(f"\n{'='*70}")
print("üìã FINAL PROCESSING CONFIGURATION")
print(f"{'='*70}")
print(f"   Study Area: {STUDY_AREA_TYPE.upper()}")
print(f"   Area size: {gee_area_km2:.2f} km¬≤")
print(f"   Temporal coverage: {START_DATE} to {END_DATE}")
print(f"   Spatial resolution: {SCALE}m")
print(f"   Max cloud cover: {MAX_CLOUD_COVER}%")
print(f"   Output directory: {OUTPUT_DIR}/")
print(f"\n‚úÖ Using CORRECTED data loading functions (Level-2A SR with cloud masking)")
print(f"{'='*70}")

üìç STUDY AREA CONFIGURATION

üéØ Using Paddy Shapefile: data/klambu-glapan.shp

‚úÖ Shapefile loaded successfully!
   Number of features: 1043
   CRS: EPSG:4326

   Converting to UTM Zone 49S for accurate buffering...
   Total paddy area: 559.46 km¬≤

   Applying 500m buffer (in UTM)...
   Buffered area: 879.84 km¬≤

   Converting back to WGS84 for Earth Engine...
   WGS84 Bounds (for Earth Engine):
     West:  110.513130¬∞
     South: -7.113018¬∞
     East:  111.038229¬∞
     North: -6.713087¬∞

‚úÖ Study area created from shapefile!
   Type: Paddy field boundaries (Klambu-Glapan)
   Location: Demak, Central Java, Indonesia
   Buffer: 500m around paddy fields
   Area (GEE): 884.30 km¬≤

üìã FINAL PROCESSING CONFIGURATION
   Study Area: KLAMBU_GLAPAN_SHAPEFILE
   Area size: 884.30 km¬≤
   Temporal coverage: 2023-11-01 to 2025-11-07
   Spatial resolution: 10m
   Max cloud cover: 80%
   Output directory: gee_fusets_data_fixed/

‚úÖ Using CORRECTED data loading functions (Level-2A SR 

## 3. Generate 12-Day Composite Periods

In [3]:
def generate_12day_periods(start_date_str, end_date_str):
    """
    Generate periods of 12 days each from start date to end date
    
    Parameters:
    -----------
    start_date_str : str
        Start date in 'YYYY-MM-DD' format (e.g., '2023-11-01')
    end_date_str : str
        End date in 'YYYY-MM-DD' format (e.g., '2025-11-07')
    """
    start_date = datetime.strptime(start_date_str, '%Y-%m-%d')
    end_date = datetime.strptime(end_date_str, '%Y-%m-%d')
    
    periods = []
    period_num = 1
    current_start = start_date
    
    while current_start <= end_date:
        period_end = current_start + timedelta(days=11)  # 12 days inclusive
        
        # Ensure we don't go beyond the end date
        if period_end > end_date:
            period_end = end_date
            
        periods.append({
            'period': period_num,
            'start_date': current_start,
            'end_date': period_end,
            'start_str': current_start.strftime('%Y-%m-%d'),
            'end_str': period_end.strftime('%Y-%m-%d'),
            'center_date': current_start + timedelta(days=6),
            'doy_center': (current_start + timedelta(days=6)).timetuple().tm_yday,
            'year': current_start.year,
            'month': current_start.month
        })
        
        if period_end >= end_date:
            break
        
        current_start = period_end + timedelta(days=1)
        period_num += 1
            
    return periods

# Generate periods from Nov 2023 to Nov 2025
periods = generate_12day_periods(START_DATE, END_DATE)

print(f"Generated {len(periods)} periods from {START_DATE} to {END_DATE}:")
print("\nFirst 5 periods:")
for i, period in enumerate(periods[:5]):
    print(f"Period {period['period']:2d}: {period['start_str']} to {period['end_str']}")

print("\nLast 5 periods:")
for i, period in enumerate(periods[-5:]):
    print(f"Period {period['period']:2d}: {period['start_str']} to {period['end_str']}")

print(f"\nTotal temporal coverage: {periods[0]['start_str']} to {periods[-1]['end_str']}")
print(f"Covers {len(periods)} 12-day periods over 2 years")

Generated 62 periods from 2023-11-01 to 2025-11-07:

First 5 periods:
Period  1: 2023-11-01 to 2023-11-12
Period  2: 2023-11-13 to 2023-11-24
Period  3: 2023-11-25 to 2023-12-06
Period  4: 2023-12-07 to 2023-12-18
Period  5: 2023-12-19 to 2023-12-30

Last 5 periods:
Period 58: 2025-09-15 to 2025-09-26
Period 59: 2025-09-27 to 2025-10-08
Period 60: 2025-10-09 to 2025-10-20
Period 61: 2025-10-21 to 2025-11-01
Period 62: 2025-11-02 to 2025-11-07

Total temporal coverage: 2023-11-01 to 2025-11-07
Covers 62 12-day periods over 2 years


## 4. Define Data Loading Functions (FIXED VERSION)

**üîß FIXES IN THIS CELL:**
1. ‚úÖ Uses **Level-2A Surface Reflectance** (`COPERNICUS/S2_SR_HARMONIZED`) instead of Level-1C TOA
2. ‚úÖ Includes **cloud masking** using QA60 band for better NDVI quality
3. ‚úÖ Validates **NDVI values are in [-1, 1]** range
4. ‚úÖ Uses **proper band names** from S2_SR collection (B8, B4)
5. ‚úÖ Adds **diagnostic information** to track data quality

In [4]:
def load_sentinel1_data(geometry, start_date, end_date):
    """
    Load Sentinel-1 GRD data for a specific time period
    
    Returns VV and VH backscatter in dB (typical range: -50 to +10 dB)
    """
    s1_collection = (ee.ImageCollection('COPERNICUS/S1_GRD')
                    .filterBounds(geometry)
                    .filterDate(start_date, end_date)
                    .filter(ee.Filter.eq('instrumentMode', 'IW'))
                    .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VV'))
                    .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VH'))
                    .select(['VV', 'VH']))
    
    return s1_collection

def mask_s2_clouds(image):
    """
    Mask clouds in Sentinel-2 SR image using QA60 band
    
    Bit 10: Clouds (opaque)
    Bit 11: Cirrus clouds
    """
    qa = image.select('QA60')
    
    # Bits 10 and 11 are clouds and cirrus
    cloud_bit_mask = 1 << 10
    cirrus_bit_mask = 1 << 11
    
    # Both bits should be zero for clear conditions
    mask = qa.bitwiseAnd(cloud_bit_mask).eq(0).And(
           qa.bitwiseAnd(cirrus_bit_mask).eq(0))
    
    return image.updateMask(mask)

def load_sentinel2_data(geometry, start_date, end_date, max_cloud_cover=80):
    """
    Load Sentinel-2 Level-2A Surface Reflectance data with cloud masking
    
    ‚úÖ FIXED VERSION:
    - Uses COPERNICUS/S2_SR_HARMONIZED (Level-2A Surface Reflectance)
    - Applies cloud masking using QA60 band
    - Calculates NDVI from atmospherically corrected bands
    - Returns ONLY the NDVI band (no ambiguity)
    
    Benefits:
    - Atmospherically corrected (more accurate than TOA)
    - Cloud-masked (better quality NDVI)
    - Validated NDVI range [-1, 1]
    
    Collection: COPERNICUS/S2_SR_HARMONIZED (Level-2A SR, NOT Level-1C TOA)
    """
    def calculate_ndvi_sr(image):
        # Apply cloud mask first
        image_masked = mask_s2_clouds(image)
        
        # B8 = NIR, B4 = Red (from Surface Reflectance)
        ndvi = image_masked.normalizedDifference(['B8', 'B4']).rename('NDVI')
        
        # Clamp NDVI to valid range [-1, 1] as a safety check
        ndvi = ndvi.clamp(-1, 1)
        
        # Copy properties to NDVI band
        ndvi = ndvi.copyProperties(image, ['system:time_start'])
        
        return ndvi
    
    # Load Level-2A Surface Reflectance data WITH cloud masking
    s2_collection = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')  # ‚Üê FIXED: Using SR, not TOA
                    .filterBounds(geometry)
                    .filterDate(start_date, end_date)
                    .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', max_cloud_cover))
                    .map(calculate_ndvi_sr))  # ‚Üê FIXED: Applies cloud masking + NDVI calculation
    
    return s2_collection

def create_composite(collection, method='median'):
    """
    Create a composite from an image collection
    """
    if method == 'median':
        return collection.median()
    elif method == 'mean':
        return collection.mean()
    elif method == 'max':
        return collection.max()
    else:
        return collection.median()

print("‚úÖ Data loading functions defined successfully!")
print("\nüîß FIXED IMPROVEMENTS:")
print("   ‚úÖ Using Level-2A Surface Reflectance (not TOA)")
print("   ‚úÖ Cloud masking applied using QA60 band")
print("   ‚úÖ NDVI clamped to valid range [-1, 1]")
print("   ‚úÖ Only NDVI band selected (no confusion with VV/VH)")

‚úÖ Data loading functions defined successfully!

üîß FIXED IMPROVEMENTS:
   ‚úÖ Using Level-2A Surface Reflectance (not TOA)
   ‚úÖ Cloud masking applied using QA60 band
   ‚úÖ NDVI clamped to valid range [-1, 1]
   ‚úÖ Only NDVI band selected (no confusion with VV/VH)


## 5. Process Data for All Periods (WITH VALIDATION)

In [5]:
def process_single_period(period_info, geometry, scale=10):
    """
    Process S1 and S2 data for a single 12-day period
    
    ‚úÖ FIXED VERSION with validation:
    - Ensures S2ndvi band contains actual NDVI (not VV/VH backscatter)
    - Adds diagnostic checks
    - Validates band names and ranges
    """
    start_date = period_info['start_str']
    end_date = period_info['end_str']
    period_num = period_info['period']
    
    print(f"Processing Period {period_num}: {start_date} to {end_date}", end="")
    
    try:
        # Load Sentinel-1 data
        s1_collection = load_sentinel1_data(geometry, start_date, end_date)
        s1_count = s1_collection.size().getInfo()
        
        # Load Sentinel-2 data (Level-2A SR with cloud masking)
        s2_collection = load_sentinel2_data(geometry, start_date, end_date, MAX_CLOUD_COVER)
        s2_count = s2_collection.size().getInfo()
        
        print(f" ‚Üí S1: {s1_count}, S2: {s2_count}", end="")
        
        # Create composites
        if s1_count > 0:
            s1_composite = create_composite(s1_collection, 'median')
        else:
            # Create empty image with correct bands
            s1_composite = ee.Image.constant([0, 0]).rename(['VV', 'VH']).updateMask(ee.Image.constant(0))
            
        if s2_count > 0:
            s2_composite = create_composite(s2_collection, 'median')
            # s2_composite already has only NDVI band from load_sentinel2_data
        else:
            # Create empty NDVI image
            s2_composite = ee.Image.constant(0).rename('NDVI').updateMask(ee.Image.constant(0))
        
        # ‚úÖ FIX: Explicitly rename S2 band to S2ndvi to avoid confusion
        s2_ndvi_band = s2_composite.select(['NDVI']).rename('S2ndvi')
        
        # Combine S1 and S2 data
        # Order: VV, VH, S2ndvi
        combined_image = s1_composite.select(['VV', 'VH']).addBands(s2_ndvi_band)
        
        # ‚úÖ VALIDATION: Check band names
        band_names = combined_image.bandNames().getInfo()
        expected_bands = ['VV', 'VH', 'S2ndvi']
        
        if band_names != expected_bands:
            print(f" ‚ö†Ô∏è WARNING: Band names mismatch!")
            print(f"      Expected: {expected_bands}")
            print(f"      Got: {band_names}")
        else:
            print(f" ‚úì", end="")
        
        # Add metadata
        combined_image = combined_image.set({
            'period': period_num,
            'start_date': start_date,
            'end_date': end_date,
            'center_date': period_info['center_date'].strftime('%Y-%m-%d'),
            'doy_center': period_info['doy_center'],
            's1_count': s1_count,
            's2_count': s2_count,
            'data_version': 'FIXED_v2',  # Mark as fixed version
            'ndvi_source': 'S2_SR_HARMONIZED',  # Document NDVI source
            'cloud_masked': True  # Document cloud masking applied
        })
        
        print("")  # New line
        return combined_image
        
    except Exception as e:
        print(f"  ‚ùå Error: {e}")
        return None

# Process all periods
print("="*70)
print("üöÄ STARTING DATA PROCESSING WITH VALIDATION")
print("="*70)
print("")

processed_images = []
successful_periods = []

for i, period in enumerate(periods):
    result = process_single_period(period, study_area, SCALE)
    if result is not None:
        processed_images.append(result)
        successful_periods.append(period)
    
    # Progress update every 10 periods
    if (i + 1) % 10 == 0:
        print(f"\n--- Completed {i + 1}/{len(periods)} periods ---\n")

print("\n" + "="*70)
print(f"‚úÖ Successfully processed {len(processed_images)} out of {len(periods)} periods")
print("="*70)

# Create ImageCollection from processed images
if processed_images:
    time_series_collection = ee.ImageCollection(processed_images)
    print(f"\n‚úÖ Created time series collection with {time_series_collection.size().getInfo()} images")
    print(f"   All images contain bands: ['VV', 'VH', 'S2ndvi']")
    print(f"   NDVI source: Sentinel-2 Level-2A Surface Reflectance (cloud-masked)")
else:
    print("\n‚ùå No images were successfully processed!")

üöÄ STARTING DATA PROCESSING WITH VALIDATION

Processing Period 1: 2023-11-01 to 2023-11-12 ‚Üí S1: 2, S2: 4 ‚úì
Processing Period 2: 2023-11-13 to 2023-11-24 ‚Üí S1: 2, S2: 4 ‚úì
Processing Period 3: 2023-11-25 to 2023-12-06 ‚Üí S1: 2, S2: 0 ‚úì
Processing Period 4: 2023-12-07 to 2023-12-18 ‚Üí S1: 2, S2: 4 ‚úì
Processing Period 5: 2023-12-19 to 2023-12-30 ‚Üí S1: 2, S2: 6 ‚úì
Processing Period 6: 2023-12-31 to 2024-01-11 ‚Üí S1: 2, S2: 0 ‚úì
Processing Period 7: 2024-01-12 to 2024-01-23 ‚Üí S1: 2, S2: 2 ‚úì
Processing Period 8: 2024-01-24 to 2024-02-04 ‚Üí S1: 2, S2: 3 ‚úì
Processing Period 9: 2024-02-05 to 2024-02-16 ‚Üí S1: 2, S2: 2 ‚úì
Processing Period 10: 2024-02-17 to 2024-02-28 ‚Üí S1: 2, S2: 3 ‚úì

--- Completed 10/62 periods ---

Processing Period 11: 2024-02-29 to 2024-03-11 ‚Üí S1: 2, S2: 0 ‚úì
Processing Period 12: 2024-03-12 to 2024-03-23 ‚Üí S1: 2, S2: 0 ‚úì
Processing Period 13: 2024-03-24 to 2024-04-04 ‚Üí S1: 2, S2: 2 ‚úì
Processing Period 14: 2024-04-05 to 2024-04-

## 6. Validate NDVI Values (DIAGNOSTIC CHECK)

**üîç This cell validates that the S2ndvi band contains actual NDVI values, not backscatter:**

In [6]:
print("="*70)
print("üîç VALIDATING NDVI VALUES (Diagnostic Check)")
print("="*70)

# Sample a few periods to check NDVI ranges
test_periods = [0, len(successful_periods)//2, len(successful_periods)-1]  # First, middle, last

for idx in test_periods:
    if idx >= len(processed_images):
        continue
        
    test_image = processed_images[idx]
    period_num = successful_periods[idx]['period']
    
    print(f"\nPeriod {period_num}: {successful_periods[idx]['start_str']} to {successful_periods[idx]['end_str']}")
    
    try:
        # Get statistics for each band
        stats = test_image.reduceRegion(
            reducer=ee.Reducer.minMax(),
            geometry=study_area,
            scale=100,  # Use coarser scale for faster computation
            maxPixels=1e8,
            bestEffort=True
        ).getInfo()
        
        # Check VV band (should be backscatter: -50 to +10 dB)
        vv_min = stats.get('VV_min', None)
        vv_max = stats.get('VV_max', None)
        
        # Check VH band (should be backscatter: -50 to +10 dB)
        vh_min = stats.get('VH_min', None)
        vh_max = stats.get('VH_max', None)
        
        # Check S2ndvi band (should be NDVI: -1 to 1)
        ndvi_min = stats.get('S2ndvi_min', None)
        ndvi_max = stats.get('S2ndvi_max', None)
        
        print(f"  VV range:     [{vv_min:.2f}, {vv_max:.2f}] dB")
        print(f"  VH range:     [{vh_min:.2f}, {vh_max:.2f}] dB")
        print(f"  S2ndvi range: [{ndvi_min:.4f}, {ndvi_max:.4f}]")
        
        # Validation
        vv_ok = (-60 < vv_min < 10) and (-60 < vv_max < 10)
        vh_ok = (-60 < vh_min < 10) and (-60 < vh_max < 10)
        ndvi_ok = (-1 <= ndvi_min <= 1) and (-1 <= ndvi_max <= 1)
        
        if vv_ok and vh_ok and ndvi_ok:
            print(f"  ‚úÖ All bands have CORRECT ranges!")
        else:
            if not vv_ok:
                print(f"  ‚ö†Ô∏è VV range unusual (expected -50 to +10 dB)")
            if not vh_ok:
                print(f"  ‚ö†Ô∏è VH range unusual (expected -50 to +10 dB)")
            if not ndvi_ok:
                print(f"  ‚ùå NDVI range INVALID (expected -1 to 1)!")
                print(f"      This suggests S2ndvi band contains backscatter, not NDVI")
        
    except Exception as e:
        print(f"  ‚ö†Ô∏è Could not validate (might be no data): {e}")

print("\n" + "="*70)
print("‚úÖ Validation complete!")
print("="*70)

üîç VALIDATING NDVI VALUES (Diagnostic Check)

Period 1: 2023-11-01 to 2023-11-12
  VV range:     [-22.66, 7.69] dB
  VH range:     [-29.60, -0.74] dB
  S2ndvi range: [-0.3955, 0.8073]
  ‚úÖ All bands have CORRECT ranges!

Period 32: 2024-11-07 to 2024-11-18
  VV range:     [-23.49, 7.17] dB
  VH range:     [-31.87, 0.74] dB
  S2ndvi range: [-0.5631, 0.8857]
  ‚úÖ All bands have CORRECT ranges!

Period 62: 2025-11-02 to 2025-11-07
  VV range:     [-24.48, 6.50] dB
  VH range:     [-33.81, -0.46] dB
  S2ndvi range: [-1.0000, 1.0000]
  ‚úÖ All bands have CORRECT ranges!

‚úÖ Validation complete!


## 7. Export Data to GEE Assets

In [7]:
def export_timeseries_to_asset(collection, geometry, scale, asset_id):
    """
    Export the time series collection to GEE Assets as ImageCollection
    """
    tasks = []
    image_list = collection.toList(collection.size())
    
    for i in range(len(successful_periods)):
        image = ee.Image(image_list.get(i))
        period_num = successful_periods[i]['period']
        period_info = successful_periods[i]
        
        # Add comprehensive metadata
        image_with_metadata = image.set({
            'period': period_num,
            'start_date': period_info['start_str'],
            'end_date': period_info['end_str'],
            'center_date': period_info['center_date'].strftime('%Y-%m-%d'),
            'doy_center': period_info['doy_center'],
            'year': period_info['year'],
            'month': period_info['month'],
            'system:time_start': ee.Date(period_info['start_str']).millis(),
            'system:time_end': ee.Date(period_info['end_str']).millis(),
            'data_version': 'FIXED_v2',
            'ndvi_source': 'S2_SR_HARMONIZED',
            'cloud_masked': True
        })
        
        # Create asset ID for this period
        period_asset_id = f'{asset_id}_Period_{period_num:02d}'
        
        task = ee.batch.Export.image.toAsset(
            image=image_with_metadata,
            description=f'AssetFixed_Period_{period_num:02d}',
            assetId=period_asset_id,
            scale=scale,
            region=geometry,
            maxPixels=1e13,
            crs='EPSG:4326',
            pyramidingPolicy={'.default': 'mean'}
        )
        
        tasks.append(task)
    
    return tasks

# Export configuration
ASSET_BASE_PATH = 'projects/ee-geodeticengineeringundip/assets/FuseTS2'

print("\n" + "="*70)
print("üì§ EXPORT CONFIGURATION (FIXED DATA)")
print("="*70)
print(f"   Asset path: {ASSET_BASE_PATH}")
print(f"   Number of periods: {len(processed_images)}")
print(f"   Data version: FIXED_v2 (Level-2A SR, cloud-masked)")

if time_series_collection:
    print("\nüöÄ Preparing asset export...")
    
    asset_id = f'{ASSET_BASE_PATH}/S1_S2_Nov2023_Oct2025_FIXED'
    
    export_tasks = export_timeseries_to_asset(
        time_series_collection,
        study_area,
        SCALE,
        asset_id
    )
    
    print(f"\nüìã Created {len(export_tasks)} export tasks")
    print(f"\nüí° To start exports, uncomment the code below:")
    print(f"\n# Start first 10 tasks:")
    print(f"# for i, task in enumerate(export_tasks[:10]):")
    print(f"#     task.start()")
    print(f"#     print(f'Started Period {{i+1:02d}}')")
    
    print(f"\nüìä After exports complete, load data with:")
    print(f"   var collection = ee.ImageCollection('{asset_id}_Period_*');")
    
else:
    print("\n‚ùå No data to export!")

print("\n" + "="*70)


üì§ EXPORT CONFIGURATION (FIXED DATA)
   Asset path: projects/ee-geodeticengineeringundip/assets/FuseTS2
   Number of periods: 62
   Data version: FIXED_v2 (Level-2A SR, cloud-masked)

üöÄ Preparing asset export...

üìã Created 62 export tasks

üí° To start exports, uncomment the code below:

# Start first 10 tasks:
# for i, task in enumerate(export_tasks[:10]):
#     task.start()
#     print(f'Started Period {i+1:02d}')

üìä After exports complete, load data with:
   var collection = ee.ImageCollection('projects/ee-geodeticengineeringundip/assets/FuseTS2/S1_S2_Nov2023_Oct2025_FIXED_Period_*');



In [8]:
# Start first 10 tasks:
for i, task in enumerate(export_tasks[:10]):
    task.start()
    print(f'Started Period {i+1:02d}')

Started Period 01
Started Period 02
Started Period 03
Started Period 04
Started Period 05
Started Period 06
Started Period 07
Started Period 08
Started Period 09
Started Period 10


## Summary

### ‚úÖ FIXES APPLIED IN THIS NOTEBOOK:

1. **Level-2A Surface Reflectance**: Uses `COPERNICUS/S2_SR_HARMONIZED` instead of Level-1C TOA
   - More accurate (atmospherically corrected)
   - Better absolute NDVI values

2. **Cloud Masking**: Applied using QA60 band
   - Removes cloudy pixels
   - Better NDVI quality

3. **NDVI Validation**: Clamped to [-1, 1] range
   - Prevents out-of-range values
   - Catches errors early

4. **Band Selection**: Explicitly selects and renames NDVI band
   - No ambiguity with VV/VH bands
   - Clear band naming: ['VV', 'VH', 'S2ndvi']

5. **Diagnostic Checks**: Validates NDVI values before export
   - Confirms S2ndvi contains NDVI (-1 to 1)
   - Not backscatter (-50 to +10 dB)

### Expected NDVI Range After Fix:
- **S2ndvi band**: -1.0 to 1.0 ‚úÖ (CORRECT)
- **VV band**: -50 to +10 dB (backscatter)
- **VH band**: -50 to +10 dB (backscatter)

### Next Steps:
1. Run validation cell (Cell 6) to confirm NDVI ranges are correct
2. Export to GEE Assets (Cell 7)
3. Re-run improved DL fusion training with corrected NDVI data
4. Expected R¬≤ should improve from -0.8 to 0.55-0.70

### Trade-offs:
- **Coverage**: May be slightly lower than Level-1C (due to cloud masking)
- **Quality**: Much better NDVI quality (atmospherically corrected, cloud-free)
- **For S1‚ÜíNDVI fusion**: Quality > Coverage, so this is the right trade-off