# Google Earth Engine Data Preparation for FuseTS

This notebook extracts Sentinel-1 and Sentinel-2 data from Google Earth Engine and prepares it for FuseTS MOGPR processing.

## Temporal Compositing Strategy
- **Total periods**: 31 periods from Nov 2024 - Oct 2025
- **Period length**: 12 days each
- **Start date**: November 1, 2024
- **End date**: October 31, 2025
- **Period 1**: Nov 1-12, 2024
- **Period 2**: Nov 13-24, 2024  
- **Period 3**: Nov 25 - Dec 6, 2024
- **... and so on**

## Indonesian Agricultural Calendar Coverage
This date range perfectly captures:
- **First planting season**: Nov 2024 - Mar 2025 (crosses year boundary)
- **Second planting season**: Apr - Jun 2025
- **Third planting season**: Jul - Sep 2025 (optional)
- **Full cycle**: Complete agricultural year

## Output Format
Data will be exported in FuseTS-compatible xarray format with proper band naming:
- S1: `VV`, `VH` bands
- S2: `S2ndvi` band
- Dimensions: `(time, y, x)` with `t` coordinate name

## 1. Setup and Authentication

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!pip install rasterio earthengine-api geemap pandas numpy xarray matplotlib geopandas shapely

In [None]:
import ee
import geemap

In [None]:
ee.Authenticate()

In [None]:
#ee.Authenticate()
ee.Initialize(project='ee-geodeticengineeringundip')


In [None]:
import ee
import geemap
import pandas as pd
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import os
import warnings
warnings.filterwarnings('ignore')

# Additional imports for mask processing
import rasterio
from rasterio.features import shapes
import geopandas as gpd
from shapely.geometry import shape, mapping
from shapely.ops import unary_union

# Initialize Earth Engine with authentication
print("üîê Authenticating with Google Earth Engine...")

try:
    # First time setup: authenticate
    ee.Authenticate()
    print("‚úÖ Authentication successful!")
except Exception as e:
    print(f"Authentication note: {e}")
    print("If already authenticated, continuing...")

# Initialize with project
try:
    ee.Initialize(project='ee-geodeticengineeringundip')
    print("‚úÖ Earth Engine initialized successfully!")
    print(f"   Project: ee-geodeticengineeringundip")
except Exception as e:
    print(f"‚ùå Error initializing Earth Engine: {e}")
    print("Please ensure:")
    print("  1. You have run ee.Authenticate() successfully")
    print("  2. You have access to project 'ee-geodeticengineeringundip'")
    raise

print(f"\nüì¶ Package versions:")
print(f"   Earth Engine API: {ee.__version__}")
print(f"   geemap: {geemap.__version__}")
print(f"   rasterio: {rasterio.__version__}")


## 2. Define Study Area and Parameters

In [None]:
# ============================================================================
# STUDY AREA SELECTION
# ============================================================================

# Choose your study area:
STUDY_AREA_TYPE = 'demak'  # Options: 'java_island' or 'demak'

print("="*70)
print("üìç STUDY AREA CONFIGURATION")
print("="*70)

if STUDY_AREA_TYPE == 'demak':
    # ========================================================================
    # OPTION 1: KABUPATEN DEMAK (Small area - faster processing)
    # ========================================================================
    print("\nüéØ Using Kabupaten Demak, Central Java")
    
    # Demak administrative boundary (approximate coordinates)
    # You can adjust these based on your specific area of interest
    demak_bounds = {
        'west': 110.35,   # Western boundary
        'east': 110.75,   # Eastern boundary  
        'south': -7.05,   # Southern boundary
        'north': -6.75    # Northern boundary
    }
    
    # Create rectangle geometry for Demak
    study_area = ee.Geometry.Rectangle([
        demak_bounds['west'], 
        demak_bounds['south'],
        demak_bounds['east'], 
        demak_bounds['north']
    ])
    
    # Alternative: Use GEE administrative boundaries (more accurate)
    # Uncomment these lines to use official boundaries:
    # admin_boundaries = ee.FeatureCollection("FAO/GAUL/2015/level2")
    # demak = admin_boundaries.filter(ee.Filter.eq('ADM2_NAME', 'Demak'))
    # study_area = demak.geometry()
    
    print(f"   Type: Administrative boundary (regency/kabupaten)")
    print(f"   Location: Central Java Province")
    print(f"   Approximate area: ~900 km¬≤")
    print(f"   Bounds: {demak_bounds}")
    print(f"   ‚úÖ Much smaller than Java Island ‚Üí faster export!")
    
elif STUDY_AREA_TYPE == 'java_island':
    # ========================================================================
    # OPTION 2: FULL JAVA ISLAND (Large area - requires more storage)
    # ========================================================================
    print("\nüèùÔ∏è  Using Full Java Island")
    
    import rasterio
    from rasterio.features import shapes
    import geopandas as gpd
    from shapely.geometry import shape, mapping
    
    # Path to Java Island mask
    MASK_FILE = 'java_island_mask.tif'
    
    print(f"   Loading Java Island mask from: {MASK_FILE}")
    
    # Read the mask file and extract geometry
    with rasterio.open(MASK_FILE) as src:
        # Read the mask (assuming mask values > 0 indicate valid areas)
        mask_data = src.read(1)
        mask_transform = src.transform
        mask_crs = src.crs
        
        # Get bounds
        bounds = src.bounds
        print(f"   Mask bounds: {bounds}")
        print(f"   Mask CRS: {mask_crs}")
        print(f"   Mask shape: {mask_data.shape}")
        
        # Extract geometry from mask (vectorize the raster mask)
        mask_geoms = []
        for geom, val in shapes(mask_data, mask=mask_data > 0, transform=mask_transform):
            mask_geoms.append(shape(geom))
    
    # Create a unified geometry for Java Island
    if len(mask_geoms) > 0:
        from shapely.ops import unary_union
        java_geometry = unary_union(mask_geoms)
        
        # Add 5 km buffer to the Java Island geometry
        BUFFER_DISTANCE_KM = 5
        BUFFER_DISTANCE_DEGREES = BUFFER_DISTANCE_KM / 111.0  # Approximate conversion (1 degree ‚âà 111 km)
        
        print(f"   Applying {BUFFER_DISTANCE_KM} km buffer to Java Island mask...")
        java_geometry_buffered = java_geometry.buffer(BUFFER_DISTANCE_DEGREES)
        
        # Convert to GeoJSON format for Earth Engine
        java_geojson = mapping(java_geometry_buffered)
        
        # Upload to Earth Engine
        study_area = ee.Geometry(java_geojson)
        
        print(f"   ‚úÖ Java Island mask loaded successfully!")
        print(f"   Number of geometries merged: {len(mask_geoms)}")
        print(f"   Buffer applied: {BUFFER_DISTANCE_KM} km")
        print(f"   Approximate area: ~150,000 km¬≤")
    else:
        print("   ‚ö†Ô∏è  No valid mask areas found, falling back to bounding box")
        study_area = ee.Geometry.Rectangle([bounds.left, bounds.bottom, bounds.right, bounds.top])

else:
    raise ValueError(f"Invalid STUDY_AREA_TYPE: {STUDY_AREA_TYPE}. Use 'demak' or 'java_island'")

# Processing parameters
START_DATE = '2024-11-01'  # November 1, 2024
END_DATE = '2025-10-31'    # October 31, 2025
SCALE = 50  # meters per pixel (50m resolution for both S1 and S2)
CRS = 'EPSG:4326'  # WGS84 coordinate system
MAX_CLOUD_COVER = 20  # Maximum cloud cover percentage for S2

# Output directory
OUTPUT_DIR = 'gee_fusets_data'
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Display final configuration
print(f"\n{'='*70}")
print("üìã FINAL CONFIGURATION")
print(f"{'='*70}")
print(f"   Study Area: {STUDY_AREA_TYPE.upper()}")
print(f"   Bounds: {study_area.bounds().getInfo()}")
print(f"   Area size: {study_area.area().getInfo() / 1e6:.1f} km¬≤")
print(f"   Processing period: {START_DATE} to {END_DATE}")
print(f"   Temporal resolution: 12-day composites (31 periods)")
print(f"   Spatial resolution: {SCALE}m")
print(f"   Coordinate system: {CRS}")
print(f"   Max cloud cover: {MAX_CLOUD_COVER}%")
print(f"   Output directory: {OUTPUT_DIR}")

# Estimate data size
area_km2 = study_area.area().getInfo() / 1e6
pixels_per_period = (area_km2 * 1e6) / (SCALE * SCALE)  # Total pixels
bands = 3  # VV, VH, S2ndvi
bytes_per_pixel = 4  # Float32
total_size_gb = (pixels_per_period * bands * bytes_per_pixel * 31) / 1e9

print(f"\nüíæ Estimated data size:")
print(f"   Per period: ~{total_size_gb/31:.2f} GB")
print(f"   Total (31 periods): ~{total_size_gb:.1f} GB")

if total_size_gb > 250:
    print(f"\n   ‚ö†Ô∏è  WARNING: Exceeds GEE Asset quota (250GB)")
    print(f"   ‚Üí Use Google Drive export instead")
elif total_size_gb > 100:
    print(f"\n   ‚ö° Large dataset - GEE Assets recommended")
else:
    print(f"\n   ‚úÖ Manageable size - Google Drive or Assets both work")

print(f"{'='*70}")


## 2b. Load Paddy Shapefile Mask (Klambu-Glapan)

In [None]:
# ============================================================================
# LOAD PADDY SHAPEFILE AND CREATE STUDY AREA FROM IT
# ============================================================================

print("="*70)
print("üìç LOADING PADDY SHAPEFILE MASK")
print("="*70)

# Load the shapefile
shapefile_path = 'data/klambu-glapan.shp'

try:
    paddy_gdf = gpd.read_file(shapefile_path)
    
    print(f"\n‚úÖ Shapefile loaded successfully!")
    print(f"   File: {shapefile_path}")
    print(f"   Number of features: {len(paddy_gdf)}")
    print(f"   CRS: {paddy_gdf.crs}")
    print(f"   Total area: {paddy_gdf.area.sum() / 1e6:.2f} km¬≤")
    
    # Get bounds in original CRS
    minx, miny, maxx, maxy = paddy_gdf.total_bounds
    print(f"\n   Original CRS Bounds:")
    print(f"     West (MinX):  {minx:.2f}")
    print(f"     South (MinY): {miny:.2f}")
    print(f"     East (MaxX):  {maxx:.2f}")
    print(f"     North (MaxY): {maxy:.2f}")
    print(f"     Width:  {(maxx - minx):.2f} m ({(maxx - minx)/1000:.2f} km)")
    print(f"     Height: {(maxy - miny):.2f} m ({(maxy - miny)/1000:.2f} km)")
    
    # Add buffer around shapefile (500m buffer)
    buffer_m = 500
    print(f"\n   Applying {buffer_m}m buffer to paddy areas...")
    
    # Buffer in the original CRS (should be meters)
    paddy_buffered = paddy_gdf.copy()
    paddy_buffered['geometry'] = paddy_gdf.buffer(buffer_m)
    
    # Convert to WGS84 (EPSG:4326) for GEE
    paddy_wgs84 = paddy_buffered.to_crs("EPSG:4326")
    
    # Get bounds in WGS84
    west, south, east, north = paddy_wgs84.total_bounds
    
    print(f"\n   WGS84 Bounds (for GEE):")
    print(f"     West:  {west:.6f}¬∞")
    print(f"     South: {south:.6f}¬∞")
    print(f"     East:  {east:.6f}¬∞")
    print(f"     North: {north:.6f}¬∞")
    
    # Create GEE geometry from the buffered shapefile
    # Convert to GeoJSON and upload to Earth Engine
    from shapely.ops import unary_union
    
    # Merge all polygons into a single geometry
    merged_geometry = unary_union(paddy_wgs84.geometry)
    
    # Convert to GeoJSON
    paddy_geojson = mapping(merged_geometry)
    
    # Upload to Earth Engine
    study_area = ee.Geometry(paddy_geojson)
    
    print(f"\n‚úÖ Study area created from paddy shapefile!")
    print(f"   Type: Paddy field boundaries with {buffer_m}m buffer")
    print(f"   Location: Klambu-Glapan, Demak, Central Java")
    print(f"   Area (GEE): {study_area.area().getInfo() / 1e6:.2f} km¬≤")
    
    # Visualize the shapefile and buffer
    fig, axes = plt.subplots(1, 2, figsize=(16, 8))
    
    # Plot 1: Original shapefile
    paddy_gdf.plot(ax=axes[0], facecolor='lightgreen', edgecolor='darkgreen', linewidth=1.5, alpha=0.7)
    axes[0].set_title('Original Paddy Shapefile\n(Klambu-Glapan)', fontsize=12, fontweight='bold')
    axes[0].set_xlabel('Easting (m)')
    axes[0].set_ylabel('Northing (m)')
    axes[0].grid(True, alpha=0.3)
    
    # Plot 2: With buffer
    paddy_buffered.plot(ax=axes[1], facecolor='yellow', edgecolor='orange', linewidth=1.5, alpha=0.5, label=f'{buffer_m}m buffer')
    paddy_gdf.plot(ax=axes[1], facecolor='lightgreen', edgecolor='darkgreen', linewidth=1.5, alpha=0.7, label='Paddy areas')
    axes[1].set_title(f'Paddy Areas with {buffer_m}m Buffer\n(Study Area for GEE Download)', fontsize=12, fontweight='bold')
    axes[1].set_xlabel('Easting (m)')
    axes[1].set_ylabel('Northing (m)')
    axes[1].legend(loc='best')
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('paddy_shapefile_study_area.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print(f"\n   Visualization saved: paddy_shapefile_study_area.png")
    
    # Override the study area type
    STUDY_AREA_TYPE = 'paddy_shapefile'
    
except FileNotFoundError:
    print(f"\n‚ùå Shapefile not found: {shapefile_path}")
    print(f"   Please ensure the shapefile exists in the data/ folder")
    print(f"   Falling back to Demak bounding box...")
    
    # Fall back to Demak bounds if shapefile not found
    STUDY_AREA_TYPE = 'demak'
    demak_bounds = {
        'west': 110.35,
        'east': 110.75,
        'south': -7.05,
        'north': -6.75
    }
    study_area = ee.Geometry.Rectangle([
        demak_bounds['west'], 
        demak_bounds['south'],
        demak_bounds['east'], 
        demak_bounds['north']
    ])
    west, south, east, north = demak_bounds['west'], demak_bounds['south'], demak_bounds['east'], demak_bounds['north']

# Processing parameters
START_DATE = '2023-11-01'  # November 1, 2023
END_DATE = '2025-11-07'    # November 7, 2025
SCALE = 10  # meters per pixel (10m resolution - native S2 resolution)
CRS = 'EPSG:4326'  # WGS84 coordinate system
MAX_CLOUD_COVER = 80  # Maximum cloud cover percentage for S2 (relaxed for better coverage)

# Output directory
OUTPUT_DIR = 'gee_fusets_data'
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Display final configuration
print(f"\n{'='*70}")
print("üìã FINAL CONFIGURATION")
print(f"{'='*70}")
print(f"   Study Area: {STUDY_AREA_TYPE.upper()}")
print(f"   Bounds: W={west:.6f}¬∞, S={south:.6f}¬∞, E={east:.6f}¬∞, N={north:.6f}¬∞")
print(f"   Area size: {study_area.area().getInfo() / 1e6:.2f} km¬≤")
print(f"   Processing period: {START_DATE} to {END_DATE}")
print(f"   Temporal resolution: 12-day composites")
print(f"   Spatial resolution: {SCALE}m")
print(f"   Coordinate system: {CRS}")
print(f"   Max cloud cover: {MAX_CLOUD_COVER}%")
print(f"   Output directory: {OUTPUT_DIR}")

# Estimate data size
area_km2 = study_area.area().getInfo() / 1e6
pixels_per_period = (area_km2 * 1e6) / (SCALE * SCALE)  # Total pixels
bands = 3  # VV, VH, S2ndvi
bytes_per_pixel = 4  # Float32

# Calculate number of periods from Nov 2023 to Nov 2025
from datetime import datetime
start = datetime.strptime(START_DATE, '%Y-%m-%d')
end = datetime.strptime(END_DATE, '%Y-%m-%d')
days = (end - start).days
periods_count = int(np.ceil(days / 12))

total_size_gb = (pixels_per_period * bands * bytes_per_pixel * periods_count) / 1e9

print(f"\nüíæ Estimated data size:")
print(f"   Number of periods: {periods_count}")
print(f"   Per period: ~{total_size_gb/periods_count:.2f} GB")
print(f"   Total ({periods_count} periods): ~{total_size_gb:.1f} GB")

if total_size_gb > 250:
    print(f"\n   ‚ö†Ô∏è  WARNING: Exceeds GEE Asset quota (250GB)")
    print(f"   ‚Üí Use Google Drive export instead")
elif total_size_gb > 100:
    print(f"\n   ‚ö° Large dataset - GEE Assets recommended")
else:
    print(f"\n   ‚úÖ Manageable size - Google Drive or Assets both work")

print(f"{'='*70}")

## 3. Generate 12-Day Composite Periods

In [None]:
def generate_12day_periods(start_date_str, end_date_str):
    """
    Generate periods of 12 days each from start date to end date
    
    Parameters:
    -----------
    start_date_str : str
        Start date in 'YYYY-MM-DD' format (e.g., '2023-11-01')
    end_date_str : str
        End date in 'YYYY-MM-DD' format (e.g., '2025-11-07')
    """
    start_date = datetime.strptime(start_date_str, '%Y-%m-%d')
    end_date = datetime.strptime(end_date_str, '%Y-%m-%d')
    
    periods = []
    period_num = 1
    current_start = start_date
    
    while current_start <= end_date:
        period_end = current_start + timedelta(days=11)  # 12 days inclusive
        
        # Ensure we don't go beyond the end date
        if period_end > end_date:
            period_end = end_date
            
        periods.append({
            'period': period_num,
            'start_date': current_start,
            'end_date': period_end,
            'start_str': current_start.strftime('%Y-%m-%d'),
            'end_str': period_end.strftime('%Y-%m-%d'),
            'center_date': current_start + timedelta(days=6),  # Middle of period
            'doy_center': (current_start + timedelta(days=6)).timetuple().tm_yday,
            'year': current_start.year,
            'month': current_start.month
        })
        
        if period_end >= end_date:
            break
        
        current_start = period_end + timedelta(days=1)  # Start next period
        period_num += 1
            
    return periods

# Generate periods from Nov 2023 to Nov 2025
periods = generate_12day_periods(START_DATE, END_DATE)

print(f"Generated {len(periods)} periods from {START_DATE} to {END_DATE}:")
print("\nFirst 5 periods:")
for i, period in enumerate(periods[:5]):
    print(f"Period {period['period']:2d}: {period['start_str']} to {period['end_str']} (center: DOY {period['doy_center']:3d}, {period['year']})")

print("\nPeriods covering 2-year span (Nov 2023 - Nov 2025):")
print(f"  ‚Ä¢ 2023 periods: {len([p for p in periods if p['year'] == 2023])}")
print(f"  ‚Ä¢ 2024 periods: {len([p for p in periods if p['year'] == 2024])}")
print(f"  ‚Ä¢ 2025 periods: {len([p for p in periods if p['year'] == 2025])}")

print("\nYear boundary crossings:")
year_boundary_periods = [p for p in periods if p['start_date'].year != p['end_date'].year]
for period in year_boundary_periods:
    print(f"Period {period['period']:2d}: {period['start_str']} to {period['end_str']} ‚Üê CROSSES YEAR BOUNDARY")

print("\nLast 5 periods:")
for i, period in enumerate(periods[-5:]):
    print(f"Period {period['period']:2d}: {period['start_str']} to {period['end_str']} (center: DOY {period['doy_center']:3d}, {period['year']})")

# Create a DataFrame for easier handling
periods_df = pd.DataFrame(periods)
print(f"\nTotal temporal coverage: {periods[0]['start_str']} to {periods[-1]['end_str']}")
print(f"Covers {len(periods)} 12-day periods over 2 years")
print(f"\nIndonesian agricultural seasons covered:")
print(f"  ‚Ä¢ 2023-2024 cycle: Nov 2023 - Oct 2024 (full year)")
print(f"  ‚Ä¢ 2024-2025 cycle: Nov 2024 - Nov 2025 (full year + 1 week)")
print(f"  ‚Ä¢ Total: ~6 growing seasons (3 per year √ó 2 years)")

## 4. Define Data Loading Functions

**üìå Configuration: Using Sentinel-2 Level-1C (TOA) without cloud masking**
- **Coverage**: 99.9% (maximum)
- **Trade-off**: TOA reflectance (not atmospherically corrected)
- **Rationale**: Best for tropical rainy season (Indonesia Nov-Oct)
- **Suitability**: Excellent for MOGPR fusion temporal analysis

In [None]:
def load_sentinel1_data(geometry, start_date, end_date):
    """
    Load Sentinel-1 GRD data for a specific time period
    """
    s1_collection = (ee.ImageCollection('COPERNICUS/S1_GRD')
                    .filterBounds(geometry)
                    .filterDate(start_date, end_date)
                    .filter(ee.Filter.eq('instrumentMode', 'IW'))
                    .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VV'))
                    .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VH'))
                    .select(['VV', 'VH']))
    
    return s1_collection

def load_sentinel2_data(geometry, start_date, end_date, max_cloud_cover=60):
    """
    Load Sentinel-2 Level-1C (TOA) data without cloud masking
    
    ‚ö†Ô∏è  IMPORTANT TRADE-OFFS:
    ‚úÖ Pros:
       ‚Ä¢ Maximum coverage (99.9%)
       ‚Ä¢ No data loss from cloud masking
       ‚Ä¢ Works well in tropical rainy season
    
    ‚ùå Cons:
       ‚Ä¢ NOT atmospherically corrected (TOA reflectance)
       ‚Ä¢ May include some cloudy pixels
       ‚Ä¢ NDVI values affected by atmosphere
       ‚Ä¢ Suitable for temporal analysis but absolute values less accurate
    
    Collection: COPERNICUS/S2 (Level-1C TOA, not Level-2A SR)
    """
    def calculate_ndvi_toa(image):
        # B8 = NIR, B4 = Red (same as Level-2A)
        ndvi = image.normalizedDifference(['B8', 'B4']).rename('NDVI')
        return image.addBands(ndvi)
    
    # Load Level-1C TOA data WITHOUT cloud masking
    s2_collection = (ee.ImageCollection('COPERNICUS/S2')  # Note: S2, not S2_SR
                    .filterBounds(geometry)
                    .filterDate(start_date, end_date)
                    .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', max_cloud_cover))
                    .map(calculate_ndvi_toa)
                    .select(['NDVI']))
    
    return s2_collection

def create_composite(collection, method='median'):
    """
    Create a composite from an image collection
    """
    if method == 'median':
        return collection.median()
    elif method == 'mean':
        return collection.mean()
    elif method == 'max':
        return collection.max()
    else:
        return collection.median()

print("Data loading functions defined successfully!")

## 5. Process Data for All Periods

In [None]:
def process_single_period(period_info, geometry, scale=10):
    """
    Process S1 and S2 data for a single 12-day period
    """
    start_date = period_info['start_str']
    end_date = period_info['end_str']
    period_num = period_info['period']
    
    print(f"Processing Period {period_num}: {start_date} to {end_date}")
    
    try:
        # Load Sentinel-1 data
        s1_collection = load_sentinel1_data(geometry, start_date, end_date)
        s1_count = s1_collection.size().getInfo()
        
        # Load Sentinel-2 data
        s2_collection = load_sentinel2_data(geometry, start_date, end_date, MAX_CLOUD_COVER)
        s2_count = s2_collection.size().getInfo()
        
        print(f"  Found {s1_count} S1 images, {s2_count} S2 images")
        
        # Create composites
        if s1_count > 0:
            s1_composite = create_composite(s1_collection, 'median')
        else:
            # Create empty image with correct bands
            s1_composite = ee.Image.constant([0, 0]).rename(['VV', 'VH']).updateMask(ee.Image.constant(0))
            
        if s2_count > 0:
            s2_composite = create_composite(s2_collection, 'median')
        else:
            # Create empty NDVI image
            s2_composite = ee.Image.constant(0).rename('NDVI').updateMask(ee.Image.constant(0))
        
        # Combine S1 and S2 data
        combined_image = s1_composite.addBands(s2_composite.rename('S2ndvi'))
        
        # Add metadata
        combined_image = combined_image.set({
            'period': period_num,
            'start_date': start_date,
            'end_date': end_date,
            'center_date': period_info['center_date'].strftime('%Y-%m-%d'),
            'doy_center': period_info['doy_center'],
            's1_count': s1_count,
            's2_count': s2_count
        })
        
        return combined_image
        
    except Exception as e:
        print(f"  Error processing period {period_num}: {e}")
        return None

# Process all periods
print("Starting data processing for all periods...\n")

processed_images = []
successful_periods = []

for i, period in enumerate(periods):
    result = process_single_period(period, study_area, SCALE)
    if result is not None:
        processed_images.append(result)
        successful_periods.append(period)
    
    # Progress update every 5 periods
    if (i + 1) % 5 == 0:
        print(f"Completed {i + 1}/{len(periods)} periods\n")

print(f"Successfully processed {len(processed_images)} out of {len(periods)} periods")

# Create ImageCollection from processed images
if processed_images:
    time_series_collection = ee.ImageCollection(processed_images)
    print(f"Created time series collection with {time_series_collection.size().getInfo()} images")
else:
    print("No images were successfully processed!")

## 6. Export Data from GEE

In [None]:
def export_timeseries_to_drive(collection, geometry, scale, output_name):
    """
    Export the time series collection to Google Drive as a multi-band image
    """
    # Convert collection to multi-band image
    # Each period becomes a separate set of bands
    image_list = collection.toList(collection.size())
    
    def rename_bands_with_period(image):
        image = ee.Image(image)
        period = ee.Number(image.get('period')).format('%02d')
        
        # Rename bands to include period number
        old_names = image.bandNames()
        new_names = old_names.map(lambda name: ee.String(name).cat('_P').cat(period))
        
        return image.rename(new_names)
    
    # Rename bands with period numbers
    renamed_collection = collection.map(rename_bands_with_period)
    
    # Convert to single multi-band image
    multi_band_image = renamed_collection.toBands()
    
    # Export task
    task = ee.batch.Export.image.toDrive(
        image=multi_band_image,
        description=output_name,
        folder='GEE_FuseTS_Data',
        fileNamePrefix=output_name,
        scale=scale,
        region=geometry,
        maxPixels=1e9,
        crs='EPSG:4326',
        fileFormat='GeoTIFF'
    )
    
    return task

def export_individual_periods_to_drive(collection, geometry, scale, base_name):
    """
    Export each period as a separate GeoTIFF file to Google Drive
    """
    tasks = []
    image_list = collection.toList(collection.size())
    
    for i in range(len(successful_periods)):
        image = ee.Image(image_list.get(i))
        period_num = successful_periods[i]['period']
        
        task = ee.batch.Export.image.toDrive(
            image=image,
            description=f'{base_name}_Period_{period_num:02d}',
            folder='GEE_FuseTS_Data',
            fileNamePrefix=f'{base_name}_Period_{period_num:02d}',
            scale=scale,
            region=geometry,
            maxPixels=1e9,
            crs='EPSG:4326',
            fileFormat='GeoTIFF'
        )
        
        tasks.append(task)
    
    return tasks

# ============================================================================
# NEW: GEE ASSETS EXPORT FUNCTIONS (Better for large datasets!)
# ============================================================================

def export_timeseries_to_asset(collection, geometry, scale, asset_id):
    """
    Export the time series collection to GEE Assets as ImageCollection
    
    Advantages over Drive export:
    - No size limits (up to 10TB per user)
    - Data stays in GEE cloud (faster processing)
    - Can be used immediately in other GEE scripts
    - Better for large study areas
    
    Parameters:
    -----------
    asset_id : str
        Full path to asset, e.g., 'projects/ee-geodeticengineeringundip/assets/S1_S2_Nov2024_Oct2025'
    """
    tasks = []
    image_list = collection.toList(collection.size())
    
    for i in range(len(successful_periods)):
        image = ee.Image(image_list.get(i))
        period_num = successful_periods[i]['period']
        period_info = successful_periods[i]
        
        # Add comprehensive metadata
        image_with_metadata = image.set({
            'period': period_num,
            'start_date': period_info['start_str'],
            'end_date': period_info['end_str'],
            'center_date': period_info['center_date'].strftime('%Y-%m-%d'),
            'doy_center': period_info['doy_center'],
            'year': period_info['year'],
            'month': period_info['month'],
            'system:time_start': ee.Date(period_info['start_str']).millis(),
            'system:time_end': ee.Date(period_info['end_str']).millis()
        })
        
        # Create asset ID for this period
        period_asset_id = f'{asset_id}_Period_{period_num:02d}'
        
        task = ee.batch.Export.image.toAsset(
            image=image_with_metadata,
            description=f'Asset_Period_{period_num:02d}',
            assetId=period_asset_id,
            scale=scale,
            region=geometry,
            maxPixels=1e13,  # Higher limit for assets
            crs='EPSG:4326',
            pyramidingPolicy={'.default': 'mean'}  # Better for time series
        )
        
        tasks.append(task)
    
    return tasks

def export_imagecollection_to_asset(collection, asset_id, geometry, scale):
    """
    Export entire ImageCollection to a single GEE Asset
    
    Note: For very large collections, individual image exports (above function) are more reliable
    """
    # This exports the collection metadata structure
    # Individual images still need to be exported separately
    print("‚ö†Ô∏è  GEE doesn't support direct ImageCollection export.")
    print("    Use export_timeseries_to_asset() to export individual images.")
    print("    They will form an ImageCollection when all are in the same folder.")
    return None

# Choose export method
EXPORT_METHOD = 'individual'  # 'combined' or 'individual'
EXPORT_DESTINATION = 'drive'  # 'drive' or 'asset' - CHANGED TO 'drive' due to asset quota limit

# Your GEE Assets path (update this to your project!)
ASSET_BASE_PATH = 'projects/ee-geodeticengineeringundip/assets/FuseTS'

print(f"\nüì§ EXPORT CONFIGURATION:")
print(f"   Destination: {EXPORT_DESTINATION.upper()}")
print(f"   Method: {EXPORT_METHOD}")
if EXPORT_DESTINATION == 'asset':
    print(f"   Asset path: {ASSET_BASE_PATH}")
print(f"\nüí° Choose export destination:")
print(f"   ‚Ä¢ 'drive': Google Drive (good for < 2GB, need to download)")
print(f"   ‚Ä¢ 'asset': GEE Assets (recommended for large data, stays in cloud)")

if time_series_collection:
    if EXPORT_DESTINATION == 'asset':
        # ====================================================================
        # EXPORT TO GEE ASSETS (Recommended for large datasets!)
        # ====================================================================
        print("\nüöÄ Exporting to GEE Assets...")
        print("   ‚úÖ No size limits (up to 10TB)")
        print("   ‚úÖ Data stays in GEE cloud")
        print("   ‚úÖ Can use immediately in other scripts")
        
        asset_id = f'{ASSET_BASE_PATH}/S1_S2_Nov2024_Oct2025'
        
        export_tasks = export_timeseries_to_asset(
            time_series_collection,
            study_area,
            SCALE,
            asset_id
        )
        
        print(f"\nüìã Starting {len(export_tasks)} asset export tasks...")
        
        # Start first 10 tasks (GEE limits concurrent tasks)
        for i, task in enumerate(export_tasks[:10]):
            task.start()
            print(f"  ‚úÖ Started: Period {i+1:02d} ‚Üí {asset_id}_Period_{i+1:02d}")
        
        if len(export_tasks) > 10:
            print(f"\n‚è≥ Remaining {len(export_tasks) - 10} tasks queued")
            print("   Start them manually from: https://code.earthengine.google.com/tasks")
            print("   Or run this code to start next batch:")
            print(f"   for task in export_tasks[10:20]: task.start()")
        
        print(f"\nüìä After exports complete, load data in GEE with:")
        print(f"   var collection = ee.ImageCollection('{ASSET_BASE_PATH}/S1_S2_Nov2024_Oct2025_Period_*');")
        
    elif EXPORT_DESTINATION == 'drive':
        # ====================================================================
        # EXPORT TO GOOGLE DRIVE (Original method)
        # ====================================================================
        if EXPORT_METHOD == 'combined':
            # Export as single multi-band file
            print("\nüì§ Preparing export as single multi-band GeoTIFF to Google Drive...")
            export_task = export_timeseries_to_drive(
                time_series_collection, 
                study_area, 
                SCALE, 
                f'S1_S2_TimeSeries_Nov2024_Oct2025'
            )
            
            print(f"Starting export task: {export_task.config['description']}")
            export_task.start()
            
            print(f"Export task submitted. Monitor progress at: https://code.earthengine.google.com/tasks")
            
        else:
            # Export individual period files
            print("\nüì§ Preparing export as individual period GeoTIFFs to Google Drive...")
            export_tasks = export_individual_periods_to_drive(
                time_series_collection,
                study_area,
                SCALE,
                f'S1_S2_Nov2024_Oct2025'
            )
            
            print(f"Starting {len(export_tasks)} export tasks...")
            for i, task in enumerate(export_tasks[:5]):  # Start first 5 tasks
                task.start()
                print(f"  Started: {task.config['description']}")
            
            if len(export_tasks) > 5:
                print(f"\nRemaining {len(export_tasks) - 5} tasks can be started manually or in batches")
                print("Monitor all tasks at: https://code.earthengine.google.com/tasks")

else:
    print("No data to export!")

## 7. Create Local Processing Function (Alternative to Export)

## 6b. Load Data from GEE Assets (For Subsequent Processing)

If you exported to GEE Assets, use this code to load the data later in GEE or download specific regions:

In [None]:
# ============================================================================
# LOAD DATA FROM GEE ASSETS
# ============================================================================

def load_asset_collection(asset_base_path, pattern='*'):
    """
    Load ImageCollection from GEE Assets
    
    Parameters:
    -----------
    asset_base_path : str
        Base path to assets folder
    pattern : str
        Pattern to match asset names (e.g., 'S1_S2_Nov2024_Oct2025_Period_*')
    
    Returns:
    --------
    ee.ImageCollection
    """
    # Load all images matching the pattern
    full_pattern = f'{asset_base_path}/{pattern}'
    
    try:
        # Try loading as collection
        collection = ee.ImageCollection(full_pattern)
        count = collection.size().getInfo()
        print(f"‚úÖ Loaded {count} images from assets")
        return collection
    except Exception as e:
        print(f"‚ùå Error loading assets: {e}")
        print(f"   Make sure assets exist at: {full_pattern}")
        print(f"   Check: https://code.earthengine.google.com/?asset={asset_base_path}")
        return None

def download_region_from_assets(collection, region_geometry, scale, output_format='GeoTIFF'):
    """
    Download a specific region from asset collection
    
    This is useful when you've exported large Java Island data but only want
    a smaller region for analysis
    """
    # Convert collection to multi-band image
    def add_period_to_bands(image):
        period = ee.Number(image.get('period')).format('%02d')
        old_names = image.bandNames()
        new_names = old_names.map(lambda name: ee.String(name).cat('_P').cat(period))
        return image.rename(new_names)
    
    renamed_collection = collection.map(add_period_to_bands)
    multi_band = renamed_collection.toBands()
    
    # Create download URL
    url = multi_band.getDownloadURL({
        'scale': scale,
        'crs': 'EPSG:4326',
        'region': region_geometry,
        'format': output_format
    })
    
    print(f"üì• Download URL generated:")
    print(f"   {url}")
    print(f"\n   Copy this URL to your browser to download")
    
    return url

# Example: Load your exported assets
if EXPORT_DESTINATION == 'asset':
    print("="*60)
    print("üìñ LOADING DATA FROM GEE ASSETS")
    print("="*60)
    
    # Wait a moment for exports to start (if just submitted)
    import time
    print("\n‚è≥ Note: Asset exports take time. Check status at:")
    print("   https://code.earthengine.google.com/tasks")
    
    # Example of how to load later (after exports complete)
    print(f"\nüí° To load your exported data later, use:")
    print(f"\n```python")
    print(f"# Load the asset collection")
    print(f"asset_pattern = '{ASSET_BASE_PATH}/S1_S2_Nov2024_Oct2025_Period_*'")
    print(f"collection = ee.ImageCollection(asset_pattern)")
    print(f"")
    print(f"# Check what was loaded")
    print(f"print(f'Loaded {{collection.size().getInfo()}} images')")
    print(f"")
    print(f"# Download a specific region (optional)")
    print(f"small_region = ee.Geometry.Rectangle([106.8, -6.3, 107.0, -6.1])  # Example: Jakarta area")
    print(f"url = download_region_from_assets(collection, small_region, scale={SCALE})")
    print(f"```")
    
    print(f"\nüîÑ Or use directly in GEE Code Editor:")
    print(f"```javascript")
    print(f"// Load the collection")
    print(f"var collection = ee.ImageCollection('{ASSET_BASE_PATH}/S1_S2_Nov2024_Oct2025_Period_*');")
    print(f"")
    print(f"// Sort by period")
    print(f"var sorted = collection.sort('period');")
    print(f"")
    print(f"// Get first image")
    print(f"var first = sorted.first();")
    print(f"print('First period bands:', first.bandNames());")
    print(f"")
    print(f"// Process further or export to Drive from here")
    print(f"```")
    
    print(f"\n‚úÖ Assets allow you to:")
    print(f"   ‚Ä¢ Process data entirely in GEE (no download needed)")
    print(f"   ‚Ä¢ Download only specific regions when needed")
    print(f"   ‚Ä¢ Share with collaborators")
    print(f"   ‚Ä¢ Use in GEE Code Editor or Python API")

elif EXPORT_DESTINATION == 'drive':
    print("\nüí° For Google Drive exports:")
    print("   1. Monitor tasks at: https://code.earthengine.google.com/tasks")
    print("   2. Download files from Google Drive")
    print("   3. Use local processing (Section 7 below) or load in MOGPR notebook")

print("\n" + "="*60)

In [None]:
def extract_timeseries_locally(collection, geometry, scale, max_pixels=1e6):
    """
    Extract time series data directly to memory for small areas
    This is faster than export/download for small study areas
    """
    print("Extracting time series data locally...")
    
    # Get the region bounds
    region = geometry.bounds()
    
    # Extract data for each period
    image_list = collection.toList(collection.size())
    
    periods_data = []
    
    for i in range(len(successful_periods)):
        print(f"Extracting period {i+1}/{len(successful_periods)}...")
        
        image = ee.Image(image_list.get(i))
        period_info = successful_periods[i]
        
        try:
            # Sample the image
            if scale * scale * 10000 < max_pixels:  # Rough estimate
                # Use geemap for efficient extraction
                data_array = geemap.ee_to_xarray(
                    image, 
                    region=region, 
                    scale=scale,
                    crs='EPSG:4326'
                )
                
                # Add period information
                data_array = data_array.assign_coords(
                    period=period_info['period'],
                    center_date=period_info['center_date'],
                    doy_center=period_info['doy_center']
                )
                
                periods_data.append(data_array)
                
            else:
                print(f"  Area too large for local extraction, use export method instead")
                break
                
        except Exception as e:
            print(f"  Error extracting period {i+1}: {e}")
            continue
    
    if periods_data:
        # Combine all periods into a single xarray Dataset
        print("Combining periods into time series...")
        
        # Concatenate along a new time dimension
        combined_data = xr.concat(periods_data, dim='time')
        
        # Create proper time coordinates
        time_coords = [p['center_date'] for p in successful_periods[:len(periods_data)]]
        combined_data = combined_data.assign_coords(time=time_coords)
        
        return combined_data
    
    return None

# Try local extraction for small areas
area_size = study_area.area().getInfo()  # in square meters
area_km2 = area_size / 1e6

print(f"Study area size: {area_km2:.2f} km¬≤")

if area_km2 < 100:  # Less than 100 km¬≤
    print("Area is small enough for local extraction. Attempting direct download...")
    
    try:
        local_data = extract_timeseries_locally(
            time_series_collection, 
            study_area, 
            SCALE, 
            max_pixels=1e6
        )
        
        if local_data is not None:
            print("Local extraction successful!")
            print(f"Data shape: {local_data.dims}")
            print(f"Variables: {list(local_data.data_vars)}")
            
            # Save locally
            output_file = os.path.join(OUTPUT_DIR, f'S1_S2_timeseries_Nov2024_Oct2025_local.nc')
            local_data.to_netcdf(output_file)
            print(f"Data saved to: {output_file}")
            
        else:
            print("Local extraction failed, use export method instead")
            
    except Exception as e:
        print(f"Local extraction error: {e}")
        print("Use export method instead")
        
else:
    print("Area is too large for local extraction. Use the export method above.")

## 8. Create Metadata and Processing Summary

In [None]:
# Create processing summary
processing_summary = {
    'processing_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
    'start_date': START_DATE,
    'end_date': END_DATE,
    'temporal_coverage': f'{START_DATE} to {END_DATE}',
    'agricultural_year': 'Nov 2024 - Oct 2025',
    'total_periods': len(periods),
    'successful_periods': len(successful_periods),
    'study_area_bounds': study_area.bounds().getInfo(),
    'spatial_resolution': f'{SCALE}m',
    'coordinate_system': CRS,
    'max_cloud_cover': MAX_CLOUD_COVER,
    'composite_method': 'median',
    'output_bands': ['VV', 'VH', 'S2ndvi'],
    'agricultural_seasons_covered': {
        'season_1': 'Nov 2024 - Mar 2025 (first planting, crosses year boundary)',
        'season_2': 'Apr - Jun 2025 (second planting, dry season)',
        'season_3': 'Jul - Sep 2025 (third planting, optional intensive)',
        'full_coverage': 'Through Oct 2025'
    }
}

# Create detailed period information
period_details = []
for period in successful_periods:
    period_details.append({
        'period': period['period'],
        'start_date': period['start_str'],
        'end_date': period['end_str'],
        'center_date': period['center_date'].strftime('%Y-%m-%d'),
        'doy_center': period['doy_center'],
        'year': period['year'],
        'month': period['month']
    })

# Save metadata
import json

metadata = {
    'summary': processing_summary,
    'periods': period_details
}

metadata_file = os.path.join(OUTPUT_DIR, f'processing_metadata_Nov2024_Oct2025.json')
with open(metadata_file, 'w') as f:
    json.dump(metadata, f, indent=2, default=str)

print("Processing Summary:")
print(f"  Temporal coverage: {START_DATE} to {END_DATE}")
print(f"  Agricultural year: Nov 2024 - Oct 2025")
print(f"  Total periods: {len(periods)}")
print(f"  Successful periods: {len(successful_periods)}")
print(f"  Spatial resolution: {SCALE}m")
print(f"  Coordinate system: {CRS}")
print(f"  Output bands: {processing_summary['output_bands']}")
print(f"\nAgricultural Seasons Covered:")
print(f"  Season 1 (Nov-Mar): First planting season (crosses 2024‚Üí2025 boundary)")
print(f"  Season 2 (Apr-Jun): Second planting season (dry season)")
print(f"  Season 3 (Jul-Sep): Third planting season (optional intensive)")
print(f"  Full coverage: Through October 2025")
print(f"\nMetadata saved to: {metadata_file}")

# Create period visualization
fig, ax = plt.subplots(figsize=(16, 7))

# Plot period timeline
period_dates = [p['center_date'] for p in successful_periods]
period_numbers = [p['period'] for p in successful_periods]

ax.scatter(period_dates, period_numbers, alpha=0.7, s=50)
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Period Number', fontsize=12)
ax.set_title(f'12-Day Composite Periods: {START_DATE} to {END_DATE}\nIndonesian Agricultural Year Coverage ({SCALE}m resolution, {CRS})', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)

# Add month boundaries and labels for both years
from matplotlib.dates import DateFormatter, MonthLocator
ax.xaxis.set_major_locator(MonthLocator())
ax.xaxis.set_major_formatter(DateFormatter('%b\n%Y'))

# Highlight agricultural seasons with colored backgrounds
from matplotlib.patches import Rectangle
from datetime import datetime

# Season 1: Nov 2024 - Mar 2025 (first planting)
season1_start = datetime(2024, 11, 1)
season1_end = datetime(2025, 3, 31)
ax.axvspan(season1_start, season1_end, alpha=0.15, color='green', label='Season 1: Nov-Mar (First Planting)')

# Season 2: Apr - Jun 2025 (second planting)
season2_start = datetime(2025, 4, 1)
season2_end = datetime(2025, 6, 30)
ax.axvspan(season2_start, season2_end, alpha=0.15, color='blue', label='Season 2: Apr-Jun (Second Planting)')

# Season 3: Jul - Sep 2025 (third planting)
season3_start = datetime(2025, 7, 1)
season3_end = datetime(2025, 9, 30)
ax.axvspan(season3_start, season3_end, alpha=0.15, color='orange', label='Season 3: Jul-Sep (Third Planting)')

# Highlight year boundary
year_boundary = datetime(2025, 1, 1)
ax.axvline(year_boundary, color='red', linewidth=2, linestyle='--', label='Year Boundary (2024‚Üí2025)')

ax.legend(loc='upper left', fontsize=10)

plt.tight_layout()
plt.savefig(os.path.join(OUTPUT_DIR, f'period_timeline_Nov2024_Oct2025.png'), dpi=150, bbox_inches='tight')
plt.show()

print(f"\nPeriod timeline saved to: {os.path.join(OUTPUT_DIR, f'period_timeline_Nov2024_Oct2025.png')}")


## 9. Data Conversion for FuseTS

In [None]:
def prepare_fusets_format(data_path_or_array, metadata_path=None):
    """
    Convert GEE-exported data to FuseTS-compatible format
    """
    
    if isinstance(data_path_or_array, str):
        # Load from file
        print(f"Loading data from: {data_path_or_array}")
        
        if data_path_or_array.endswith('.nc'):
            data = xr.open_dataset(data_path_or_array)
        else:
            # Assume GeoTIFF
            import rioxarray
            data = rioxarray.open_rasterio(data_path_or_array)
            
    else:
        # Use provided array
        data = data_path_or_array
    
    print("Converting to FuseTS format...")
    
    # Ensure proper dimension naming
    if 'time' in data.dims:
        data = data.rename({'time': 't'})
    
    # Ensure proper band naming for FuseTS
    if 'NDVI' in data.data_vars:
        data = data.rename({'NDVI': 'S2ndvi'})
    
    # Ensure coordinate order is (t, y, x)
    expected_dims = ['t', 'y', 'x']
    
    for var in data.data_vars:
        if set(data[var].dims) == set(expected_dims):
            data[var] = data[var].transpose('t', 'y', 'x')
    
    # Add FuseTS-specific attributes
    data.attrs.update({
        'title': f'Sentinel-1/2 Time Series for FuseTS Processing',
        'description': '12-day composite periods extracted from Google Earth Engine',
        'bands': 'VV (S1), VH (S1), S2ndvi (S2 NDVI)',
        'temporal_resolution': '12-day composites',
        'processing_software': 'Google Earth Engine + Python',
        'fusets_ready': True
    })
    
    return data

def create_example_usage_script():
    """
    Create a script showing how to use the exported data with FuseTS
    """
    
    script_content = '''
# Example script to use GEE-exported data with FuseTS
# Run this after downloading the exported data from Google Drive
# Temporal coverage: November 2024 - October 2025 (Indonesian agricultural year)

import xarray as xr
import rioxarray
from fusets.mogpr import MOGPRTransformer
from fusets.analytics import phenology
from fusets import whittaker

# Load the exported data
# Option 1: If you exported as individual periods
# data_files = ['S1_S2_Nov2024_Oct2025_Period_01.tif', 'S1_S2_Nov2024_Oct2025_Period_02.tif', ...]
# data = combine_period_files(data_files)  # You'll need to implement this

# Option 2: If you exported as single multi-band file
data_path = 'S1_S2_TimeSeries_Nov2024_Oct2025.tif'
data = rioxarray.open_rasterio(data_path)

# Convert to FuseTS format
fusets_data = prepare_fusets_format(data)

# Apply MOGPR fusion
mogpr = MOGPRTransformer()
fused_data = mogpr.fit_transform(fusets_data)

# Extract phenological metrics for Indonesian agricultural seasons
# Season 1: Nov 2024 - Mar 2025 (first planting, crosses year boundary)
# Season 2: Apr - Jun 2025 (second planting, dry season)
# Season 3: Jul - Sep 2025 (third planting, optional intensive)

phenology_metrics = phenology(fused_data['S2ndvi'])

# Access results
sos_times = phenology_metrics.da_sos_times
eos_times = phenology_metrics.da_eos_times

print("FuseTS processing completed for Nov 2024 - Oct 2025!")
print("Captured full Indonesian agricultural calendar including year-boundary season")
'''
    
    script_file = os.path.join(OUTPUT_DIR, 'fusets_processing_example.py')
    with open(script_file, 'w') as f:
        f.write(script_content)
    
    return script_file

# Create example script
example_script = create_example_usage_script()
print(f"Example FuseTS processing script created: {example_script}")

# If we have local data, prepare it for FuseTS
if 'local_data' in locals() and local_data is not None:
    print("\nPreparing local data for FuseTS...")
    fusets_ready_data = prepare_fusets_format(local_data)
    
    # Save FuseTS-ready data
    fusets_output = os.path.join(OUTPUT_DIR, f'S1_S2_timeseries_Nov2024_Oct2025_fusets_ready.nc')
    fusets_ready_data.to_netcdf(fusets_output)
    print(f"FuseTS-ready data saved to: {fusets_output}")
    
    # Display data structure
    print("\nFuseTS-ready data structure:")
    print(fusets_ready_data)
    
    print("\nThis data is now ready for the MOGPR fusion notebook!")

## 10. Summary and Next Steps

### What this notebook accomplishes:

1. **Temporal Strategy**: Creates exactly 31 periods of 12-day composites from **Nov 2024 to Oct 2025**
2. **Data Collection**: Extracts S1 (VV, VH) and S2 (NDVI) data from Google Earth Engine
3. **Cloud Processing**: Uses GEE's computational power for large-scale data processing
4. **Flexible Export**: **GEE Assets (recommended)** or Google Drive
5. **Local Processing**: For small areas, extracts data directly without export/download
6. **FuseTS Preparation**: Converts data to the exact format needed for MOGPR processing

### Export Options Comparison:

| Feature | GEE Assets ‚≠ê RECOMMENDED | Google Drive |
|---------|---------------------------|--------------|
| **Size limit** | 10 TB per user | ~15 GB per file |
| **Best for** | Large areas (Java Island) | Small test areas |
| **Speed** | Fast (stays in cloud) | Slow (download required) |
| **Usage** | Use directly in GEE | Must download first |
| **Sharing** | Easy (asset permissions) | Manual file sharing |
| **Cost** | Free (GEE quota) | Free (Drive quota) |
| **Processing** | Process in GEE cloud | Local processing needed |

### When to use GEE Assets:
‚úÖ **Study area > 1000 km¬≤** (like Java Island with 5km buffer)  
‚úÖ **Multiple people need access** to the same data  
‚úÖ **Want to process in GEE** without downloading  
‚úÖ **Need to reuse data** in multiple projects  
‚úÖ **Data size > 2GB**  

### When to use Google Drive:
‚úÖ **Small test area** (< 100 km¬≤)  
‚úÖ **Quick prototyping** with local tools  
‚úÖ **One-time download** for offline work  
‚úÖ **Prefer local storage** over cloud  

### Temporal Coverage (Indonesian Agricultural Year):
- **Period 1**: 2024-11-01 to 2024-11-12 ‚Üê **First planting season starts**
- **Period 2**: 2024-11-13 to 2024-11-24  
- **Period 3**: 2024-11-25 to 2024-12-06
- **Period 6**: 2024-12-31 to 2025-01-11 ‚Üê **Crosses year boundary**
- **...**
- **Period 11**: 2025-03-09 to 2025-03-20 ‚Üê **First planting season ends**
- **Period 12-18**: 2025-04-01 to 2025-06-30 ‚Üê **Second planting season**
- **Period 19-25**: 2025-07-01 to 2025-09-30 ‚Üê **Third planting season (optional)**
- **Period 31**: 2025-10-21 to 2025-10-31 ‚Üê **Full coverage complete**

### Agricultural Seasons Captured:
- **Season 1 (Nov-Mar)**: First planting season - **handles year boundary transition**
  - Start: Nov 2024 (Period 1)
  - Peak: Jan 2025 (crosses from 2024‚Üí2025)
  - End: Mar 2025 (Period ~11)
  
- **Season 2 (Apr-Jun)**: Second planting season (dry season)
  - Periods 12-18 in 2025
  
- **Season 3 (Jul-Sep)**: Third planting season (optional intensive)
  - Periods 19-25 in 2025
  
- **Full Monitoring**: Through October 2025 (Period 31)

### Next Steps:

#### If you exported to GEE Assets (Recommended):
1. **Monitor exports**: https://code.earthengine.google.com/tasks
2. **Use in GEE Code Editor**:
   ```javascript
   var collection = ee.ImageCollection('projects/ee-geodeticengineeringundip/assets/FuseTS/S1_S2_Nov2024_Oct2025_Period_*');
   ```
3. **Or download specific regions** when needed (see Section 6b)
4. **Process in GEE** or download small regions for local analysis

#### If you exported to Google Drive:
1. **Download Data**: Monitor exports at https://code.earthengine.google.com/tasks
2. **Load in FuseTS**: Use the exported GeoTIFF files with the MOGPR fusion notebook
3. **Apply MOGPR**: Run the S1+S2 fusion using the prepared time series
4. **Multi-Season Analysis**: Detect all three Indonesian agricultural seasons

### File Outputs:
- **Assets**: `projects/ee-geodeticengineeringundip/assets/FuseTS/S1_S2_Nov2024_Oct2025_Period_*`
- **Or Drive**: S1_S2_TimeSeries_Nov2024_Oct2025.tif (or individual period files)
- **Metadata**: processing_metadata_Nov2024_Oct2025.json
- **Timeline**: period_timeline_Nov2024_Oct2025.png
- **Example Script**: fusets_processing_example.py

### Key Features:
‚úÖ **Perfect alignment** with Indonesian agricultural calendar  
‚úÖ **Year boundary handling** for Nov 2024 ‚Üí Mar 2025 first season  
‚úÖ **Complete coverage** of all potential planting seasons  
‚úÖ **31 periods** √ó 12 days = 365 days (full agricultural year)  
‚úÖ **50m resolution** for efficient regional analysis  
‚úÖ **GEE Assets support** for large-scale datasets  

### For Large Datasets (Java Island):
üí° **Recommended workflow**:
1. Export to **GEE Assets** (no size limits)
2. Process and analyze **entirely in GEE** using Code Editor or Python API
3. Download **only final results** or specific regions of interest
4. Use MOGPR fusion **on cloud-processed data** for maximum efficiency

The exported data is now ready for the FuseTS MOGPR processing workflow with full Indonesian agricultural season detection!

---

## ‚úÖ READY TO RUN

**Current Configuration:**
- **Study Area**: Kabupaten Demak (~900 km¬≤)
- **Temporal Coverage**: 31 periods √ó 12 days (Nov 2024 - Oct 2025)
- **Sentinel-1**: VV, VH bands (99.9% coverage)
- **Sentinel-2**: Level-1C TOA NDVI without masking (99.9% coverage)
- **Export**: Google Drive, 31 individual files
- **Expected Size**: ~15-30GB total

**Next Steps:**
1. ‚úÖ **Run Section 5** to process all 31 periods (~10-30 min)
2. ‚úÖ **Run Section 6** to export to Google Drive
3. ‚úÖ **Monitor exports** at https://code.earthengine.google.com/tasks
4. ‚úÖ **Download** from Google Drive after completion (~30-60 min)
5. ‚úÖ **Run MOGPR fusion** in S1_S2_MOGPR_Fusion_Tutorial.ipynb

**Ready to start? Run Section 5 now!** üöÄ