# Google Earth Engine Data Preparation for FuseTS

This notebook extracts Sentinel-1 and Sentinel-2 data from Google Earth Engine and prepares it for FuseTS MOGPR processing.

## Temporal Compositing Strategy
- **Total periods**: 31 periods from Nov 2024 - Oct 2025
- **Period length**: 12 days each
- **Start date**: November 1, 2024
- **End date**: October 31, 2025
- **Period 1**: Nov 1-12, 2024
- **Period 2**: Nov 13-24, 2024  
- **Period 3**: Nov 25 - Dec 6, 2024
- **... and so on**

## Indonesian Agricultural Calendar Coverage
This date range perfectly captures:
- **First planting season**: Nov 2024 - Mar 2025 (crosses year boundary)
- **Second planting season**: Apr - Jun 2025
- **Third planting season**: Jul - Sep 2025 (optional)
- **Full cycle**: Complete agricultural year

## Output Format
Data will be exported in FuseTS-compatible xarray format with proper band naming:
- S1: `VV`, `VH` bands
- S2: `S2ndvi` band
- Dimensions: `(time, y, x)` with `t` coordinate name

## 1. Setup and Authentication

In [None]:
import ee
import geemap
import pandas as pd
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import os
import warnings
warnings.filterwarnings('ignore')

# Additional imports for mask processing
import rasterio
from rasterio.features import shapes
import geopandas as gpd
from shapely.geometry import shape, mapping
from shapely.ops import unary_union

# Initialize Earth Engine with authentication
print("üîê Authenticating with Google Earth Engine...")

try:
    # First time setup: authenticate
    ee.Authenticate()
    print("‚úÖ Authentication successful!")
except Exception as e:
    print(f"Authentication note: {e}")
    print("If already authenticated, continuing...")

# Initialize with project
try:
    ee.Initialize(project='ee-geodeticengineeringundip')
    print("‚úÖ Earth Engine initialized successfully!")
    print(f"   Project: ee-geodeticengineeringundip")
except Exception as e:
    print(f"‚ùå Error initializing Earth Engine: {e}")
    print("Please ensure:")
    print("  1. You have run ee.Authenticate() successfully")
    print("  2. You have access to project 'ee-geodeticengineeringundip'")
    raise

print(f"\nüì¶ Package versions:")
print(f"   Earth Engine API: {ee.__version__}")
print(f"   geemap: {geemap.__version__}")
print(f"   rasterio: {rasterio.__version__}")


## 2. Define Study Area and Parameters

In [None]:
# ============================================================================
# STUDY AREA SELECTION
# ============================================================================

# Choose your study area:
STUDY_AREA_TYPE = 'demak'  # Options: 'java_island' or 'demak'

print("="*70)
print("üìç STUDY AREA CONFIGURATION")
print("="*70)

if STUDY_AREA_TYPE == 'demak':
    # ========================================================================
    # OPTION 1: KABUPATEN DEMAK (Small area - faster processing)
    # ========================================================================
    print("\nüéØ Using Kabupaten Demak, Central Java")
    
    # Demak administrative boundary (approximate coordinates)
    # You can adjust these based on your specific area of interest
    demak_bounds = {
        'west': 110.35,   # Western boundary
        'east': 110.75,   # Eastern boundary  
        'south': -7.05,   # Southern boundary
        'north': -6.75    # Northern boundary
    }
    
    # Create rectangle geometry for Demak
    study_area = ee.Geometry.Rectangle([
        demak_bounds['west'], 
        demak_bounds['south'],
        demak_bounds['east'], 
        demak_bounds['north']
    ])
    
    # Alternative: Use GEE administrative boundaries (more accurate)
    # Uncomment these lines to use official boundaries:
    # admin_boundaries = ee.FeatureCollection("FAO/GAUL/2015/level2")
    # demak = admin_boundaries.filter(ee.Filter.eq('ADM2_NAME', 'Demak'))
    # study_area = demak.geometry()
    
    print(f"   Type: Administrative boundary (regency/kabupaten)")
    print(f"   Location: Central Java Province")
    print(f"   Approximate area: ~900 km¬≤")
    print(f"   Bounds: {demak_bounds}")
    print(f"   ‚úÖ Much smaller than Java Island ‚Üí faster export!")
    
elif STUDY_AREA_TYPE == 'java_island':
    # ========================================================================
    # OPTION 2: FULL JAVA ISLAND (Large area - requires more storage)
    # ========================================================================
    print("\nüèùÔ∏è  Using Full Java Island")
    
    import rasterio
    from rasterio.features import shapes
    import geopandas as gpd
    from shapely.geometry import shape, mapping
    
    # Path to Java Island mask
    MASK_FILE = 'java_island_mask.tif'
    
    print(f"   Loading Java Island mask from: {MASK_FILE}")
    
    # Read the mask file and extract geometry
    with rasterio.open(MASK_FILE) as src:
        # Read the mask (assuming mask values > 0 indicate valid areas)
        mask_data = src.read(1)
        mask_transform = src.transform
        mask_crs = src.crs
        
        # Get bounds
        bounds = src.bounds
        print(f"   Mask bounds: {bounds}")
        print(f"   Mask CRS: {mask_crs}")
        print(f"   Mask shape: {mask_data.shape}")
        
        # Extract geometry from mask (vectorize the raster mask)
        mask_geoms = []
        for geom, val in shapes(mask_data, mask=mask_data > 0, transform=mask_transform):
            mask_geoms.append(shape(geom))
    
    # Create a unified geometry for Java Island
    if len(mask_geoms) > 0:
        from shapely.ops import unary_union
        java_geometry = unary_union(mask_geoms)
        
        # Add 5 km buffer to the Java Island geometry
        BUFFER_DISTANCE_KM = 5
        BUFFER_DISTANCE_DEGREES = BUFFER_DISTANCE_KM / 111.0  # Approximate conversion (1 degree ‚âà 111 km)
        
        print(f"   Applying {BUFFER_DISTANCE_KM} km buffer to Java Island mask...")
        java_geometry_buffered = java_geometry.buffer(BUFFER_DISTANCE_DEGREES)
        
        # Convert to GeoJSON format for Earth Engine
        java_geojson = mapping(java_geometry_buffered)
        
        # Upload to Earth Engine
        study_area = ee.Geometry(java_geojson)
        
        print(f"   ‚úÖ Java Island mask loaded successfully!")
        print(f"   Number of geometries merged: {len(mask_geoms)}")
        print(f"   Buffer applied: {BUFFER_DISTANCE_KM} km")
        print(f"   Approximate area: ~150,000 km¬≤")
    else:
        print("   ‚ö†Ô∏è  No valid mask areas found, falling back to bounding box")
        study_area = ee.Geometry.Rectangle([bounds.left, bounds.bottom, bounds.right, bounds.top])

else:
    raise ValueError(f"Invalid STUDY_AREA_TYPE: {STUDY_AREA_TYPE}. Use 'demak' or 'java_island'")

# Processing parameters
START_DATE = '2024-11-01'  # November 1, 2024
END_DATE = '2025-10-31'    # October 31, 2025
SCALE = 50  # meters per pixel (50m resolution for both S1 and S2)
CRS = 'EPSG:4326'  # WGS84 coordinate system
MAX_CLOUD_COVER = 20  # Maximum cloud cover percentage for S2

# Output directory
OUTPUT_DIR = 'gee_fusets_data'
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Display final configuration
print(f"\n{'='*70}")
print("üìã FINAL CONFIGURATION")
print(f"{'='*70}")
print(f"   Study Area: {STUDY_AREA_TYPE.upper()}")
print(f"   Bounds: {study_area.bounds().getInfo()}")
print(f"   Area size: {study_area.area().getInfo() / 1e6:.1f} km¬≤")
print(f"   Processing period: {START_DATE} to {END_DATE}")
print(f"   Temporal resolution: 12-day composites (31 periods)")
print(f"   Spatial resolution: {SCALE}m")
print(f"   Coordinate system: {CRS}")
print(f"   Max cloud cover: {MAX_CLOUD_COVER}%")
print(f"   Output directory: {OUTPUT_DIR}")

# Estimate data size
area_km2 = study_area.area().getInfo() / 1e6
pixels_per_period = (area_km2 * 1e6) / (SCALE * SCALE)  # Total pixels
bands = 3  # VV, VH, S2ndvi
bytes_per_pixel = 4  # Float32
total_size_gb = (pixels_per_period * bands * bytes_per_pixel * 31) / 1e9

print(f"\nüíæ Estimated data size:")
print(f"   Per period: ~{total_size_gb/31:.2f} GB")
print(f"   Total (31 periods): ~{total_size_gb:.1f} GB")

if total_size_gb > 250:
    print(f"\n   ‚ö†Ô∏è  WARNING: Exceeds GEE Asset quota (250GB)")
    print(f"   ‚Üí Use Google Drive export instead")
elif total_size_gb > 100:
    print(f"\n   ‚ö° Large dataset - GEE Assets recommended")
else:
    print(f"\n   ‚úÖ Manageable size - Google Drive or Assets both work")

print(f"{'='*70}")


## 3. Generate 12-Day Composite Periods

In [None]:
def generate_12day_periods(start_date_str, end_date_str):
    """
    Generate 31 periods of 12 days each from Nov 1, 2024 to Oct 31, 2025
    
    Parameters:
    -----------
    start_date_str : str
        Start date in 'YYYY-MM-DD' format (e.g., '2024-11-01')
    end_date_str : str
        End date in 'YYYY-MM-DD' format (e.g., '2025-10-31')
    """
    start_date = datetime.strptime(start_date_str, '%Y-%m-%d')
    end_date = datetime.strptime(end_date_str, '%Y-%m-%d')
    
    periods = []
    
    for period_num in range(31):
        period_start = start_date + timedelta(days=period_num * 12)
        period_end = period_start + timedelta(days=11)  # 12 days inclusive
        
        # Ensure we don't go beyond the end date
        if period_end > end_date:
            period_end = end_date
            
        periods.append({
            'period': period_num + 1,
            'start_date': period_start,
            'end_date': period_end,
            'start_str': period_start.strftime('%Y-%m-%d'),
            'end_str': period_end.strftime('%Y-%m-%d'),
            'center_date': period_start + timedelta(days=6),  # Middle of period
            'doy_center': (period_start + timedelta(days=6)).timetuple().tm_yday,
            'year': period_start.year,
            'month': period_start.month
        })
        
        if period_end >= end_date:
            break
            
    return periods

# Generate periods from Nov 2024 to Oct 2025
periods = generate_12day_periods(START_DATE, END_DATE)

print(f"Generated {len(periods)} periods from {START_DATE} to {END_DATE}:")
print("\nFirst 5 periods:")
for i, period in enumerate(periods[:5]):
    print(f"Period {period['period']:2d}: {period['start_str']} to {period['end_str']} (center: DOY {period['doy_center']:3d}, {period['year']})")

print("\nPeriods crossing year boundary (Dec 2024 -> Jan 2025):")
year_boundary_periods = [p for p in periods if p['start_date'].year != p['end_date'].year]
for period in year_boundary_periods:
    print(f"Period {period['period']:2d}: {period['start_str']} to {period['end_str']} ‚Üê CROSSES YEAR BOUNDARY")

print("\nLast 5 periods:")
for i, period in enumerate(periods[-5:]):
    print(f"Period {period['period']:2d}: {period['start_str']} to {period['end_str']} (center: DOY {period['doy_center']:3d}, {period['year']})")

# Create a DataFrame for easier handling
periods_df = pd.DataFrame(periods)
print(f"\nTotal temporal coverage: {periods[0]['start_str']} to {periods[-1]['end_str']}")
print(f"Covers Indonesian agricultural seasons:")
print(f"  ‚Ä¢ Season 1 (Nov-Mar): Periods 1-11 (Nov 2024 - Mar 2025)")
print(f"  ‚Ä¢ Season 2 (Apr-Jun): Periods 12-18 (Apr - Jun 2025)")
print(f"  ‚Ä¢ Season 3 (Jul-Sep): Periods 19-25 (Jul - Sep 2025)")
print(f"  ‚Ä¢ Full coverage: Through Period 31 (Oct 2025)")

## 4. Define Data Loading Functions

In [None]:
def load_sentinel1_data(geometry, start_date, end_date):
    """
    Load Sentinel-1 GRD data for a specific time period
    """
    s1_collection = (ee.ImageCollection('COPERNICUS/S1_GRD')
                    .filterBounds(geometry)
                    .filterDate(start_date, end_date)
                    .filter(ee.Filter.eq('instrumentMode', 'IW'))
                    .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VV'))
                    .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VH'))
                    .select(['VV', 'VH']))
    
    return s1_collection

def load_sentinel2_data(geometry, start_date, end_date, max_cloud_cover=20):
    """
    Load Sentinel-2 data and calculate NDVI for a specific time period
    """
    def calculate_ndvi(image):
        ndvi = image.normalizedDifference(['B8', 'B4']).rename('NDVI')
        return image.addBands(ndvi)
    
    def mask_clouds(image):
        # Use SCL band for cloud masking
        scl = image.select('SCL')
        # Keep vegetation, soil, water, snow classes (4,5,6,11)
        good_pixels = scl.eq(4).Or(scl.eq(5)).Or(scl.eq(6)).Or(scl.eq(11))
        return image.updateMask(good_pixels)
    
    s2_collection = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
                    .filterBounds(geometry)
                    .filterDate(start_date, end_date)
                    .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', max_cloud_cover))
                    .map(mask_clouds)
                    .map(calculate_ndvi)
                    .select(['NDVI']))
    
    return s2_collection

def create_composite(collection, method='median'):
    """
    Create a composite from an image collection
    """
    if method == 'median':
        return collection.median()
    elif method == 'mean':
        return collection.mean()
    elif method == 'max':
        return collection.max()
    else:
        return collection.median()

print("Data loading functions defined successfully!")

## 5. Process Data for All Periods

In [None]:
def process_single_period(period_info, geometry, scale=10):
    """
    Process S1 and S2 data for a single 12-day period
    """
    start_date = period_info['start_str']
    end_date = period_info['end_str']
    period_num = period_info['period']
    
    print(f"Processing Period {period_num}: {start_date} to {end_date}")
    
    try:
        # Load Sentinel-1 data
        s1_collection = load_sentinel1_data(geometry, start_date, end_date)
        s1_count = s1_collection.size().getInfo()
        
        # Load Sentinel-2 data
        s2_collection = load_sentinel2_data(geometry, start_date, end_date, MAX_CLOUD_COVER)
        s2_count = s2_collection.size().getInfo()
        
        print(f"  Found {s1_count} S1 images, {s2_count} S2 images")
        
        # Create composites
        if s1_count > 0:
            s1_composite = create_composite(s1_collection, 'median')
        else:
            # Create empty image with correct bands
            s1_composite = ee.Image.constant([0, 0]).rename(['VV', 'VH']).updateMask(ee.Image.constant(0))
            
        if s2_count > 0:
            s2_composite = create_composite(s2_collection, 'median')
        else:
            # Create empty NDVI image
            s2_composite = ee.Image.constant(0).rename('NDVI').updateMask(ee.Image.constant(0))
        
        # Combine S1 and S2 data
        combined_image = s1_composite.addBands(s2_composite.rename('S2ndvi'))
        
        # Add metadata
        combined_image = combined_image.set({
            'period': period_num,
            'start_date': start_date,
            'end_date': end_date,
            'center_date': period_info['center_date'].strftime('%Y-%m-%d'),
            'doy_center': period_info['doy_center'],
            's1_count': s1_count,
            's2_count': s2_count
        })
        
        return combined_image
        
    except Exception as e:
        print(f"  Error processing period {period_num}: {e}")
        return None

# Process all periods
print("Starting data processing for all periods...\n")

processed_images = []
successful_periods = []

for i, period in enumerate(periods):
    result = process_single_period(period, study_area, SCALE)
    if result is not None:
        processed_images.append(result)
        successful_periods.append(period)
    
    # Progress update every 5 periods
    if (i + 1) % 5 == 0:
        print(f"Completed {i + 1}/{len(periods)} periods\n")

print(f"Successfully processed {len(processed_images)} out of {len(periods)} periods")

# Create ImageCollection from processed images
if processed_images:
    time_series_collection = ee.ImageCollection(processed_images)
    print(f"Created time series collection with {time_series_collection.size().getInfo()} images")
else:
    print("No images were successfully processed!")

## 6. Export Data from GEE

## 5b. Preview 12-Day Composites

Before exporting all 31 periods, let's preview a few composites to verify the data quality and spatial coverage.

In [None]:
# ============================================================================
# PREVIEW COMPOSITES BEFORE EXPORT
# ============================================================================

print("="*70)
print("üîç PREVIEW: 12-Day Composites Quality Check")
print("="*70)

# Select a few sample periods to preview
sample_periods = [0, 10, 20, 30]  # Period 1, 11, 21, 31 (spread across the year)

print(f"\nüìä Previewing {len(sample_periods)} sample periods:")
for idx in sample_periods:
    if idx < len(successful_periods):
        p = successful_periods[idx]
        print(f"   Period {p['period']:2d}: {p['start_str']} to {p['end_str']}")

# Create visualization map
print("\nüó∫Ô∏è  Creating interactive map...")
Map = geemap.Map(center=[-7.2, 110.5], zoom=7)

# Add Java Island boundary for reference
Map.addLayer(study_area, {'color': 'red'}, 'Study Area (Java Island + 5km buffer)', opacity=0.3)

# Visualization parameters for different bands
vis_params_vv = {
    'min': -25,
    'max': 0,
    'palette': ['blue', 'yellow', 'red']
}

vis_params_vh = {
    'min': -30,
    'max': -5,
    'palette': ['blue', 'green', 'yellow']
}

vis_params_ndvi = {
    'min': 0,
    'max': 1,
    'palette': ['brown', 'yellow', 'green', 'darkgreen']
}

# Add sample periods to map
for idx in sample_periods:
    if idx < len(processed_images):
        image = processed_images[idx]
        period_info = successful_periods[idx]
        period_num = period_info['period']
        
        # Add each band as a separate layer
        Map.addLayer(
            image.select('VV'), 
            vis_params_vv, 
            f'Period {period_num:02d} - S1 VV', 
            shown=False
        )
        
        Map.addLayer(
            image.select('VH'), 
            vis_params_vh, 
            f'Period {period_num:02d} - S1 VH', 
            shown=False
        )
        
        Map.addLayer(
            image.select('S2ndvi'), 
            vis_params_ndvi, 
            f'Period {period_num:02d} - S2 NDVI', 
            shown=(idx == 0)  # Show only first period by default
        )
        
        # RGB composite (false color)
        rgb_vis = {
            'min': [0, -25, -30],
            'max': [1, 0, -5],
            'bands': ['S2ndvi', 'VV', 'VH']
        }
        Map.addLayer(
            image, 
            rgb_vis, 
            f'Period {period_num:02d} - RGB (NDVI/VV/VH)', 
            shown=False
        )

print("‚úÖ Map created! Toggle layers in the map to compare periods and bands")
print("   ‚Ä¢ Red outline: Study area boundary")
print("   ‚Ä¢ VV: Sentinel-1 VV polarization (blue=low, red=high backscatter)")
print("   ‚Ä¢ VH: Sentinel-1 VH polarization (blue=low, yellow=high backscatter)")
print("   ‚Ä¢ NDVI: Vegetation index (brown=no veg, green=dense vegetation)")
print("   ‚Ä¢ RGB: False color composite (Red=NDVI, Green=VV, Blue=VH)")

# Display the map
Map

In [None]:
# ============================================================================
# PIXEL-LEVEL DATA QUALITY CHECK
# ============================================================================

print("\n" + "="*70)
print("üìà PIXEL DATA QUALITY CHECK")
print("="*70)

# Define sample points across Java Island
sample_points = [
    {'name': 'Western Banten', 'lon': 106.0, 'lat': -6.5},
    {'name': 'West Java (Bandung)', 'lon': 107.6, 'lat': -6.9},
    {'name': 'Central Java (Semarang)', 'lon': 110.4, 'lat': -7.0},
    {'name': 'Central Java Coast', 'lon': 109.0, 'lat': -6.8},
    {'name': 'East Java (Surabaya)', 'lon': 112.7, 'lat': -7.3},
    {'name': 'Eastern Java', 'lon': 114.2, 'lat': -8.0}
]

print(f"\nüéØ Sampling {len(sample_points)} locations across Java Island:")
for pt in sample_points:
    print(f"   ‚Ä¢ {pt['name']:25s} ({pt['lon']:6.2f}¬∞E, {pt['lat']:5.2f}¬∞N)")

# Sample first period to check for data availability
if len(processed_images) > 0:
    first_image = processed_images[0]
    
    print(f"\nüìä Checking Period 1 data at sample locations...")
    print(f"{'Location':<25s} {'VV':>8s} {'VH':>8s} {'NDVI':>8s} {'Status':>12s}")
    print("-" * 70)
    
    for pt in sample_points:
        point = ee.Geometry.Point([pt['lon'], pt['lat']])
        
        # Sample the image at this point
        try:
            sample = first_image.sample(point, scale=SCALE).first().getInfo()
            
            if sample and 'properties' in sample:
                props = sample['properties']
                vv = props.get('VV', None)
                vh = props.get('VH', None)
                ndvi = props.get('S2ndvi', None)
                
                # Check if data exists
                if vv is not None and vh is not None and ndvi is not None:
                    status = "‚úÖ HAS DATA"
                    vv_str = f"{vv:8.2f}"
                    vh_str = f"{vh:8.2f}"
                    ndvi_str = f"{ndvi:8.3f}"
                else:
                    status = "‚ùå NO DATA"
                    vv_str = "None" if vv is None else f"{vv:8.2f}"
                    vh_str = "None" if vh is None else f"{vh:8.2f}"
                    ndvi_str = "None" if ndvi is None else f"{ndvi:8.3f}"
                
                print(f"{pt['name']:<25s} {vv_str:>8s} {vh_str:>8s} {ndvi_str:>8s} {status:>12s}")
            else:
                print(f"{pt['name']:<25s} {'None':>8s} {'None':>8s} {'None':>8s} {'‚ùå NO DATA':>12s}")
                
        except Exception as e:
            print(f"{pt['name']:<25s} {'Error':>8s} {'Error':>8s} {'Error':>8s} {'‚ùå ERROR':>12s}")
    
    print("\nüí° Interpretation:")
    print("   ‚Ä¢ VV/VH values between -30 to 0 dB are normal for Sentinel-1")
    print("   ‚Ä¢ NDVI values between 0 to 1 are normal (0=no vegetation, 1=dense)")
    print("   ‚Ä¢ 'None' values indicate missing data (possible mask issue)")
    
    # Check if Java Island mask was properly applied
    print("\n‚ö†Ô∏è  IMPORTANT:")
    print("   If you see 'NO DATA' for most locations, the Java Island mask")
    print("   might not be properly applied during export. This is the issue")
    print("   we identified earlier. Make sure to apply the mask fix before export!")
    
else:
    print("‚ùå No processed images available for quality check")

In [None]:
# ============================================================================
# TIME SERIES PROFILE AT SAMPLE LOCATION
# ============================================================================

print("\n" + "="*70)
print("üìâ TIME SERIES PROFILE")
print("="*70)

# Pick one location for detailed time series analysis
test_location = {'name': 'Central Java (Agricultural Area)', 'lon': 110.4, 'lat': -7.0}
test_point = ee.Geometry.Point([test_location['lon'], test_location['lat']])

print(f"\nüìç Extracting full time series at: {test_location['name']}")
print(f"   Coordinates: {test_location['lon']:.2f}¬∞E, {test_location['lat']:.2f}¬∞N")

# Extract values for all periods
time_series_data = {
    'period': [],
    'date': [],
    'VV': [],
    'VH': [],
    'NDVI': []
}

print("\n‚è≥ Extracting data from all periods...")
for i, (image, period_info) in enumerate(zip(processed_images, successful_periods)):
    try:
        sample = image.sample(test_point, scale=SCALE).first().getInfo()
        
        if sample and 'properties' in sample:
            props = sample['properties']
            time_series_data['period'].append(period_info['period'])
            time_series_data['date'].append(period_info['center_date'])
            time_series_data['VV'].append(props.get('VV', None))
            time_series_data['VH'].append(props.get('VH', None))
            time_series_data['NDVI'].append(props.get('S2ndvi', None))
        else:
            time_series_data['period'].append(period_info['period'])
            time_series_data['date'].append(period_info['center_date'])
            time_series_data['VV'].append(None)
            time_series_data['VH'].append(None)
            time_series_data['NDVI'].append(None)
            
    except Exception as e:
        print(f"   Error at period {period_info['period']}: {e}")

# Create time series plot
fig, axes = plt.subplots(3, 1, figsize=(14, 10))

# Plot VV
ax = axes[0]
dates = time_series_data['date']
vv_vals = [v if v is not None else np.nan for v in time_series_data['VV']]
ax.plot(dates, vv_vals, 'o-', color='blue', linewidth=2, markersize=6)
ax.set_ylabel('VV Backscatter (dB)', fontsize=11)
ax.set_title(f'Sentinel-1/2 Time Series at {test_location["name"]}\n12-Day Composites (Nov 2024 - Oct 2025)', 
             fontsize=13, fontweight='bold')
ax.grid(True, alpha=0.3)
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)

# Plot VH
ax = axes[1]
vh_vals = [v if v is not None else np.nan for v in time_series_data['VH']]
ax.plot(dates, vh_vals, 'o-', color='green', linewidth=2, markersize=6)
ax.set_ylabel('VH Backscatter (dB)', fontsize=11)
ax.grid(True, alpha=0.3)
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)

# Plot NDVI
ax = axes[2]
ndvi_vals = [v if v is not None else np.nan for v in time_series_data['NDVI']]
ax.plot(dates, ndvi_vals, 'o-', color='darkgreen', linewidth=2, markersize=6)
ax.set_ylabel('NDVI', fontsize=11)
ax.set_xlabel('Date', fontsize=11)
ax.grid(True, alpha=0.3)
ax.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5, label='Moderate vegetation')
ax.legend()

# Highlight agricultural seasons
from matplotlib.dates import DateFormatter, MonthLocator
for ax in axes:
    ax.xaxis.set_major_locator(MonthLocator())
    ax.xaxis.set_major_formatter(DateFormatter('%b\n%Y'))
    
    # Season 1: Nov-Mar
    ax.axvspan(datetime(2024, 11, 1), datetime(2025, 3, 31), alpha=0.1, color='green', label='Season 1')
    # Season 2: Apr-Jun
    ax.axvspan(datetime(2025, 4, 1), datetime(2025, 6, 30), alpha=0.1, color='blue', label='Season 2')
    # Season 3: Jul-Sep
    ax.axvspan(datetime(2025, 7, 1), datetime(2025, 9, 30), alpha=0.1, color='orange', label='Season 3')

plt.tight_layout()
plt.savefig(os.path.join(OUTPUT_DIR, 'time_series_preview.png'), dpi=150, bbox_inches='tight')
plt.show()

# Summary statistics
print(f"\nüìä Time Series Summary:")
print(f"   Valid VV points: {sum(1 for v in vv_vals if not np.isnan(v))}/{len(vv_vals)}")
print(f"   Valid VH points: {sum(1 for v in vh_vals if not np.isnan(v))}/{len(vh_vals)}")
print(f"   Valid NDVI points: {sum(1 for v in ndvi_vals if not np.isnan(v))}/{len(ndvi_vals)}")

vv_valid = [v for v in vv_vals if not np.isnan(v)]
vh_valid = [v for v in vh_vals if not np.isnan(v)]
ndvi_valid = [v for v in ndvi_vals if not np.isnan(v)]

if len(vv_valid) > 0:
    print(f"\n   VV range: {min(vv_valid):.2f} to {max(vv_valid):.2f} dB")
if len(vh_valid) > 0:
    print(f"   VH range: {min(vh_valid):.2f} to {max(vh_valid):.2f} dB")
if len(ndvi_valid) > 0:
    print(f"   NDVI range: {min(ndvi_valid):.3f} to {max(ndvi_valid):.3f}")

print(f"\nüíæ Time series plot saved to: {os.path.join(OUTPUT_DIR, 'time_series_preview.png')}")

print("\n‚úÖ PREVIEW COMPLETE!")
print("="*70)
print("\nüí° Next Steps:")
print("   1. Review the interactive map above")
print("   2. Check the time series plot for data continuity")
print("   3. Verify pixel values are reasonable")
print("   4. If everything looks good, proceed to Section 6 (Export)")
print("   5. If you see missing data, check the export function for mask application")
print("="*70)

## 5c. Coverage Analysis - Check if 12 Days is Enough

**IMPORTANT**: Sentinel-1 has a 12-day repeat cycle, but Sentinel-2 has a 5-day repeat cycle (with 2 satellites). However, cloud cover can significantly reduce effective coverage. Let's check if 12-day composites provide full spatial coverage.

In [None]:
# ============================================================================
# SPATIAL COVERAGE ANALYSIS
# ============================================================================

print("="*70)
print("üåç SPATIAL COVERAGE ANALYSIS")
print("="*70)

print("\nüìä Analyzing coverage for first few periods...")
print("   This checks what percentage of Java Island has valid data\n")

# Analyze first 3 periods
coverage_results = []

for idx in range(min(3, len(processed_images))):
    image = processed_images[idx]
    period_info = successful_periods[idx]
    period_num = period_info['period']
    
    print(f"\n{'='*70}")
    print(f"Period {period_num}: {period_info['start_str']} to {period_info['end_str']}")
    print(f"{'='*70}")
    
    # Count valid pixels for each band
    for band_name in ['VV', 'VH', 'S2ndvi']:
        band = image.select(band_name)
        
        # Create a binary mask (1 = has data, 0 = no data)
        valid_mask = band.mask()
        
        # Calculate statistics over the study area
        stats = valid_mask.reduceRegion(
            reducer=ee.Reducer.sum().combine(
                reducer2=ee.Reducer.count(),
                sharedInputs=True
            ),
            geometry=study_area,
            scale=SCALE,
            maxPixels=1e10
        ).getInfo()
        
        valid_pixels = stats.get(f'{band_name}_sum', 0)
        total_pixels = stats.get(f'{band_name}_count', 1)
        coverage_pct = (valid_pixels / total_pixels * 100) if total_pixels > 0 else 0
        
        print(f"   {band_name:8s}: {coverage_pct:5.1f}% coverage ({int(valid_pixels):,} / {int(total_pixels):,} pixels)")
        
        coverage_results.append({
            'period': period_num,
            'band': band_name,
            'coverage_pct': coverage_pct,
            'valid_pixels': valid_pixels,
            'total_pixels': total_pixels
        })

# Summary
print(f"\n{'='*70}")
print("üìà COVERAGE SUMMARY")
print(f"{'='*70}")

# Group by band
for band_name in ['VV', 'VH', 'S2ndvi']:
    band_coverage = [r for r in coverage_results if r['band'] == band_name]
    avg_coverage = np.mean([r['coverage_pct'] for r in band_coverage])
    min_coverage = np.min([r['coverage_pct'] for r in band_coverage])
    max_coverage = np.max([r['coverage_pct'] for r in band_coverage])
    
    print(f"\n{band_name:8s}:")
    print(f"   Average coverage: {avg_coverage:5.1f}%")
    print(f"   Range: {min_coverage:5.1f}% - {max_coverage:5.1f}%")

# Interpretation
print(f"\n{'='*70}")
print("üí° INTERPRETATION")
print(f"{'='*70}")

s2_coverage = [r['coverage_pct'] for r in coverage_results if r['band'] == 'S2ndvi']
s1_coverage = [r['coverage_pct'] for r in coverage_results if r['band'] in ['VV', 'VH']]

avg_s2 = np.mean(s2_coverage) if s2_coverage else 0
avg_s1 = np.mean(s1_coverage) if s1_coverage else 0

print(f"\nSentinel-1 (VV/VH) average: {avg_s1:.1f}%")
print(f"Sentinel-2 (NDVI) average:  {avg_s2:.1f}%")

if avg_s2 < 80:
    print(f"\n‚ö†Ô∏è  WARNING: S2 coverage is low ({avg_s2:.1f}%)")
    print("   Possible reasons:")
    print("   ‚Ä¢ 12 days too short for cloud-free S2 coverage")
    print("   ‚Ä¢ High cloud cover in tropical Indonesia")
    print("   ‚Ä¢ Rainy season (Nov-Mar)")
    print("\nüí° RECOMMENDATIONS:")
    print("   1. Increase composite period to 16-30 days for better S2 coverage")
    print("   2. Use longer periods during rainy season (Nov-Mar)")
    print("   3. Rely more on S1 data (radar, cloud-penetrating)")
    print("   4. Adjust MAX_CLOUD_COVER threshold (currently 20%)")
elif avg_s2 < 95:
    print(f"\n‚ö° S2 coverage is moderate ({avg_s2:.1f}%)")
    print("   ‚Ä¢ Should work for MOGPR fusion (fills gaps)")
    print("   ‚Ä¢ Consider 16-day periods for more consistent coverage")
else:
    print(f"\n‚úÖ S2 coverage is excellent ({avg_s2:.1f}%)")
    print("   ‚Ä¢ 12-day periods work well for this time period")

if avg_s1 < 90:
    print(f"\n‚ö†Ô∏è  WARNING: S1 coverage is low ({avg_s1:.1f}%)")
    print("   This is unusual for Sentinel-1 (radar, all-weather)")
    print("   ‚Ä¢ Check if data availability issue")
    print("   ‚Ä¢ Verify study area geometry")
else:
    print(f"\n‚úÖ S1 coverage is good ({avg_s1:.1f}%)")
    print("   ‚Ä¢ Sentinel-1 provides reliable all-weather coverage")

print(f"\n{'='*70}")

## 5d. ALTERNATIVE: Test Different Composite Periods

If 12-day coverage is insufficient, let's test what period length gives ~95%+ coverage:

In [None]:
# ============================================================================
# TEST DIFFERENT COMPOSITE PERIOD LENGTHS
# ============================================================================

print("="*70)
print("üß™ TESTING DIFFERENT COMPOSITE PERIOD LENGTHS")
print("="*70)

# Test different period lengths
test_periods = [12, 16, 20, 24, 30]  # days

print("\nüìä Testing coverage for different composite periods...")
print("   (Testing on first period: Nov 1-X, 2024)\n")

coverage_by_period = {}

for days in test_periods:
    print(f"\n{'='*70}")
    print(f"Testing {days}-day composite: Nov 1 - Nov {days}, 2024")
    print(f"{'='*70}")
    
    test_start = '2024-11-01'
    test_end = (datetime(2024, 11, 1) + timedelta(days=days-1)).strftime('%Y-%m-%d')
    
    # Load data for this period
    s1_test = load_sentinel1_data(study_area, test_start, test_end)
    s2_test = load_sentinel2_data(study_area, test_start, test_end, MAX_CLOUD_COVER)
    
    s1_count = s1_test.size().getInfo()
    s2_count = s2_test.size().getInfo()
    
    print(f"   S1 images found: {s1_count}")
    print(f"   S2 images found: {s2_count}")
    
    if s1_count > 0:
        s1_composite = create_composite(s1_test, 'median')
    else:
        s1_composite = ee.Image.constant([0, 0]).rename(['VV', 'VH']).updateMask(ee.Image.constant(0))
    
    if s2_count > 0:
        s2_composite = create_composite(s2_test, 'median')
    else:
        s2_composite = ee.Image.constant(0).rename('NDVI').updateMask(ee.Image.constant(0))
    
    test_image = s1_composite.addBands(s2_composite.rename('S2ndvi'))
    
    # Calculate coverage
    coverage_by_period[days] = {}
    
    for band_name in ['VV', 'VH', 'S2ndvi']:
        band = test_image.select(band_name)
        valid_mask = band.mask()
        
        stats = valid_mask.reduceRegion(
            reducer=ee.Reducer.sum().combine(
                reducer2=ee.Reducer.count(),
                sharedInputs=True
            ),
            geometry=study_area,
            scale=SCALE,
            maxPixels=1e10
        ).getInfo()
        
        valid_pixels = stats.get(f'{band_name}_sum', 0)
        total_pixels = stats.get(f'{band_name}_count', 1)
        coverage_pct = (valid_pixels / total_pixels * 100) if total_pixels > 0 else 0
        
        coverage_by_period[days][band_name] = coverage_pct
        print(f"   {band_name:8s}: {coverage_pct:5.1f}% coverage")

# Summary comparison
print(f"\n{'='*70}")
print("üìä COVERAGE COMPARISON")
print(f"{'='*70}\n")

print(f"{'Period':>8s} {'S1 VV':>8s} {'S1 VH':>8s} {'S2 NDVI':>10s} {'Avg S1':>8s} {'Recommendation':>20s}")
print("-" * 70)

for days in test_periods:
    vv_cov = coverage_by_period[days]['VV']
    vh_cov = coverage_by_period[days]['VH']
    s2_cov = coverage_by_period[days]['S2ndvi']
    avg_s1 = (vv_cov + vh_cov) / 2
    
    if s2_cov >= 95:
        rec = "‚úÖ Excellent"
    elif s2_cov >= 85:
        rec = "‚ö° Good"
    elif s2_cov >= 70:
        rec = "‚ö†Ô∏è  Moderate"
    else:
        rec = "‚ùå Poor"
    
    print(f"{days:>8d} {vv_cov:>7.1f}% {vh_cov:>7.1f}% {s2_cov:>9.1f}% {avg_s1:>7.1f}% {rec:>20s}")

# Recommendation
print(f"\n{'='*70}")
print("üí° RECOMMENDATION")
print(f"{'='*70}\n")

# Find optimal period length
s2_coverages = [(days, coverage_by_period[days]['S2ndvi']) for days in test_periods]
optimal = max(s2_coverages, key=lambda x: x[1])

print(f"Based on coverage analysis:")
print(f"   ‚Ä¢ Current setting: 12-day composites")
print(f"   ‚Ä¢ Best coverage: {optimal[0]}-day composites ({optimal[1]:.1f}% S2 coverage)")

if optimal[1] < 85:
    print(f"\n‚ö†Ô∏è  Even {optimal[0]} days gives <85% S2 coverage")
    print("   This is expected for tropical Indonesia (frequent clouds)")
    print("\n   Options:")
    print("   1. Use monthly composites (30 days) for reliable coverage")
    print("   2. Accept gaps - MOGPR fusion designed to handle this")
    print("   3. Increase MAX_CLOUD_COVER threshold (currently 20%)")
    print("   4. Rely more on S1 data (all-weather)")
else:
    print(f"\n‚úÖ Recommended period length: {optimal[0]} days")
    
    if optimal[0] != 12:
        print(f"\n   To use {optimal[0]}-day periods:")
        print(f"   1. Go back to Section 3")
        print(f"   2. Modify generate_12day_periods() function")
        print(f"   3. Change period_num * 12 to period_num * {optimal[0]}")
        print(f"   4. Adjust total number of periods for the year")
        print(f"   5. Re-run from Section 3 onwards")

print(f"\n{'='*70}")
print("üîÑ Or continue with 12-day periods and let MOGPR handle gaps")
print(f"{'='*70}")

## 5e. CRITICAL: Diagnose S2 Coverage Problem

‚ö†Ô∏è **If S2 coverage is < 5% even with 30-day periods, there's a fundamental issue!**

This section will diagnose why Sentinel-2 data is not appearing in the composites.

In [None]:
# ============================================================================
# DIAGNOSE S2 COVERAGE PROBLEM
# ============================================================================

print("="*70)
print("üîç DIAGNOSING SENTINEL-2 COVERAGE ISSUE")
print("="*70)

print("\n‚ö†Ô∏è  Your results show S2 NDVI = 0.1% coverage even with 30 days!")
print("   This is NOT normal. Let's investigate...\n")

test_start = '2024-11-01'
test_end = '2024-11-30'

print(f"Testing period: {test_start} to {test_end} (30 days)")
print(f"Study area: {STUDY_AREA_TYPE.upper()}\n")

# Step 1: Check raw S2 data availability (before cloud masking)
print("="*70)
print("STEP 1: Check raw Sentinel-2 data availability")
print("="*70)

s2_raw = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
          .filterBounds(study_area)
          .filterDate(test_start, test_end))

s2_count_raw = s2_raw.size().getInfo()
print(f"\n‚úÖ Raw S2 images found (no filters): {s2_count_raw}")

if s2_count_raw == 0:
    print("‚ùå PROBLEM: No Sentinel-2 images found for this area/period!")
    print("   Possible causes:")
    print("   ‚Ä¢ Study area outside S2 coverage")
    print("   ‚Ä¢ Date range has no S2 data")
    print("   ‚Ä¢ GEE data availability issue")
    print("\nüí° Try a different date range or check study area bounds")
else:
    # Get sample image info
    sample_image = s2_raw.first()
    sample_info = sample_image.getInfo()
    print(f"   First image date: {sample_info['properties'].get('system:index', 'unknown')}")
    print(f"   Cloud cover: {sample_info['properties'].get('CLOUDY_PIXEL_PERCENTAGE', 'unknown')}%")

# Step 2: Check after cloud cover filtering
print("\n" + "="*70)
print("STEP 2: Check after cloud cover filtering")
print("="*70)

s2_cloud_filtered = s2_raw.filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', MAX_CLOUD_COVER))
s2_count_cloud = s2_cloud_filtered.size().getInfo()

print(f"\n   Max cloud cover threshold: {MAX_CLOUD_COVER}%")
print(f"   Images after cloud filtering: {s2_count_cloud}")
print(f"   Images removed by cloud filter: {s2_count_raw - s2_count_cloud}")

if s2_count_cloud == 0 and s2_count_raw > 0:
    print("\n‚ùå PROBLEM: All images filtered out due to cloud cover!")
    print("   Your MAX_CLOUD_COVER = 20% is too strict for tropical Indonesia")
    print("\nüí° SOLUTIONS:")
    print("   1. Increase MAX_CLOUD_COVER to 50-80% (recommended for Indonesia)")
    print("   2. Use longer composite periods (30+ days)")
    print("   3. Accept that cloud masking will remove cloudy pixels")

# Step 3: Check after cloud masking (pixel-level)
print("\n" + "="*70)
print("STEP 3: Check pixel-level cloud masking effect")
print("="*70)

if s2_count_cloud > 0:
    # Test with and without cloud masking
    s2_no_mask = s2_cloud_filtered.map(lambda img: img.normalizedDifference(['B8', 'B4']).rename('NDVI'))
    s2_with_mask = s2_cloud_filtered.map(lambda img: img.normalizedDifference(['B8', 'B4']).rename('NDVI')).map(
        lambda img: img.updateMask(img.select('B8').mask())
    )
    
    # Actually, let's check the SCL masking function
    def mask_clouds_test(image):
        scl = image.select('SCL')
        # Keep vegetation, soil, water, snow classes (4,5,6,11)
        good_pixels = scl.eq(4).Or(scl.eq(5)).Or(scl.eq(6)).Or(scl.eq(11))
        return image.updateMask(good_pixels)
    
    s2_scl_masked = s2_cloud_filtered.map(mask_clouds_test).map(
        lambda img: img.normalizedDifference(['B8', 'B4']).rename('NDVI')
    )
    
    # Create composites
    composite_no_mask = s2_no_mask.median()
    composite_scl_mask = s2_scl_masked.median()
    
    # Check coverage
    for name, composite in [('Without pixel masking', composite_no_mask), 
                             ('With SCL cloud masking', composite_scl_mask)]:
        valid_mask = composite.mask()
        stats = valid_mask.reduceRegion(
            reducer=ee.Reducer.sum().combine(reducer2=ee.Reducer.count(), sharedInputs=True),
            geometry=study_area,
            scale=SCALE,
            maxPixels=1e10
        ).getInfo()
        
        valid_pixels = stats.get('NDVI_sum', 0)
        total_pixels = stats.get('NDVI_count', 1)
        coverage_pct = (valid_pixels / total_pixels * 100) if total_pixels > 0 else 0
        
        print(f"\n   {name}:")
        print(f"   Coverage: {coverage_pct:.1f}%")
        print(f"   Valid pixels: {int(valid_pixels):,} / {int(total_pixels):,}")
    
    if coverage_pct < 10:
        print("\n‚ùå CRITICAL: SCL cloud masking removes almost everything!")
        print("   The SCL band is too aggressive for this area/period")
        print("\nüí° SOLUTIONS:")
        print("   1. Remove SCL masking (use QA60 band instead)")
        print("   2. Use less strict SCL classes (add classes 7,8,9,10)")
        print("   3. Increase composite period to 60+ days")
        print("   4. Accept lower quality data in exchange for coverage")

# Step 4: Check if it's a date range issue
print("\n" + "="*70)
print("STEP 4: Check historical S2 data availability")
print("="*70)

print("\nChecking S2 availability for different months in 2024-2025:")

test_months = [
    ('2024-11-01', '2024-11-30', 'Nov 2024 (rainy season)'),
    ('2025-01-01', '2025-01-31', 'Jan 2025 (rainy season)'),
    ('2025-04-01', '2025-04-30', 'Apr 2025 (dry season)'),
    ('2025-07-01', '2025-07-31', 'Jul 2025 (dry season)'),
]

print(f"\n{'Period':<30s} {'Raw Images':>12s} {'<{MAX_CLOUD_COVER}% cloud':>15s}")
print("-" * 70)

for start, end, label in test_months:
    s2_test = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
               .filterBounds(study_area)
               .filterDate(start, end))
    
    count_raw = s2_test.size().getInfo()
    count_filtered = s2_test.filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', MAX_CLOUD_COVER)).size().getInfo()
    
    print(f"{label:<30s} {count_raw:>12d} {count_filtered:>15d}")

print("\nüí° If dry season months have more images, consider:")
print("   ‚Ä¢ Using seasonal composite periods (longer in rainy season)")
print("   ‚Ä¢ Accepting higher cloud cover in rainy months")

# FINAL RECOMMENDATION
print("\n" + "="*70)
print("üéØ RECOMMENDED FIXES")
print("="*70)

print("\nBased on your 0.1% S2 coverage, the issue is likely:")
print("\n1. ‚ùå SCL cloud masking is TOO AGGRESSIVE")
print("   Current code removes pixels with clouds/shadows/cirrus")
print("   For Indonesia, this removes ~99% of pixels!")

print("\n2. üí° IMMEDIATE FIX - Modify Section 4:")
print("   Change the mask_clouds() function to be less aggressive:")
print("""
   def mask_clouds(image):
       # Option A: Don't use SCL masking at all
       return image
       
       # Option B: Use QA60 band instead (less aggressive)
       qa = image.select('QA60')
       cloudBitMask = 1 << 10
       cirrusBitMask = 1 << 11
       mask = qa.bitwiseAnd(cloudBitMask).eq(0).And(
              qa.bitwiseAnd(cirrusBitMask).eq(0))
       return image.updateMask(mask)
   """)

print("\n3. üí° ALTERNATIVE FIX:")
print("   Increase MAX_CLOUD_COVER from 20% to 60-80%")
print("   Then let cloud masking remove only the worst pixels")

print("\n4. ‚úÖ FOR TROPICAL AREAS:")
print("   ‚Ä¢ Use 30-60 day composites")
print("   ‚Ä¢ MAX_CLOUD_COVER = 60-80%")
print("   ‚Ä¢ Less aggressive cloud masking")
print("   ‚Ä¢ Rely more on S1 data (cloud-penetrating)")

print("\n" + "="*70)

## 5f. QUICK FIX: Update Cloud Masking Parameters

Based on the diagnosis above, apply one of these fixes and re-run from Section 4:

In [None]:
# ============================================================================
# APPLY QUICK FIX FOR S2 COVERAGE
# ============================================================================

print("="*70)
print("üîß APPLYING FIX FOR LOW S2 COVERAGE")
print("="*70)

# Choose your fix approach:
FIX_APPROACH = 'relaxed_scl'  # Options: 'no_masking', 'qa60_masking', 'relaxed_scl', 'increase_threshold'

print(f"\nSelected approach: {FIX_APPROACH}\n")

if FIX_APPROACH == 'no_masking':
    # ========================================================================
    # OPTION 1: Remove cloud masking entirely (fastest coverage)
    # ========================================================================
    print("‚úÖ Option 1: NO CLOUD MASKING")
    print("   ‚Ä¢ Fastest coverage (95-100%)")
    print("   ‚Ä¢ May include some cloudy pixels")
    print("   ‚Ä¢ Good for MOGPR (it handles outliers)")
    
    def load_sentinel2_data_FIXED(geometry, start_date, end_date, max_cloud_cover=20):
        def calculate_ndvi(image):
            ndvi = image.normalizedDifference(['B8', 'B4']).rename('NDVI')
            return image.addBands(ndvi)
        
        s2_collection = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
                        .filterBounds(geometry)
                        .filterDate(start_date, end_date)
                        .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', max_cloud_cover))
                        .map(calculate_ndvi)
                        .select(['NDVI']))
        
        return s2_collection
    
elif FIX_APPROACH == 'qa60_masking':
    # ========================================================================
    # OPTION 2: Use QA60 band instead of SCL (less aggressive)
    # ========================================================================
    print("‚úÖ Option 2: QA60 CLOUD MASKING")
    print("   ‚Ä¢ Less aggressive than SCL")
    print("   ‚Ä¢ Masks only opaque clouds and cirrus")
    print("   ‚Ä¢ Better coverage for tropical areas")
    
    def load_sentinel2_data_FIXED(geometry, start_date, end_date, max_cloud_cover=20):
        def calculate_ndvi(image):
            ndvi = image.normalizedDifference(['B8', 'B4']).rename('NDVI')
            return image.addBands(ndvi)
        
        def mask_clouds_qa60(image):
            qa = image.select('QA60')
            # Bits 10 and 11 are clouds and cirrus
            cloudBitMask = 1 << 10
            cirrusBitMask = 1 << 11
            # Both flags should be set to zero (clear)
            mask = qa.bitwiseAnd(cloudBitMask).eq(0).And(
                   qa.bitwiseAnd(cirrusBitMask).eq(0))
            return image.updateMask(mask)
        
        s2_collection = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
                        .filterBounds(geometry)
                        .filterDate(start_date, end_date)
                        .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', max_cloud_cover))
                        .map(mask_clouds_qa60)
                        .map(calculate_ndvi)
                        .select(['NDVI']))
        
        return s2_collection
    
elif FIX_APPROACH == 'relaxed_scl':
    # ========================================================================
    # OPTION 3: Relaxed SCL masking (keep more pixels)
    # ========================================================================
    print("‚úÖ Option 3: RELAXED SCL MASKING")
    print("   ‚Ä¢ Keeps more pixel classes than original")
    print("   ‚Ä¢ Includes some cloud shadows and dark pixels")
    print("   ‚Ä¢ Better balance for Indonesia")
    
    def load_sentinel2_data_FIXED(geometry, start_date, end_date, max_cloud_cover=20):
        def calculate_ndvi(image):
            ndvi = image.normalizedDifference(['B8', 'B4']).rename('NDVI')
            return image.addBands(ndvi)
        
        def mask_clouds_relaxed(image):
            scl = image.select('SCL')
            # Keep more classes: 4,5,6,7,11 (vegetation, soil, water, dark pixels, snow)
            # Original only kept 4,5,6,11
            good_pixels = (scl.eq(4).Or(scl.eq(5)).Or(scl.eq(6))
                          .Or(scl.eq(7)).Or(scl.eq(11)))
            return image.updateMask(good_pixels)
        
        s2_collection = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
                        .filterBounds(geometry)
                        .filterDate(start_date, end_date)
                        .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', max_cloud_cover))
                        .map(mask_clouds_relaxed)
                        .map(calculate_ndvi)
                        .select(['NDVI']))
        
        return s2_collection

elif FIX_APPROACH == 'increase_threshold':
    # ========================================================================
    # OPTION 4: Increase cloud cover threshold + original masking
    # ========================================================================
    print("‚úÖ Option 4: INCREASED CLOUD THRESHOLD")
    print("   ‚Ä¢ MAX_CLOUD_COVER increased to 60%")
    print("   ‚Ä¢ More images available for compositing")
    print("   ‚Ä¢ SCL masking removes cloudy pixels")
    
    # Update global variable
    MAX_CLOUD_COVER = 60
    
    def load_sentinel2_data_FIXED(geometry, start_date, end_date, max_cloud_cover=60):
        def calculate_ndvi(image):
            ndvi = image.normalizedDifference(['B8', 'B4']).rename('NDVI')
            return image.addBands(ndvi)
        
        def mask_clouds(image):
            scl = image.select('SCL')
            good_pixels = scl.eq(4).Or(scl.eq(5)).Or(scl.eq(6)).Or(scl.eq(11))
            return image.updateMask(good_pixels)
        
        s2_collection = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
                        .filterBounds(geometry)
                        .filterDate(start_date, end_date)
                        .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', max_cloud_cover))
                        .map(mask_clouds)
                        .map(calculate_ndvi)
                        .select(['NDVI']))
        
        return s2_collection

# Test the fix
print("\n" + "="*70)
print("üß™ Testing the fix with 30-day period...")
print("="*70)

test_start = '2024-11-01'
test_end = '2024-11-30'

s2_fixed = load_sentinel2_data_FIXED(study_area, test_start, test_end, MAX_CLOUD_COVER)
s2_count_fixed = s2_fixed.size().getInfo()

print(f"\nS2 images found: {s2_count_fixed}")

if s2_count_fixed > 0:
    s2_composite_fixed = create_composite(s2_fixed, 'median')
    
    # Check coverage
    valid_mask = s2_composite_fixed.mask()
    stats = valid_mask.reduceRegion(
        reducer=ee.Reducer.sum().combine(reducer2=ee.Reducer.count(), sharedInputs=True),
        geometry=study_area,
        scale=SCALE,
        maxPixels=1e10
    ).getInfo()
    
    valid_pixels = stats.get('NDVI_sum', 0)
    total_pixels = stats.get('NDVI_count', 1)
    coverage_pct = (valid_pixels / total_pixels * 100) if total_pixels > 0 else 0
    
    print(f"\n‚úÖ FIXED S2 NDVI Coverage: {coverage_pct:.1f}%")
    print(f"   Valid pixels: {int(valid_pixels):,} / {int(total_pixels):,}")
    
    if coverage_pct > 80:
        print(f"\nüéâ SUCCESS! Coverage improved from 0.1% to {coverage_pct:.1f}%")
        print("\nüìã NEXT STEPS:")
        print("   1. The load_sentinel2_data_FIXED() function is now defined")
        print("   2. Go back to Section 4 (Data Loading Functions)")
        print("   3. Replace load_sentinel2_data() with load_sentinel2_data_FIXED()")
        print("   4. Re-run Section 5 (Process Data)")
        print("   5. Export with full S2 coverage!")
    elif coverage_pct > 50:
        print(f"\n‚ö° IMPROVED! Coverage increased from 0.1% to {coverage_pct:.1f}%")
        print("   Consider trying a different approach for even better coverage")
    else:
        print(f"\n‚ö†Ô∏è  Still low coverage ({coverage_pct:.1f}%)")
        print("   Try a different FIX_APPROACH")
else:
    print("\n‚ùå No S2 images found")
    print("   Try FIX_APPROACH = 'increase_threshold'")

print("\n" + "="*70)

## 5g. ADVANCED FIX: Use S2 Cloud Probability

Instead of SCL masking, use Sentinel-2 Cloud Probability for more nuanced cloud detection:

In [None]:
# ============================================================================
# ADVANCED: CLOUD PROBABILITY MASKING (Most flexible approach)
# ============================================================================

print("="*70)
print("üå•Ô∏è  ADVANCED CLOUD MASKING: Using S2 Cloud Probability")
print("="*70)

print("\nüí° This approach uses a dedicated cloud probability dataset")
print("   that gives you fine control over cloud masking threshold\n")

# Cloud probability threshold (0-100%)
CLOUD_PROBABILITY_THRESHOLD = 50  # Adjust this: lower = stricter, higher = more data

print(f"Cloud probability threshold: {CLOUD_PROBABILITY_THRESHOLD}%")
print("   ‚Ä¢ 50% = balanced (recommended for Indonesia)")
print("   ‚Ä¢ 30% = strict (clearer data, less coverage)")
print("   ‚Ä¢ 70% = relaxed (more coverage, some clouds)")

def load_sentinel2_data_CLOUD_PROB(geometry, start_date, end_date, max_cloud_cover=60):
    """
    Load Sentinel-2 data using cloud probability masking
    More flexible than SCL-based masking
    """
    
    def calculate_ndvi(image):
        ndvi = image.normalizedDifference(['B8', 'B4']).rename('NDVI')
        return image.addBands(ndvi)
    
    def mask_clouds_with_probability(image):
        # Join S2 image with its cloud probability
        # Use a filter to find matching cloud probability image
        cloud_prob = (ee.ImageCollection('COPERNICUS/S2_CLOUD_PROBABILITY')
                     .filterBounds(image.geometry())
                     .filterDate(image.date(), image.date().advance(1, 'day'))
                     .first())
        
        # Get cloud probability band
        cloud = cloud_prob.select('probability')
        
        # Mask pixels with cloud probability > threshold
        is_not_cloud = cloud.lt(CLOUD_PROBABILITY_THRESHOLD)
        
        # Also mask cloud shadows using simple approach
        # Shadows are typically dark in NIR
        is_not_shadow = image.select('B8').gt(1000)  # NIR > 1000
        
        # Combine masks
        final_mask = is_not_cloud.And(is_not_shadow)
        
        return image.updateMask(final_mask)
    
    # Load S2 data
    s2_collection = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
                    .filterBounds(geometry)
                    .filterDate(start_date, end_date)
                    .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', max_cloud_cover))
                    .map(mask_clouds_with_probability)
                    .map(calculate_ndvi)
                    .select(['NDVI']))
    
    return s2_collection

# Test cloud probability masking
print("\n" + "="*70)
print("üß™ Testing Cloud Probability Masking (30-day period)")
print("="*70)

test_start = '2024-11-01'
test_end = '2024-11-30'

try:
    s2_cloud_prob = load_sentinel2_data_CLOUD_PROB(
        study_area, 
        test_start, 
        test_end, 
        max_cloud_cover=60
    )
    
    s2_count_prob = s2_cloud_prob.size().getInfo()
    print(f"\nS2 images found: {s2_count_prob}")
    
    if s2_count_prob > 0:
        s2_composite_prob = create_composite(s2_cloud_prob, 'median')
        
        # Check coverage
        valid_mask = s2_composite_prob.mask()
        stats = valid_mask.reduceRegion(
            reducer=ee.Reducer.sum().combine(
                reducer2=ee.Reducer.count(), 
                sharedInputs=True
            ),
            geometry=study_area,
            scale=SCALE,
            maxPixels=1e10
        ).getInfo()
        
        valid_pixels = stats.get('NDVI_sum', 0)
        total_pixels = stats.get('NDVI_count', 1)
        coverage_pct = (valid_pixels / total_pixels * 100) if total_pixels > 0 else 0
        
        print(f"\n‚úÖ Cloud Probability S2 Coverage: {coverage_pct:.1f}%")
        print(f"   Valid pixels: {int(valid_pixels):,} / {int(total_pixels):,}")
        
        if coverage_pct > 80:
            print(f"\nüéâ EXCELLENT! Cloud probability masking gives {coverage_pct:.1f}% coverage")
            print("\nüí° You can fine-tune by adjusting:")
            print(f"   ‚Ä¢ CLOUD_PROBABILITY_THRESHOLD (currently {CLOUD_PROBABILITY_THRESHOLD}%)")
            print(f"   ‚Ä¢ Lower threshold = stricter masking = less coverage")
            print(f"   ‚Ä¢ Higher threshold = relaxed masking = more coverage")
        elif coverage_pct > 60:
            print(f"\n‚ö° GOOD! Coverage is {coverage_pct:.1f}%")
            print(f"   Try increasing CLOUD_PROBABILITY_THRESHOLD to {CLOUD_PROBABILITY_THRESHOLD + 10}% for more coverage")
        else:
            print(f"\n‚ö†Ô∏è  Coverage still low ({coverage_pct:.1f}%)")
            print("   Try increasing CLOUD_PROBABILITY_THRESHOLD or use 'no_masking' approach")
            
    else:
        print("\n‚ùå No S2 images found after filtering")
        print("   Try increasing max_cloud_cover parameter")
        
except Exception as e:
    print(f"\n‚ùå Error testing cloud probability masking: {e}")
    print("   The cloud probability collection might not have data for all S2 images")
    print("   Fall back to QA60 or relaxed SCL masking")

# Comparison table
print("\n" + "="*70)
print("üìä CLOUD MASKING APPROACHES COMPARISON")
print("="*70)

print(f"""
{'Approach':<25s} {'Complexity':>12s} {'Coverage':>12s} {'Quality':>12s}
{'-'*70}
{'No masking':<25s} {'Simple':>12s} {'~95-100%':>12s} {'Lower':>12s}
{'QA60 bands':<25s} {'Simple':>12s} {'~70-90%':>12s} {'Good':>12s}
{'Relaxed SCL':<25s} {'Simple':>12s} {'~60-80%':>12s} {'Good':>12s}
{'Cloud Probability':<25s} {'Advanced':>12s} {'~70-95%':>12s} {'Best':>12s}
{'Original SCL':<25s} {'Simple':>12s} {'~0.1%':>12s} {'Unusable':>12s}
""")

print("\nüí° RECOMMENDATIONS:")
print("   For Kabupaten Demak (tropical, agricultural):")
print("   1. Best: Cloud Probability (threshold=50-60%)")
print("   2. Good: QA60 masking")
print("   3. Fast: No masking (let MOGPR handle outliers)")
print("\n   For final export, use approach that gives >70% coverage")

print("\n" + "="*70)

In [None]:
def export_timeseries_to_drive(collection, geometry, scale, output_name):
    """
    Export the time series collection to Google Drive as a multi-band image
    """
    # Convert collection to multi-band image
    # Each period becomes a separate set of bands
    image_list = collection.toList(collection.size())
    
    def rename_bands_with_period(image):
        image = ee.Image(image)
        period = ee.Number(image.get('period')).format('%02d')
        
        # Rename bands to include period number
        old_names = image.bandNames()
        new_names = old_names.map(lambda name: ee.String(name).cat('_P').cat(period))
        
        return image.rename(new_names)
    
    # Rename bands with period numbers
    renamed_collection = collection.map(rename_bands_with_period)
    
    # Convert to single multi-band image
    multi_band_image = renamed_collection.toBands()
    
    # Export task
    task = ee.batch.Export.image.toDrive(
        image=multi_band_image,
        description=output_name,
        folder='GEE_FuseTS_Data',
        fileNamePrefix=output_name,
        scale=scale,
        region=geometry,
        maxPixels=1e9,
        crs='EPSG:4326',
        fileFormat='GeoTIFF'
    )
    
    return task

def export_individual_periods_to_drive(collection, geometry, scale, base_name):
    """
    Export each period as a separate GeoTIFF file to Google Drive
    """
    tasks = []
    image_list = collection.toList(collection.size())
    
    for i in range(len(successful_periods)):
        image = ee.Image(image_list.get(i))
        period_num = successful_periods[i]['period']
        
        task = ee.batch.Export.image.toDrive(
            image=image,
            description=f'{base_name}_Period_{period_num:02d}',
            folder='GEE_FuseTS_Data',
            fileNamePrefix=f'{base_name}_Period_{period_num:02d}',
            scale=scale,
            region=geometry,
            maxPixels=1e9,
            crs='EPSG:4326',
            fileFormat='GeoTIFF'
        )
        
        tasks.append(task)
    
    return tasks

# ============================================================================
# NEW: GEE ASSETS EXPORT FUNCTIONS (Better for large datasets!)
# ============================================================================

def export_timeseries_to_asset(collection, geometry, scale, asset_id):
    """
    Export the time series collection to GEE Assets as ImageCollection
    
    Advantages over Drive export:
    - No size limits (up to 10TB per user)
    - Data stays in GEE cloud (faster processing)
    - Can be used immediately in other GEE scripts
    - Better for large study areas
    
    Parameters:
    -----------
    asset_id : str
        Full path to asset, e.g., 'projects/ee-geodeticengineeringundip/assets/S1_S2_Nov2024_Oct2025'
    """
    tasks = []
    image_list = collection.toList(collection.size())
    
    for i in range(len(successful_periods)):
        image = ee.Image(image_list.get(i))
        period_num = successful_periods[i]['period']
        period_info = successful_periods[i]
        
        # Add comprehensive metadata
        image_with_metadata = image.set({
            'period': period_num,
            'start_date': period_info['start_str'],
            'end_date': period_info['end_str'],
            'center_date': period_info['center_date'].strftime('%Y-%m-%d'),
            'doy_center': period_info['doy_center'],
            'year': period_info['year'],
            'month': period_info['month'],
            'system:time_start': ee.Date(period_info['start_str']).millis(),
            'system:time_end': ee.Date(period_info['end_str']).millis()
        })
        
        # Create asset ID for this period
        period_asset_id = f'{asset_id}_Period_{period_num:02d}'
        
        task = ee.batch.Export.image.toAsset(
            image=image_with_metadata,
            description=f'Asset_Period_{period_num:02d}',
            assetId=period_asset_id,
            scale=scale,
            region=geometry,
            maxPixels=1e13,  # Higher limit for assets
            crs='EPSG:4326',
            pyramidingPolicy={'.default': 'mean'}  # Better for time series
        )
        
        tasks.append(task)
    
    return tasks

def export_imagecollection_to_asset(collection, asset_id, geometry, scale):
    """
    Export entire ImageCollection to a single GEE Asset
    
    Note: For very large collections, individual image exports (above function) are more reliable
    """
    # This exports the collection metadata structure
    # Individual images still need to be exported separately
    print("‚ö†Ô∏è  GEE doesn't support direct ImageCollection export.")
    print("    Use export_timeseries_to_asset() to export individual images.")
    print("    They will form an ImageCollection when all are in the same folder.")
    return None

# Choose export method
EXPORT_METHOD = 'individual'  # 'combined' or 'individual'
EXPORT_DESTINATION = 'drive'  # 'drive' or 'asset' - CHANGED TO 'drive' due to asset quota limit

# Your GEE Assets path (update this to your project!)
ASSET_BASE_PATH = 'projects/ee-geodeticengineeringundip/assets/FuseTS'

print(f"\nüì§ EXPORT CONFIGURATION:")
print(f"   Destination: {EXPORT_DESTINATION.upper()}")
print(f"   Method: {EXPORT_METHOD}")
if EXPORT_DESTINATION == 'asset':
    print(f"   Asset path: {ASSET_BASE_PATH}")
print(f"\nüí° Choose export destination:")
print(f"   ‚Ä¢ 'drive': Google Drive (good for < 2GB, need to download)")
print(f"   ‚Ä¢ 'asset': GEE Assets (recommended for large data, stays in cloud)")

if time_series_collection:
    if EXPORT_DESTINATION == 'asset':
        # ====================================================================
        # EXPORT TO GEE ASSETS (Recommended for large datasets!)
        # ====================================================================
        print("\nüöÄ Exporting to GEE Assets...")
        print("   ‚úÖ No size limits (up to 10TB)")
        print("   ‚úÖ Data stays in GEE cloud")
        print("   ‚úÖ Can use immediately in other scripts")
        
        asset_id = f'{ASSET_BASE_PATH}/S1_S2_Nov2024_Oct2025'
        
        export_tasks = export_timeseries_to_asset(
            time_series_collection,
            study_area,
            SCALE,
            asset_id
        )
        
        print(f"\nüìã Starting {len(export_tasks)} asset export tasks...")
        
        # Start first 10 tasks (GEE limits concurrent tasks)
        for i, task in enumerate(export_tasks[:10]):
            task.start()
            print(f"  ‚úÖ Started: Period {i+1:02d} ‚Üí {asset_id}_Period_{i+1:02d}")
        
        if len(export_tasks) > 10:
            print(f"\n‚è≥ Remaining {len(export_tasks) - 10} tasks queued")
            print("   Start them manually from: https://code.earthengine.google.com/tasks")
            print("   Or run this code to start next batch:")
            print(f"   for task in export_tasks[10:20]: task.start()")
        
        print(f"\nüìä After exports complete, load data in GEE with:")
        print(f"   var collection = ee.ImageCollection('{ASSET_BASE_PATH}/S1_S2_Nov2024_Oct2025_Period_*');")
        
    elif EXPORT_DESTINATION == 'drive':
        # ====================================================================
        # EXPORT TO GOOGLE DRIVE (Original method)
        # ====================================================================
        if EXPORT_METHOD == 'combined':
            # Export as single multi-band file
            print("\nüì§ Preparing export as single multi-band GeoTIFF to Google Drive...")
            export_task = export_timeseries_to_drive(
                time_series_collection, 
                study_area, 
                SCALE, 
                f'S1_S2_TimeSeries_Nov2024_Oct2025'
            )
            
            print(f"Starting export task: {export_task.config['description']}")
            export_task.start()
            
            print(f"Export task submitted. Monitor progress at: https://code.earthengine.google.com/tasks")
            
        else:
            # Export individual period files
            print("\nüì§ Preparing export as individual period GeoTIFFs to Google Drive...")
            export_tasks = export_individual_periods_to_drive(
                time_series_collection,
                study_area,
                SCALE,
                f'S1_S2_Nov2024_Oct2025'
            )
            
            print(f"Starting {len(export_tasks)} export tasks...")
            for i, task in enumerate(export_tasks[:5]):  # Start first 5 tasks
                task.start()
                print(f"  Started: {task.config['description']}")
            
            if len(export_tasks) > 5:
                print(f"\nRemaining {len(export_tasks) - 5} tasks can be started manually or in batches")
                print("Monitor all tasks at: https://code.earthengine.google.com/tasks")

else:
    print("No data to export!")

## 7. Create Local Processing Function (Alternative to Export)

## 6b. Load Data from GEE Assets (For Subsequent Processing)

If you exported to GEE Assets, use this code to load the data later in GEE or download specific regions:

In [None]:
# ============================================================================
# LOAD DATA FROM GEE ASSETS
# ============================================================================

def load_asset_collection(asset_base_path, pattern='*'):
    """
    Load ImageCollection from GEE Assets
    
    Parameters:
    -----------
    asset_base_path : str
        Base path to assets folder
    pattern : str
        Pattern to match asset names (e.g., 'S1_S2_Nov2024_Oct2025_Period_*')
    
    Returns:
    --------
    ee.ImageCollection
    """
    # Load all images matching the pattern
    full_pattern = f'{asset_base_path}/{pattern}'
    
    try:
        # Try loading as collection
        collection = ee.ImageCollection(full_pattern)
        count = collection.size().getInfo()
        print(f"‚úÖ Loaded {count} images from assets")
        return collection
    except Exception as e:
        print(f"‚ùå Error loading assets: {e}")
        print(f"   Make sure assets exist at: {full_pattern}")
        print(f"   Check: https://code.earthengine.google.com/?asset={asset_base_path}")
        return None

def download_region_from_assets(collection, region_geometry, scale, output_format='GeoTIFF'):
    """
    Download a specific region from asset collection
    
    This is useful when you've exported large Java Island data but only want
    a smaller region for analysis
    """
    # Convert collection to multi-band image
    def add_period_to_bands(image):
        period = ee.Number(image.get('period')).format('%02d')
        old_names = image.bandNames()
        new_names = old_names.map(lambda name: ee.String(name).cat('_P').cat(period))
        return image.rename(new_names)
    
    renamed_collection = collection.map(add_period_to_bands)
    multi_band = renamed_collection.toBands()
    
    # Create download URL
    url = multi_band.getDownloadURL({
        'scale': scale,
        'crs': 'EPSG:4326',
        'region': region_geometry,
        'format': output_format
    })
    
    print(f"üì• Download URL generated:")
    print(f"   {url}")
    print(f"\n   Copy this URL to your browser to download")
    
    return url

# Example: Load your exported assets
if EXPORT_DESTINATION == 'asset':
    print("="*60)
    print("üìñ LOADING DATA FROM GEE ASSETS")
    print("="*60)
    
    # Wait a moment for exports to start (if just submitted)
    import time
    print("\n‚è≥ Note: Asset exports take time. Check status at:")
    print("   https://code.earthengine.google.com/tasks")
    
    # Example of how to load later (after exports complete)
    print(f"\nüí° To load your exported data later, use:")
    print(f"\n```python")
    print(f"# Load the asset collection")
    print(f"asset_pattern = '{ASSET_BASE_PATH}/S1_S2_Nov2024_Oct2025_Period_*'")
    print(f"collection = ee.ImageCollection(asset_pattern)")
    print(f"")
    print(f"# Check what was loaded")
    print(f"print(f'Loaded {{collection.size().getInfo()}} images')")
    print(f"")
    print(f"# Download a specific region (optional)")
    print(f"small_region = ee.Geometry.Rectangle([106.8, -6.3, 107.0, -6.1])  # Example: Jakarta area")
    print(f"url = download_region_from_assets(collection, small_region, scale={SCALE})")
    print(f"```")
    
    print(f"\nüîÑ Or use directly in GEE Code Editor:")
    print(f"```javascript")
    print(f"// Load the collection")
    print(f"var collection = ee.ImageCollection('{ASSET_BASE_PATH}/S1_S2_Nov2024_Oct2025_Period_*');")
    print(f"")
    print(f"// Sort by period")
    print(f"var sorted = collection.sort('period');")
    print(f"")
    print(f"// Get first image")
    print(f"var first = sorted.first();")
    print(f"print('First period bands:', first.bandNames());")
    print(f"")
    print(f"// Process further or export to Drive from here")
    print(f"```")
    
    print(f"\n‚úÖ Assets allow you to:")
    print(f"   ‚Ä¢ Process data entirely in GEE (no download needed)")
    print(f"   ‚Ä¢ Download only specific regions when needed")
    print(f"   ‚Ä¢ Share with collaborators")
    print(f"   ‚Ä¢ Use in GEE Code Editor or Python API")

elif EXPORT_DESTINATION == 'drive':
    print("\nüí° For Google Drive exports:")
    print("   1. Monitor tasks at: https://code.earthengine.google.com/tasks")
    print("   2. Download files from Google Drive")
    print("   3. Use local processing (Section 7 below) or load in MOGPR notebook")

print("\n" + "="*60)

In [None]:
def extract_timeseries_locally(collection, geometry, scale, max_pixels=1e6):
    """
    Extract time series data directly to memory for small areas
    This is faster than export/download for small study areas
    """
    print("Extracting time series data locally...")
    
    # Get the region bounds
    region = geometry.bounds()
    
    # Extract data for each period
    image_list = collection.toList(collection.size())
    
    periods_data = []
    
    for i in range(len(successful_periods)):
        print(f"Extracting period {i+1}/{len(successful_periods)}...")
        
        image = ee.Image(image_list.get(i))
        period_info = successful_periods[i]
        
        try:
            # Sample the image
            if scale * scale * 10000 < max_pixels:  # Rough estimate
                # Use geemap for efficient extraction
                data_array = geemap.ee_to_xarray(
                    image, 
                    region=region, 
                    scale=scale,
                    crs='EPSG:4326'
                )
                
                # Add period information
                data_array = data_array.assign_coords(
                    period=period_info['period'],
                    center_date=period_info['center_date'],
                    doy_center=period_info['doy_center']
                )
                
                periods_data.append(data_array)
                
            else:
                print(f"  Area too large for local extraction, use export method instead")
                break
                
        except Exception as e:
            print(f"  Error extracting period {i+1}: {e}")
            continue
    
    if periods_data:
        # Combine all periods into a single xarray Dataset
        print("Combining periods into time series...")
        
        # Concatenate along a new time dimension
        combined_data = xr.concat(periods_data, dim='time')
        
        # Create proper time coordinates
        time_coords = [p['center_date'] for p in successful_periods[:len(periods_data)]]
        combined_data = combined_data.assign_coords(time=time_coords)
        
        return combined_data
    
    return None

# Try local extraction for small areas
area_size = study_area.area().getInfo()  # in square meters
area_km2 = area_size / 1e6

print(f"Study area size: {area_km2:.2f} km¬≤")

if area_km2 < 100:  # Less than 100 km¬≤
    print("Area is small enough for local extraction. Attempting direct download...")
    
    try:
        local_data = extract_timeseries_locally(
            time_series_collection, 
            study_area, 
            SCALE, 
            max_pixels=1e6
        )
        
        if local_data is not None:
            print("Local extraction successful!")
            print(f"Data shape: {local_data.dims}")
            print(f"Variables: {list(local_data.data_vars)}")
            
            # Save locally
            output_file = os.path.join(OUTPUT_DIR, f'S1_S2_timeseries_Nov2024_Oct2025_local.nc')
            local_data.to_netcdf(output_file)
            print(f"Data saved to: {output_file}")
            
        else:
            print("Local extraction failed, use export method instead")
            
    except Exception as e:
        print(f"Local extraction error: {e}")
        print("Use export method instead")
        
else:
    print("Area is too large for local extraction. Use the export method above.")

## 8. Create Metadata and Processing Summary

In [None]:
# Create processing summary
processing_summary = {
    'processing_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
    'start_date': START_DATE,
    'end_date': END_DATE,
    'temporal_coverage': f'{START_DATE} to {END_DATE}',
    'agricultural_year': 'Nov 2024 - Oct 2025',
    'total_periods': len(periods),
    'successful_periods': len(successful_periods),
    'study_area_bounds': study_area.bounds().getInfo(),
    'spatial_resolution': f'{SCALE}m',
    'coordinate_system': CRS,
    'max_cloud_cover': MAX_CLOUD_COVER,
    'composite_method': 'median',
    'output_bands': ['VV', 'VH', 'S2ndvi'],
    'agricultural_seasons_covered': {
        'season_1': 'Nov 2024 - Mar 2025 (first planting, crosses year boundary)',
        'season_2': 'Apr - Jun 2025 (second planting, dry season)',
        'season_3': 'Jul - Sep 2025 (third planting, optional intensive)',
        'full_coverage': 'Through Oct 2025'
    }
}

# Create detailed period information
period_details = []
for period in successful_periods:
    period_details.append({
        'period': period['period'],
        'start_date': period['start_str'],
        'end_date': period['end_str'],
        'center_date': period['center_date'].strftime('%Y-%m-%d'),
        'doy_center': period['doy_center'],
        'year': period['year'],
        'month': period['month']
    })

# Save metadata
import json

metadata = {
    'summary': processing_summary,
    'periods': period_details
}

metadata_file = os.path.join(OUTPUT_DIR, f'processing_metadata_Nov2024_Oct2025.json')
with open(metadata_file, 'w') as f:
    json.dump(metadata, f, indent=2, default=str)

print("Processing Summary:")
print(f"  Temporal coverage: {START_DATE} to {END_DATE}")
print(f"  Agricultural year: Nov 2024 - Oct 2025")
print(f"  Total periods: {len(periods)}")
print(f"  Successful periods: {len(successful_periods)}")
print(f"  Spatial resolution: {SCALE}m")
print(f"  Coordinate system: {CRS}")
print(f"  Output bands: {processing_summary['output_bands']}")
print(f"\nAgricultural Seasons Covered:")
print(f"  Season 1 (Nov-Mar): First planting season (crosses 2024‚Üí2025 boundary)")
print(f"  Season 2 (Apr-Jun): Second planting season (dry season)")
print(f"  Season 3 (Jul-Sep): Third planting season (optional intensive)")
print(f"  Full coverage: Through October 2025")
print(f"\nMetadata saved to: {metadata_file}")

# Create period visualization
fig, ax = plt.subplots(figsize=(16, 7))

# Plot period timeline
period_dates = [p['center_date'] for p in successful_periods]
period_numbers = [p['period'] for p in successful_periods]

ax.scatter(period_dates, period_numbers, alpha=0.7, s=50)
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Period Number', fontsize=12)
ax.set_title(f'12-Day Composite Periods: {START_DATE} to {END_DATE}\nIndonesian Agricultural Year Coverage ({SCALE}m resolution, {CRS})', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)

# Add month boundaries and labels for both years
from matplotlib.dates import DateFormatter, MonthLocator
ax.xaxis.set_major_locator(MonthLocator())
ax.xaxis.set_major_formatter(DateFormatter('%b\n%Y'))

# Highlight agricultural seasons with colored backgrounds
from matplotlib.patches import Rectangle
from datetime import datetime

# Season 1: Nov 2024 - Mar 2025 (first planting)
season1_start = datetime(2024, 11, 1)
season1_end = datetime(2025, 3, 31)
ax.axvspan(season1_start, season1_end, alpha=0.15, color='green', label='Season 1: Nov-Mar (First Planting)')

# Season 2: Apr - Jun 2025 (second planting)
season2_start = datetime(2025, 4, 1)
season2_end = datetime(2025, 6, 30)
ax.axvspan(season2_start, season2_end, alpha=0.15, color='blue', label='Season 2: Apr-Jun (Second Planting)')

# Season 3: Jul - Sep 2025 (third planting)
season3_start = datetime(2025, 7, 1)
season3_end = datetime(2025, 9, 30)
ax.axvspan(season3_start, season3_end, alpha=0.15, color='orange', label='Season 3: Jul-Sep (Third Planting)')

# Highlight year boundary
year_boundary = datetime(2025, 1, 1)
ax.axvline(year_boundary, color='red', linewidth=2, linestyle='--', label='Year Boundary (2024‚Üí2025)')

ax.legend(loc='upper left', fontsize=10)

plt.tight_layout()
plt.savefig(os.path.join(OUTPUT_DIR, f'period_timeline_Nov2024_Oct2025.png'), dpi=150, bbox_inches='tight')
plt.show()

print(f"\nPeriod timeline saved to: {os.path.join(OUTPUT_DIR, f'period_timeline_Nov2024_Oct2025.png')}")


## 9. Data Conversion for FuseTS

In [None]:
def prepare_fusets_format(data_path_or_array, metadata_path=None):
    """
    Convert GEE-exported data to FuseTS-compatible format
    """
    
    if isinstance(data_path_or_array, str):
        # Load from file
        print(f"Loading data from: {data_path_or_array}")
        
        if data_path_or_array.endswith('.nc'):
            data = xr.open_dataset(data_path_or_array)
        else:
            # Assume GeoTIFF
            import rioxarray
            data = rioxarray.open_rasterio(data_path_or_array)
            
    else:
        # Use provided array
        data = data_path_or_array
    
    print("Converting to FuseTS format...")
    
    # Ensure proper dimension naming
    if 'time' in data.dims:
        data = data.rename({'time': 't'})
    
    # Ensure proper band naming for FuseTS
    if 'NDVI' in data.data_vars:
        data = data.rename({'NDVI': 'S2ndvi'})
    
    # Ensure coordinate order is (t, y, x)
    expected_dims = ['t', 'y', 'x']
    
    for var in data.data_vars:
        if set(data[var].dims) == set(expected_dims):
            data[var] = data[var].transpose('t', 'y', 'x')
    
    # Add FuseTS-specific attributes
    data.attrs.update({
        'title': f'Sentinel-1/2 Time Series for FuseTS Processing',
        'description': '12-day composite periods extracted from Google Earth Engine',
        'bands': 'VV (S1), VH (S1), S2ndvi (S2 NDVI)',
        'temporal_resolution': '12-day composites',
        'processing_software': 'Google Earth Engine + Python',
        'fusets_ready': True
    })
    
    return data

def create_example_usage_script():
    """
    Create a script showing how to use the exported data with FuseTS
    """
    
    script_content = '''
# Example script to use GEE-exported data with FuseTS
# Run this after downloading the exported data from Google Drive
# Temporal coverage: November 2024 - October 2025 (Indonesian agricultural year)

import xarray as xr
import rioxarray
from fusets.mogpr import MOGPRTransformer
from fusets.analytics import phenology
from fusets import whittaker

# Load the exported data
# Option 1: If you exported as individual periods
# data_files = ['S1_S2_Nov2024_Oct2025_Period_01.tif', 'S1_S2_Nov2024_Oct2025_Period_02.tif', ...]
# data = combine_period_files(data_files)  # You'll need to implement this

# Option 2: If you exported as single multi-band file
data_path = 'S1_S2_TimeSeries_Nov2024_Oct2025.tif'
data = rioxarray.open_rasterio(data_path)

# Convert to FuseTS format
fusets_data = prepare_fusets_format(data)

# Apply MOGPR fusion
mogpr = MOGPRTransformer()
fused_data = mogpr.fit_transform(fusets_data)

# Extract phenological metrics for Indonesian agricultural seasons
# Season 1: Nov 2024 - Mar 2025 (first planting, crosses year boundary)
# Season 2: Apr - Jun 2025 (second planting, dry season)
# Season 3: Jul - Sep 2025 (third planting, optional intensive)

phenology_metrics = phenology(fused_data['S2ndvi'])

# Access results
sos_times = phenology_metrics.da_sos_times
eos_times = phenology_metrics.da_eos_times

print("FuseTS processing completed for Nov 2024 - Oct 2025!")
print("Captured full Indonesian agricultural calendar including year-boundary season")
'''
    
    script_file = os.path.join(OUTPUT_DIR, 'fusets_processing_example.py')
    with open(script_file, 'w') as f:
        f.write(script_content)
    
    return script_file

# Create example script
example_script = create_example_usage_script()
print(f"Example FuseTS processing script created: {example_script}")

# If we have local data, prepare it for FuseTS
if 'local_data' in locals() and local_data is not None:
    print("\nPreparing local data for FuseTS...")
    fusets_ready_data = prepare_fusets_format(local_data)
    
    # Save FuseTS-ready data
    fusets_output = os.path.join(OUTPUT_DIR, f'S1_S2_timeseries_Nov2024_Oct2025_fusets_ready.nc')
    fusets_ready_data.to_netcdf(fusets_output)
    print(f"FuseTS-ready data saved to: {fusets_output}")
    
    # Display data structure
    print("\nFuseTS-ready data structure:")
    print(fusets_ready_data)
    
    print("\nThis data is now ready for the MOGPR fusion notebook!")

## 10. Summary and Next Steps

### What this notebook accomplishes:

1. **Temporal Strategy**: Creates exactly 31 periods of 12-day composites from **Nov 2024 to Oct 2025**
2. **Data Collection**: Extracts S1 (VV, VH) and S2 (NDVI) data from Google Earth Engine
3. **Cloud Processing**: Uses GEE's computational power for large-scale data processing
4. **Flexible Export**: **GEE Assets (recommended)** or Google Drive
5. **Local Processing**: For small areas, extracts data directly without export/download
6. **FuseTS Preparation**: Converts data to the exact format needed for MOGPR processing

### Export Options Comparison:

| Feature | GEE Assets ‚≠ê RECOMMENDED | Google Drive |
|---------|---------------------------|--------------|
| **Size limit** | 10 TB per user | ~15 GB per file |
| **Best for** | Large areas (Java Island) | Small test areas |
| **Speed** | Fast (stays in cloud) | Slow (download required) |
| **Usage** | Use directly in GEE | Must download first |
| **Sharing** | Easy (asset permissions) | Manual file sharing |
| **Cost** | Free (GEE quota) | Free (Drive quota) |
| **Processing** | Process in GEE cloud | Local processing needed |

### When to use GEE Assets:
‚úÖ **Study area > 1000 km¬≤** (like Java Island with 5km buffer)  
‚úÖ **Multiple people need access** to the same data  
‚úÖ **Want to process in GEE** without downloading  
‚úÖ **Need to reuse data** in multiple projects  
‚úÖ **Data size > 2GB**  

### When to use Google Drive:
‚úÖ **Small test area** (< 100 km¬≤)  
‚úÖ **Quick prototyping** with local tools  
‚úÖ **One-time download** for offline work  
‚úÖ **Prefer local storage** over cloud  

### Temporal Coverage (Indonesian Agricultural Year):
- **Period 1**: 2024-11-01 to 2024-11-12 ‚Üê **First planting season starts**
- **Period 2**: 2024-11-13 to 2024-11-24  
- **Period 3**: 2024-11-25 to 2024-12-06
- **Period 6**: 2024-12-31 to 2025-01-11 ‚Üê **Crosses year boundary**
- **...**
- **Period 11**: 2025-03-09 to 2025-03-20 ‚Üê **First planting season ends**
- **Period 12-18**: 2025-04-01 to 2025-06-30 ‚Üê **Second planting season**
- **Period 19-25**: 2025-07-01 to 2025-09-30 ‚Üê **Third planting season (optional)**
- **Period 31**: 2025-10-21 to 2025-10-31 ‚Üê **Full coverage complete**

### Agricultural Seasons Captured:
- **Season 1 (Nov-Mar)**: First planting season - **handles year boundary transition**
  - Start: Nov 2024 (Period 1)
  - Peak: Jan 2025 (crosses from 2024‚Üí2025)
  - End: Mar 2025 (Period ~11)
  
- **Season 2 (Apr-Jun)**: Second planting season (dry season)
  - Periods 12-18 in 2025
  
- **Season 3 (Jul-Sep)**: Third planting season (optional intensive)
  - Periods 19-25 in 2025
  
- **Full Monitoring**: Through October 2025 (Period 31)

### Next Steps:

#### If you exported to GEE Assets (Recommended):
1. **Monitor exports**: https://code.earthengine.google.com/tasks
2. **Use in GEE Code Editor**:
   ```javascript
   var collection = ee.ImageCollection('projects/ee-geodeticengineeringundip/assets/FuseTS/S1_S2_Nov2024_Oct2025_Period_*');
   ```
3. **Or download specific regions** when needed (see Section 6b)
4. **Process in GEE** or download small regions for local analysis

#### If you exported to Google Drive:
1. **Download Data**: Monitor exports at https://code.earthengine.google.com/tasks
2. **Load in FuseTS**: Use the exported GeoTIFF files with the MOGPR fusion notebook
3. **Apply MOGPR**: Run the S1+S2 fusion using the prepared time series
4. **Multi-Season Analysis**: Detect all three Indonesian agricultural seasons

### File Outputs:
- **Assets**: `projects/ee-geodeticengineeringundip/assets/FuseTS/S1_S2_Nov2024_Oct2025_Period_*`
- **Or Drive**: S1_S2_TimeSeries_Nov2024_Oct2025.tif (or individual period files)
- **Metadata**: processing_metadata_Nov2024_Oct2025.json
- **Timeline**: period_timeline_Nov2024_Oct2025.png
- **Example Script**: fusets_processing_example.py

### Key Features:
‚úÖ **Perfect alignment** with Indonesian agricultural calendar  
‚úÖ **Year boundary handling** for Nov 2024 ‚Üí Mar 2025 first season  
‚úÖ **Complete coverage** of all potential planting seasons  
‚úÖ **31 periods** √ó 12 days = 365 days (full agricultural year)  
‚úÖ **50m resolution** for efficient regional analysis  
‚úÖ **GEE Assets support** for large-scale datasets  

### For Large Datasets (Java Island):
üí° **Recommended workflow**:
1. Export to **GEE Assets** (no size limits)
2. Process and analyze **entirely in GEE** using Code Editor or Python API
3. Download **only final results** or specific regions of interest
4. Use MOGPR fusion **on cloud-processed data** for maximum efficiency

The exported data is now ready for the FuseTS MOGPR processing workflow with full Indonesian agricultural season detection!