# Google Earth Engine Data Preparation for FuseTS

This notebook extracts Sentinel-1 and Sentinel-2 data from Google Earth Engine and prepares it for FuseTS MOGPR processing.

## Temporal Compositing Strategy
- **Total periods**: 31 periods for 2024
- **Period length**: 12 days each
- **Period 1**: Jan 1-12, 2024
- **Period 2**: Jan 13-25, 2024  
- **Period 3**: Jan 26 - Feb 7, 2024
- **... and so on**

## Output Format
Data will be exported in FuseTS-compatible xarray format with proper band naming:
- S1: `VV`, `VH` bands
- S2: `S2ndvi` band
- Dimensions: `(time, y, x)` with `t` coordinate name

## 1. Setup and Authentication

In [None]:
import ee
import geemap
import pandas as pd
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import os
import warnings
warnings.filterwarnings('ignore')

# Initialize Earth Engine
try:
    ee.Initialize()
    print("Earth Engine initialized successfully!")
except Exception as e:
    print(f"Error initializing Earth Engine: {e}")
    print("Please run: ee.Authenticate() first if this is your first time")
    # ee.Authenticate()  # Uncomment this line for first-time setup
    # ee.Initialize()

print(f"Earth Engine version: {ee.__version__}")
print(f"geemap version: {geemap.__version__}")

## 2. Define Study Area and Parameters

In [None]:
# Define your study area (modify coordinates as needed)
# Example: Agricultural area in Belgium
study_area = ee.Geometry.Rectangle([
    5.0,   # min longitude
    50.8,  # min latitude  
    5.4,   # max longitude
    51.2   # max latitude
])

# Alternative: Define study area from shapefile or other geometry
# study_area = geemap.shp_to_ee('path/to/your/shapefile.shp')

# Processing parameters
YEAR = 2024
SCALE = 10  # meters per pixel (10m for S2, will be resampled for S1)
MAX_CLOUD_COVER = 20  # Maximum cloud cover percentage for S2

# Output directory
OUTPUT_DIR = 'gee_fusets_data'
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"Study area bounds: {study_area.bounds().getInfo()}")
print(f"Processing year: {YEAR}")
print(f"Spatial resolution: {SCALE}m")
print(f"Output directory: {OUTPUT_DIR}")

## 3. Generate 12-Day Composite Periods

In [None]:
def generate_12day_periods(year):
    """
    Generate 31 periods of 12 days each for the specified year
    """
    start_date = datetime(year, 1, 1)
    periods = []
    
    for period_num in range(31):
        period_start = start_date + timedelta(days=period_num * 12)
        period_end = period_start + timedelta(days=11)  # 12 days inclusive
        
        # Ensure we don't go beyond the year
        if period_end.year > year:
            period_end = datetime(year, 12, 31)
            
        periods.append({
            'period': period_num + 1,
            'start_date': period_start,
            'end_date': period_end,
            'start_str': period_start.strftime('%Y-%m-%d'),
            'end_str': period_end.strftime('%Y-%m-%d'),
            'center_date': period_start + timedelta(days=6),  # Middle of period
            'doy_center': (period_start + timedelta(days=6)).timetuple().tm_yday
        })
        
        if period_end.year > year:
            break
            
    return periods

# Generate periods
periods = generate_12day_periods(YEAR)

print(f"Generated {len(periods)} periods for {YEAR}:")
print("\nFirst 5 periods:")
for i, period in enumerate(periods[:5]):
    print(f"Period {period['period']:2d}: {period['start_str']} to {period['end_str']} (center: DOY {period['doy_center']:3d})")

print("\nLast 5 periods:")
for i, period in enumerate(periods[-5:]):
    print(f"Period {period['period']:2d}: {period['start_str']} to {period['end_str']} (center: DOY {period['doy_center']:3d})")

# Create a DataFrame for easier handling
periods_df = pd.DataFrame(periods)
print(f"\nTotal temporal coverage: {periods[0]['start_str']} to {periods[-1]['end_str']}")

## 4. Define Data Loading Functions

In [None]:
def load_sentinel1_data(geometry, start_date, end_date):
    """
    Load Sentinel-1 GRD data for a specific time period
    """
    s1_collection = (ee.ImageCollection('COPERNICUS/S1_GRD')
                    .filterBounds(geometry)
                    .filterDate(start_date, end_date)
                    .filter(ee.Filter.eq('instrumentMode', 'IW'))
                    .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VV'))
                    .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VH'))
                    .select(['VV', 'VH']))
    
    return s1_collection

def load_sentinel2_data(geometry, start_date, end_date, max_cloud_cover=20):
    """
    Load Sentinel-2 data and calculate NDVI for a specific time period
    """
    def calculate_ndvi(image):
        ndvi = image.normalizedDifference(['B8', 'B4']).rename('NDVI')
        return image.addBands(ndvi)
    
    def mask_clouds(image):
        # Use SCL band for cloud masking
        scl = image.select('SCL')
        # Keep vegetation, soil, water, snow classes (4,5,6,11)
        good_pixels = scl.eq(4).Or(scl.eq(5)).Or(scl.eq(6)).Or(scl.eq(11))
        return image.updateMask(good_pixels)
    
    s2_collection = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
                    .filterBounds(geometry)
                    .filterDate(start_date, end_date)
                    .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', max_cloud_cover))
                    .map(mask_clouds)
                    .map(calculate_ndvi)
                    .select(['NDVI']))
    
    return s2_collection

def create_composite(collection, method='median'):
    """
    Create a composite from an image collection
    """
    if method == 'median':
        return collection.median()
    elif method == 'mean':
        return collection.mean()
    elif method == 'max':
        return collection.max()
    else:
        return collection.median()

print("Data loading functions defined successfully!")

## 5. Process Data for All Periods

In [None]:
def process_single_period(period_info, geometry, scale=10):
    """
    Process S1 and S2 data for a single 12-day period
    """
    start_date = period_info['start_str']
    end_date = period_info['end_str']
    period_num = period_info['period']
    
    print(f"Processing Period {period_num}: {start_date} to {end_date}")
    
    try:
        # Load Sentinel-1 data
        s1_collection = load_sentinel1_data(geometry, start_date, end_date)
        s1_count = s1_collection.size().getInfo()
        
        # Load Sentinel-2 data
        s2_collection = load_sentinel2_data(geometry, start_date, end_date, MAX_CLOUD_COVER)
        s2_count = s2_collection.size().getInfo()
        
        print(f"  Found {s1_count} S1 images, {s2_count} S2 images")
        
        # Create composites
        if s1_count > 0:
            s1_composite = create_composite(s1_collection, 'median')
        else:
            # Create empty image with correct bands
            s1_composite = ee.Image.constant([0, 0]).rename(['VV', 'VH']).updateMask(ee.Image.constant(0))
            
        if s2_count > 0:
            s2_composite = create_composite(s2_collection, 'median')
        else:
            # Create empty NDVI image
            s2_composite = ee.Image.constant(0).rename('NDVI').updateMask(ee.Image.constant(0))
        
        # Combine S1 and S2 data
        combined_image = s1_composite.addBands(s2_composite.rename('S2ndvi'))
        
        # Add metadata
        combined_image = combined_image.set({
            'period': period_num,
            'start_date': start_date,
            'end_date': end_date,
            'center_date': period_info['center_date'].strftime('%Y-%m-%d'),
            'doy_center': period_info['doy_center'],
            's1_count': s1_count,
            's2_count': s2_count
        })
        
        return combined_image
        
    except Exception as e:
        print(f"  Error processing period {period_num}: {e}")
        return None

# Process all periods
print("Starting data processing for all periods...\n")

processed_images = []
successful_periods = []

for i, period in enumerate(periods):
    result = process_single_period(period, study_area, SCALE)
    if result is not None:
        processed_images.append(result)
        successful_periods.append(period)
    
    # Progress update every 5 periods
    if (i + 1) % 5 == 0:
        print(f"Completed {i + 1}/{len(periods)} periods\n")

print(f"Successfully processed {len(processed_images)} out of {len(periods)} periods")

# Create ImageCollection from processed images
if processed_images:
    time_series_collection = ee.ImageCollection(processed_images)
    print(f"Created time series collection with {time_series_collection.size().getInfo()} images")
else:
    print("No images were successfully processed!")

## 6. Export Data from GEE

In [None]:
def export_timeseries_to_drive(collection, geometry, scale, output_name):
    """
    Export the time series collection to Google Drive as a multi-band image
    """
    # Convert collection to multi-band image
    # Each period becomes a separate set of bands
    image_list = collection.toList(collection.size())
    
    def rename_bands_with_period(image):
        image = ee.Image(image)
        period = ee.Number(image.get('period')).format('%02d')
        
        # Rename bands to include period number
        old_names = image.bandNames()
        new_names = old_names.map(lambda name: ee.String(name).cat('_P').cat(period))
        
        return image.rename(new_names)
    
    # Rename bands with period numbers
    renamed_collection = collection.map(rename_bands_with_period)
    
    # Convert to single multi-band image
    multi_band_image = renamed_collection.toBands()
    
    # Export task
    task = ee.batch.Export.image.toDrive(
        image=multi_band_image,
        description=output_name,
        folder='GEE_FuseTS_Data',
        fileNamePrefix=output_name,
        scale=scale,
        region=geometry,
        maxPixels=1e9,
        crs='EPSG:4326',
        fileFormat='GeoTIFF'
    )
    
    return task

def export_individual_periods(collection, geometry, scale, base_name):
    """
    Export each period as a separate GeoTIFF file
    """
    tasks = []
    image_list = collection.toList(collection.size())
    
    for i in range(len(successful_periods)):
        image = ee.Image(image_list.get(i))
        period_num = successful_periods[i]['period']
        
        task = ee.batch.Export.image.toDrive(
            image=image,
            description=f'{base_name}_Period_{period_num:02d}',
            folder='GEE_FuseTS_Data',
            fileNamePrefix=f'{base_name}_Period_{period_num:02d}',
            scale=scale,
            region=geometry,
            maxPixels=1e9,
            crs='EPSG:4326',
            fileFormat='GeoTIFF'
        )
        
        tasks.append(task)
    
    return tasks

# Choose export method
EXPORT_METHOD = 'individual'  # 'combined' or 'individual'

if time_series_collection:
    if EXPORT_METHOD == 'combined':
        # Export as single multi-band file
        print("Preparing export as single multi-band GeoTIFF...")
        export_task = export_timeseries_to_drive(
            time_series_collection, 
            study_area, 
            SCALE, 
            f'S1_S2_TimeSeries_{YEAR}'
        )
        
        print(f"Starting export task: {export_task.config['description']}")
        export_task.start()
        
        print(f"Export task submitted. Monitor progress at: https://code.earthengine.google.com/tasks")
        
    else:
        # Export individual period files
        print("Preparing export as individual period GeoTIFFs...")
        export_tasks = export_individual_periods(
            time_series_collection,
            study_area,
            SCALE,
            f'S1_S2_{YEAR}'
        )
        
        print(f"Starting {len(export_tasks)} export tasks...")
        for i, task in enumerate(export_tasks[:5]):  # Start first 5 tasks
            task.start()
            print(f"  Started: {task.config['description']}")
        
        if len(export_tasks) > 5:
            print(f"\nRemaining {len(export_tasks) - 5} tasks can be started manually or in batches")
            print("Monitor all tasks at: https://code.earthengine.google.com/tasks")

else:
    print("No data to export!")

## 7. Create Local Processing Function (Alternative to Export)

In [None]:
def extract_timeseries_locally(collection, geometry, scale, max_pixels=1e6):
    """
    Extract time series data directly to memory for small areas
    This is faster than export/download for small study areas
    """
    print("Extracting time series data locally...")
    
    # Get the region bounds
    region = geometry.bounds()
    
    # Extract data for each period
    image_list = collection.toList(collection.size())
    
    periods_data = []
    
    for i in range(len(successful_periods)):
        print(f"Extracting period {i+1}/{len(successful_periods)}...")
        
        image = ee.Image(image_list.get(i))
        period_info = successful_periods[i]
        
        try:
            # Sample the image
            if scale * scale * 10000 < max_pixels:  # Rough estimate
                # Use geemap for efficient extraction
                data_array = geemap.ee_to_xarray(
                    image, 
                    region=region, 
                    scale=scale,
                    crs='EPSG:4326'
                )
                
                # Add period information
                data_array = data_array.assign_coords(
                    period=period_info['period'],
                    center_date=period_info['center_date'],
                    doy_center=period_info['doy_center']
                )
                
                periods_data.append(data_array)
                
            else:
                print(f"  Area too large for local extraction, use export method instead")
                break
                
        except Exception as e:
            print(f"  Error extracting period {i+1}: {e}")
            continue
    
    if periods_data:
        # Combine all periods into a single xarray Dataset
        print("Combining periods into time series...")
        
        # Concatenate along a new time dimension
        combined_data = xr.concat(periods_data, dim='time')
        
        # Create proper time coordinates
        time_coords = [p['center_date'] for p in successful_periods[:len(periods_data)]]
        combined_data = combined_data.assign_coords(time=time_coords)
        
        return combined_data
    
    return None

# Try local extraction for small areas
area_size = study_area.area().getInfo()  # in square meters
area_km2 = area_size / 1e6

print(f"Study area size: {area_km2:.2f} km²")

if area_km2 < 100:  # Less than 100 km²
    print("Area is small enough for local extraction. Attempting direct download...")
    
    try:
        local_data = extract_timeseries_locally(
            time_series_collection, 
            study_area, 
            SCALE, 
            max_pixels=1e6
        )
        
        if local_data is not None:
            print("Local extraction successful!")
            print(f"Data shape: {local_data.dims}")
            print(f"Variables: {list(local_data.data_vars)}")
            
            # Save locally
            output_file = os.path.join(OUTPUT_DIR, f'S1_S2_timeseries_{YEAR}_local.nc')
            local_data.to_netcdf(output_file)
            print(f"Data saved to: {output_file}")
            
        else:
            print("Local extraction failed, use export method instead")
            
    except Exception as e:
        print(f"Local extraction error: {e}")
        print("Use export method instead")
        
else:
    print("Area is too large for local extraction. Use the export method above.")

## 8. Create Metadata and Processing Summary

In [None]:
# Create processing summary
processing_summary = {
    'processing_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
    'year': YEAR,
    'total_periods': len(periods),
    'successful_periods': len(successful_periods),
    'study_area_bounds': study_area.bounds().getInfo(),
    'spatial_resolution': SCALE,
    'max_cloud_cover': MAX_CLOUD_COVER,
    'composite_method': 'median',
    'output_bands': ['VV', 'VH', 'S2ndvi']
}

# Create detailed period information
period_details = []
for period in successful_periods:
    period_details.append({
        'period': period['period'],
        'start_date': period['start_str'],
        'end_date': period['end_str'],
        'center_date': period['center_date'].strftime('%Y-%m-%d'),
        'doy_center': period['doy_center']
    })

# Save metadata
import json

metadata = {
    'summary': processing_summary,
    'periods': period_details
}

metadata_file = os.path.join(OUTPUT_DIR, f'processing_metadata_{YEAR}.json')
with open(metadata_file, 'w') as f:
    json.dump(metadata, f, indent=2, default=str)

print("Processing Summary:")
print(f"  Year: {YEAR}")
print(f"  Total periods: {len(periods)}")
print(f"  Successful periods: {len(successful_periods)}")
print(f"  Spatial resolution: {SCALE}m")
print(f"  Output bands: {processing_summary['output_bands']}")
print(f"  Metadata saved to: {metadata_file}")

# Create period visualization
fig, ax = plt.subplots(figsize=(15, 6))

# Plot period timeline
period_dates = [p['center_date'] for p in successful_periods]
period_numbers = [p['period'] for p in successful_periods]

ax.scatter(period_dates, period_numbers, alpha=0.7, s=50)
ax.set_xlabel('Date')
ax.set_ylabel('Period Number')
ax.set_title(f'12-Day Composite Periods for {YEAR}')
ax.grid(True, alpha=0.3)

# Add month boundaries
for month in range(1, 13):
    month_start = datetime(YEAR, month, 1)
    ax.axvline(month_start, color='red', alpha=0.3, linestyle='--')
    ax.text(month_start, max(period_numbers) * 0.9, 
           month_start.strftime('%b'), rotation=90, ha='right')

plt.tight_layout()
plt.savefig(os.path.join(OUTPUT_DIR, f'period_timeline_{YEAR}.png'), dpi=150, bbox_inches='tight')
plt.show()

print(f"\nPeriod timeline saved to: {os.path.join(OUTPUT_DIR, f'period_timeline_{YEAR}.png')}")

## 9. Data Conversion for FuseTS

In [None]:
def prepare_fusets_format(data_path_or_array, metadata_path=None):
    """
    Convert GEE-exported data to FuseTS-compatible format
    """
    
    if isinstance(data_path_or_array, str):
        # Load from file
        print(f"Loading data from: {data_path_or_array}")
        
        if data_path_or_array.endswith('.nc'):
            data = xr.open_dataset(data_path_or_array)
        else:
            # Assume GeoTIFF
            import rioxarray
            data = rioxarray.open_rasterio(data_path_or_array)
            
    else:
        # Use provided array
        data = data_path_or_array
    
    print("Converting to FuseTS format...")
    
    # Ensure proper dimension naming
    if 'time' in data.dims:
        data = data.rename({'time': 't'})
    
    # Ensure proper band naming for FuseTS
    if 'NDVI' in data.data_vars:
        data = data.rename({'NDVI': 'S2ndvi'})
    
    # Ensure coordinate order is (t, y, x)
    expected_dims = ['t', 'y', 'x']
    
    for var in data.data_vars:
        if set(data[var].dims) == set(expected_dims):
            data[var] = data[var].transpose('t', 'y', 'x')
    
    # Add FuseTS-specific attributes
    data.attrs.update({
        'title': f'Sentinel-1/2 Time Series for FuseTS Processing',
        'description': '12-day composite periods extracted from Google Earth Engine',
        'bands': 'VV (S1), VH (S1), S2ndvi (S2 NDVI)',
        'temporal_resolution': '12-day composites',
        'processing_software': 'Google Earth Engine + Python',
        'fusets_ready': True
    })
    
    return data

def create_example_usage_script():
    """
    Create a script showing how to use the exported data with FuseTS
    """
    
    script_content = '''
# Example script to use GEE-exported data with FuseTS
# Run this after downloading the exported data from Google Drive

import xarray as xr
import rioxarray
from fusets.mogpr import MOGPRTransformer
from fusets.analytics import phenology
from fusets import whittaker

# Load the exported data
# Option 1: If you exported as individual periods
# data_files = ['S1_S2_2024_Period_01.tif', 'S1_S2_2024_Period_02.tif', ...]
# data = combine_period_files(data_files)  # You'll need to implement this

# Option 2: If you exported as single multi-band file
data_path = 'S1_S2_TimeSeries_2024.tif'
data = rioxarray.open_rasterio(data_path)

# Convert to FuseTS format
fusets_data = prepare_fusets_format(data)

# Apply MOGPR fusion
mogpr = MOGPRTransformer()
fused_data = mogpr.fit_transform(fusets_data)

# Extract phenological metrics
phenology_metrics = phenology(fused_data['S2ndvi'])

# Access results
sos_times = phenology_metrics.da_sos_times
eos_times = phenology_metrics.da_eos_times

print("FuseTS processing completed!")
'''
    
    script_file = os.path.join(OUTPUT_DIR, 'fusets_processing_example.py')
    with open(script_file, 'w') as f:
        f.write(script_content)
    
    return script_file

# Create example script
example_script = create_example_usage_script()
print(f"Example FuseTS processing script created: {example_script}")

# If we have local data, prepare it for FuseTS
if 'local_data' in locals() and local_data is not None:
    print("\nPreparing local data for FuseTS...")
    fusets_ready_data = prepare_fusets_format(local_data)
    
    # Save FuseTS-ready data
    fusets_output = os.path.join(OUTPUT_DIR, f'S1_S2_timeseries_{YEAR}_fusets_ready.nc')
    fusets_ready_data.to_netcdf(fusets_output)
    print(f"FuseTS-ready data saved to: {fusets_output}")
    
    # Display data structure
    print("\nFuseTS-ready data structure:")
    print(fusets_ready_data)
    
    print("\nThis data is now ready for the MOGPR fusion notebook!")

## 10. Summary and Next Steps

### What this notebook accomplishes:

1. **Temporal Strategy**: Creates exactly 31 periods of 12-day composites for 2024
2. **Data Collection**: Extracts S1 (VV, VH) and S2 (NDVI) data from Google Earth Engine
3. **Cloud Processing**: Uses GEE's computational power for large-scale data processing
4. **Export Options**: Provides both individual period files and combined multi-band exports
5. **Local Processing**: For small areas, extracts data directly without export/download
6. **FuseTS Preparation**: Converts data to the exact format needed for MOGPR processing

### Temporal Coverage:
- **Period 1**: 2024-01-01 to 2024-01-12
- **Period 2**: 2024-01-13 to 2024-01-25  
- **Period 3**: 2024-01-26 to 2024-02-07
- **...**
- **Period 31**: 2024-12-18 to 2024-12-29

### Next Steps:

1. **Download Data**: Monitor exports at https://code.earthengine.google.com/tasks
2. **Load in FuseTS**: Use the exported GeoTIFF files with the MOGPR fusion notebook
3. **Apply MOGPR**: Run the S1+S2 fusion using the prepared time series
4. **Phenological Analysis**: Extract seasonal metrics from the fused data

### File Outputs:
- **Data**: S1_S2_TimeSeries_2024.tif (or individual period files)
- **Metadata**: processing_metadata_2024.json
- **Timeline**: period_timeline_2024.png
- **Example Script**: fusets_processing_example.py

The exported data is now ready for the FuseTS MOGPR processing workflow!