# Sentinel-1 & Sentinel-2 Temporal Dataset Creation

This notebook demonstrates how to create temporally-aligned datasets combining Sentinel-1 (SAR) and Sentinel-2 (optical) imagery.

## Features
- Fetch Sentinel-1 and Sentinel-2 images for any ROI
- Filter by date range (2016 onwards)
- Filter Sentinel-2 by cloud coverage (<5% default)
- Match images within 2-3 days temporal window
- Export matched pairs to Google Drive

## 1. Setup and Initialization

In [None]:
# Import required libraries
import ee
import pandas as pd
import json
from sentinel_dataset import (
    initialize_earth_engine,
    create_dataset,
    export_matched_images,
    get_sample_roi_geometry
)

# Initialize Earth Engine
initialize_earth_engine()

## 2. Define Region of Interest (ROI)

You can define your ROI in multiple ways:
- Rectangle: `ee.Geometry.Rectangle([lon_min, lat_min, lon_max, lat_max])`
- Point with buffer: `ee.Geometry.Point([lon, lat]).buffer(distance_meters)`
- Polygon: `ee.Geometry.Polygon([[[lon1, lat1], [lon2, lat2], ...]])`

In [None]:
# Option 1: Use sample ROI (San Francisco Bay Area)
roi = get_sample_roi_geometry()

# Option 2: Define custom rectangular ROI
# roi = ee.Geometry.Rectangle([-122.5, 37.5, -122.0, 38.0])

# Option 3: Define point-based ROI with buffer
# roi = ee.Geometry.Point([-122.4, 37.8]).buffer(10000)  # 10km radius

# Option 4: Define custom polygon ROI
# roi = ee.Geometry.Polygon([
#     [[-8.4, 37.8], [-8.4, 37.7], [-8.3, 37.7], [-8.3, 37.8]]
# ])

print("ROI defined successfully!")
print(f"ROI bounds: {roi.bounds().getInfo()['coordinates']}")

## 3. Create Temporally-Aligned Dataset

In [None]:
# Set parameters
START_DATE = '2023-01-01'
END_DATE = '2023-12-31'
CLOUD_PERCENTAGE = 5.0  # Maximum 5% cloud coverage
MAX_TIME_DIFF = 3       # Maximum 3 days between S1 and S2
OUTPUT_DIR = './sentinel_dataset'

# Create dataset
s1_collection, s2_collection, matched_pairs = create_dataset(
    roi=roi,
    start_date=START_DATE,
    end_date=END_DATE,
    cloud_percentage=CLOUD_PERCENTAGE,
    max_time_diff_days=MAX_TIME_DIFF,
    output_dir=OUTPUT_DIR
)

## 4. Analyze Matched Pairs

In [None]:
if matched_pairs:
    # Convert to DataFrame for analysis
    df = pd.DataFrame(matched_pairs)
    
    print(f"Total matched pairs: {len(matched_pairs)}")
    print(f"\nFirst 5 matched pairs:")
    display(df.head())
    
    print(f"\nTime difference statistics (days):")
    print(df['time_diff_days'].describe())
    
    # Temporal distribution
    df['s2_date'] = pd.to_datetime(df['s2_date'])
    df['month'] = df['s2_date'].dt.month
    
    print(f"\nMonthly distribution:")
    print(df['month'].value_counts().sort_index())
else:
    print("No matched pairs found. Try increasing cloud_percentage or max_time_diff_days.")

## 5. Visualize Time Differences

In [None]:
import matplotlib.pyplot as plt

if matched_pairs:
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Plot 1: Time difference distribution
    axes[0].hist(df['time_diff_days'], bins=20, edgecolor='black')
    axes[0].set_xlabel('Time Difference (days)')
    axes[0].set_ylabel('Frequency')
    axes[0].set_title('Temporal Difference Distribution')
    axes[0].grid(True, alpha=0.3)
    
    # Plot 2: Monthly distribution
    monthly_counts = df['month'].value_counts().sort_index()
    axes[1].bar(monthly_counts.index, monthly_counts.values)
    axes[1].set_xlabel('Month')
    axes[1].set_ylabel('Number of Matched Pairs')
    axes[1].set_title('Monthly Distribution of Matched Pairs')
    axes[1].set_xticks(range(1, 13))
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
else:
    print("No data to visualize.")

## 6. View Metadata

In [None]:
# Load and display metadata
metadata_file = f"{OUTPUT_DIR}/matched_pairs.json"

try:
    with open(metadata_file, 'r') as f:
        metadata = json.load(f)
    
    print("Dataset Metadata:")
    print("=" * 60)
    print(f"Start Date: {metadata['start_date']}")
    print(f"End Date: {metadata['end_date']}")
    print(f"Cloud Threshold: {metadata['cloud_percentage_threshold']}%")
    print(f"Max Time Difference: {metadata['max_time_diff_days']} days")
    print(f"S1 Orbit: {metadata['s1_orbit'] or 'Both'}")
    print(f"\nTotal S1 Images: {metadata['total_s1_images']}")
    print(f"Total S2 Images: {metadata['total_s2_images']}")
    print(f"Matched Pairs: {metadata['matched_pairs_count']}")
    print("=" * 60)
except FileNotFoundError:
    print(f"Metadata file not found: {metadata_file}")

## 7. Export Matched Pairs (Optional)

⚠️ **Warning**: This will start export tasks to your Google Drive. Make sure you have enough storage space.

The images will be exported to your Google Drive in the specified folder.

In [None]:
# Export first 5 matched pairs to Google Drive
# UNCOMMENT THE FOLLOWING LINES TO START EXPORT

# NUM_PAIRS_TO_EXPORT = 5
# EXPORT_FOLDER = 'Sentinel_Dataset_2023'
# EXPORT_SCALE = 10  # 10 meters resolution

# if matched_pairs and len(matched_pairs) > 0:
#     export_matched_images(
#         matched_pairs=matched_pairs[:NUM_PAIRS_TO_EXPORT],
#         s1_collection=s1_collection,
#         s2_collection=s2_collection,
#         roi=roi,
#         output_folder=EXPORT_FOLDER,
#         scale=EXPORT_SCALE,
#         export_to='drive'
#     )
#     print(f"\n✓ Export tasks submitted for {NUM_PAIRS_TO_EXPORT} pairs!")
#     print(f"✓ Check your Google Drive folder: {EXPORT_FOLDER}")
#     print(f"✓ You can monitor progress at: https://code.earthengine.google.com/tasks")
# else:
#     print("No matched pairs available for export.")

print("Export section ready. Uncomment the code above to start exporting.")

## 8. Advanced: Filter by Orbit Direction

In [None]:
# Create dataset with only ascending Sentinel-1 passes
# This can be useful for consistent viewing geometry

# s1_asc, s2_asc, pairs_asc = create_dataset(
#     roi=roi,
#     start_date='2023-01-01',
#     end_date='2023-12-31',
#     cloud_percentage=5.0,
#     max_time_diff_days=3,
#     s1_orbit='ASCENDING',  # Only ascending passes
#     output_dir='./sentinel_dataset_ascending'
# )

print("Uncomment the code above to create orbit-specific datasets.")

## 9. Summary and Next Steps

### What you've accomplished:
1. ✓ Initialized Google Earth Engine
2. ✓ Defined a Region of Interest
3. ✓ Created a temporally-aligned Sentinel-1/Sentinel-2 dataset
4. ✓ Analyzed matched pairs statistics
5. ✓ Generated metadata for the dataset

### Next steps:
- Export matched pairs to Google Drive for analysis
- Use the imagery for change detection, machine learning, or monitoring
- Experiment with different ROIs, date ranges, and parameters
- Combine with other Earth Engine datasets for enhanced analysis

### Useful Resources:
- [Google Earth Engine Code Editor](https://code.earthengine.google.com/)
- [Task Manager](https://code.earthengine.google.com/tasks) - Monitor export progress
- [Sentinel-1 Documentation](https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-1-sar)
- [Sentinel-2 Documentation](https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi)