# Sentinel-1 & Sentinel-2 Temporal Dataset Creation
## Microsoft Planetary Computer Version

This notebook demonstrates how to create temporally-aligned datasets combining Sentinel-1 (SAR) and Sentinel-2 (optical) imagery using Microsoft Planetary Computer.

## Features
- Fetch Sentinel-1 and Sentinel-2 images for any bounding box
- Filter by date range (2016 onwards)
- Filter Sentinel-2 by cloud coverage (<5% default)
- Match images within 2-3 days temporal window
- Download matched pairs to local storage
- No authentication required!

## 0. Fix Import Path (Run This First!)

This cell ensures Python can find the sentinel_dataset_mpc module.

In [None]:
import sys
import os

# Add the current directory to Python path
current_dir = os.path.dirname(os.path.abspath('__file__'))
if current_dir not in sys.path:
    sys.path.insert(0, current_dir)

print(f"Added to Python path: {current_dir}")
print("\nReady to import sentinel_dataset_mpc!")

## 1. Setup and Imports

If you get import errors, try installing dependencies:
```bash
pip install planetary-computer pystac-client pandas rasterio shapely xarray matplotlib
```

In [None]:
# Import required libraries
import pandas as pd
import json
import matplotlib.pyplot as plt

try:
    from sentinel_dataset_mpc import (
        create_dataset,
        export_matched_pairs,
        get_sample_bbox
    )
    print("✓ Imports successful!")
    print("✓ Microsoft Planetary Computer - No authentication required!")
except ImportError as e:
    print(f"❌ Import Error: {e}")
    print("\nPlease install required packages:")
    print("pip install planetary-computer pystac-client pandas rasterio shapely xarray")
    raise

## 2. Define Bounding Box

A bounding box is defined as: `(min_lon, min_lat, max_lon, max_lat)`

You can:
- Use the sample bbox (San Francisco Bay Area)
- Define a custom rectangular bbox
- Create a bbox from a point with buffer

In [None]:
# Option 1: Use sample bbox (San Francisco Bay Area)
bbox = get_sample_bbox()

# Option 2: Define custom bbox
# bbox = (-122.5, 37.5, -122.0, 38.0)

# Option 3: Create bbox from point with buffer
# lon, lat = -122.4, 37.8  # San Francisco
# buffer = 0.05  # approximately 5km
# bbox = (lon - buffer, lat - buffer, lon + buffer, lat + buffer)

# Option 4: Custom reservoir area
# bbox = (-8.4, 37.7, -8.3, 37.8)  # Portugal reservoir

print(f"Bounding Box defined: {bbox}")
print(f"  Min Longitude: {bbox[0]}°")
print(f"  Min Latitude: {bbox[1]}°")
print(f"  Max Longitude: {bbox[2]}°")
print(f"  Max Latitude: {bbox[3]}°")

## 3. Create Temporally-Aligned Dataset

In [None]:
# Set parameters
START_DATE = '2023-06-01'
END_DATE = '2023-08-31'     # Summer 2023
CLOUD_PERCENTAGE = 5.0      # Maximum 5% cloud coverage
MAX_TIME_DIFF = 3           # Maximum 3 days between S1 and S2
OUTPUT_DIR = './sentinel_dataset_mpc'

# Create dataset
s1_items, s2_items, matched_pairs = create_dataset(
    bbox=bbox,
    start_date=START_DATE,
    end_date=END_DATE,
    cloud_percentage=CLOUD_PERCENTAGE,
    max_time_diff_days=MAX_TIME_DIFF,
    output_dir=OUTPUT_DIR
)

## 4. Analyze Matched Pairs

In [None]:
if matched_pairs:
    # Convert to DataFrame for analysis
    df = pd.DataFrame([{
        's1_id': p['s1_id'],
        's1_date': p['s1_date'],
        's1_orbit': p['s1_orbit'],
        's2_id': p['s2_id'],
        's2_date': p['s2_date'],
        's2_cloud_cover': p['s2_cloud_cover'],
        'time_diff_days': p['time_diff_days']
    } for p in matched_pairs])
    
    print(f"Total matched pairs: {len(matched_pairs)}")
    print(f"\nFirst 5 matched pairs:")
    display(df.head())
    
    print(f"\n{'='*60}")
    print("Time difference statistics (days):")
    print(f"{'='*60}")
    print(df['time_diff_days'].describe())
    
    print(f"\n{'='*60}")
    print("Cloud cover statistics (%):")
    print(f"{'='*60}")
    print(df['s2_cloud_cover'].describe())
    
    print(f"\n{'='*60}")
    print("Orbit distribution:")
    print(f"{'='*60}")
    print(df['s1_orbit'].value_counts())
else:
    print("No matched pairs found. Try:")
    print("  - Increasing cloud_percentage")
    print("  - Increasing max_time_diff_days")
    print("  - Expanding the date range")

## 5. Visualize Dataset Characteristics

In [None]:
if matched_pairs:
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Plot 1: Time difference distribution
    axes[0, 0].hist(df['time_diff_days'], bins=20, edgecolor='black', color='steelblue')
    axes[0, 0].set_xlabel('Time Difference (days)', fontsize=12)
    axes[0, 0].set_ylabel('Frequency', fontsize=12)
    axes[0, 0].set_title('Temporal Difference Distribution', fontsize=14, fontweight='bold')
    axes[0, 0].grid(True, alpha=0.3)
    
    # Plot 2: Cloud cover distribution
    axes[0, 1].hist(df['s2_cloud_cover'], bins=20, edgecolor='black', color='coral')
    axes[0, 1].set_xlabel('Cloud Cover (%)', fontsize=12)
    axes[0, 1].set_ylabel('Frequency', fontsize=12)
    axes[0, 1].set_title('Sentinel-2 Cloud Coverage', fontsize=14, fontweight='bold')
    axes[0, 1].grid(True, alpha=0.3)
    
    # Plot 3: Orbit direction distribution
    orbit_counts = df['s1_orbit'].value_counts()
    axes[1, 0].bar(orbit_counts.index, orbit_counts.values, color=['#1f77b4', '#ff7f0e'])
    axes[1, 0].set_xlabel('Orbit Direction', fontsize=12)
    axes[1, 0].set_ylabel('Number of Images', fontsize=12)
    axes[1, 0].set_title('Sentinel-1 Orbit Distribution', fontsize=14, fontweight='bold')
    axes[1, 0].grid(True, alpha=0.3, axis='y')
    
    # Plot 4: Timeline
    df['date'] = pd.to_datetime(df['s2_date'])
    daily_counts = df.groupby('date').size()
    axes[1, 1].plot(daily_counts.index, daily_counts.values, marker='o', linestyle='-', color='green')
    axes[1, 1].set_xlabel('Date', fontsize=12)
    axes[1, 1].set_ylabel('Number of Pairs', fontsize=12)
    axes[1, 1].set_title('Temporal Distribution', fontsize=14, fontweight='bold')
    axes[1, 1].grid(True, alpha=0.3)
    axes[1, 1].tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()
else:
    print("No data to visualize.")

## 6. View Detailed Metadata

In [None]:
# Load and display metadata
metadata_file = f"{OUTPUT_DIR}/matched_pairs.json"

try:
    with open(metadata_file, 'r') as f:
        metadata = json.load(f)
    
    print("="*70)
    print("DATASET METADATA")
    print("="*70)
    print(f"Data Source: {metadata['source']}")
    print(f"Bounding Box: {metadata['bbox']}")
    print(f"Start Date: {metadata['start_date']}")
    print(f"End Date: {metadata['end_date']}")
    print(f"Cloud Threshold: {metadata['cloud_percentage_threshold']}%")
    print(f"Max Time Difference: {metadata['max_time_diff_days']} days")
    print(f"Orbit Filter: {metadata['orbit_direction'] or 'Both'}")
    print("-"*70)
    print(f"Total Sentinel-1 Images: {metadata['total_s1_images']}")
    print(f"Total Sentinel-2 Images: {metadata['total_s2_images']}")
    print(f"Matched Pairs: {metadata['matched_pairs_count']}")
    print("="*70)
    
    if metadata['matched_pairs_count'] > 0:
        match_rate = (metadata['matched_pairs_count'] / 
                     min(metadata['total_s1_images'], metadata['total_s2_images']) * 100)
        print(f"\nMatching Success Rate: {match_rate:.1f}%")
        
except FileNotFoundError:
    print(f"Metadata file not found: {metadata_file}")

## 7. Download Matched Pairs (Optional)

⚠️ **Warning**: This will download satellite imagery to your local machine. Make sure you have enough disk space.

**Estimated sizes:**
- Each Sentinel-2 RGB image: ~50-200 MB (depends on area)
- Each Sentinel-1 image: ~20-100 MB (depends on area)

In [None]:
# Download first 3 matched pairs
# UNCOMMENT THE FOLLOWING LINES TO START DOWNLOAD

# NUM_PAIRS = 3
# OUTPUT_IMAGES_DIR = f'{OUTPUT_DIR}/images'

# if matched_pairs and len(matched_pairs) > 0:
#     print(f"Downloading {NUM_PAIRS} matched pairs...\n")
#     export_matched_pairs(
#         matched_pairs=matched_pairs[:NUM_PAIRS],
#         output_dir=OUTPUT_IMAGES_DIR,
#         s2_bands=['B04', 'B03', 'B02'],  # RGB bands (Red, Green, Blue)
#         s1_bands=['vh', 'vv']             # SAR polarizations
#     )
#     print(f"\n✓ Download complete!")
#     print(f"✓ Images saved to: {OUTPUT_IMAGES_DIR}")
# else:
#     print("No matched pairs available for download.")

print("Download section ready. Uncomment the code above to start downloading.")

## 8. Advanced: Custom Band Selection

Sentinel-2 has multiple bands for different purposes:
- **RGB**: ['B04', 'B03', 'B02'] - True color
- **NIR**: ['B08', 'B04', 'B03'] - Vegetation analysis
- **SWIR**: ['B12', 'B11', 'B04'] - Geology, moisture

In [None]:
# Example: Download with NIR band for vegetation analysis
# UNCOMMENT TO USE

# if matched_pairs:
#     export_matched_pairs(
#         matched_pairs=matched_pairs[:2],
#         output_dir='./vegetation_analysis',
#         s2_bands=['B08', 'B04', 'B03'],  # NIR, Red, Green
#         s1_bands=['vh', 'vv']
#     )

print("Uncomment the code above to download with custom bands.")

## 9. Export Metadata to CSV

In [None]:
if matched_pairs:
    # Export to CSV for external analysis
    csv_file = f'{OUTPUT_DIR}/matched_pairs.csv'
    df.to_csv(csv_file, index=False)
    print(f"✓ Metadata exported to: {csv_file}")
    print(f"✓ Total records: {len(df)}")
else:
    print("No data to export.")

## 10. Summary and Next Steps

### What you've accomplished:
1. ✓ Connected to Microsoft Planetary Computer (no auth needed!)
2. ✓ Defined a bounding box for your area of interest
3. ✓ Created a temporally-aligned Sentinel-1/Sentinel-2 dataset
4. ✓ Analyzed matched pairs statistics and visualizations
5. ✓ Generated metadata in JSON and CSV formats

### Next steps:
- Download matched pairs for local analysis
- Use the imagery for change detection or monitoring
- Train machine learning models with multi-modal data
- Experiment with different ROIs, date ranges, and parameters
- Calculate indices (NDVI, NDWI, etc.) from the downloaded imagery

### Useful Resources:
- [Microsoft Planetary Computer](https://planetarycomputer.microsoft.com/)
- [Data Catalog](https://planetarycomputer.microsoft.com/catalog)
- [Sentinel-1 Documentation](https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-1-sar)
- [Sentinel-2 Documentation](https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi)
- [STAC API Specification](https://stacspec.org/)