# Satellite Image Analysis: Low Methane Concentration Areas

**Author:** [Yuexin (Joy) Wang]
**Date:** November 2025
**Purpose:** Predoctoral Research Assistant Application - Code Sample

---

## Overview

This notebook identifies low-methane concentration areas in the continental United States using Sentinel-5P atmospheric data and extracts corresponding high-resolution Sentinel-2 imagery for further analysis.

### Research Question
**Where are the lowest methane concentration areas in the US, and what do they look like from space?**

### Data Sources
- **Sentinel-5P**: Methane (CH4) column concentration data (~11km resolution)
- **Sentinel-2**: Multispectral optical imagery (10-20m resolution)
- **Study Period**: May 4, 2019
- **Study Area**: Continental United States

## 1. Setup and Configuration

In [None]:
# Import required libraries
import ee
import geemap
from datetime import datetime, timedelta
from typing import Tuple, List

In [None]:
# Configuration parameters
STUDY_DATE = '2019-05-04'
PROJECT_NAME = 'ee-epic-code-test'
METHANE_PERCENTILE = 10
CLOUD_THRESHOLD = 10
TARGET_RESOLUTION = 20  # meters
TARGET_CRS = 'EPSG:3857'
EXPORT_CRS = 'EPSG:4326'

# Sentinel-2 spectral bands of interest
SENTINEL2_BANDS = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B8A', 'B11', 'B12']

# Export configuration
EXPORT_FOLDER = 'EarthEngineExports'
MAX_PIXELS = 1e13

## 2. Initialize Google Earth Engine

In [None]:
def initialize_gee(project: str) -> None:
    """
    Initialize Google Earth Engine with authentication.

    Parameters:
    ----------
    project : str
        GEE project name
    """
    try:
        ee.Initialize(project=project)
        print(f"✓ Successfully initialized GEE project: {project}")
    except Exception as e:
        print(f"✗ Initialization failed. Attempting authentication...")
        ee.Authenticate()
        ee.Initialize(project=project)
        print(f"✓ Successfully authenticated and initialized GEE")

In [None]:
# Initialize GEE
initialize_gee(PROJECT_NAME)

## 3. Define Helper Functions

In [None]:
def get_study_region() -> ee.FeatureCollection:
    """
    Define the study region (Continental United States).

    Returns:
    -------
    ee.FeatureCollection
        Feature collection representing US boundaries
    """
    us_boundary = (ee.FeatureCollection("USDOS/LSIB_SIMPLE/2017")
                   .filter(ee.Filter.eq('country_na', 'United States')))
    return us_boundary


def get_date_range(date_str: str) -> Tuple[str, str]:
    """
    Convert single date to GEE-compatible date range.

    Parameters:
    ----------
    date_str : str
        Date in 'YYYY-MM-DD' format

    Returns:
    -------
    tuple
        (start_date, end_date) where end_date is exclusive
    """
    date_obj = datetime.strptime(date_str, '%Y-%m-%d')
    start = date_str
    end = (date_obj + timedelta(days=1)).strftime('%Y-%m-%d')
    return start, end

In [None]:
# Set up study parameters
study_region = get_study_region()
start_date, end_date = get_date_range(STUDY_DATE)

print(f"Study Region: Continental United States")
print(f"Study Date: {STUDY_DATE}")
print(f"Date Range for GEE: {start_date} to {end_date}")

## 4. Identify Low Methane Concentration Areas

This step uses Sentinel-5P data to identify pixels with methane concentrations in the lowest 10th percentile.

In [None]:
def identify_low_methane_areas(
    region: ee.FeatureCollection,
    start_date: str,
    end_date: str,
    percentile: int = 10
) -> ee.Image:
    """
    Identify areas with methane concentrations below specified percentile.

    Parameters:
    ----------
    region : ee.FeatureCollection
        Geographic region of interest
    start_date : str
        Start date (inclusive)
    end_date : str
        End date (exclusive)
    percentile : int
        Percentile threshold for low methane (default: 10)

    Returns:
    -------
    ee.Image
        Masked image showing only low-methane areas
    """
    # Load and filter Sentinel-5P methane data
    methane_collection = (
        ee.ImageCollection('COPERNICUS/S5P/OFFL/L3_CH4')
        .select('CH4_column_volume_mixing_ratio_dry_air')
        .filterDate(start_date, end_date)
        .filterBounds(region)
    )

    # Compute median methane concentration
    methane_image = methane_collection.median().clip(region)

    # Calculate percentile threshold
    percentile_threshold = methane_image.reduceRegion(
        reducer=ee.Reducer.percentile([percentile]),
        geometry=region.geometry(),
        scale=11132,  # Sentinel-5P native resolution
        bestEffort=True,
        maxPixels=1e9
    ).get('CH4_column_volume_mixing_ratio_dry_air')

    # Create binary mask for low-methane areas
    low_methane_mask = methane_image.updateMask(
        methane_image.lt(ee.Number(percentile_threshold))
    )

    print(f"✓ Identified low methane areas (< {percentile}th percentile)")
    return low_methane_mask

In [None]:
# Identify low methane areas
print("[Step 1/4] Identifying low methane concentration areas...")
low_methane_mask = identify_low_methane_areas(
    region=study_region,
    start_date=start_date,
    end_date=end_date,
    percentile=METHANE_PERCENTILE
)

### Optional: Visualize Low Methane Areas

In [None]:
# Create an interactive map to visualize results
Map = geemap.Map()
Map.addLayer(low_methane_mask, {'palette': ['blue']}, 'Low Methane Areas')
Map.centerObject(study_region, 4)
Map

## 5. Extract Sentinel-2 Images

Extract Sentinel-2 images corresponding to the identified low-methane areas, filtering for cloud cover < 10%.

In [None]:
def extract_sentinel2_images(
    region: ee.FeatureCollection,
    start_date: str,
    end_date: str,
    methane_mask: ee.Image,
    cloud_threshold: int = 10
) -> ee.ImageCollection:
    """
    Extract Sentinel-2 images corresponding to low-methane areas.

    Parameters:
    ----------
    region : ee.FeatureCollection
        Geographic region of interest
    start_date : str
        Start date (inclusive)
    end_date : str
        End date (exclusive)
    methane_mask : ee.Image
        Binary mask for low-methane areas
    cloud_threshold : int
        Maximum cloud cover percentage (default: 10)

    Returns:
    -------
    ee.ImageCollection
        Sentinel-2 images masked to low-methane areas
    """
    # Load Sentinel-2 Top-of-Atmosphere reflectance data
    sentinel2_collection = (
        ee.ImageCollection("COPERNICUS/S2_HARMONIZED")
        .filterDate(start_date, end_date)
        .filterBounds(region)
        .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', cloud_threshold))
    )

    # Apply methane mask to each image
    def apply_mask(image):
        return image.updateMask(methane_mask)

    masked_collection = sentinel2_collection.map(apply_mask)

    image_count = masked_collection.size().getInfo()
    print(f"✓ Extracted {image_count} Sentinel-2 images (cloud cover < {cloud_threshold}%)")

    return masked_collection

In [None]:
# Extract Sentinel-2 imagery
print("[Step 2/4] Extracting Sentinel-2 imagery...")
sentinel2_images = extract_sentinel2_images(
    region=study_region,
    start_date=start_date,
    end_date=end_date,
    methane_mask=low_methane_mask,
    cloud_threshold=CLOUD_THRESHOLD
)

## 6. Process Images

Select relevant spectral bands and reproject to standardized 20m resolution.

In [None]:
def process_images(
    image_collection: ee.ImageCollection,
    bands: List[str],
    target_crs: str,
    target_scale: int
) -> ee.ImageCollection:
    """
    Process images: select bands and reproject to target resolution.

    Parameters:
    ----------
    image_collection : ee.ImageCollection
        Input image collection
    bands : list
        List of band names to select
    target_crs : str
        Target coordinate reference system
    target_scale : int
        Target spatial resolution in meters

    Returns:
    -------
    ee.ImageCollection
        Processed image collection
    """
    # Select specified bands
    selected = image_collection.select(bands)

    # Reproject to target resolution
    def reproject(image):
        return image.reproject(crs=target_crs, scale=target_scale)

    processed = selected.map(reproject)

    print(f"✓ Processed images: {len(bands)} bands, {target_scale}m resolution, {target_crs}")
    return processed

In [None]:
# Process images
print("[Step 3/4] Processing images...")
processed_images = process_images(
    image_collection=sentinel2_images,
    bands=SENTINEL2_BANDS,
    target_crs=TARGET_CRS,
    target_scale=TARGET_RESOLUTION
)

## 7. Export to Google Drive

Export processed images as GeoTIFF files to Google Drive for downstream analysis.

In [None]:
def export_images_to_drive(
    image_collection: ee.ImageCollection,
    folder_name: str,
    file_prefix: str,
    export_crs: str,
    max_pixels: float
) -> List[ee.batch.Task]:
    """
    Export processed images to Google Drive as GeoTIFF files.

    Parameters:
    ----------
    image_collection : ee.ImageCollection
        Images to export
    folder_name : str
        Google Drive folder name
    file_prefix : str
        Prefix for exported filenames
    export_crs : str
        Coordinate reference system for export
    max_pixels : float
        Maximum number of pixels per image

    Returns:
    -------
    list
        List of export tasks
    """
    image_list = image_collection.toList(image_collection.size())
    n_images = image_list.size().getInfo()

    tasks = []

    for i in range(n_images):
        image = ee.Image(image_list.get(i))
        valid_geometry = image.geometry()
        clipped_image = image.clip(valid_geometry)

        # Configure export task
        task = ee.batch.Export.image.toDrive(
            image=clipped_image,
            description=f'{file_prefix}_{i:03d}',
            folder=folder_name,
            fileNamePrefix=f'{file_prefix}_{i:03d}',
            fileFormat='GeoTIFF',
            region=valid_geometry.bounds().getInfo()['coordinates'],
            scale=TARGET_RESOLUTION,
            crs=export_crs,
            maxPixels=max_pixels
        )

        task.start()
        tasks.append(task)
        print(f"  → Started export task {i+1}/{n_images}: {file_prefix}_{i:03d}")

    print(f"✓ Initiated {n_images} export tasks to Google Drive folder: {folder_name}")
    return tasks

In [None]:
# Export to Google Drive
print("[Step 4/4] Exporting images to Google Drive...")
tasks = export_images_to_drive(
    image_collection=processed_images,
    folder_name=EXPORT_FOLDER,
    file_prefix='LowMethane_S2',
    export_crs=EXPORT_CRS,
    max_pixels=MAX_PIXELS
)

## 8. Summary and Results

Monitor export progress at: https://code.earthengine.google.com/tasks

In [None]:
# Print summary
print("\n" + "="*70)
print("WORKFLOW COMPLETE")
print("="*70)
print(f"Study Date: {STUDY_DATE}")
print(f"Study Area: Continental United States")
print(f"Methane Threshold: {METHANE_PERCENTILE}th percentile")
print(f"Cloud Threshold: < {CLOUD_THRESHOLD}%")
print(f"Output Resolution: {TARGET_RESOLUTION}m")
print(f"Export Location: Google Drive/{EXPORT_FOLDER}")
print(f"Number of Images Exported: {len(tasks)}")
print("\nMonitor export progress at: https://code.earthengine.google.com/tasks")
print("="*70)

---

## Next Steps

### Potential Extensions:
1. **Time Series Analysis**: Extend to multiple dates to track temporal patterns
2. **Statistical Analysis**: Correlate methane levels with land use/land cover
3. **Machine Learning**: Train models to predict methane concentrations from surface features
4. **Validation**: Compare with ground-based measurements

### Skills Demonstrated:
- Remote sensing data processing
- Geospatial analysis with Google Earth Engine
- Python programming with scientific libraries
- Reproducible research workflows
- Environmental data analysis

---

*This code sample was prepared for predoctoral research assistant applications in environmental economics, climate science, and applied economics research positions.*