<a href="https://colab.research.google.com/github/hucarlos08/GEE-CIMAT/blob/main/2_ImageCollections.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring Satellite Image Operations and Spectral Indices with GEE

This notebook delves into fundamental image processing techniques using Google Earth Engine (GEE) and Sentinel-2 satellite imagery. We will explore how to transform raw satellite reflectance data into meaningful information by calculating spectral indices, applying thresholds, and classifying pixels based on their spectral properties.

## Introduction

Satellite images provide a wealth of information about the Earth's surface, but raw reflectance values across different spectral bands often need further processing to extract specific features or characteristics. Spectral indices, which are mathematical combinations of different bands, are powerful tools designed to enhance specific phenomena like vegetation health, water content, or soil/mineral composition while minimizing noise and illumination effects.

Thresholding and classification techniques allow us to convert these continuous index values or raw reflectance data into discrete categories or thematic maps, enabling quantitative analysis and visualization of land cover types and conditions.

## Topics Covered

In this notebook, we will cover:

1.  **Loading and Visualizing Sentinel-2 Data:** Accessing and displaying cloud-filtered Surface Reflectance imagery.
2.  **Calculating Common Spectral Indices:**
    *   **NDVI (Normalized Difference Vegetation Index):** Quantifying vegetation greenness and health.
    *   **NDWI (Normalized Difference Water Index):** Highlighting water bodies and vegetation moisture content.
3.  **Thresholding and Masking:** Creating binary maps (e.g., vegetation vs. non-vegetation) by applying thresholds to index values.
4.  **Basic Statistics:** Calculating pixel counts and percentages for thresholded areas within a Region of Interest (ROI).
5.  **Multi-Class Mapping with `.where()`:** Classifying pixels into multiple categories based on index ranges.
6.  **Domain-Specific Indices (Geology Example):**
    *   **CMR (Clay Minerals Ratio):** Identifying potential clay mineral presence.
    *   **IOR (Iron Oxide Ratio):** Highlighting areas potentially rich in iron oxides.

## Workflow Overview

1.  **Setup:** Initialize GEE and configure the environment.
2.  **Data Acquisition:** Load and preprocess a Sentinel-2 SR image for a chosen location and time.
3.  **Index Calculation:** Compute various spectral indices (NDVI, NDWI, CMR, IOR).
4.  **Analysis & Classification:** Apply thresholds and use `.where()` for classification. Calculate basic statistics.
5.  **Visualization:** Display the original image, calculated indices, and classified maps using Folium for interactive comparison.

## Dataset

We will primarily use **Sentinel-2 Surface Reflectance (SR)** data provided by the Copernicus program (`COPERNICUS/S2_SR_HARMONIZED`). This dataset provides atmospherically corrected imagery at 10-20m resolution across multiple spectral bands, suitable for a wide range of land cover analyses.

## Learning Objectives

*   Load, preprocess (scale), and visualize Sentinel-2 imagery.
*   Understand the formulas and rationale behind NDVI, NDWI, CMR, and IOR.
*   Calculate spectral indices using GEE's built-in functions (`.normalizedDifference`, band math).
*   Apply thresholds (`.gt`, `.lte`) to create binary masks.
*   Use `.reduceRegion()` with `ee.Reducer.sum()` and `ee.Reducer.count()` for basic statistics.
*   Implement multi-class classification using chained `.where()` methods.
*   Visualize and compare different image processing results on an interactive map.

## 1. Setting Up Google Earth Engine in Colab

First, we need to install the necessary packages and authenticate with Earth Engine.

In [None]:
# Import libraries
import ee
import folium
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime

# Authenticate and initialize Earth Engine
try:
    ee.Initialize()
except Exception as e:
    ee.Authenticate()
    ee.Initialize(project="ee-cimat")

## 2. Basic GEE Concepts

Before we dive into ImageCollections, let's review some basic GEE concepts:

### Key Earth Engine Objects
- **Image**: A raster with bands (like a multi-band satellite image)
- **Feature**: A geometry with properties (like a point, line, or polygon with attributes)
- **ImageCollection**: A stack or time-series of Images
- **FeatureCollection**: A group of Features

### GEE's Client-Server Architecture
- Code runs in your browser/Colab (client)
- Computations happen on Google's servers
- Results are sent back to the client for display

Let's define our area of interest around CIMAT in Guanajuato.

In [None]:
import geemap

# Define coordinates for Parque Científico y Tecnológico de Yucatán
parque_coords = [21.1518, -89.6676]

# ... (resto del código) ...

# Create a point for Parque Científico y Tecnológico de Yucatán
parque_point = ee.Geometry.Point(parque_coords[1], parque_coords[0])

# Create a 10km buffer around the park for our region of interest
parque_area = parque_point.buffer(10000)  # 10km buffer

# Create a map centered on the park
map_parque = geemap.Map(center=parque_coords, zoom=12)
map_parque.add_basemap('HYBRID')
map_parque.addLayer(parque_area, {'color': 'red'}, 'Parque Buffer')
map_parque.addLayer(parque_point, {'color': 'blue'}, 'Parque')
map_parque

## 3. Introduction to ImageCollections

An **ImageCollection** is a stack or time-series of images. Think of it as a catalog of images with similar properties, typically covering different time periods.

### Common ImageCollections in GEE:
- **Landsat**: Landsat 5, 7, 8, 9 collections (30m resolution)
- **Sentinel**: Sentinel-1 (radar), Sentinel-2 (optical) collections (10m resolution)
- **MODIS**: Various products including land surface temperature, vegetation indices (250m-1km)
- **Climate data**: Precipitation, temperature collections

Let's start by exploring the Sentinel-2 ImageCollection, which provides high-resolution optical imagery.

In [None]:
# Define a date range
start_date = '2023-01-01'
end_date   = '2023-01-15'

# Filter the collection by date and location
filtered_s2  = (ee.ImageCollection("COPERNICUS/S2_SR")
              .filterBounds(parque_area)
              .filterDate(start_date, end_date)
              .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)))

# Print the number of images after filtering
print(f"Number of images after filtering: {filtered_s2.size().getInfo()}")

# Print information about each image in the filtered collection
image_list = filtered_s2.toList(filtered_s2.size())
for i in range(filtered_s2.size().getInfo()):
    img = ee.Image(image_list.get(i))
    date = ee.Date(img.get('system:time_start')).format('YYYY-MM-dd').getInfo()
    clouds = img.get('CLOUDY_PIXEL_PERCENTAGE').getInfo()
    # Get the tile ID or granule ID
    tile_id = img.get('MGRS_TILE').getInfo() if img.propertyNames().contains('MGRS_TILE').getInfo() else 'Unknown'
    # Get numeric ID to differentiate images
    img_id = img.get('system:index').getInfo()
    print(f"Image {i+1}: Date = {date}, Cloud cover = {clouds:.2f}%, Tile = {tile_id}, ID = {img_id}")

### Multiple images

You're seeing multiple images with the same date (like two images from 2023-01-04 with identical cloud cover) because of how Sentinel-2 imagery is structured. There are a few reasons this happens:

Different satellites:
 1. Sentinel-2 is actually a constellation of two satellites (Sentinel-2A and Sentinel-2B), which can capture images of the same area on the same day.
 2. Adjacent tiles: Sentinel-2 data is organized into tiles using the Military Grid Reference System (MGRS). Your area of interest (the 10km buffer around PCTY) likely falls on the boundary between two or more tiles. GEE returns all tiles that intersect with your area, which can result in multiple images from the same date.
 3. Different granules: Even within a single tile, Sentinel-2 data can be processed as separate granules, which appear as distinct images in Earth Engine with the same date.

In [None]:
# Get the first image from the filtered collection
first_image = ee.Image(filtered_s2.first())

# Define visualization parameters for true color (RGB)
vis_params = {
    'bands': ['B4', 'B3', 'B2'],  # R, G, B bands in Sentinel-2
    'min': 0,
    'max': 3000,
    'gamma': 1.4
}

# Create a map centered on PCTY
map_single = geemap.Map(center=parque_coords, zoom=12)
map_single.add_basemap('HYBRID')

# Add the image to the map
date = ee.Date(first_image.get('system:time_start')).format('YYYY-MM-dd').getInfo()
map_single.addLayer(first_image, vis_params, f'Sentinel-2 ({date})')
map_single.addLayer(parque_point, {'color': 'red'}, 'PCTY')
map_single

## When and Why to Use `mosaic()` in Google Earth Engine

In satellite image collections like Sentinel-2, a single location can be covered by **multiple image tiles on the same day**. This happens because:

- The satellite's path is divided into **overlapping tiles**.
- Each tile is a separate image, but they may share the same acquisition time and metadata (like cloud cover).
- As a result, when you filter by date and location, you may get **several images with the same timestamp**.

This redundancy can be problematic when:
- You want to visualize or export a **single image per date**.
- You are preparing a **time series analysis** and need exactly one image per time step.
- You want to eliminate **repetitive layers** that result from tile overlaps.

### What `mosaic()` Does

The `mosaic()` function helps by:
- Taking all overlapping images in a collection and **composing a single image**.
- **Overlaying** the images in draw order (last on top).
- Returning a **clean, seamless composite** that covers the union of all input images.

This is especially useful when working with:
- **Small areas** (like the region around CIMAT), which are likely to be covered by multiple tiles.
- **Quick previews** or **monthly composites**.
- **Visualizations** where image consistency is more important than pixel-level provenance.

### It depends on your goal

 **Use `mosaic()` if:**
- You want one clean image per date (especially for time series analysis).
- You want to hide tile boundaries and work at region-scale.
- You're producing composites like median, min, max, NDVI trends, etc.
- You're fine with “one best pixel per location” on each date.

⚠️ **Be cautious with `mosaic()` if:**
- You need per-tile quality control (e.g., selecting only the best tile).
- You're doing per-image atmospheric correction or QA (e.g., cloud masking).
- You need complete metadata for each pixel (e.g., data provenance or pixel QA bits).

**QA stands for Quality Assessment**, and in the context of satellite imagery it refers to metadata stored in special bands (like `QA60` for Sentinel-2) that encode information such as cloud masks, shadows, saturation, or snow. These bands are crucial for high-quality, pixel-level analysis and should be preserved when needed.

In summary, `mosaic()` is a powerful tool when you want to **resolve spatial redundancy** in your filtered collections and work with **one image per date** in a clean, intuitive way.
"""

In [None]:
# Create and visualize the mosaic
mosaic = same_day_images.mosaic()

# Create a unified geometry from all input tiles
union_geom = same_day_images.geometry()

# Create a new map
map2 = geemap.Map(center=parque_coords, zoom=10)
map2.add_basemap('HYBRID')

# Add the mosaic
map2.addLayer(mosaic, vis_params, f'Mosaic for {start_date}')

# Add the unified geometry in red
map2.addLayer(union_geom, {'color': 'red'}, 'Mosaic Geometry')

# Add PCTY location
map2.addLayer(parque_point, {'color': 'red'}, 'PCTY')

# Show the map
display(map2)


## 🧪 Try it Yourself!

- Change the cloud cover threshold to 30% and observe the number of available images.
- Try a different season (e.g., rainy season: June–September).
- Replace the visualization bands with `['B8', 'B4', 'B3']` for a false color composite.
