<a href="https://colab.research.google.com/github/hucarlos08/GEE-CIMAT/blob/main/PCA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Principal Component Analysis (PCA) for Image Transformation

Principal Component Analysis (PCA) is another powerful technique leveraging matrix algebra for image transformation, but unlike the Tasseled Cap Transformation (TCT) with its fixed coefficients, PCA is **data-driven**. It derives a new set of coordinate axes, called **Principal Components (PCs)**, based entirely on the statistical properties (variance and covariance) of the input image data itself.

## The Core Idea: Variance Maximization

Imagine your multi-band image data plotted in a multi-dimensional space (one dimension per band). Often, the original bands are correlated, meaning the data cloud is somewhat elongated or slanted rather than spherical. PCA aims to find a new set of orthogonal (uncorrelated) axes that better align with the spread of this data cloud.

1.  **First Principal Component (PC1):** This new axis is oriented in the direction of the **maximum variance** within the data. It captures the most significant pattern or trend across all input bands.
2.  **Second Principal Component (PC2):** This axis is orthogonal (perpendicular) to PC1 and points in the direction of the **next highest variance** remaining in the data.
3.  **Subsequent Components (PC3, PC4, etc.):** Each subsequent PC is orthogonal to all preceding components and captures the maximum remaining variance.

The result is a transformation of the original correlated bands into a new set of uncorrelated PC bands. Often, the first few PCs capture the vast majority of the information (variance) present in the original dataset, while later PCs tend to represent finer details or noise.

## The Matrix Algebra Behind PCA

PCA relies heavily on linear algebra operations:

1.  **Data Representation:** The input image is treated as a collection of pixel vectors, where each vector contains the reflectance values for the different bands. This is often represented as an `ee.Array` image in GEE.
2.  **Centering/Scaling (Optional but Recommended):** Since PCA is based on variance, it's sensitive to the scale of the input bands. To prevent bands with larger numerical ranges from dominating the analysis, the data is often:
    *   **Centered:** Subtracting the mean value of each band from all pixels in that band.
    *   **Standardized:** Centering the data and then dividing each band by its standard deviation. PCA on standardized data is equivalent to performing PCA on the **correlation matrix** instead of the covariance matrix, giving each band equal weight regardless of its original variance.
3.  **Covariance Matrix Calculation:** The core statistical relationship between the bands is captured in the **covariance matrix** (or correlation matrix if standardized). This matrix shows how each band varies with every other band. In GEE, this can be calculated using reducers like `ee.Reducer.covariance()` applied over the image or directly on an array image (`arrayImage.covariance()`).
4.  **Eigenvalue Decomposition:** This is the key mathematical step. The covariance matrix is decomposed into its **eigenvalues** and **eigenvectors**:
    *   **Eigenvectors:** These vectors define the directions of the new principal component axes in the original band space. Each eigenvector corresponds to one PC.
    *   **Eigenvalues:** These scalar values represent the amount of variance explained by their corresponding eigenvector (and thus, the corresponding PC). They are typically sorted in descending order, so PC1 has the largest eigenvalue. GEE's `matrix.eigen()` function performs this decomposition.
5.  **Projection:** The final step is to transform the original (centered/standardized) pixel data into the new PC space. This is done by projecting the pixel vectors onto the new axes defined by the eigenvectors. Mathematically, this is achieved via **matrix multiplication**: `PC_Scores = Centered_Data * Eigenvector_Matrix`.

## Why Use PCA in Remote Sensing?

*   **Dimensionality Reduction:** Compresses information from many (potentially correlated) bands (e.g., hyperspectral data, multi-temporal stacks) into fewer uncorrelated PC bands, simplifying subsequent analysis like classification.
*   **Noise Reduction:** Later PCs often capture random noise. By discarding these components and reconstructing the image from the first few significant PCs, noise can be reduced.
*   **Feature Enhancement:** Sometimes specific features or patterns that are obscured across multiple bands become more apparent in one or two specific PCs.
*   **Change Detection:** Performing PCA on stacked images from different dates (or on difference images) can highlight areas of change, often concentrated in the first few PCs.

## Implementation in GEE

The general workflow in GEE involves:
1.  Selecting the input bands from an image.
2.  Converting the image to an `ee.Array` image.
3.  Calculating means and potentially standard deviations for centering/standardization.
4.  Calculating the covariance or correlation matrix.
5.  Performing eigenvalue decomposition on the matrix.
6.  Projecting the centered/standardized array image onto the eigenvectors using `matrixMultiply`.
7.  Converting the resulting array image (containing PC scores) back into a multi-band image using `arrayFlatten`.

In [1]:
import ee
import folium
from IPython.display import display

# --------------------------------------------------------------------
# 1. Initialize Earth Engine
# --------------------------------------------------------------------
try:
    ee.Initialize()
    print("Earth Engine initialized successfully.")
except Exception as e:
    print("Initializing Earth Engine...")
    ee.Authenticate()  # Will prompt for auth tokens
    ee.Initialize(project='ee-cimat')
    print("Earth Engine authenticated and initialized.")

# --------------------------------------------------------------------
# 2. Define a helper function to add Earth Engine layers to folium
# --------------------------------------------------------------------
def add_ee_layer(self, ee_image_object, vis_params, name):
    """Adds a given EE image to a Folium map."""
    if not ee_image_object:
        print(f"Warning: no image to display for {name}")
        return

    try:
        map_id_dict = ee.Image(ee_image_object).getMapId(vis_params)
        folium.raster_layers.TileLayer(
            tiles=map_id_dict['tile_fetcher'].url_format,
            attr='Map Data © Google Earth Engine',
            name=name,
            overlay=True,
            control=True
        ).add_to(self)
        print(f"Layer '{name}' added.")
    except Exception as exc:
        print(f"Could not add layer '{name}': {exc}")

folium.Map.add_ee_layer = add_ee_layer

# --------------------------------------------------------------------
# 3. Prepare the Data
# --------------------------------------------------------------------
# Define a region of interest
point = ee.Geometry.Point([-118.6947, 47.3141])  # Example region
Map_location = [47.3141, -118.6947]  # lat, lon order for folium

# Load the first image from an L8 TOA ImageCollection filtered by date/region
imageL8 = (ee.ImageCollection('LANDSAT/LC08/C02/T1_TOA')
           .filterBounds(point)
           .filterDate('2018-06-01', '2018-09-01')
           .sort('CLOUD_COVER')
           .first())

# Visualization for the original true-color composite
true_color_params = {
    'bands': ['B4', 'B3', 'B2'],  # Red, Green, Blue
    'min': 0.0,
    'max': 0.3
}

# Define which bands to use in the PCA
pca_bands = ['B2','B3','B4','B5','B6','B7','B10','B11']

# --------------------------------------------------------------------
# 4. STANDARDIZE the data for PCA (The correct Earth Engine way)
# --------------------------------------------------------------------
# Select the bands we want to use for PCA
selected_image = imageL8.select(pca_bands)

# Calculate mean and standard deviation across the entire image
means = selected_image.reduceRegion(
    reducer=ee.Reducer.mean(),
    geometry=selected_image.geometry(),
    scale=30,
    maxPixels=1e9
)

stdDevs = selected_image.reduceRegion(
    reducer=ee.Reducer.stdDev(),
    geometry=selected_image.geometry(),
    scale=30,
    maxPixels=1e9
)

# Convert dictionary to lists
mean_list = ee.Image.constant(means.values(pca_bands))
stdDev_list = ee.Image.constant(stdDevs.values(pca_bands))

# Calculate (pixel - mean) / stdDev
centered = selected_image.subtract(mean_list)
standardized_image = centered.divide(stdDev_list)

# Display info about standardization
print("Means of bands:", means.getInfo())
print("Standard deviations of bands:", stdDevs.getInfo())

# --------------------------------------------------------------------
# 5. Convert the standardized bands into a 2D array image
# --------------------------------------------------------------------
array_image = standardized_image.toArray()

# --------------------------------------------------------------------
# 6. Compute the Covariance Matrix
# --------------------------------------------------------------------
covar_dict = array_image.reduceRegion(
    reducer=ee.Reducer.covariance(),
    maxPixels=1e9
)
covar_array = ee.Array(covar_dict.get('array'))

# --------------------------------------------------------------------
# 7. Do Eigen Decomposition to obtain eigenvalues & eigenvectors
# --------------------------------------------------------------------
eigens = covar_array.eigen()

# Slice out just the eigenvalues (1D array)
eigenvalues = eigens.slice(1, 0, 1).project([0])

# Slice out just the eigenvectors (2D array)
eigen_vectors = eigens.slice(1, 1)

# --------------------------------------------------------------------
# 8. Project the Standardized Array Data onto the Eigenvectors
# --------------------------------------------------------------------
principal_components = ee.Image(eigen_vectors) \
    .matrixMultiply(array_image.toArray(1))

# --------------------------------------------------------------------
# 9. Convert Principal Components Array into a Multi-Band Image
# --------------------------------------------------------------------
pc_image = principal_components \
    .arrayProject([0]) \
    .arrayFlatten([
        ['pc1','pc2','pc3','pc4','pc5','pc6','pc7','pc8']
    ])

# --------------------------------------------------------------------
# 10. Calculate and display variance explained by each PC
# --------------------------------------------------------------------
# Get eigenvalues as a list (will be used for variance calculation)
try:
    eigenvalue_list = eigenvalues.getInfo()
    total_variance = sum(eigenvalue_list)
    print("\nEigenvalues:", eigenvalue_list)

    # Calculate percentage of variance explained by each PC
    print("\nPercent variance explained:")
    for i, val in enumerate(eigenvalue_list):
        percent = (val / total_variance) * 100
        print(f"PC{i+1}: {percent:.2f}%")
except Exception as e:
    print("Could not calculate variance explained:", str(e))
    print("Continuing with visualization...")

# --------------------------------------------------------------------
# 11. Visualize Results in Folium
# --------------------------------------------------------------------
# Create a folium map
m = folium.Map(location=Map_location, zoom_start=8)

# Add original Landsat 8 true color
m.add_ee_layer(imageL8, true_color_params, "L8 True Color")

# Add the first principal component as grayscale
m.add_ee_layer(
    pc_image.select('pc1'),
    {'min': -2, 'max': 2, 'palette': ['black', 'white']},
    "PC1 (grayscale)"
)

# Add the second principal component as grayscale
m.add_ee_layer(
    pc_image.select('pc2'),
    {'min': -2, 'max': 2, 'palette': ['black', 'white']},
    "PC2 (grayscale)"
)

# Add the third principal component as grayscale
m.add_ee_layer(
    pc_image.select('pc3'),
    {'min': -2, 'max': 2, 'palette': ['black', 'white']},
    "PC3 (grayscale)"
)

# RGB composite of PC1, PC2, PC3
m.add_ee_layer(
    pc_image.select(['pc1', 'pc2', 'pc3']),
    {'min': [-2, -2, -2], 'max': [2, 2, 2]},
    "PCA RGB (pc1, pc2, pc3)"
)

# Add a layer control panel to toggle layers on/off
m.add_child(folium.LayerControl())

# Display the map in a Jupyter/Colab environment
display(m)

print("\nPCA with standardized data complete. Inspect the interactive map above.")

Initializing Earth Engine...
Earth Engine authenticated and initialized.
Means of bands: {'B10': 310.07550590006116, 'B11': 308.1221940798781, 'B2': 0.11811594188354033, 'B3': 0.11410512407270741, 'B4': 0.1261035440143871, 'B5': 0.25657820745207427, 'B6': 0.24553526384442345, 'B7': 0.16419869448089147}
Standard deviations of bands: {'B10': 6.304766936345135, 'B11': 5.776204563510937, 'B2': 0.02231708374495679, 'B3': 0.02995641117369862, 'B4': 0.0516798102496665, 'B5': 0.08686276970882281, 'B6': 0.07747979028983698, 'B7': 0.0682773145783062}

Eigenvalues: [3.5048860027943105, 1.3229361796554022, 0.4460081351815703, 0.16500617887467292, 0.0503696889665898, 0.019256324544092505, 0.005898380486718894, 0.0018303829421575937]

Percent variance explained:
PC1: 63.54%
PC2: 23.98%
PC3: 8.09%
PC4: 2.99%
PC5: 0.91%
PC6: 0.35%
PC7: 0.11%
PC8: 0.03%
Layer 'L8 True Color' added.
Layer 'PC1 (grayscale)' added.
Layer 'PC2 (grayscale)' added.
Layer 'PC3 (grayscale)' added.
Layer 'PCA RGB (pc1, pc2, pc3


PCA with standardized data complete. Inspect the interactive map above.
