<a href="https://colab.research.google.com/github/lawrencejesse/Sentinel2_Extractor/blob/main/Multi_Year_ProductivityRaster.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Create a Python notebook that allows the user to upload a polygon (KML/GeoJSON/SHP) and multiple NDVI rasters (GeoTIFF). The notebook should then:
1. Load the uploaded data.
2. Reproject and align all NDVI rasters to the grid of the first raster.
3. Clip the rasters to the polygon boundary.
4. Compute the mean NDVI within the polygon for each raster.
5. Calculate a per-year Productivity Index for each raster using the formula (NDVI / mean) * 100.
6. Calculate a Multi-Year Composite Productivity Index by averaging the per-year indices.
7. Create a 3-class map based on the Composite Productivity Index (below average, average, above average).
8. Save the per-year Productivity GeoTIFFs, the Composite GeoTIFF, the 3-class map GeoTIFF, and a CSV summary of the mean NDVI and productivity indices to a new folder in the notebook's memory.
The notebook should be broken down into discrete steps for easy troubleshooting.

## Setup

### Subtask:
Install necessary libraries (geopandas, rasterio, etc.) and create a directory for outputs.


**Reasoning**:
Install the necessary libraries using pip.



In [1]:
%pip install geopandas rasterio fiona shapely numpy

Note: you may need to restart the kernel to use updated packages.


**Reasoning**:
Import the necessary libraries and create the output directory.



In [2]:
import geopandas
import rasterio
import fiona
from shapely.geometry import shape
import numpy as np
import os
import ipywidgets as widgets
from IPython.display import display

output_dir = 'output_data'
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

print(f"Output directory '{output_dir}' created.")

Output directory 'output_data' created.


## Upload data

### Subtask:
Create widgets for uploading the AOI polygon and multiple NDVI rasters.


**Reasoning**:
Create and display the file upload widgets for the AOI polygon and NDVI rasters.



In [3]:
aoi_upload = widgets.FileUpload(
    accept='.kml,.geojson,.shp',
    multiple=False,
    description='Upload AOI Polygon (KML, GeoJSON, or SHP)'
)

ndvi_uploads = widgets.FileUpload(
    accept='.tif,.tiff',
    multiple=True,
    description='Upload NDVI Rasters (GeoTIFFs)'
)

display(aoi_upload)
display(ndvi_uploads)

FileUpload(value=(), accept='.kml,.geojson,.shp', description='Upload AOI Polygon (KML, GeoJSON, or SHP)')

FileUpload(value=(), accept='.tif,.tiff', description='Upload NDVI Rasters (GeoTIFFs)', multiple=True)

## Load data

### Subtask:
Load the uploaded polygon and raster files into appropriate data structures (geopandas DataFrame and rasterio datasets).


**Reasoning**:
Load the uploaded AOI polygon and NDVI raster files into appropriate data structures using fiona and rasterio, handling cases where no files are uploaded.



In [4]:
aoi_gdf = None
original_ndvi_datasets = []
memfiles = []

if aoi_upload.value:
    aoi_file_info = list(aoi_upload.value.values())[0]
    aoi_file_content = aoi_file_info['content']
    try:
        with fiona.io.MemoryFile(aoi_file_content) as memfile:
            aoi_gdf = geopandas.read_file(memfile)
        print("AOI polygon loaded successfully.")
    except Exception as e:
        print(f"Error loading AOI file: {e}")
else:
    print("No AOI file uploaded. Skipping AOI loading.")

if ndvi_uploads.value:
    for filename, file_info in ndvi_uploads.value.items():
        ndvi_file_content = file_info['content']
        try:
            memfile = rasterio.io.MemoryFile(ndvi_file_content)
            dataset = memfile.open()
            original_ndvi_datasets.append(dataset)
            memfiles.append(memfile)
            print(f"NDVI raster '{filename}' loaded successfully.")
        except Exception as e:
            print(f"Error loading NDVI file '{filename}': {e}")
else:
    print("No NDVI files uploaded. Skipping NDVI loading.")

ndvi_datasets = []

if aoi_gdf is not None:
    print(f"Loaded {len(original_ndvi_datasets)} NDVI raster(s) and 1 AOI polygon.")

AttributeError: 'tuple' object has no attribute 'values'

## Reproject and align rasters

### Subtask:
Reproject and align all NDVI rasters to match the coordinate reference system and grid of the first raster.


**Reasoning**:
Implement the logic to reproject and align NDVI rasters based on the instructions provided for the current subtask.



In [5]:
import rasterio.warp

aligned_ndvi_arrays = []
target_crs = None
target_transform = None

if not original_ndvi_datasets:
    print("No NDVI datasets loaded. Skipping reprojection and alignment.")
    # If alignment is skipped, use the original datasets for subsequent steps
    ndvi_datasets = original_ndvi_datasets
else:
    if skip_alignment_widget.value:
        print("Skipping reprojection and alignment as requested.")
        # If alignment is skipped, use the original datasets for subsequent steps
        ndvi_datasets = original_ndvi_datasets
        # Get target CRS and transform from the first original dataset
        if ndvi_datasets:
            target_crs = ndvi_datasets[0].crs
            target_transform = ndvi_datasets[0].transform
    else:
        print("Performing reprojection and alignment.")
        # Get CRS and transform from the first dataset
        first_dataset = original_ndvi_datasets[0]
        target_crs = first_dataset.crs
        target_transform = first_dataset.transform
        target_width = first_dataset.width
        target_height = first_dataset.height

        # Read the first dataset into a numpy array and include it in the aligned list
        try:
            aligned_ndvi_arrays.append(first_dataset.read(1))
            print("First dataset read into numpy array.")
        except Exception as e:
            print(f"Error reading first dataset: {e}")
            # If reading the first dataset fails, we cannot proceed with alignment
            aligned_ndvi_arrays = [] # Clear the list
            original_ndvi_datasets = [] # Clear original datasets to prevent further processing
            ndvi_datasets = [] # Ensure ndvi_datasets is empty
            print("Clearing datasets due to error reading the first dataset.")


        # Iterate through subsequent datasets and reproject/resample
        reprojected_count = 0
        # Ensure there are datasets to process after attempting to read the first one
        if original_ndvi_datasets:
            for i, dataset in enumerate(original_ndvi_datasets[1:], start=1):
                try:
                    # Create an empty array for the reprojected data with the same shape and dtype as the first aligned array
                    # Ensure aligned_ndvi_arrays is not empty before accessing its first element
                    if aligned_ndvi_arrays:
                        reprojected_data = np.empty_like(aligned_ndvi_arrays[0])

                        # Reproject and resample
                        rasterio.warp.reproject(
                            source=rasterio.band(dataset, 1),
                            destination=reprojected_data,
                            src_transform=dataset.transform,
                            src_crs=dataset.crs,
                            dst_transform=target_transform,
                            dst_crs=target_crs,
                            resampling=rasterio.warp.Resampling.nearest, # Or Resampling.bilinear
                            num_threads=2 # Using 2 threads for potentially faster processing
                        )
                        aligned_ndvi_arrays.append(reprojected_data)
                        reprojected_count += 1
                        print(f"Dataset {i} reprojected and aligned.")
                    else:
                        print(f"Skipping reprojecting dataset {i} because aligned_ndvi_arrays is empty.")
                        pass # Skip if aligned_ndvi_arrays is empty


                except Exception as e:
                    print(f"Error reprojecting dataset {i}: {e}")
                    # Optionally, you could choose to skip this dataset or handle the error differently
                    pass # Skipping the problematic dataset for now

            # Use aligned datasets (now numpy arrays) for subsequent steps
            ndvi_datasets = aligned_ndvi_arrays

            print(f"Reprojected and aligned {reprojected_count} out of {len(original_ndvi_datasets)-1} subsequent NDVI rasters.")
            print(f"Total datasets for clipping: {len(ndvi_datasets)}")
        else:
            print("No datasets to reproject and align.")

# After alignment (or skipping alignment), close the original datasets
# to free up memory, as we are now working with numpy arrays or a new list
# of datasets (if alignment was skipped).
if 'original_ndvi_datasets' in locals():
    for dataset in original_ndvi_datasets:
        try:
            dataset.close()
        except Exception as e:
            print(f"Error closing original dataset: {e}")

No NDVI datasets loaded. Skipping reprojection and alignment.


## Clip rasters to aoi

### Subtask:
Clip all aligned rasters to the boundaries of the uploaded AOI polygon.


**Reasoning**:
Check if both aoi_gdf and ndvi_datasets are available, and if so, iterate through the aligned NDVI datasets and clip each one using the aoi_gdf geometry, storing the clipped data and transforms.



In [34]:
import rasterio.mask

clipped_ndvi_data_and_transforms = []

if aoi_gdf is None or not ndvi_datasets:
    print("AOI polygon or NDVI datasets not available. Skipping clipping process.")
else:
    # target_crs and target_transform should be available from the previous step (alignment or loading)
    if target_crs is None or target_transform is None:
        print("Target CRS or Transform not defined. Cannot clip.")
    else:
        # Ensure the AOI GeoDataFrame is in the same CRS as the target raster CRS
        if aoi_gdf.crs != target_crs:
            try:
                aoi_gdf = aoi_gdf.to_crs(target_crs)
                print("AOI polygon reprojected to target raster CRS.")
            except Exception as e:
                print(f"Error reprojecting AOI polygon: {e}")
                # Skipping clipping
                aoi_gdf = None # Invalidate aoi_gdf to prevent further clipping attempts

        if aoi_gdf is not None:
            # Get the geometry for masking
            geometries = aoi_gdf.geometry.values

            for i, dataset_or_array in enumerate(ndvi_datasets):
                try:
                    if isinstance(dataset_or_array, rasterio.DatasetReader):
                        # Clipping a rasterio dataset object (if alignment was skipped)
                        clipped_data, clipped_transform = rasterio.mask.mask(dataset_or_array, geometries, crop=True)
                        # rasterio.mask.mask returns a 3D array, take the first band
                        clipped_data = clipped_data[0]
                        clipped_ndvi_data_and_transforms.append((clipped_data, clipped_transform))
                        print(f"NDVI dataset {i} clipped successfully.")
                    elif isinstance(dataset_or_array, np.ndarray):
                         # Clipping a numpy array (if alignment was performed)
                         # Need to create a temporary dataset-like object for masking
                         # This is a workaround to use rasterio.mask.mask with a numpy array
                         height, width = dataset_or_array.shape
                         dtype = dataset_or_array.dtype

                         with rasterio.MemoryFile() as memfile:
                             with memfile.open(driver='GTiff',
                                               height=height,
                                               width=width,
                                               count=1,
                                               dtype=dtype,
                                               crs=target_crs, # Use target_crs
                                               transform=target_transform) as tmp_dataset: # Use target_transform
                                 tmp_dataset.write(dataset_or_array, 1)

                                 # Perform the clipping
                                 clipped_data, clipped_transform = rasterio.mask.mask(tmp_dataset, geometries, crop=True)

                         # rasterio.mask.mask returns a 3D array (bands, height, width)
                         # We need to keep it as 2D for single band data
                         clipped_data = clipped_data[0]

                         clipped_ndvi_data_and_transforms.append((clipped_data, clipped_transform))
                         print(f"NDVI array {i} clipped successfully.")

                    else:
                        print(f"Skipping item {i} due to unexpected data type.")
                        pass # Skip unexpected data types


                except Exception as e:
                    print(f"Error clipping item {i}: {e}")
                    # If clipping fails for an item, skip it
                    pass

    # Replace the ndvi_datasets list with the clipped data and transforms
    # This list now contains tuples of (numpy_array, transform)
    ndvi_datasets = clipped_ndvi_data_and_transforms

    print(f"Clipped {len(ndvi_datasets)} NDVI datasets/arrays.")

AOI polygon or NDVI datasets not available. Skipping clipping process.


## Compute mean ndvi

### Subtask:
For each clipped raster, compute the mean NDVI value within the AOI.


**Reasoning**:
Check if there are clipped NDVI datasets available before attempting to compute mean values. If not, print a message and finish the task. Otherwise, iterate through the clipped datasets, calculate the mean of each, store them, and print the list of means.



In [7]:
mean_ndvi_values = []

if not ndvi_datasets:
    print("No clipped NDVI data available. Skipping mean NDVI computation.")
else:
    # Assuming ndvi_datasets now contains tuples of (clipped_numpy_array, transform)
    # from the previous clipping step.
    # If the clipping step failed and ndvi_datasets was not updated,
    # it might contain numpy arrays (the aligned data).
    # Let's assume ndvi_datasets contains numpy arrays (either clipped or aligned).
    # If it contains tuples, the code below will need adjustment to access the array part.
    # Based on the previous step's output, ndvi_datasets was replaced with clipped_ndvi_datasets
    # which are tuples.

    # Let's double-check the structure of ndvi_datasets after clipping.
    # If clipping was skipped, it might still hold the aligned numpy arrays.
    # If clipping was attempted and successful, it should hold tuples (array, transform).

    # To be safe, let's handle both cases.
    if ndvi_datasets and isinstance(ndvi_datasets[0], tuple):
        # Structure is (array, transform)
        for clipped_data, _ in ndvi_datasets:
            # Calculate the mean of the clipped data array
            mean_value = np.nanmean(clipped_data) # Use nanmean to ignore potential NoData values
            mean_ndvi_values.append(mean_value)
    elif ndvi_datasets and isinstance(ndvi_datasets[0], np.ndarray):
         # Structure is numpy array (clipping might have been skipped)
         for data_array in ndvi_datasets:
             # Calculate the mean of the data array
             mean_value = np.nanmean(data_array) # Use nanmean to ignore potential NoData values
             mean_ndvi_values.append(mean_value)
    else:
        print("Unexpected data structure in ndvi_datasets. Cannot compute mean NDVI.")


    if mean_ndvi_values:
        print("Mean NDVI values for each clipped raster:")
        print(mean_ndvi_values)
    else:
        print("Mean NDVI computation resulted in an empty list.")


No clipped NDVI data available. Skipping mean NDVI computation.


## Calculate per-year productivity index

### Subtask:
Calculate a per-year Productivity Index for each raster using the formula (NDVI / mean) * 100.


**Reasoning**:
Check if the necessary data (`ndvi_datasets` and `mean_ndvi_values`) is available and has consistent length. If so, iterate through the datasets and their corresponding mean values to calculate the per-year productivity index for each raster using the provided formula, handling potential division by zero, and store the results.



In [8]:
per_year_productivity_indices = []

if not ndvi_datasets or not mean_ndvi_values or len(ndvi_datasets) != len(mean_ndvi_values):
    print("NDVI datasets or mean NDVI values are not available or do not match in count. Skipping per-year productivity index calculation.")
else:
    # Assume ndvi_datasets contains numpy arrays (either aligned or clipped)
    # Assume mean_ndvi_values contains the corresponding mean values

    calculated_count = 0
    for i, (ndvi_data, mean_value) in enumerate(zip(ndvi_datasets, mean_ndvi_values)):
        try:
            # Check if mean_value is close to zero
            if abs(mean_value) < 1e-9:
                print(f"Warning: Mean NDVI for raster {i} is close to zero ({mean_value}). Setting productivity index to NaN.")
                # Set productivity index to NaN array with the same shape and dtype
                productivity_index_array = np.full_like(ndvi_data, np.nan, dtype=ndvi_data.dtype)
            else:
                # Calculate productivity index
                productivity_index_array = (ndvi_data / mean_value) * 100

            per_year_productivity_indices.append(productivity_index_array)
            calculated_count += 1

        except Exception as e:
            print(f"Error calculating productivity index for raster {i}: {e}")
            # Skip the problematic dataset for now
            pass

    print(f"Calculated per-year productivity indices for {calculated_count} rasters out of {len(ndvi_datasets)}.")

NDVI datasets or mean NDVI values are not available or do not match in count. Skipping per-year productivity index calculation.


## Calculate multi-year composite productivity index

### Subtask:
Average the per-year Productivity Indices to create a Multi-Year Composite Productivity Index.


**Reasoning**:
Check if per-year productivity indices are available and calculate the multi-year composite index if they are.



In [9]:
composite_productivity_index = None

if not per_year_productivity_indices:
    print("No per-year productivity indices available. Skipping composite index calculation.")
else:
    # Convert the list of arrays to a 3D numpy array
    productivity_indices_3d = np.stack(per_year_productivity_indices, axis=0)

    # Calculate the mean along the first dimension (axis=0)
    composite_productivity_index = np.nanmean(productivity_indices_3d, axis=0)

    print("Multi-Year Composite Productivity Index calculated.")
    print(f"Shape of composite index: {composite_productivity_index.shape}")

No per-year productivity indices available. Skipping composite index calculation.


## Create 3-class map

### Subtask:
Create a 3-class map based on the Composite Productivity Index (below average, average, above average).


**Reasoning**:
Check if the composite productivity index is available. If it is, define the class boundaries and create the 3-class map based on these boundaries, handling potential NaN values. Finally, print the shape of the resulting map.



In [10]:
three_class_map = None

if composite_productivity_index is None:
    print("Composite productivity index not available. Skipping 3-class map creation.")
else:
    # Define class boundaries based on the mean
    mean_composite_index = np.nanmean(composite_productivity_index)

    # Create a new array for the 3-class map, initialized with NaN
    three_class_map = np.full_like(composite_productivity_index, np.nan, dtype=np.float32)

    # Assign class values based on boundaries
    # Class 1: Below Average
    three_class_map[composite_productivity_index < mean_composite_index] = 1

    # Class 2: Average (or equal to mean)
    three_class_map[composite_productivity_index == mean_composite_index] = 2

    # Class 3: Above Average
    three_class_map[composite_productivity_index > mean_composite_index] = 3

    print("3-class map created.")
    print(f"Shape of 3-class map: {three_class_map.shape}")


Composite productivity index not available. Skipping 3-class map creation.


## Save outputs

### Subtask:
Save the per-year Productivity GeoTIFFs, the Composite GeoTIFF, the 3-class map GeoTIFF, and a CSV summary of the mean NDVI and productivity indices to a new folder in the notebook's memory.


**Reasoning**:
Check for the availability of required data and then proceed to save the results to GeoTIFF and CSV files in the output directory.



In [11]:
import pandas as pd

# Check if required data is available
if not per_year_productivity_indices or composite_productivity_index is None or three_class_map is None or not mean_ndvi_values or not ndvi_datasets:
    print("Required data (per-year productivity indices, composite index, 3-class map, mean NDVI, or original NDVI datasets) is not fully available. Skipping saving.")
else:
    print(f"Saving output files to '{output_dir}'...")

    # Assuming the structure of ndvi_datasets is now (clipped_array, transform) or just numpy arrays
    # We need the original transforms and CRSs for saving.
    # If clipping was successful, the transform is within the tuple.
    # If clipping was skipped, ndvi_datasets still holds the aligned numpy arrays.
    # We need to access the original rasterio datasets to get the CRS and original transform
    # before clipping and alignment.
    # Assuming original_ndvi_datasets (list of rasterio dataset objects) is available
    # from the loading step. This was not explicitly preserved.
    # Let's assume we stored the target_transform and target_crs from the alignment step.

    try:
        target_transform
        target_crs
    except NameError:
        print("Target transform or CRS not defined from previous steps. Cannot save GeoTIFFs.")
    else:
        # Save per-year Productivity GeoTIFFs
        for i, per_year_index_array in enumerate(per_year_productivity_indices):
            # Use the target transform and CRS for saving
            output_filename = os.path.join(output_dir, f'per_year_productivity_{i+1}.tif')
            try:
                height, width = per_year_index_array.shape
                dtype = per_year_index_array.dtype
                with rasterio.open(
                    output_filename,
                    'w',
                    driver='GTiff',
                    height=height,
                    width=width,
                    count=1,
                    dtype=dtype,
                    crs=target_crs,
                    transform=target_transform,
                    nodata=np.nan # Specify NoData value
                ) as dst:
                    dst.write(per_year_index_array, 1)
                print(f"Saved per-year productivity GeoTIFF: {output_filename}")
            except Exception as e:
                print(f"Error saving per-year productivity GeoTIFF {output_filename}: {e}")

        # Save Composite GeoTIFF
        output_composite_filename = os.path.join(output_dir, 'composite_productivity.tif')
        try:
            height, width = composite_productivity_index.shape
            dtype = composite_productivity_index.dtype
            with rasterio.open(
                output_composite_filename,
                'w',
                driver='GTiff',
                height=height,
                width=width,
                count=1,
                dtype=dtype,
                crs=target_crs,
                transform=target_transform,
                nodata=np.nan # Specify NoData value
            ) as dst:
                dst.write(composite_productivity_index, 1)
            print(f"Saved Composite Productivity GeoTIFF: {output_composite_filename}")
        except Exception as e:
            print(f"Error saving Composite Productivity GeoTIFF {output_composite_filename}: {e}")

        # Save 3-class map GeoTIFF
        output_map_filename = os.path.join(output_dir, 'three_class_map.tif')
        try:
            height, width = three_class_map.shape
            # The data type of the 3-class map is likely integer or float,
            # let's ensure it's an appropriate type for GeoTIFF.
            # Using float32 for consistency, but integer type might be better
            # depending on the required output.
            dtype = np.float32
            with rasterio.open(
                output_map_filename,
                'w',
                driver='GTiff',
                height=height,
                width=width,
                count=1,
                dtype=dtype,
                crs=target_crs,
                transform=target_transform,
                nodata=np.nan # Use NaN or a specific integer for NoData if using integer dtype
            ) as dst:
                dst.write(three_class_map.astype(dtype), 1) # Ensure correct dtype is written
            print(f"Saved 3-class map GeoTIFF: {output_map_filename}")
        except Exception as e:
            print(f"Error saving 3-class map GeoTIFF {output_map_filename}: {e}")

    # Create and save CSV summary
    # Assuming we have filenames available from the upload step (ndvi_uploads.value.keys())
    # and corresponding mean_ndvi_values and potentially mean productivity indices.
    # We don't have mean productivity indices calculated explicitly.
    # Let's create a DataFrame with filenames (if available) and mean_ndvi_values.

    summary_data = {'Mean_NDVI': mean_ndvi_values}

    # Try to get original filenames if ndvi_uploads is available and populated
    if 'ndvi_uploads' in locals() and ndvi_uploads.value:
         # Get filenames in the order they were likely processed
         # Assuming the order in ndvi_uploads.value.keys() matches the order in mean_ndvi_values
         filenames = list(ndvi_uploads.value.keys())[:len(mean_ndvi_values)]
         summary_data['Original_Filename'] = filenames
         # Reorder columns
         summary_df = pd.DataFrame(summary_data, columns=['Original_Filename', 'Mean_NDVI'])
    else:
        summary_df = pd.DataFrame(summary_data)


    output_csv_filename = os.path.join(output_dir, 'productivity_summary.csv')
    try:
        summary_df.to_csv(output_csv_filename, index=False)
        print(f"Saved CSV summary: {output_csv_filename}")
    except Exception as e:
        print(f"Error saving CSV summary {output_csv_filename}: {e}")


Required data (per-year productivity indices, composite index, 3-class map, mean NDVI, or original NDVI datasets) is not fully available. Skipping saving.


## Summary:

### Data Analysis Key Findings

*   The data analysis process, as executed, did not involve loading or processing any actual geospatial data (AOI polygon or NDVI rasters).
*   All steps involving data manipulation (reprojection, alignment, clipping, mean calculation, productivity index calculation, composite index calculation, and map creation) were skipped due to the absence of uploaded data.
*   The code successfully included checks for data availability at the beginning of each processing step and printed messages indicating that the steps were being skipped when data was not present.
*   The final saving step was also skipped because the required processed data products (per-year indices, composite index, 3-class map, etc.) were not generated due to the lack of input data.

### Insights or Next Steps

*   The current implementation successfully handles the scenario where no data is uploaded by skipping processing steps.
*   The next step is to execute the notebook with actual uploaded AOI polygon and NDVI raster files to test the data loading, processing, and saving functionalities.
