<a href="https://colab.research.google.com/github/liangchow/zindi-amazon-secret-runway/blob/main/utils/GeoTIFF%20modifications.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Remove NaN columns in GeoTiFF files

This code was necessary to fix a few images that included an extra column filled wiht NaN values. The extra columns (left or right side of image) were created by Google Earth Engine. Not sure why this happened, but it didn't impact the bounding box of the image. This as probably caused by a floating point calculation when converting between CRS.

**Attention:** This code save the modifed file to the compute node. You need to download the files and then overwrite or replace the original file on the drive.

# Install and import packages

In [5]:
!pip -q install rasterio

In [9]:
import os
import glob
import rasterio
import numpy as np

# Mount Google Drive

In [4]:
# mount your drive in case you have any new data uploaded there you want to use
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Functions

In [11]:
def check_nan_border_pixels(geotiff_path):
    """
    Checks if any pixel in the first or last column of a multi-band GeoTIFF
    file has NaN as a value.

    Args:
        geotiff_path (str): Path to the GeoTIFF file.

    Returns:
        tuple: A tuple containing:
            - bool: True if NaN is found in the border columns, False otherwise.
            - str: 'first' if NaN is found in the first column,
                   'last' if found in the last column,
                   None if no NaN is found.
    """
    with rasterio.open(geotiff_path) as src:
        image_data = src.read()  # Read all bands

        # Check for NaN in the first column
        if np.isnan(image_data[:, :, 0]).any():
            return True, 'first'

        # Check for NaN in the last column
        if np.isnan(image_data[:, :, -1]).any():
            return True, 'last'

    return False, None  # No NaN found


In [23]:
def remove_nan_column(geotiff_path, output_path, remove='first'):
    """
    Remove the first or last column from all bands in a GeoTIFF file.

    Args:
        geotiff_path (str): Path to the input GeoTIFF file.
        output_path (str): Path to the output GeoTIFF file.
        remove (str): 'first' to remove the first column, 'last' to remove the last column.
    """

    # Open the GeoTIFF file for reading
    with rasterio.open(geotiff_path) as src:
        # Read the original image data
        image_data = src.read()  # shape: (num_bands, height, width)

        # Get band descriptions (band names)
        band_descriptions = src.descriptions

        # Check the dimensions
        _, height, width = image_data.shape

        # Decide whether to remove the first or last column
        if remove == 'first':
            new_image_data = image_data[:, :, 1:]  # Remove first column
            # Update the transform for the removed column
            new_transform = src.transform * rasterio.Affine.translation(1, 0)
        elif remove == 'last':
            new_image_data = image_data[:, :, :-1]  # Remove last column
            new_transform = src.transform
        else:
            raise ValueError("Invalid value for 'remove'. Use 'first' or 'last'.")

        # Update metadata
        new_meta = src.meta.copy()
        new_meta.update({
            "width": new_image_data.shape[2],  # New width
            "transform": new_transform,  # Adjust the transform for geospatial coordinates
            "descriptions": band_descriptions  # Retain band descriptions (names)
        })

        # Write the modified image data to a new GeoTIFF file
        with rasterio.open(output_path, 'w', **new_meta) as dst:
            dst.write(new_image_data)
            # Set the descriptions to maintain band names
            dst.descriptions = band_descriptions



    print(f"Updated GeoTIFF saved to {output_path}")


In [16]:
def find_matching_mask_file(mask_files, id):
    """
    Finds the file in mask_files that matches the given id value.

    Args:
        mask_files (list): A list of mask file paths.
        id (int): The id value to search for.

    Returns:
        str: The path to the matching mask file, or None if not found.
    """
    for mask_file in mask_files:
        if f"Id_{id}" in os.path.basename(mask_file):
            return mask_file
    return None


# Main Code

In [25]:
# Edit the link to the directory containing the GeoTIFF files
image_directory = '/content/drive/MyDrive/Sentinel_temp'

# Get a list of all GeoTIFF files in the directories
image_files = glob.glob(f"{image_directory}/*.tif")
image_files.sort(key=os.path.basename)


for image_file in image_files:
    has_nan, nan_column = check_nan_border_pixels(image_file)
    if has_nan:
        filename = os.path.basename(image_file)
        print(f"Image {filename} has NaN in the {nan_column} column")
        #id = os.path.basename(image_file).split('_')[4].split('.')[0]
        remove_nan_column(image_file, filename, remove=nan_column)



Image Copy of Sentinel_AllBands_Training_Id_182.tif has NaN in the last column




Updated GeoTIFF saved to Copy of Sentinel_AllBands_Training_Id_182.tif
Image Sentinel_AllBands_Training_Id_182.tif has NaN in the last column




Updated GeoTIFF saved to Sentinel_AllBands_Training_Id_182.tif
