# GIS Image Pre-Processing 

The processing pipeline to transform globally available data types into spatially aligned image stacks with uniform pixel sizes involve the following steps.  

- **Specialized Processing**: Specialized, one-time, processing to transform certain data types to the required format (e.g., building footprints, and rainfall statistics).
- **Crop Data to UTM Zone**: Crop the (geographic) source data to the appropriate UTM zone that encompasses the region of interest (adding a +/- 1 deg buffer in longitude). Adding a buffer is necessary to enable image tiles near the border to be created.
- **Standardize NoData / NaN Values**: Replace NoData values and NaNs with a fixed value to ensures consistency across data types and to facilitate pixel imputation. 
- **Pixel Imputation (for NoData)**: This step is required to infill pixels that do not contain valid data, but requires further investigation to determine the appropriate sequence in the processing pipeline and the algorithm for pixel value imputation.
- **Transform the Geographic Data to UTM**: Transform the geographic data to UTM using the appropriate UTM zone.
- **Resample Data** : Resample each data type to a consistent pixel size.

Apart from one-time specialized pre-processing, the remaining processing steps outlined above will be implemented in this notebook using configurable Dataset classes.

### UTM Zones for Reference

```
EPSG:32628  28N	 18°W to 12°W
EPSG:32629  29N	 12°W to 6°W
EPSG:32630  30N	 6°W  to 0°
EPSG:32631  31N	 0°   to 6°E
EPSG:32632  32N  6°E  to 12°E
EPSG:32633  33N  12°E to 18°E
EPSG:32634  34N	 18°E to 24°E
EPSG:32635  35N	 24°E to 30°E
EPSG:32636  36N	 30°E to 36°E
EPSG:32637  37N	 36°E to 42°E
EPSG:32638  38N	 42°E to 48°E
EPSG:32639  39N	 48°E to 54°E
EPSG:32640  40N	 54°E to 60°E
EPSG:32641  41N	 60°E to 66°E
EPSG:32642  42N	 66°E to 72°E
EPSG:32643  43N	 72°E to 78°E
EPSG:32644  44N	 78°E to 84°E
EPSG:32645  45N	 84°E to 90°E
```

In [1]:
import os
from dataclasses import dataclass

# Import module that contains several convenience functions (e.g., gdal wrappers).
from gist_utils import *

# Adding path to gdal commands for local system.
os.environ['PATH'] += ':/Users/billk/miniforge3/envs/py39-pt/bin/' 

## Configure Data Type

In [2]:
@dataclass(frozen=True)
class DatasetConfig_Population:
    DATA_DIR:      str = "./Population"                   # Top-level folder for source data
    INPUT_TIF:     str = "./landscan-global-2022.tif"     # Source data GeoTiff
    OUT_DIR:       str = "./Population/output"            # Output folder to store processed data
    NODATA_SRC:    int = -2147483647                      # NoData value in src data  
    NODATA_SET:    int = -999                             # NoData value to fill in processed data
    CASE_ID:      str = "PK_42N"                          # Case ID to distinguish image stacks
    GDAL_INFO:    bool = True

In [3]:
@dataclass(frozen=True)
class DatasetConfig_Nightlights:
    DATA_DIR:     str = "./Nightlights"                   
    INPUT_TIF:    str = "./VNL_v22_npp-j01_2022_global_vcmslcfg_c202303062300_median_masked.tif"     
    OUT_DIR:      str = "./Nightlights/output"           
    NODATA_SRC:   int = -999                        
    NODATA_SET:   int = -999                             
    CASE_ID:      str = "PK_42N"
    GDAL_INFO:   bool = True

In [4]:
@dataclass(frozen=True)
class DatasetConfig_Rainfall:
    DATA_DIR:   str   = "./Rainfall"                   
    INPUT_TIF:  str   = "./chirps-v2.0.2022_avg.tif"     
    OUT_DIR:    str   = "./Rainfall/output"           
    NODATA_SRC: float = 3.4028234663852886e+38                   
    NODATA_SET: int   = -999                             
    CASE_ID:    str   = "PK_42N"
    GDAL_INFO:   bool = True

In [5]:
@dataclass(frozen=True)
class DatasetConfig_Flood:
    DATA_DIR:   str   = "./Flood"                   
    INPUT_TIF:  str   = "./dfo_3696_from_20100727_to_20101115_band5.tif"     
    OUT_DIR:    str   = "./Flood/output"           
    NODATA_SRC: int   = -999               
    NODATA_SET: int   = -999                             
    CASE_ID:    str   = "PK_42N"
    GDAL_INFO:   bool = True

In [6]:
@dataclass(frozen=True)
class DatasetConfig_Landcover:
    DATA_DIR:   str   = "./Landcover"                   
    INPUT_TIF:  str   = "./esalc_2020.tif"     
    OUT_DIR:    str   = "./Landcover/output"           
    NODATA_SRC: int   = -999               
    NODATA_SET: int   = -999                             
    CASE_ID:    str   = "PK_42N"
    GDAL_INFO:   bool = True

## Configure Data Type Execution Here

In [7]:
#-------------------------------------------------------------------
# Configure data type here (uncomment one line from the list below)
#-------------------------------------------------------------------
data_config = DatasetConfig_Population
#data_config = DatasetConfig_Nightlights
#data_config = DatasetConfig_Rainfall
#data_config = DatasetConfig_Flood
#data_config = DatasetConfig_Landcover

# Full path to input tif.
input_tif =  os.path.join(data_config.DATA_DIR, data_config.INPUT_TIF)

if not os.path.exists(data_config.OUT_DIR):
    os.makedirs(data_config.OUT_DIR)

## Comnfigure AOI

In [8]:
@dataclass(frozen=True)
class AOIConfig:
    UTM_ZONE:      str = "EPSG:32642"              # Edit for appropriate UTM Zone
    RESAMPLE:    float = 400                       # Resample data in UTM x & y (meters)
    UTM_BUF_DEG:   int = 1.0                       # Extended buffer in deg longitude beyond UTM zone
    LAT_NORTH:   float = 38.0                      # Define max latitude for AOI
    LAT_SOUTH:   float = 23.0                      # Define min latitude for AOI

aoi_config = AOIConfig

##  Information Summary: Source Data

In [9]:
if data_config.GDAL_INFO:
    run_gdalinfo(input_tif)

Driver: GTiff/GeoTIFF
Files: ./Population/./landscan-global-2022.tif
Size is 43200, 21600
Coordinate System is:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.01

##  Crop Data to AOI and Repalce NoData Values
Crop the source data to the appropriate UTM zone with a buffer in longitude. Adding a buffer is necessary to enable image tiles near the border to be created.

In [10]:
utm_west_lon, utm_east_lon = utm_zone_longitude_bounds(aoi_config.UTM_ZONE)

# Define AOI to encompass local UTM zone (+/- small buffer). Choose latitude to cover data for region.
ul_lat, ul_lon = aoi_config.LAT_NORTH, utm_west_lon - aoi_config.UTM_BUF_DEG
lr_lat, lr_lon = aoi_config.LAT_SOUTH, utm_east_lon + aoi_config.UTM_BUF_DEG

# Print the results
print(f"Upper Left Lat: {ul_lat}")
print(f"Upper Left Lon: {ul_lon}")
print(f"Lower Right Lat: {lr_lat}")
print(f"Lower Right Lon: {lr_lon}")

Upper Left Lat: 38.0
Upper Left Lon: 65.0
Lower Right Lat: 23.0
Lower Right Lon: 73.0


In [11]:
 # Construct the output file name based on the input file name.
case = "_" + data_config.CASE_ID + "_1_crop_geo.tif"
intermediate_tif = os.path.join(data_config.OUT_DIR, os.path.splitext(data_config.INPUT_TIF)[0] + case)
    
gdal_crop(input_tif, intermediate_tif, ul_lon, ul_lat, lr_lon, lr_lat, True)

Input  TIF: ./Population/./landscan-global-2022.tif
Output TIF: ./Population/output/./landscan-global-2022_PK_42N_1_crop_geo.tif

Input file size is 43200, 21600
0...10...20...30...40...50...60...70...80...90...100 - done.


In [12]:
input_tif = intermediate_tif 

# Construct the output file name based on the input file name.
case = "_" + data_config.CASE_ID + "_2_nodata.tif"
intermediate_tif = os.path.join(data_config.OUT_DIR, os.path.splitext(data_config.INPUT_TIF)[0] + case)

# Replace nodata values.
gdal_replace_nodata(input_tif, intermediate_tif, data_config.NODATA_SET, data_config.NODATA_SRC, True)

Input TIF: ./Population/output/./landscan-global-2022_PK_42N_1_crop_geo.tif
Final Output TIF: ./Population/output/./landscan-global-2022_PK_42N_2_nodata.tif

gdalwarp command: gdalwarp -overwrite -co COMPRESS=DEFLATE -co ZLEVEL=9 ./Population/output/./landscan-global-2022_PK_42N_1_crop_geo.tif ./Population/output/./landscan-global-2022_PK_42N_2_nodata.tif -srcnodata -2147483647 -dstnodata -999

Creating output file that is 960P x 1800L.
Processing ./Population/output/./landscan-global-2022_PK_42N_1_crop_geo.tif [1/1] : 0...10...20...30...40...50...60...70...80...90...100 - done.
Output TIF has been written to: ./Population/output/./landscan-global-2022_PK_42N_2_nodata.tif


## Information Summary: Cropped Data

In [13]:
if data_config.GDAL_INFO:
    run_gdalinfo(intermediate_tif)

Driver: GTiff/GeoTIFF
Files: ./Population/output/./landscan-global-2022_PK_42N_2_nodata.tif
Size is 960, 1800
Coordinate System is:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANG

## Transform Geographic Data to UTM

Transform the geographic data to a projected CRS (UTM) so that image tiles consist of uniform pixel sizes as measured by their edge extents on the ground.

In [14]:
utm_zone = aoi_config.UTM_ZONE 

input_tif = intermediate_tif 

case = "_" + data_config.CASE_ID + "_3_utm.tif"
temp = os.path.splitext(data_config.INPUT_TIF)[0] + case

intermediate_tif = os.path.join(data_config.OUT_DIR, temp)

transform_to_utm(input_tif, intermediate_tif, utm_zone, True)

Input  TIF: ./Population/output/./landscan-global-2022_PK_42N_2_nodata.tif
Output TIF: ./Population/output/./landscan-global-2022_PK_42N_3_utm.tif

Creating output file that is 914P x 1861L.
Processing ./Population/output/./landscan-global-2022_PK_42N_2_nodata.tif [1/1] : 0Using internal nodata values (e.g. -999) for image ./Population/output/./landscan-global-2022_PK_42N_2_nodata.tif.
Copying nodata values from source ./Population/output/./landscan-global-2022_PK_42N_2_nodata.tif to destination ./Population/output/./landscan-global-2022_PK_42N_3_utm.tif.
...10...20...30...40...50...60...70...80...90...100 - done.


##  Information Summary: Resampled Data

In [15]:
if data_config.GDAL_INFO:
    run_gdalinfo(intermediate_tif)

Driver: GTiff/GeoTIFF
Files: ./Population/output/./landscan-global-2022_PK_42N_3_utm.tif
Size is 914, 1861
Coordinate System is:
PROJCRS["WGS 84 / UTM zone 42N",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["UTM zone 42N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",69,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],

## Resample Data

In [16]:
# Construct the intermediate output file name
input_tif = intermediate_tif 

# Construct the final output file name
case = "_" + data_config.CASE_ID + "_4_resampled.tif"
temp = os.path.splitext(data_config.INPUT_TIF)[0] + case

intermediate_tif = os.path.join(data_config.OUT_DIR, temp)

# Resample the data
gdal_resample(input_tif, intermediate_tif, aoi_config.RESAMPLE, aoi_config.RESAMPLE)

Creating output file that is 2050P x 4174L.
Processing ./Population/output/./landscan-global-2022_PK_42N_3_utm.tif [1/1] : 0Using internal nodata values (e.g. -999) for image ./Population/output/./landscan-global-2022_PK_42N_3_utm.tif.
Copying nodata values from source ./Population/output/./landscan-global-2022_PK_42N_3_utm.tif to destination ./Population/output/./landscan-global-2022_PK_42N_4_resampled.tif.
...10...20...30...40...50...60...70...80...90...100 - done.


##  Information Summary: Resampled Data

In [17]:
if data_config.GDAL_INFO:
    run_gdalinfo(intermediate_tif)

Driver: GTiff/GeoTIFF
Files: ./Population/output/./landscan-global-2022_PK_42N_4_resampled.tif
Size is 2050, 4174
Coordinate System is:
PROJCRS["WGS 84 / UTM zone 42N",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["UTM zone 42N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",69,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["met