# Pre-Preocess Global Rainfall Data (GPM) 

The notebook pre-processes global rainfall data for specific Areas of Interest (AOIs). This global rainfall data is provided as daily averages for each month from 2001 to 2022 (i.e., each month GeoTiff contains raster values in mm/day representing the daily average rainfall for a given month over the 22 year span). 

For a given AOI, these global files are cropped to the AOI and bad values (-9999.9) are replaced with -999 for consistency across all data types and set to NoData. The cropped files are then averaged to produce a single GeoTiff file representing the daily average rainfall for the 22 year span.  

The source data is available globally [here](https://gpm.nasa.gov/data/imerg/precipitation-climatology).


## File System Structure
The top level file structure is shown below. This notebook is used to process global rainfall data as described below.


<pre style="font-family: monospace;">
<span style="color: black;">./GIS-Image-Stack-Processing</span> 
<span></span>
<span style="color: gray;">    ./AOI         # AOI Image Stacks and Image Tiles</span>  
<span style="color: gray;">    ./DHS         # DHS survey data</span>
<span style="color: blue;">    ./gist_utils</span>  <span style="color: gray;"># Python package with convenience functions</span>
<span style="color: gray;">    ./Nightlights</span>
<span style="color: gray;">    ./Population</span>
<span style="color: blue;">    ./Rainfall</span>

<span style="color: blue;">    ./01_prep_rainfall_gpm.ipynb (this notebook)</span>
<span style="color: gray;">    ./02_prep_geospatial_data.ipynb</span>
<span style="color: gray;">    ./03_prep_aoi_image_tiles.ipynb</span>

</pre>

## **Input (Global Monthly Rainfall):**

The following file structure is required as input for this notebook. The Chirps dataset consists of 12 monthly global rainfall sum GeoTiff files as indicated below.

<pre style="font-family: monospace;">
    ./Rainfall/
        GPM_2001-2022/
            IMERG-Final.CLIM.2001-2022.01.V07B.tif
            :
            :
            IMERG-Final.CLIM.2001-2022.12.V07B.tif
</pre>

## **Output (AOI Daily Average):**

The following file structure will be created by this notebook. The 12 global monthly rainfall sum GeoTiff files are processed to produce daily average rainfall GeoTiff files for the specified country.

<pre style="font-family: monospace;">
    ./Rainfall/
        GPM_2001-2022/
            PK/
                AOI_crop_daily_average/
                    IMERG-Final.CLIM.2001-2022.01.V07B_PK_avg.tif
</pre>

## Required Configurations

The following configurations are required for each execution of this notebook: the two-letter country code for the specified AOI. This notebook should be executed once for a specified AOI.

<pre style="font-family: monospace;">
<span style="color: blue;">country_code= 'PK'</span>  # Set the country code
</pre>

In [1]:
import os
import rasterio
import numpy as np
from dataclasses import dataclass

# Import module that contains several convenience functions (e.g., gdal wrappers)
from project_utils import *

# Adding path to gdal commands for local system
os.environ['PATH'] += ':/Users/billk/miniforge3/envs/py39-pt/bin/' 

## 1 Set Country Code and Define AOI

The only input settings required in this notebook are the two-letter country code. The AOI for the specified country will be automatically computed based on the bounding box for the country plus an added buffer to allow image tiles near the boarders to be cropped.

In [118]:
#-------------------------------------------------
# REQUIRED CONFIGURATIONS HERE
#-------------------------------------------------
country_code = 'TD'   # Set the country code
#-------------------------------------------------

lat_north = aoi_configurations[country_code]['lat_north']
lat_south = aoi_configurations[country_code]['lat_south']
lon_west  = aoi_configurations[country_code]['lon_west']
lon_east  = aoi_configurations[country_code]['lon_east']

case = country_code

## Define Data Classes to Configure Case

In [119]:
@dataclass(frozen=True)
class AOIConfig:
    # The following Lat, Lon bounds are programmatically set based on pre-defined AOI configurations
    # in gist_utils/aoi_configurations.py
    LAT_NORTH:   float                      
    LAT_SOUTH:   float                      
    LON_WEST:    float
    LON_EAST:    float
    BUF_DEG: float = 1.0
        
@dataclass(frozen=True)
class DatasetConfig:
    COUNTRY_CODE:  str
    DATA_DIR:      str = './Rainfall/GPM_2001-2022/'
    OUT_DIR_CROP:  str = './Rainfall/GPM_2001-2022/{country_code}/AOI_crop_monthly' 
    OUT_DIR_NODATA:str = './Rainfall/GPM_2001-2022/{country_code}/AOI_crop_monthly_nodata' 
    OUT_DIR:       str = './Rainfall/GPM_2001-2022/{country_code}/AOI_crop_daily'
    OUT_BASE:      str = 'GPM_2001-2022.01.V07B'
    BAD_VALUES:    int = -9999.9  # Bad values in source data
    NODATA_SET:    int = -999     # NoData value used for this project
    GDAL_INFO:    bool = False

    def get_out_dir(self):
        return self.OUT_DIR.format(country_code=self.COUNTRY_CODE)
    
    def get_out_dir_crop(self):
        return self.OUT_DIR_CROP.format(country_code=self.COUNTRY_CODE)
    
    def get_out_dir_nodata(self):
        return self.OUT_DIR_NODATA.format(country_code=self.COUNTRY_CODE)
    
data_config = DatasetConfig(COUNTRY_CODE=country_code)
aoi_config  = AOIConfig(LAT_NORTH=lat_north, LAT_SOUTH=lat_south, LON_WEST=lon_west, LON_EAST=lon_east)

In [120]:
print(data_config.get_out_dir_crop())
print(data_config.get_out_dir_nodata())
print(data_config.get_out_dir())

./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly_nodata
./Rainfall/GPM_2001-2022/TD/AOI_crop_daily


## Set Output Filenames

In [121]:
# Set output filenames
output_avg = data_config.get_out_dir() + "/" + data_config.OUT_BASE + "_" + case + "_avg.tif"

print("output_avg will be saved here: ", output_avg)

output_avg will be saved here:  ./Rainfall/GPM_2001-2022/TD/AOI_crop_daily/GPM_2001-2022.01.V07B_TD_avg.tif


In [122]:
# Create output folders if they do not already exist
if not os.path.exists(data_config.get_out_dir()):
    os.makedirs(data_config.get_out_dir())
    
if not os.path.exists(data_config.get_out_dir_crop()):
    os.makedirs(data_config.get_out_dir_crop())
    
if not os.path.exists(data_config.get_out_dir_nodata()):
    os.makedirs(data_config.get_out_dir_nodata())

## 2 Define the Cropped Region
The cropped region is defined by the AOI country bounds plus an additional buffer to allow for cropping tiles near the AOI bounds.

In [123]:
# Define AOI to encompass the country (+/- small buffer).
ul_lat, ul_lon = aoi_config.LAT_NORTH + aoi_config.BUF_DEG, aoi_config.LON_WEST - aoi_config.BUF_DEG
lr_lat, lr_lon = aoi_config.LAT_SOUTH - aoi_config.BUF_DEG, aoi_config.LON_EAST + aoi_config.BUF_DEG

# Print the results
print(f"Upper Left Lat: {ul_lat}")
print(f"Upper Left Lon: {ul_lon}")
print(f"Lower Right Lat: {lr_lat}")
print(f"Lower Right Lon: {lr_lon}")

Upper Left Lat: 24.41
Upper Left Lon: 12.47
Lower Right Lat: 6.44
Lower Right Lon: 25.0


### Confirm Global Input Source Files

In [124]:
# Create a list of all files in the directory
files_in_directory = os.listdir(data_config.DATA_DIR)

# Filter the list to include only TIFF files
tiff_files = sorted([file for file in files_in_directory if file.endswith('.tif')])

for file in tiff_files:
    print(data_config.DATA_DIR + file)

./Rainfall/GPM_2001-2022/IMERG-Final.CLIM.2001-2022.01.V07B.tif
./Rainfall/GPM_2001-2022/IMERG-Final.CLIM.2001-2022.02.V07B.tif
./Rainfall/GPM_2001-2022/IMERG-Final.CLIM.2001-2022.03.V07B.tif
./Rainfall/GPM_2001-2022/IMERG-Final.CLIM.2001-2022.04.V07B.tif
./Rainfall/GPM_2001-2022/IMERG-Final.CLIM.2001-2022.05.V07B.tif
./Rainfall/GPM_2001-2022/IMERG-Final.CLIM.2001-2022.06.V07B.tif
./Rainfall/GPM_2001-2022/IMERG-Final.CLIM.2001-2022.07.V07B.tif
./Rainfall/GPM_2001-2022/IMERG-Final.CLIM.2001-2022.08.V07B.tif
./Rainfall/GPM_2001-2022/IMERG-Final.CLIM.2001-2022.09.V07B.tif
./Rainfall/GPM_2001-2022/IMERG-Final.CLIM.2001-2022.10.V07B.tif
./Rainfall/GPM_2001-2022/IMERG-Final.CLIM.2001-2022.11.V07B.tif
./Rainfall/GPM_2001-2022/IMERG-Final.CLIM.2001-2022.12.V07B.tif


## 3 Crop the Monthly Source Files 

In [125]:
# Loop through each global TIFF file
for file_name in tiff_files:
    
    input_tif = os.path.join(data_config.DATA_DIR, file_name)
    
    # Construct the output file name based on the input file name
    temp = '_' + case + "_crop.tif"
    intermediate_tif = os.path.join(data_config.get_out_dir_crop(), os.path.splitext(file_name)[0] + temp)
    print(intermediate_tif)
    
    # Crop the data to the specified AOI
    gdal_crop(input_tif, intermediate_tif, ul_lon, ul_lat, lr_lon, lr_lat, False)

./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.01.V07B_TD_crop.tif
Input file size is 3600, 1800
0...10...20...30...40...50...60...70...80...90...100 - done.

./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.02.V07B_TD_crop.tif
Input file size is 3600, 1800
0...10...20...30...40...50...60...70...80...90...100 - done.

./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.03.V07B_TD_crop.tif
Input file size is 3600, 1800
0...10...20...30...40...50...60...70...80...90...100 - done.

./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.04.V07B_TD_crop.tif
Input file size is 3600, 1800
0...10...20...30...40...50...60...70...80...90...100 - done.

./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.05.V07B_TD_crop.tif
Input file size is 3600, 1800
0...10...20...30...40...50...60...70...80...90...100 - done.

./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.06.V07B_

### Confirm the Cropped Files

In [126]:
# List all the cropped TIFF files for the specified AOI
cropped_files = sorted([os.path.join(data_config.get_out_dir_crop(), file) 
                        for file in os.listdir(data_config.get_out_dir_crop()) 
                        if file.endswith('.tif') or file.endswith('.tiff')])

for file in cropped_files:
    print(file)

./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.01.V07B_TD_crop.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.02.V07B_TD_crop.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.03.V07B_TD_crop.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.04.V07B_TD_crop.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.05.V07B_TD_crop.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.06.V07B_TD_crop.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.07.V07B_TD_crop.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.08.V07B_TD_crop.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.09.V07B_TD_crop.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.10.V07B_TD_crop.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.11.V07B_

## 4 Set NoData Values in the Cropped Monthly Source Files

Replace "bad value" pixels with NoData set to -999 for consistency with other data types.

In [127]:
processed_files = []
    
for cropped_file in cropped_files:
    
    # Split the filename from its extension
    base_name, extension = os.path.splitext(os.path.basename(cropped_file))
    
    # Append the suffix before the extension
    new_base_name = f"{base_name}{'_nodata'}{extension}"
    
    # Create the full path for the output file
    output_tif = os.path.join(data_config.get_out_dir_nodata(), new_base_name)
    
    # Set NoData values for each file ()
    gdal_set_nodata(cropped_file, output_tif, data_config.BAD_VALUES, data_config.NODATA_SET, False)
    
    processed_files.append(output_tif)
    
    print(f"Processed file: {os.path.basename(output_tif)}\n")

Creating output file that is 125P x 180L.
Processing ./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.01.V07B_TD_crop.tif [1/1] : 0...10...20...30...40...50...60...70...80...90...100 - done.

Processed file: IMERG-Final.CLIM.2001-2022.01.V07B_TD_crop_nodata.tif

Creating output file that is 125P x 180L.
Processing ./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.02.V07B_TD_crop.tif [1/1] : 0...10...20...30...40...50...60...70...80...90...100 - done.

Processed file: IMERG-Final.CLIM.2001-2022.02.V07B_TD_crop_nodata.tif

Creating output file that is 125P x 180L.
Processing ./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.03.V07B_TD_crop.tif [1/1] : 0...10...20...30...40...50...60...70...80...90...100 - done.

Processed file: IMERG-Final.CLIM.2001-2022.03.V07B_TD_crop_nodata.tif

Creating output file that is 125P x 180L.
Processing ./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly/IMERG-Final.CLIM.2001-2022.04.V07B_TD_crop.t

## 5 Compute the Daily Average

In [128]:
# List all the cropped TIFF files
cropped_nodata_files = sorted([os.path.join(data_config.get_out_dir_nodata(), file) 
                        for file in os.listdir(data_config.get_out_dir_nodata()) 
                        if file.endswith('.tif') or file.endswith('.tiff')])

for file in cropped_nodata_files:
    print(file)

./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly_nodata/IMERG-Final.CLIM.2001-2022.01.V07B_TD_crop_nodata.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly_nodata/IMERG-Final.CLIM.2001-2022.02.V07B_TD_crop_nodata.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly_nodata/IMERG-Final.CLIM.2001-2022.03.V07B_TD_crop_nodata.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly_nodata/IMERG-Final.CLIM.2001-2022.04.V07B_TD_crop_nodata.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly_nodata/IMERG-Final.CLIM.2001-2022.05.V07B_TD_crop_nodata.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly_nodata/IMERG-Final.CLIM.2001-2022.06.V07B_TD_crop_nodata.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly_nodata/IMERG-Final.CLIM.2001-2022.07.V07B_TD_crop_nodata.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly_nodata/IMERG-Final.CLIM.2001-2022.08.V07B_TD_crop_nodata.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly_nodata/IMERG-Final.CLIM.2001-2022.09.V07B_TD_crop_nodata.tif
./Rainfall/GPM_2001-2022/TD/AOI_crop_monthly_n

In [129]:
# run_gdalinfo('./Rainfall/Chirps_2023/PK/AOI_crop_monthly/chirps-v2.0.2023.06_PK_crop.tif')

In [130]:
# Compute the daily average
average_rasters(cropped_nodata_files, output_avg, nodata_value=data_config.NODATA_SET)

Average operation successfully saved to: ./Rainfall/GPM_2001-2022/TD/AOI_crop_daily/GPM_2001-2022.01.V07B_TD_avg.tif
