# Pre-Preocess Global Rainfall Data 

The notebook pre-processes global rainfall data for specific Areas of Interest (AOIs). This global rainfall data is provided as monthly sum GeoTiff files. For a given AOI, the monthly rainfall sum files are first cropped to the AOI. Then, a yearly sum is computed as an intermediate output. This yearly sum is used to calculate a single daily average GeoTiff file for the specified AOI. No coordinate transformations are performed. The file structure is indicated below. 

The source data is available globally [here](https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_monthly/tifs/) for 2023 as monthly rainfall totals in mm. 

Data over waterbodies in the source data is set to -9999.0 to indicate "bad values." The computation of rainfall statistics will exclude the pixels flagged These values will be set as NoData = -999 for consistency with other data types. The 

## File System Structure
The top level file structure is shown below. This notebook is used to process global rainfall data as described below.

<pre style="font-family: monospace;">
<span style="color: gray;">./AOI         # AOI Image Stacks and Image Tiles</span>  
<span style="color: gray;">./DHS         # DHS survey data</span>
<span style="color: blue;">./gist_utils</span>  <span style="color: gray;"># Python package with convenience functions</span>
<span style="color: gray;">./Nightlights</span>
<span style="color: gray;">./Population</span>
<span style="color: blue;">./Rainfall</span>

<span style="color: gray;">./prep_aoi_image_tiles.ipynb</span>
<span style="color: gray;">./prep_geospatial_data.ipynb</span>
<span style="color: blue;">./prep_rainfall_chirps.ipynb (this notebook)</span>
</pre>

## **Input (Global Monthly Rainfall):**

The following file structure is required as input for this notebook. The Chirps dataset consists of 12 monthly global rainfall sum GeoTiff files as indicated below.

<pre style="font-family: monospace;">
./Rainfall/
    Chirps_2023/
        chirps-v2.0.2023.01.tif
        :
        :
        chirps-v2.0.2023.12.tif
</pre>

## **Output (AOI Daily Average):**

The following file structure will be created by this notebook. The 12 global monthly rainfall sum GeoTiff files are processed to produce daily average rainfall GeoTiff files for the specified country.

<pre style="font-family: monospace;">
./Rainfall/
    Chirps_2023/
        PK/
            AOI_crop_daily_average/
                chirps-v2.0.2023_PK_avg.tif
</pre>

## Required Configurations

The following configurations are required for each execution of this notebook: the two-letter country code for the specified AOI. This notebook should be executed once for a specified AOI.

<pre style="font-family: monospace;">
<span style="color: blue;">country_code= 'PK'</span>  # Set the country code
</pre>

In [1]:
import os
from dataclasses import dataclass

# Import module that contains several convenience functions (e.g., gdal wrappers)
from gist_utils import *

# Adding path to gdal commands for local system
os.environ['PATH'] += ':/Users/billk/miniforge3/envs/py39-pt/bin/' 

## 1 Set Country Code and Define AOI

The only input settings required in this notebook are the two-letter country code. The AOI for the specified country will be automatically computed based on the bounding box for the country plus an added buffer to allow image tiles near the boarders to be cropped.

In [2]:
#-------------------------------------------------
# REQUIRED CONFIGURATIONS HERE
#-------------------------------------------------
country_code = 'TD'   # Set the country code
#-------------------------------------------------

lat_north = aoi_configurations[country_code]['lat_north']
lat_south = aoi_configurations[country_code]['lat_south']
lon_west  = aoi_configurations[country_code]['lon_west']
lon_east  = aoi_configurations[country_code]['lon_east']

case = country_code

## Define Data Classes to Configure Case

In [3]:
@dataclass(frozen=True)
class AOIConfig:
    # The following Lat, Lon bounds are programmatically set based on pre-defined AOI configurations
    # in gist_utils/aoi_configurations.py
    LAT_NORTH:   float                      
    LAT_SOUTH:   float                      
    LON_WEST:    float
    LON_EAST:    float
    BUF_DEG: float = 1.0
        
@dataclass(frozen=True)
class DatasetConfig:
    COUNTRY_CODE:  str
    DATA_DIR:      str = './Rainfall/Chirps_2023/'
    OUT_DIR_CROP:  str = './Rainfall/Chirps_2023/{country_code}/AOI_crop_monthly' 
    OUT_DIR_NODATA:str = './Rainfall/Chirps_2023/{country_code}/AOI_crop_monthly_nodata' 
    OUT_DIR:       str = './Rainfall/Chirps_2023/{country_code}/AOI_crop_daily'
    OUT_BASE:      str = 'chirps-v2.0.2023'
    BAD_VALUES:    int = -9999  # Bad values in source data
    NODATA_SET:    int = -999   # NoData value used for this project
    GDAL_INFO:    bool = False

    def get_out_dir_crop(self):
        return self.OUT_DIR_CROP.format(country_code=self.COUNTRY_CODE)
    
    def get_out_dir_nodata(self):
        return self.OUT_DIR_NODATA.format(country_code=self.COUNTRY_CODE)
    
    def get_out_dir(self):
        return self.OUT_DIR.format(country_code=self.COUNTRY_CODE)
    
data_config = DatasetConfig(COUNTRY_CODE=country_code)
aoi_config  = AOIConfig(LAT_NORTH=lat_north, LAT_SOUTH=lat_south, LON_WEST=lon_west, LON_EAST=lon_east)

In [4]:
print(data_config.get_out_dir_crop())
print(data_config.get_out_dir())

./Rainfall/Chirps_2023/TD/AOI_crop_monthly
./Rainfall/Chirps_2023/TD/AOI_crop_daily


## Set Output Filenames

In [5]:
# Set output filenames
output_sum = data_config.get_out_dir() + "/" + data_config.OUT_BASE + "_" + case + "_sum.tif"
output_avg = data_config.get_out_dir() + "/" + data_config.OUT_BASE + "_" + case + "_avg.tif"

print("output_sum will be saved here: ", output_sum)
print("output_avg will be saved here: ", output_avg)

output_sum will be saved here:  ./Rainfall/Chirps_2023/TD/AOI_crop_daily/chirps-v2.0.2023_TD_sum.tif
output_avg will be saved here:  ./Rainfall/Chirps_2023/TD/AOI_crop_daily/chirps-v2.0.2023_TD_avg.tif


In [6]:
# Create output folders if they do not already exist
if not os.path.exists(data_config.get_out_dir_crop()):
    os.makedirs(data_config.get_out_dir_crop())
    
if not os.path.exists(data_config.get_out_dir_nodata()):
    os.makedirs(data_config.get_out_dir_nodata())
    
if not os.path.exists(data_config.get_out_dir()):
    os.makedirs(data_config.get_out_dir())

## 2 Define the Cropped Region
The cropped region is defined by the AOI country bounds plus an additional buffer to allow for cropping tiles near the AOI bounds.

In [7]:
# Define AOI to encompass the country (+/- small buffer).
ul_lat, ul_lon = aoi_config.LAT_NORTH + aoi_config.BUF_DEG, aoi_config.LON_WEST - aoi_config.BUF_DEG
lr_lat, lr_lon = aoi_config.LAT_SOUTH - aoi_config.BUF_DEG, aoi_config.LON_EAST + aoi_config.BUF_DEG

# Print the results
print(f"Upper Left Lat: {ul_lat}")
print(f"Upper Left Lon: {ul_lon}")
print(f"Lower Right Lat: {lr_lat}")
print(f"Lower Right Lon: {lr_lon}")

Upper Left Lat: 24.41
Upper Left Lon: 12.47
Lower Right Lat: 6.44
Lower Right Lon: 25.0


### Confirm Global Input Source Files

In [8]:
# Create a list of all files in the directory
files_in_directory = os.listdir(data_config.DATA_DIR)

# Filter the list to include only TIFF files
tiff_files = sorted([file for file in files_in_directory if file.endswith('.tif')])

for file in tiff_files:
    print(data_config.DATA_DIR + file)

./Rainfall/Chirps_2023/chirps-v2.0.2023.01.tif
./Rainfall/Chirps_2023/chirps-v2.0.2023.02.tif
./Rainfall/Chirps_2023/chirps-v2.0.2023.03.tif
./Rainfall/Chirps_2023/chirps-v2.0.2023.04.tif
./Rainfall/Chirps_2023/chirps-v2.0.2023.05.tif
./Rainfall/Chirps_2023/chirps-v2.0.2023.06.tif
./Rainfall/Chirps_2023/chirps-v2.0.2023.07.tif
./Rainfall/Chirps_2023/chirps-v2.0.2023.08.tif
./Rainfall/Chirps_2023/chirps-v2.0.2023.09.tif
./Rainfall/Chirps_2023/chirps-v2.0.2023.10.tif
./Rainfall/Chirps_2023/chirps-v2.0.2023.11.tif
./Rainfall/Chirps_2023/chirps-v2.0.2023.12.tif


## 3 Crop the Monthly Sum Source Files 

In [9]:
# Loop through each global TIFF file
for file_name in tiff_files:
    
    input_tif = os.path.join(data_config.DATA_DIR, file_name)
    
    # Construct the output file name based on the input file name
    temp = '_' + case + "_crop.tif"
    intermediate_tif = os.path.join(data_config.get_out_dir_crop(), os.path.splitext(file_name)[0] + temp)
    print(intermediate_tif)
    
    # Crop the data to the specified AOI
    gdal_crop(input_tif, intermediate_tif, ul_lon, ul_lat, lr_lon, lr_lat, False)

./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.01_TD_crop.tif
Input file size is 7200, 2000
0...10...20...30...40...50...60...70...80...90...100 - done.

./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.02_TD_crop.tif
Input file size is 7200, 2000
0...10...20...30...40...50...60...70...80...90...100 - done.

./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.03_TD_crop.tif
Input file size is 7200, 2000
0...10...20...30...40...50...60...70...80...90...100 - done.

./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.04_TD_crop.tif
Input file size is 7200, 2000
0...10...20...30...40...50...60...70...80...90...100 - done.

./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.05_TD_crop.tif
Input file size is 7200, 2000
0...10...20...30...40...50...60...70...80...90...100 - done.

./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.06_TD_crop.tif
Input file size is 7200, 2000
0...10...20...30...40...50...60...70...80...90...100 - done.

### Confirm the Cropped Files

In [10]:
# List all the cropped TIFF files for the specified AOI
cropped_files = [os.path.join(data_config.get_out_dir_crop(), file) for file in os.listdir(data_config.get_out_dir_crop()) if file.endswith('.tif') or file.endswith('.tiff')]

for file in cropped_files:
    print(file)

./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.02_TD_crop.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.07_TD_crop.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.12_TD_crop.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.11_TD_crop.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.04_TD_crop.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.01_TD_crop.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.08_TD_crop.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.06_TD_crop.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.03_TD_crop.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.09_TD_crop.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.05_TD_crop.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.10_TD_crop.tif


In [11]:
# run_gdalinfo('./Rainfall/Chirps_2023/PK/AOI_crop_monthly/chirps-v2.0.2023.06_PK_crop.tif')

## 4 Set NoData Values in the Cropped Monthly Sum Source Files

Replace "bad value" pixels with NoData set to -999 for consistency with other data types.

In [12]:
processed_files = []
    
for cropped_file in cropped_files:
    
    # Split the filename from its extension
    base_name, extension = os.path.splitext(os.path.basename(cropped_file))
    
    # Append the suffix before the extension
    new_base_name = f"{base_name}{'_nodata'}{extension}"
    
    # Create the full path for the output file
    output_tif = os.path.join(data_config.get_out_dir_nodata(), new_base_name)
    
    # Set NoData values for each file ()
    gdal_set_nodata(cropped_file, output_tif, data_config.BAD_VALUES, data_config.NODATA_SET, False)
    
    processed_files.append(output_tif)
    
    print(f"Processed file: {output_tif}")

Creating output file that is 251P x 359L.
Processing ./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.02_TD_crop.tif [1/1] : 0...10...20...30...40...50...60...70...80...90...100 - done.

Processed file: ./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.02_TD_crop_nodata.tif
Creating output file that is 251P x 359L.
Processing ./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.07_TD_crop.tif [1/1] : 0...10...20...30...40...50...60...70...80...90...100 - done.

Processed file: ./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.07_TD_crop_nodata.tif
Creating output file that is 251P x 359L.
Processing ./Rainfall/Chirps_2023/TD/AOI_crop_monthly/chirps-v2.0.2023.12_TD_crop.tif [1/1] : 0...10...20...30...40...50...60...70...80...90...100 - done.

Processed file: ./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.12_TD_crop_nodata.tif
Creating output file that is 251P x 359L.
Processing ./Rainfall/Chirps_2023/TD/AOI_crop_mon

## 5 Compute the Yearly Sum

In [13]:
# List all the cropped TIFF files
cropped_nodata_files = [os.path.join(data_config.get_out_dir_nodata(), file) for file in os.listdir(data_config.get_out_dir_nodata()) if file.endswith('.tif') or file.endswith('.tiff')]

for file in cropped_nodata_files:
    print(file)

./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.03_TD_crop_nodata.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.10_TD_crop_nodata.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.06_TD_crop_nodata.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.08_TD_crop_nodata.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.05_TD_crop_nodata.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.09_TD_crop_nodata.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.04_TD_crop_nodata.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.07_TD_crop_nodata.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.11_TD_crop_nodata.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.01_TD_crop_nodata.tif
./Rainfall/Chirps_2023/TD/AOI_crop_monthly_nodata/chirps-v2.0.2023.02_TD_crop_nodata.tif
./Rainfall/Chirps_202

In [14]:
# run_gdalinfo('./Rainfall/Chirps_2023/PK/AOI_crop_monthly/chirps-v2.0.2023.06_PK_crop.tif')

In [15]:
# Compute the yearly sum
sum_rasters(cropped_nodata_files, output_sum, nodata_value=data_config.NODATA_SET)

Sum of rasters successfully saved to: ./Rainfall/Chirps_2023/TD/AOI_crop_daily/chirps-v2.0.2023_TD_sum.tif
Output from command:
0...10...20...30...40...50...60...70...80...90...100 - done.



## 6 Compute the Daily Average

In [16]:
# Set the input file to the sum computed above
input_file = output_sum

# Compute the daily average
average_raster(input_file, output_avg, divisor=365, nodata_value=data_config.NODATA_SET)

Average operation successfully saved to: ./Rainfall/Chirps_2023/TD/AOI_crop_daily/chirps-v2.0.2023_TD_avg.tif
Output from command:
0...10...20...30...40...50...60...70...80...90...100 - done.

