# The objectives of this jupyter notebook (prepare_model_application_data.ipynb):

1. Load some necessary data: Custom boundaries of the two regions used for application (subROI1 and subROI2), Flood id for both regions, Ghana historical flood information table, Relevant daily rainfall data, Relevant Geospatial data.


2. Automatic acquisition of data on flood and non-flood points(Longitude, Latitude) in the two region used for application (subROI1 and subROI2), the final data type is pandas.dataframe


**Note1: When you try to re-run the code below, be sure to redefine the path to the data file to suit your current environment.**

**Note2: It is recommended to use Google colab to run the following code, making sure you have permission to use it from Google Earth Engine.**

### Import Some necessary packages and autheticate the earthengine to be ready.

In [1]:
!earthengine authenticate

To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions. If the web browser does not start automatically, please manually browse the URL below.

    https://code.earthengine.google.com/client-auth?scopes=https%3A//www.googleapis.com/auth/earthengine%20https%3A//www.googleapis.com/auth/devstorage.full_control&request_id=5R55W1CMZhSioc3HZZAWD0RdvVU_i3O9T1NupZpCRPc&tc=aw3MgwrEbTc7Ab0onaAuyYZrEQPGrBXIvtz-U40cmIs&cc=mY1ocDUaSJephB7MEPecyNNv9qEgVqMoCsMXbVz4dTk

The authorization workflow will generate a code, which you should paste in the box below.
Enter verification code: 4/1Adeu5BW1ryInrDYxQ0NSyHyMgJcRWoulWkG680Ij9Lxt25spTk1HE7_aDbg

Successfully saved authorization token.


In [2]:
import pickle
import ee
import numpy as np
import pandas as pd
import json
from google.colab import drive
drive.mount('/content/gdrive')
import sys
tool_folder_dir = "/content/gdrive/MyDrive/irp_project_111/tools"
sys.path.append(tool_folder_dir)
from get_flood_sampling_data import uniform_get_flood_data_from_ROI, coordinates_dict_to_df
from common_fucs import convert_to_ee_rectangle
from prepare_Application_data import auto_get_flood_data_for_application

Mounted at /content/gdrive


### For Objective 1 and 2: auto-get the flood data for application

In [3]:
# Define a new subROI boundaries for application and save it in case it will be used later
sub_ROI_boundary1_for_Application = [[[-0.642576, 8.172062], [-0.502478, 8.172062], [-0.502478, 8.339226], [-0.642576, 8.339226], [-0.642576, 8.172062]]]
sub_ROI_boundary2_for_Application = [[[-1.011302, 9.744245], [-0.929578, 9.744245], [-0.929578, 9.856564], [-1.011302, 9.856564], [-1.011302, 9.744245]]]

# Combine into a dictionary
roi_boundaries_for_modeling = {
    'sub_ROI1_for_Application': sub_ROI_boundary1_for_Application,
    'sub_ROI2_for_Application': sub_ROI_boundary2_for_Application
}

# Save new subROI boundaries to be json file for later vis use
sub_ROI_boundary_file_path = '/content/gdrive/MyDrive/irp_project_111/data/ROI_boundaries_for_Application.json'
with open(sub_ROI_boundary_file_path, 'w') as file:
    json.dump(roi_boundaries_for_modeling, file)

# Load common flood id, which we defined in modeling stage
common_flood_ids_file_path = '/content/gdrive/MyDrive/irp_project_111/data/common_flood_ids_for_both_Modeling_ROI.json'
with open(common_flood_ids_file_path, 'r') as file:
    common_flood_ids = json.load(file)
common_flood_ids = common_flood_ids['common_flood_ids_for_both_Modeling_ROI']

# Always initialize ee before use it
ee.Initialize()

# Start uniformly obtain coordinate points from the target region for application
sub_ROI_boundary1 = convert_to_ee_rectangle(sub_ROI_boundary1_for_Application)
sub_ROI1_samples_dic = uniform_get_flood_data_from_ROI(common_flood_ids, sub_ROI_boundary1)
sub_ROI_boundary2 = convert_to_ee_rectangle(sub_ROI_boundary2_for_Application)
sub_ROI2_samples_dic = uniform_get_flood_data_from_ROI(common_flood_ids, sub_ROI_boundary2)

key_to_delete = [3534, 3663, 3747] # There are no flooded points at these flood ids
for key in key_to_delete:
    if key in sub_ROI2_samples_dic:
        del sub_ROI2_samples_dic[key]

print()
print()

# Get the df for each sub rois, and concat them together:
sub_roi1_flood_samples = coordinates_dict_to_df(sub_ROI1_samples_dic, None, prefix="subROI1")
sub_roi2_flood_samples = coordinates_dict_to_df(sub_ROI2_samples_dic, None, prefix="subROI2")
flood_samples = pd.concat([sub_roi1_flood_samples, sub_roi2_flood_samples])

# note: it will takes a long time to re-download the revelant daily precipitation, if you choose here as re_download_daily_precip=True;
# The advice here is to choose not to redo the download of daily precipitation data, simply type re_download_daily_precip=False
re_download_daily_precip = False
historical_info_path = '/content/gdrive/MyDrive/irp_project_111/data/Historical_Ghana_Flood_info_table.csv'
daily_precip_pickle_file_path = '/content/gdrive/MyDrive/irp_project_111/data/Daily_Precipitation/daily_precip_application'
geo_data_path = '/content/gdrive/MyDrive/irp_project_111/data/Application_Data/flood_samples_geospatial_data_for_application.csv'

# Start get the conversting to the final application data df:
auto_get_flood_data_for_application(flood_samples, historical_info_path, daily_precip_pickle_file_path, geo_data_path,
                download_daily_precip=re_download_daily_precip, save_path='/content/gdrive/MyDrive/irp_project_111/data/Application_Data/application_data.csv')

Sampling data from each event:   0%|          | 0/6 [00:00<?, ?it/s]

Sampling data from each event:   0%|          | 0/6 [00:00<?, ?it/s]



Processing subROI1_2320:


Calculating <Mean_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Mean_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainy_Days> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainy_Days> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Single_Day_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Single_Day_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Rainfall_Increase> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Rainfall_Increase> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing subROI1_3166:


Calculating <Mean_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Mean_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainy_Days> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainy_Days> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Single_Day_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Single_Day_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Rainfall_Increase> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Rainfall_Increase> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing subROI1_3534:


Calculating <Mean_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Mean_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainy_Days> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainy_Days> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Single_Day_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Single_Day_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Rainfall_Increase> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Rainfall_Increase> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing subROI1_3663:


Calculating <Mean_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Mean_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainy_Days> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainy_Days> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Single_Day_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Single_Day_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Rainfall_Increase> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Rainfall_Increase> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing subROI1_3747:


Calculating <Mean_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Mean_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainy_Days> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainy_Days> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Single_Day_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Single_Day_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Rainfall_Increase> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Rainfall_Increase> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing subROI1_4683:


Calculating <Mean_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Mean_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainy_Days> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainy_Days> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Single_Day_Rainfall> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Single_Day_Rainfall> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Calculating <Max_Rainfall_Increase> features:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing <Max_Rainfall_Increase> column:   0%|          | 0/4588 [00:00<?, ?it/s]

Processing subROI2_2320:


Calculating <Mean_Rainfall> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Mean_Rainfall> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainy_Days> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainy_Days> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainfall> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainfall> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Calculating <Max_Single_Day_Rainfall> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Max_Single_Day_Rainfall> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Calculating <Max_Rainfall_Increase> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Max_Rainfall_Increase> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing subROI2_3166:


Calculating <Mean_Rainfall> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Mean_Rainfall> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainy_Days> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainy_Days> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainfall> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainfall> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Calculating <Max_Single_Day_Rainfall> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Max_Single_Day_Rainfall> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Calculating <Max_Rainfall_Increase> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Max_Rainfall_Increase> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing subROI2_4683:


Calculating <Mean_Rainfall> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Mean_Rainfall> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainy_Days> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainy_Days> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainfall> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainfall> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Calculating <Max_Single_Day_Rainfall> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Max_Single_Day_Rainfall> column:   0%|          | 0/1800 [00:00<?, ?it/s]

Calculating <Max_Rainfall_Increase> features:   0%|          | 0/1800 [00:00<?, ?it/s]

Processing <Max_Rainfall_Increase> column:   0%|          | 0/1800 [00:00<?, ?it/s]

No duplicate rows found. Returning the original DataFrame.
your file has been saved to /content/gdrive/MyDrive/irp_project_111/data/Application_Data/application_data.csv 


Unnamed: 0,Flood_ID,Lon,Lat,Mean_Rainfall,Median_Rainfall,Max_Continuous_Rainy_Days,Max_Continuous_Rainfall,Max_Single_Day_Rainfall,Max_Rainfall_Increase,label,DEM,slope,slope_aspect,SPI,TWI,plan_curvature,general_curvature,soiltype,landcover,distance_to_river
0,subROI1_2320,-0.641173,8.220708,7.755191,0.000000,3,57.240845,28.620422,28.620422,0,91.0,1.96806,225.00000,975.88324,10.24301,0.02338,0.00105,WR,50.0,1600.0
1,subROI1_2320,-0.638927,8.220708,7.755191,0.000000,3,57.240845,28.620422,28.620422,0,91.0,0.00000,,492.63519,10.92659,4.11867,0.00422,WR,50.0,1600.0
2,subROI1_2320,-0.636681,8.220708,7.755191,0.000000,3,57.240845,28.620422,28.620422,0,90.0,0.98432,135.00000,1543.00708,11.96128,0.00000,0.00105,WR,50.0,1200.0
3,subROI1_2320,-0.634435,8.220708,7.755191,0.000000,3,57.240845,28.620422,28.620422,0,90.0,1.76652,203.19859,6172.02832,13.33965,-0.02314,-0.00001,WR,50.0,1200.0
4,subROI1_2320,-0.632189,8.220708,7.755191,0.000000,3,57.240845,28.620422,28.620422,0,86.0,1.46716,71.56505,514.33569,10.88348,0.01749,-0.00106,WR,20.0,1200.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32923,subROI2_4683,-0.939862,9.844413,10.534732,9.015319,3,57.231897,38.154598,19.077299,0,135.0,0.00000,,490.29932,10.92198,0.03466,0.00002,G,40.0,1200.0
32924,subROI2_4683,-0.937617,9.844413,10.534732,9.015319,3,57.231897,38.154598,19.077299,0,137.0,0.32814,315.00000,162.26115,12.02779,0.06503,0.00318,G,40.0,800.0
32925,subROI2_4683,-0.935371,9.844413,10.534732,9.015319,3,57.231897,38.154598,19.077299,0,138.0,1.35270,329.03625,70247.09375,15.23391,-0.01153,-0.00210,G,30.0,800.0
32926,subROI2_4683,-0.933125,9.844413,10.534732,9.015319,3,57.231897,38.154598,19.077299,0,141.0,1.46716,288.43494,1451.30786,11.20743,-0.01176,-0.00109,G,30.0,1200.0
