# The objectives of this jupyter notebook (prepare_modeling_data_part2.ipynb):

1. Load some necessary data: Custom boundaries of the two regions used for modelling(ROI1 and ROI2), Flood id for both regions, Ghana historical flood information table, Relevant daily rainfall data, Relevant Geospatial data


2. Automatic acquisition of data on flood and non-flood points(Longitude, Latitude) in the two areas used for modelling (ROI1 and ROI2), the final data type is pandas.dataframe


**Note1: When you try to re-run the code below, be sure to redefine the path to the data file to suit your current environment.**

**Note2: It is recommended to use Google colab to run the following code, making sure you have permission to use it from Google Earth Engine.**

### Import Some necessary packages and autheticate the earthengine to be ready.

In [1]:
!earthengine authenticate

To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions. If the web browser does not start automatically, please manually browse the URL below.

    https://code.earthengine.google.com/client-auth?scopes=https%3A//www.googleapis.com/auth/earthengine%20https%3A//www.googleapis.com/auth/devstorage.full_control&request_id=GDaOnTaaKW09BI1VgXhbuWfE2mFYvitfv3hs0_i8CCM&tc=5r8LN3zEVrLBsxVrnMoQUnW2wQqaZ4xwTsfDRoAit1w&cc=VjSS3kjsx9ejKvqNLxTZlw86pYJp0adVHfFmp_XExz4

The authorization workflow will generate a code, which you should paste in the box below.
Enter verification code: 4/1Adeu5BVm40M7f7grir47WyLE70NLwkMeipW7kzTDwi-VS4Ts8hj7Y6Vo6NI

Successfully saved authorization token.


In [2]:
import ee
import json

from google.colab import drive
drive.mount('/content/gdrive')

import sys
tool_folder_dir = "/content/gdrive/MyDrive/irp_project_111/tools"
sys.path.append(tool_folder_dir)
from prepare_Modeling_data import get_flood_data

Mounted at /content/gdrive


### For Objective 1 and 2: auto-get the flood data for modeling

In [3]:
# load necessary data:
common_flood_ids_file_path = '/content/gdrive/MyDrive/irp_project_111/data/common_flood_ids_for_both_Modeling_ROI.json'
with open(common_flood_ids_file_path, 'r') as file:
    common_flood_ids = json.load(file)
common_flood_ids = common_flood_ids['common_flood_ids_for_both_Modeling_ROI']
print(f'Only these 6 flood ID(s) will affect both Modeling ROIs: {common_flood_ids}')

roi_boundaries_file_path = '/content/gdrive/MyDrive/irp_project_111/data/ROI_boundaries_for_Modeling.json'
with open(roi_boundaries_file_path, 'r') as file:
    roi_boundaries = json.load(file)

# Always initialize ee before use it
ee.Initialize()

# Firstly it was determined that all the points in the two areas totalled around 6,000.
# Based on the number of fields occurring in the two areas: ROI1 has 6 floods (65%) and ROI2 has 3 floods (35%).
# Therefore, ROI1 will be allocated 6000*65%, ROI2 will be allocated 6000*35%.
total_num_data = 6000
total_num_data_ROI1 = total_num_data * 0.65
total_num_data_ROI2 = total_num_data * 0.35

ROI_features_ls_3d = [[roi_boundaries['ROI_boundary1'],
            [total_num_data_ROI1],
            ["ROI1"]],

            [roi_boundaries['ROI_boundary2'],
            [total_num_data_ROI2],
            ["ROI2"]]]

historical_info_path = '/content/gdrive/MyDrive/irp_project_111/data/Historical_Ghana_Flood_info_table.csv'
daily_precip_pickle_file_path = '/content/gdrive/MyDrive/irp_project_111/data/Daily_Precipitation/daily_precip_Modeling.pickle'
geo_data_path = '/content/gdrive/MyDrive/irp_project_111/data/Modeling_Data/flood_samples_geospatial_data.csv'
data_path = '/content/gdrive/MyDrive/irp_project_111/data/Modeling_Data/modeling_data.csv'
# note: it will takes some time to re-download the revelant daily precipitation, if you choose here as re_download_daily_precip=True;
# The advice here is to choose not to redo the download of daily precipitation data, simply type re_download_daily_precip=False
re_download_daily_precip = False
flood_samples_for_Modeling = get_flood_data(common_flood_ids, historical_info_path, geo_data_path, daily_precip_pickle_file_path, ROI_features_ls_3d, flood_data_scale=30,
                    download_daily_precip=re_download_daily_precip, check_each_sample_amount=False, print_info=True, save_path=data_path)
flood_samples_for_Modeling

Only these 6 flood ID(s) will affect both Modeling ROIs: [2320, 3166, 3534, 3663, 3747, 4683]
Start Sampling......
---------------------------------------------------------------


Calculating flooded area proportion:   0%|          | 0/6 [00:00<?, ?it/s]

Performing stratified sampling:   0%|          | 0/6 [00:00<?, ?it/s]

---------------------------------------------------------------
Finish Sampling


Start Sampling......
---------------------------------------------------------------


Calculating flooded area proportion:   0%|          | 0/6 [00:00<?, ?it/s]

Performing stratified sampling:   0%|          | 0/6 [00:00<?, ?it/s]

---------------------------------------------------------------
Finish Sampling




Processing <Duration> column:   0%|          | 0/5992 [00:00<?, ?it/s]

Processing <Max_Duration> column:   0%|          | 0/9 [00:00<?, ?it/s]

Processing <Event_Start_Date> & <Event_End_Date> columns:   0%|          | 0/5992 [00:00<?, ?it/s]

Calculating <Mean_Rainfall> features:   0%|          | 0/5992 [00:00<?, ?it/s]

Processing <Mean_Rainfall> column:   0%|          | 0/5992 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainy_Days> features:   0%|          | 0/5992 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainy_Days> column:   0%|          | 0/5992 [00:00<?, ?it/s]

Calculating <Max_Continuous_Rainfall> features:   0%|          | 0/5992 [00:00<?, ?it/s]

Processing <Max_Continuous_Rainfall> column:   0%|          | 0/5992 [00:00<?, ?it/s]

Calculating <Max_Single_Day_Rainfall> features:   0%|          | 0/5992 [00:00<?, ?it/s]

Processing <Max_Single_Day_Rainfall> column:   0%|          | 0/5992 [00:00<?, ?it/s]

Calculating <Max_Rainfall_Increase> features:   0%|          | 0/5992 [00:00<?, ?it/s]

Processing <Max_Rainfall_Increase> column:   0%|          | 0/5992 [00:00<?, ?it/s]

No duplicate rows found. Returning the original DataFrame.
your file has been saved to /content/gdrive/MyDrive/irp_project_111/data/Modeling_Data/modeling_data.csv 


Unnamed: 0,Flood_ID,Lon,Lat,Duration,Max_Duration,Event_Start_Date,Event_End_Date,Mean_Rainfall,Median_Rainfall,Max_Continuous_Rainy_Days,...,slope_aspect,SPI,STI,TWI,plan_curvature,general_curvature,soiltype,landcover,distance_to_river,label
0,ROI1_2320,-0.755618,8.351143,22,26,2003-08-10,2003-09-05,8.080259,0.000000,3,...,,86211.16406,,16.06017,0.00000,0.00000,WR,80.0,1200.0,1
1,ROI1_2320,-0.827842,8.431991,7,26,2003-08-10,2003-09-05,7.559837,0.000000,2,...,135.00000,325.29449,,12.71235,-1.53688,-0.00001,WR,90.0,400.0,1
2,ROI1_2320,-0.701719,8.353299,19,26,2003-08-10,2003-09-05,8.176556,0.000000,4,...,,138430.48438,,16.53367,0.00000,0.00000,WR,80.0,400.0,1
3,ROI1_2320,-0.815715,8.446275,13,26,2003-08-10,2003-09-05,7.559837,0.000000,2,...,255.96376,492.63519,,10.92659,0.00957,0.00001,WR,90.0,800.0,1
4,ROI1_2320,-0.661834,8.323385,17,26,2003-08-10,2003-09-05,8.212687,0.000000,2,...,63.43495,193675.07813,,17.00637,0.00000,0.00000,WR,80.0,800.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5987,ROI2_4683,-0.906894,9.886993,0,11,2018-09-01,2018-09-12,9.022603,6.282393,3,...,300.96375,6435.45020,24.24362,13.11458,-0.00072,-0.00105,Lf,30.0,400.0,0
5988,ROI2_4683,-0.956481,9.746047,0,11,2018-09-01,2018-09-12,8.436683,8.563931,3,...,45.00000,2271.65625,26.25356,13.25358,0.00004,-0.00204,G,40.0,800.0,0
5989,ROI2_4683,-0.987473,9.924183,0,11,2018-09-01,2018-09-12,10.082670,8.717156,2,...,,229.47194,,11.68121,0.00000,0.00104,G,20.0,1600.0,0
5990,ROI2_4683,-1.049996,9.800485,0,11,2018-09-01,2018-09-12,9.278730,8.562659,2,...,,2596.17847,,14.77084,-0.03511,-0.00192,Lp,30.0,800.0,0
