## Smoothing Method Notebook

### Overview: 

##### Import necessary functions

These functions are stored in src/utils and contain intuitive folders hosting functions. The working directory should be set to the VIEWS_FAO_INDEX folder to enable access to retrieve these .py files.

In [1]:
import os
import sys

# Get the current working directory
current_directory = os.getcwd()

# Print the current working directory
print("The current Working Directory is:", current_directory)

# Get the path to the base directory (VIEWS_FAO_index)
base_dir = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))
print(f'The base directory will be set to: {base_dir}')

# Add the base directory to sys.path
sys.path.insert(0, base_dir)

The current Working Directory is: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods
The base directory will be set to: /Users/gbenz/Documents/VIEWS_FAO_index


In [17]:
import sys
import os
import pandas as pd
import matplotlib.pyplot as plt

#Functions necessary for all methods:
from src.utils.universal_functions.setup.generate_base_file import give_primary_frame
from src.utils.universal_functions.setup.associate_country_id import associate_country_years, pull_from_c_y_dictionary
from src.utils.universal_functions.setup.build_directory import ensure_directory_exists


#Functions to generate the insurance table requested by FAO partners:
from src.utils.universal_functions.FAO_table_formatting.calculate_percentiles import format_stats, clean_percentile_table
from src.utils.universal_functions.FAO_table_formatting.generate_output_tables import insurance_table, annual_summary_table

#Functions bespoke to smoothing method:
from src.utils.functions_for_method_smoothing.generate_smoothing_dataframe import smoothed_dataframe

#Function for Event-Year Return Period process:
from src.utils.functions_for_single_cell_return_period.cell_return_period import calculate_cumulative_distribution, calculate_probabilities, calculate_expected_time_periods, calculate_expected_voxels, compare_empirical_vs_expected
from src.utils.functions_for_single_cell_return_period.Ei_insurance_table_setup import insurance_table_for_E_i

from src.utils.functions_for_method_aggregation.generate_aggregate_dataframe import aggregate_priogrid_for_country, map_c_y_dictionary_to_data, map_c_id_to_aggregations

#Function to run the method in a single line:
from src.utils.functions_for_return_periods.insurance_products_for_RP import aggregation_Country_Year_files, aggregation_Event_Year_files

### First, we define our primary dataframe. 

The variable 'data' is developed from two querysets 1. Fatalities_fao_pgm and 2. cm_properties which are required in order to capture fatality attributes and unique country identifiers, including start and end years for boundary changes.

Year parameters can be adapted to incorporate future annual releases of Uppsala Conflict Data Program (UCDP) Geospatial Events Database (GED) data or constrain the temporal window for unique applications. The recommendation is to capture data from the full extent, which at the time of this release is 1989 - 2023.

In [3]:
data = give_primary_frame('Fatalities_fao_pgm', 'cm_properties', 1988, 2023)

100%|██████████| 40.0M/40.0M [00:02<00:00, 15.7MB/s]


Queryset Fatalities_fao_pgm read successfully 
Queryset cm_properties read successfully 




['month_id', 'pg_id', 'country_name', 'C_start_year', 'C_end_year', 'pop_gpw_sum', 'ged_sb', 'ged_ns', 'ged_os', 'year', 'fatalities_sum', 'country_id']


#### Make a copy of the variable data to keep a clean version accesible.

In [4]:
data_working_copy = data

#### Specifiy countries to investigate

In [5]:
countries = ['Ethiopia']

#### Template process to construct a function unique to this method

#### Ouput variables and application:
annual_summary:
insurance_table
df_annual_cleaned:

#### Parameters Explained with conditions for changes:

countries: 
'country_id':
smoothed_dataframe_file:
insurance_table (percentiles):
insurance_table ('perca_Mean'):

In [6]:
for country in countries:
    print('working on: '+ country)

#   First:
#   1. Aggregate the standard PRIO-Grid scale to a courser resolution
    country_and_year_dictionary = associate_country_years(data_working_copy, country)
    print('printing he country and year dictionary:')
    print(country_and_year_dictionary)
#-----------------------------------------------------------------------------------------------------------------------------------
#   Second:
#       1. Subset designated country in list. This requires examining country_id information and corresponding start and end years. 
#   Some countries contain more than one country_id. The functions employed in this section identify the most recent country range and 
#   subset the neccesary temporal ranges.
#-----------------------------------------------------------------------------------------------------------------------------------
#-----------------------------------------------------------------------------------------------------------------------------------
    cid_int = pull_from_c_y_dictionary(country_and_year_dictionary)
    subset_to_country = data_working_copy[data_working_copy['country_id'] == cid_int]

    display(subset_to_country.head(3))
    #concludes the AOI subsetting parameters----
    smoothed_dataframe_file = base_dir + '/data/generated/Smoothing/' + 'pgy_smoothing.csv'
    
    df_annual_cleaned = smoothed_dataframe(smoothed_dataframe_file, subset_to_country, df_pg_column='pg_id', df_year_column='year', smooth_pg_column='gid', smooth_year_column='year_id', smoothing_column='perca_Mean')
    df_annual_cleaned['perca_Mean'] = df_annual_cleaned['perca_Mean'].round(1)

    percentile_df = format_stats(df_annual_cleaned, 'perca_Mean')
    filtered_x = clean_percentile_table(percentile_df)
    insurance_table_df = insurance_table(filtered_x, df_annual_cleaned, ['90','95','98','99','100'], 'perca_Mean') #uses the default attribute = percapita_100k and appened_1_value = yes
    annual_summary = annual_summary_table(df_annual_cleaned, 'perca_Mean')
#-----------------------------------------------------------------------------------------------------------------------------------
#----- SET DIRECTORIES 
#-----------------------------------------------------------------------------------------------------------------------------------
#----- THIS SETS A DIRECTORY THAT IS UNIQUE FOR AGGREGATION METHOD ----
#-----------------------------------------------------------------------------------------------------------------------------------
#----- <<< working just with the 'Cell Year' Return Period Process >>>-
    output_path = base_dir + '/notebooks/methods/Country_Results/' + country + '/Smoothing/Cell Year/FAO tables/'
    ensure_directory_exists(output_path)
#-----------------------------------------------------------------------------------------------------------------------------------
    annual_summary_file_path = output_path + country + ' annual summary.csv'
    print(f'saving annual_summary table to: {annual_summary_file_path}')
#-----------------------------------------------------------------------------------------------------------------------------------
    insurance_table_file_path = output_path + country + ' insurance table.csv'
    print(f'saving insurance table to: {insurance_table_file_path}')
#-----------------------------------------------------------------------------------------------------------------------------------
    main_dataframe_file_path = output_path + ' main dataframe.csv'
    print(f'saving main dataframe table to: {main_dataframe_file_path}')
#-----------------------------------------------------------------------------------------------------------------------------------
#-----------------------------------------------------------------------------------------------------------------------------------
#----- NOW WE WRITE TO THE FOLDERS. -----------------------------------
    annual_summary.to_csv(annual_summary_file_path)
    insurance_table_df.to_csv(insurance_table_file_path)
    df_annual_cleaned.to_csv(main_dataframe_file_path)
#-----------------------------------------------------------------------------------------------------------------------------------

working on: Ethiopia
   c_id country_name  C_start_year  C_end_year       0
0    57     Ethiopia          1993        2050  132397
1   191     Ethiopia          1946        1993   22112
the length of country_ids for the selected country is: 2
printing he country and year dictionary:
{57: (1993, 2050), 191: (1946, 1993)}


Unnamed: 0,month_id,pg_id,country_name,C_start_year,C_end_year,pop_gpw_sum,ged_sb,ged_ns,ged_os,year,fatalities_sum,country_id
686393,161,135077,Ethiopia,1993,2050,15539.84082,0.0,0.0,0.0,1993,0.0,57
686395,161,135078,Ethiopia,1993,2050,23565.462891,0.0,0.0,0.0,1993,0.0,57
686397,161,135079,Ethiopia,1946,1993,32318.042969,0.0,0.0,0.0,1993,0.0,57


The smoothed dataframe contains the following columns: ['OBJECTID_1', 'Join_Count', 'TARGET_FID', 'gid', 'Shape_Length', 'Shape_Area', 'X_center', 'Y_center', 'percapita_100k', 'DIST_Mean', 'DIST_NNBRS', 'perca_Mean', 'perc_NNBRS', 'year_id']
the length of the smoothed country priogrid dataframe is: 11189
After joining...the length of the country priogrid dataframe is: 11189
Index where perca_Mean equals 1: 83.1: 1.0
Directory '/Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Smoothing/Cell Year/FAO tables' already exists.
saving annual_summary table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Smoothing/Cell Year/FAO tables/Ethiopia annual summary.csv
saving insurance table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Smoothing/Cell Year/FAO tables/Ethiopia insurance table.csv
saving main dataframe table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country

In [12]:

def smoothing_Country_Year_files(data, country_name):
    country_and_year_dictionary = associate_country_years(data, country_name)
#-----------------------------------------------------------------------------------------------------------------------------------
#   Second:
#       1. Subset designated country in list. This requires examining country_id information and corresponding start and end years. 
#   Some countries contain more than one country_id. The functions employed in this section identify the most recent country range and 
#   subset the neccesary temporal ranges.
#-----------------------------------------------------------------------------------------------------------------------------------
#-----------------------------------------------------------------------------------------------------------------------------------
    cid_int = pull_from_c_y_dictionary(country_and_year_dictionary)
    subset_to_country = data[data['country_id'] == cid_int]

    display(subset_to_country.head(3))
    #concludes the AOI subsetting parameters----
    smoothed_dataframe_file = base_dir + '/data/generated/Smoothing/' + 'pgy_smoothing.csv'
    
    df_annual_cleaned = smoothed_dataframe(smoothed_dataframe_file, subset_to_country, df_pg_column='pg_id', df_year_column='year', smooth_pg_column='gid', smooth_year_column='year_id', smoothing_column='perca_Mean')
    df_annual_cleaned['perca_Mean'] = df_annual_cleaned['perca_Mean'].round(1)

    percentile_df = format_stats(df_annual_cleaned, 'perca_Mean')
    filtered_x = clean_percentile_table(percentile_df)
    insurance_table_df = insurance_table(filtered_x, df_annual_cleaned, ['90','95','98','99','100'], 'perca_Mean') #uses the default attribute = percapita_100k and appened_1_value = yes
    annual_summary = annual_summary_table(df_annual_cleaned, 'perca_Mean')
#-----------------------------------------------------------------------------------------------------------------------------------
#----- SET DIRECTORIES 
#-----------------------------------------------------------------------------------------------------------------------------------
#----- THIS SETS A DIRECTORY THAT IS UNIQUE FOR AGGREGATION METHOD ----
#-----------------------------------------------------------------------------------------------------------------------------------
#----- <<< working just with the 'Cell Year' Return Period Process >>>-
    output_path = base_dir + '/notebooks/methods/Country_Results/' + country_name + '/Smoothing/Cell Year/FAO tables/'
    ensure_directory_exists(output_path)
#-----------------------------------------------------------------------------------------------------------------------------------
    annual_summary_file_path = output_path + country_name + ' annual summary.csv'
    print(f'saving annual_summary table to: {annual_summary_file_path}')
#-----------------------------------------------------------------------------------------------------------------------------------
    insurance_table_file_path = output_path + country_name + ' insurance table.csv'
    print(f'saving insurance table to: {insurance_table_file_path}')
#-----------------------------------------------------------------------------------------------------------------------------------
    main_dataframe_file_path = output_path + country_name + ' main dataframe.csv'
    print(f'saving main dataframe table to: {main_dataframe_file_path}')
#-----------------------------------------------------------------------------------------------------------------------------------
#-----------------------------------------------------------------------------------------------------------------------------------
#----- NOW WE WRITE TO THE FOLDERS. -----------------------------------
    annual_summary.to_csv(annual_summary_file_path)
    insurance_table_df.to_csv(insurance_table_file_path)
    df_annual_cleaned.to_csv(main_dataframe_file_path)
#-----------------------------------------------------------------------------------------------------------------------------------
    return(df_annual_cleaned, insurance_table_df, annual_summary)
#-----------------------------------------------------------------------------------------------------------------------------------
#-----------------------------------------------------------------------------------------------------------------------------------

In [13]:
df_annual_cleaned, insurance_table_df, annual_summary = smoothing_Country_Year_files(data_working_copy, 'Ethiopia')


   c_id country_name  C_start_year  C_end_year       0
0    57     Ethiopia          1993        2050  132397
1   191     Ethiopia          1946        1993   22112
the length of country_ids for the selected country is: 2


Unnamed: 0,month_id,pg_id,country_name,C_start_year,C_end_year,pop_gpw_sum,ged_sb,ged_ns,ged_os,year,fatalities_sum,country_id
686393,161,135077,Ethiopia,1993,2050,15539.84082,0.0,0.0,0.0,1993,0.0,57
686395,161,135078,Ethiopia,1993,2050,23565.462891,0.0,0.0,0.0,1993,0.0,57
686397,161,135079,Ethiopia,1946,1993,32318.042969,0.0,0.0,0.0,1993,0.0,57


The smoothed dataframe contains the following columns: ['OBJECTID_1', 'Join_Count', 'TARGET_FID', 'gid', 'Shape_Length', 'Shape_Area', 'X_center', 'Y_center', 'percapita_100k', 'DIST_Mean', 'DIST_NNBRS', 'perca_Mean', 'perc_NNBRS', 'year_id']
the length of the smoothed country priogrid dataframe is: 11189
After joining...the length of the country priogrid dataframe is: 11189
Index where perca_Mean equals 1: 83.1: 1.0
Directory '/Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Smoothing/Cell Year/FAO tables' already exists.
saving annual_summary table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Smoothing/Cell Year/FAO tables/Ethiopia annual summary.csv
saving insurance table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Smoothing/Cell Year/FAO tables/Ethiopia insurance table.csv
saving main dataframe table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country

#### You must run the Method for the Cell-Year process before the Event Year Return Period type can be generated.

In [18]:
for country in countries:

    df_annual_cleaned, insurance_table_df, annual_summary = smoothing_Country_Year_files(data_working_copy, country)

    #This cell is exclusively working with E_i values (Return Period by Cell)
    cumulative_distribution = calculate_cumulative_distribution(df_annual_cleaned, 'perca_Mean')
    #calculate_probabilities(cumulative_distribution, data, id_column='percapita_100k'):
    probabilities = calculate_probabilities(cumulative_distribution, df_annual_cleaned, 'pg_id')
    #print(probabilities)
    probabilities['E_i'] = calculate_expected_time_periods(probabilities['P_i'])
    # Calculate E_i^voxels
    probabilities['E_i_voxels'] = calculate_expected_voxels(probabilities['p_i'])
    probabilities_renamed = probabilities.rename(columns={'value': 'perca_Mean'})
    print(probabilities_renamed.head(100))

    probabilities_with_empirical = compare_empirical_vs_expected(df_annual_cleaned, probabilities_renamed, time_column='year', value_column = 'perca_Mean')

    probabilities_with_empirical_sorted = probabilities_with_empirical.sort_values(by='E_i_value')
    subset_E_i = probabilities_with_empirical_sorted[probabilities_with_empirical_sorted['E_i_value'] >= 4.0]
    insurance_from_E_i = insurance_table_for_E_i( [5,10,20,30], subset_E_i, df_annual_cleaned, value_field='perca_Mean')
    insurance_from_E_i = insurance_from_E_i.round({
        'Closest E_i': 1,
        'perca_Mean': 1,
    })
    
    print(insurance_from_E_i)

   c_id country_name  C_start_year  C_end_year       0
0    57     Ethiopia          1993        2050  132397
1   191     Ethiopia          1946        1993   22112
the length of country_ids for the selected country is: 2


Unnamed: 0,month_id,pg_id,country_name,C_start_year,C_end_year,pop_gpw_sum,ged_sb,ged_ns,ged_os,year,fatalities_sum,country_id
686393,161,135077,Ethiopia,1993,2050,15539.84082,0.0,0.0,0.0,1993,0.0,57
686395,161,135078,Ethiopia,1993,2050,23565.462891,0.0,0.0,0.0,1993,0.0,57
686397,161,135079,Ethiopia,1946,1993,32318.042969,0.0,0.0,0.0,1993,0.0,57


The smoothed dataframe contains the following columns: ['OBJECTID_1', 'Join_Count', 'TARGET_FID', 'gid', 'Shape_Length', 'Shape_Area', 'X_center', 'Y_center', 'percapita_100k', 'DIST_Mean', 'DIST_NNBRS', 'perca_Mean', 'perc_NNBRS', 'year_id']
the length of the smoothed country priogrid dataframe is: 11189
After joining...the length of the country priogrid dataframe is: 11189
Index where perca_Mean equals 1: 83.1: 1.0
Directory '/Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Smoothing/Cell Year/FAO tables' already exists.
saving annual_summary table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Smoothing/Cell Year/FAO tables/Ethiopia annual summary.csv
saving insurance table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Smoothing/Cell Year/FAO tables/Ethiopia insurance table.csv
saving main dataframe table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country