## Standard Method Notebook

### Overview: 

##### Import necessary functions

These functions are stored in src/utils and contain intuitive folders hosting functions. The working directory should be set to the VIEWS_FAO_INDEX folder to enable access to retrieve these .py files.

In [1]:
import pandas as pd
import os
import sys
from pathlib import Path
import matplotlib.pyplot as plt
# # Get the current working directory
# current_directory = os.getcwd()

# # Print the current working directory
# print("The current Working Directory is:", current_directory)

# # Get the path to the base directory (VIEWS_FAO_index)
# base_dir = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))
# print(f'The base directory will be set to: {base_dir}')

# # Add the base directory to sys.path
# sys.path.insert(0, base_dir)


In [2]:
# Jupyter Notebook solution  --------------------------------------------------------------------------------------------
notebook_dir = os.getcwd() # notebook specific
notebook_name = "reference_standard.ipynb" # notebook specific name

PATH = Path(notebook_dir) / Path(notebook_name) # notebook specific

# alt script version -----------------------------------------------------------------------------------------------------
# PATH = Path(__file__)

# Common for notebooks and scripts alike
sys.path.insert(0, str(Path(*[i for i in PATH.parts[:PATH.parts.index("VIEWS_FAO_index")+1]]) / "src/utils"))   

from set_paths import setup_project_paths, get_logo_path, get_data_paths, setup_root_paths, get_plot_path
setup_project_paths(PATH)

## Access public ViEWS data to support `1x1` analysis:

### Consult the  `starter_notebook.ipynb` for information on running `main.py`. Doing so is a precondition for operating this notebook.

**If you have not yet run `main.py`, please do so now.** You can, of course, get the VIEWS-FAO data through the Google Cloud Storage links provided in the repository's root README, but the `main.py` interface will give you more choices and information. For instance, it will allow you to check whether your data is in the expected directory, how your library versions compare to those used by the developers of this repository, and allow you to download and process the data directly. This currently takes as long as 20 minutes because a host of different tests are applied to check the integrity of the data and the transformations involved in the processing steps. Future versions will allow you to forego these tests, but as we are still in very active development, the tests are considered mandatory.

**Welcome back!** Hopefully, `main.py` was able to execute flawlessly, and you're all set up now. If you were not able to execute `main.py` to your satisfaction, see the root README file for contact information for the project owner and team lead of this repository.

**Let's move on!**

In [3]:
PATH_RAW_VIEWSER, PATH_RAW_EXTERNAL, PATH_PROCESSED, PATH_GENERATED = get_data_paths(PATH)
PATH_ROOT = setup_root_paths(PATH)

# lest print the paths
print(PATH_RAW_VIEWSER)
print(PATH_RAW_EXTERNAL)
print(PATH_PROCESSED)
print(PATH_GENERATED) 
#-----
print(PATH_ROOT)

/Users/gbenz/Documents/VIEWS_FAO_index/data/raw_viewser
/Users/gbenz/Documents/VIEWS_FAO_index/data/raw_external
/Users/gbenz/Documents/VIEWS_FAO_index/data/processed
/Users/gbenz/Documents/VIEWS_FAO_index/data/generated
/Users/gbenz/Documents/VIEWS_FAO_index


In [4]:
#Functions necessary for all methods:
from src.utils.universal_functions.setup.generate_base_file import give_primary_frame
from src.utils.universal_functions.setup.associate_country_id import associate_country_years, pull_from_c_y_dictionary
from src.utils.universal_functions.setup.build_directory import ensure_directory_exists

#Functions to generate the insurance table requested by FAO partners:
from src.utils.universal_functions.FAO_table_formatting.calculate_percentiles import format_stats, clean_percentile_table
from src.utils.universal_functions.FAO_table_formatting.generate_output_tables import insurance_table, annual_summary_table

#Functions bespoke to standard method:
from src.utils.functions_for_method_standard.standard_per_capita_fatalities import native_per_capita_fatalities

#function to calculate Ei & big P return period
from src.utils.functions_for_single_cell_return_period.cell_return_period import calculate_cumulative_distribution, calculate_probabilities, calculate_expected_time_periods, calculate_expected_voxels, compare_empirical_vs_expected
from src.utils.functions_for_single_cell_return_period.Ei_insurance_table_setup import insurance_table_for_E_i

#summary function that is templated by preceding code:
from src.utils.functions_for_return_periods.insurance_products_for_RP import standard_Country_Year_files, standard_Event_Year_files

The current Working Directory is: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods
The base directory will be set to: /Users/gbenz/Documents/VIEWS_FAO_index
The current Working Directory is: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods
The base directory will be set to: /Users/gbenz/Documents/VIEWS_FAO_index
The current Working Directory is: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods
The base directory will be set to: /Users/gbenz/Documents/VIEWS_FAO_index


### First, we define our primary dataframe. 

The variable 'data' is developed from two querysets 1. Fatalities_fao_pgm and 2. cm_properties which are required in order to capture fatality attributes and unique country identifiers, including start and end years for boundary changes.

Year parameters can be adapted to incorporate future annual releases of Uppsala Conflict Data Program (UCDP) Geospatial Events Database (GED) data or constrain the temporal window for unique applications. The recommendation is to capture data from the full extent, which at the time of this release is 1989 - 2023.

In [None]:
df_monthly = pd.read_pickle(PATH_GENERATED / "df_monthly_country_return_periods.pkl")
df_yearly = pd.read_pickle(PATH_GENERATED / "df_yearly_country_return_periods.pkl")

#### Alternatively, if you are in the ViEWS family operating this notebook. The data may be retrieved directly from our server:

In [5]:
data = give_primary_frame('Fatalities_fao_pgm', 'cm_properties', 1988, 2023)

100%|██████████| 40.0M/40.0M [00:02<00:00, 16.3MB/s]


Queryset Fatalities_fao_pgm read successfully 
Queryset cm_properties read successfully 




['month_id', 'pg_id', 'country_name', 'C_start_year', 'C_end_year', 'pop_gpw_sum', 'ged_sb', 'ged_ns', 'ged_os', 'year', 'fatalities_sum', 'country_id']


#### Make a copy of the variable data to keep a clean version accesible.

In [6]:
data_working_copy = data

#### Specifiy countries to investigate

In [7]:
countries = ['Burkina Faso']

#### Process for Country Year Return Period:

In [12]:
for country in countries:
    print('working on: '+ country)

#   First:
#   1. Aggregate the standard PRIO-Grid scale to a courser resolution
    country_and_year_dictionary = associate_country_years(data_working_copy, country)
    print('printing he country and year dictionary:')
    print(country_and_year_dictionary)
#-----------------------------------------------------------------------------------------------------------------------------------
#   Second:
#       1. Subset designated country in list. This requires examining country_id information and corresponding start and end years. 
#   Some countries contain more than one country_id. The functions employed in this section identify the most recent country range and 
#   subset the neccesary temporal ranges.
#-----------------------------------------------------------------------------------------------------------------------------------
#-----------------------------------------------------------------------------------------------------------------------------------
    cid_int = pull_from_c_y_dictionary(country_and_year_dictionary)
    subset_to_country = data_working_copy[data_working_copy['country_id'] == cid_int]
    conflict_profile = {col: subset_to_country[col].sum() for col in ['ged_sb', 'ged_ns', 'ged_os', 'fatalities_sum']}

    df_annual = native_per_capita_fatalities(subset_to_country)
    df_annual['percapita_100k'] = df_annual['percapita_100k'].round(1)

    percentile_df = format_stats(df_annual, field_to_describe='fatalities_sum') #or 'percapita_100k'
    filtered_x = clean_percentile_table(percentile_df)
    insurance_table_df = insurance_table(filtered_x, df_annual, ['90','95','98','99','100'], attribute_to_explore= 'fatalities_sum') #uses the default attribute = percapita_100k and appened_1_value = yes
    annual_summary = annual_summary_table(df_annual, 'standard', fat_or_pcf='fatalities_sum') #or 'percapita_100k'
#-----------------------------------------------------------------------------------------------------------------------------------
#----- SET DIRECTORIES 
#-----------------------------------------------------------------------------------------------------------------------------------
#----- THIS SETS A DIRECTORY THAT IS UNIQUE UNIQUE STANDARD METHOD ----
#-----------------------------------------------------------------------------------------------------------------------------------
#----- <<< working just with the 'Cell Year' Return Period Process >>>-
    output_path = PATH_ROOT / 'notebooks/methods/Country_Results' / country / 'Standard/Country Year/FAO tables/'
    ensure_directory_exists(output_path)
#-----------------------------------------------------------------------------------------------------------------------------------
    annual_summary_file_path = output_path /  f'{country} annual summary.csv'
    print(f'saving annual_summary table to: {annual_summary_file_path}')
#-----------------------------------------------------------------------------------------------------------------------------------
    insurance_table_file_path = output_path / f'{country} insurance table.csv'
    print(f'saving insurance table to: {insurance_table_file_path}')
#-----------------------------------------------------------------------------------------------------------------------------------
    main_dataframe_file_path = output_path / f'{country} main dataframe.csv'
    print(f'saving main dataframe table to: {main_dataframe_file_path}')
#-----------------------------------------------------------------------------------------------------------------------------------
#-----------------------------------------------------------------------------------------------------------------------------------
#----- NOW WE WRITE TO THE FOLDERS. -----------------------------------
    annual_summary.to_csv(annual_summary_file_path)
    insurance_table_df.to_csv(insurance_table_file_path)
    df_annual.to_csv(main_dataframe_file_path)

working on: Burkina Faso
   c_id  country_name  C_start_year  C_end_year      0
0    47  Burkina Faso          1960        2050  35496
the length of country_ids for the selected country is: 1
printing he country and year dictionary:
{47: (1960, 2050)}
Index where fatalities_sum equals 1: 94.2: 1.0
Directory '/Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Burkina Faso/Standard/Country Year/FAO tables' already exists.
saving annual_summary table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Burkina Faso/Standard/Country Year/FAO tables/Burkina Faso annual summary.csv
saving insurance table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Burkina Faso/Standard/Country Year/FAO tables/Burkina Faso insurance table.csv
saving main dataframe table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Burkina Faso/Standard/Country Year/FAO tables/Burkina Faso main dataframe.csv


In [13]:
conflict_profile, df_annual, annual_summary, insurance_table_df = standard_Country_Year_files(data_working_copy, 'Ethiopia', 'fatalities_sum')

   c_id country_name  C_start_year  C_end_year       0
0    57     Ethiopia          1993        2050  132397
1   191     Ethiopia          1946        1993   22112
the length of country_ids for the selected country is: 2
printing he country and year dictionary:
{57: (1993, 2050), 191: (1946, 1993)}
Index where fatalities_sum equals 1: 94.5: 1.0
Directory '/Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Standard/Country Year/FAO tables' already exists.
saving annual_summary table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Standard/Country Year/FAO tables/Ethiopia annual summary.csv
saving insurance table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Standard/Country Year/FAO tables/Ethiopia insurance table.csv
saving main dataframe table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Standard/Country Year/FAO tables/Ethiopia main da

#### Process for Event Year Return Period:

In [14]:
for country in countries:

    conflict_profile, df_annual, annual_summary, insurance_table_df = standard_Country_Year_files(data, country, 'fatalities_sum') #or 'percapita_100k'

    df_annual = df_annual.rename(columns={'GIS__Index': 'priogrid_gid'})
    cumulative_distribution = calculate_cumulative_distribution(df_annual, 'fatalities_sum') #or 'percapita_100k'

    #calculate_probabilities(cumulative_distribution, data, id_column='percapita_100k'):
    probabilities = calculate_probabilities(cumulative_distribution, df_annual, 'pg_id')
    #print(probabilities)
    probabilities['E_i'] = calculate_expected_time_periods(probabilities['P_i'])
    # Calculate E_i^voxels
    probabilities['E_i_voxels'] = calculate_expected_voxels(probabilities['p_i'])
    probabilities_renamed = probabilities.rename(columns={'value': 'fatalities_sum'}) #or 'percapita_100k'
    print(probabilities_renamed.head(100))

    probabilities_with_empirical = compare_empirical_vs_expected(df_annual, probabilities_renamed, time_column='year', value_column='fatalities_sum') #or 'percapita_100k'
    probabilities_with_empirical_sorted = probabilities_with_empirical.sort_values(by='E_i_value')
    subset_E_i = probabilities_with_empirical_sorted[probabilities_with_empirical_sorted['E_i_value'] >= 4.0]
    insurance_from_E_i = insurance_table_for_E_i([5,10,20,30], subset_E_i, df_annual, value_field='fatalities_sum') #or 'percapita_100k'
    insurance_from_E_i = insurance_from_E_i.round({
        'closest r.p.': 1,
        'fatalities_sum': 1, #or 'percapita_100k'
    })
#-----------------------------------------------------------------------------------------------------------------------------------
#----- SET DIRECTORIES 
#-----------------------------------------------------------------------------------------------------------------------------------
#----- THIS SETS A DIRECTORY THAT IS UNIQUE FOR AGGREGATION METHOD ----
#-----------------------------------------------------------------------------------------------------------------------------------
#----- <<< working just with the 'Cell Year' Return Period Process >>>-
    output_path = PATH_ROOT / 'notebooks/methods/Country_Results' / country / 'Standard/Event year/FAO tables/'
    ensure_directory_exists(output_path)
#-----------------------------------------------------------------------------------------------------------------------------------
    E_i_insurance_table_file_path = output_path / f'{country} Event year insurance table.csv'
    print(f'saving insurance table to: {E_i_insurance_table_file_path}')
#-----------------------------------------------------------------------------------------------------------------------------------
    event_year_probabilities_file_path = output_path / 'Event year probabilities.csv'
    print(f'saving main dataframe table to: {event_year_probabilities_file_path}')
#-----------------------------------------------------------------------------------------------------------------------------------
#-----------------------------------------------------------------------------------------------------------------------------------
#----- NOW WE WRITE TO THE FOLDERS. -----------------------------------
    insurance_from_E_i.to_csv(E_i_insurance_table_file_path)
    probabilities_with_empirical_sorted.to_csv(event_year_probabilities_file_path)


   c_id  country_name  C_start_year  C_end_year      0
0    47  Burkina Faso          1960        2050  35496
the length of country_ids for the selected country is: 1
printing he country and year dictionary:
{47: (1960, 2050)}
Index where fatalities_sum equals 1: 94.2: 1.0
Directory '/Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Burkina Faso/Standard/Country Year/FAO tables' already exists.
saving annual_summary table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Burkina Faso/Standard/Country Year/FAO tables/Burkina Faso annual summary.csv
saving insurance table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Burkina Faso/Standard/Country Year/FAO tables/Burkina Faso insurance table.csv
saving main dataframe table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Burkina Faso/Standard/Country Year/FAO tables/Burkina Faso main dataframe.csv
    fatalities_sum       p_i      

In [15]:
conflict_profile, df_annual, annual_summary, insurance_from_E_i = standard_Event_Year_files(data_working_copy, 'Ethiopia', 'Standard', 'Event year', 'fatalities_sum')

   c_id country_name  C_start_year  C_end_year       0
0    57     Ethiopia          1993        2050  132397
1   191     Ethiopia          1946        1993   22112
the length of country_ids for the selected country is: 2
printing he country and year dictionary:
{57: (1993, 2050), 191: (1946, 1993)}
Index where fatalities_sum equals 1: 94.5: 1.0
Directory '/Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Standard/Country Year/FAO tables' already exists.
saving annual_summary table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Standard/Country Year/FAO tables/Ethiopia annual summary.csv
saving insurance table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Standard/Country Year/FAO tables/Ethiopia insurance table.csv
saving main dataframe table to: /Users/gbenz/Documents/VIEWS_FAO_index/notebooks/methods/Country_Results/Ethiopia/Standard/Country Year/FAO tables/Ethiopia main da