### Application: Develop a process to compute potential insurance payouts informed by conflict return period thresholds.

#### Contextual information

<span style="color:red">Guidance provided by Håvard</span>

1. Start out with a dataset with rows for each grid cell year 
2. Identify the cells for a country for each year that qualify to each of the thresholds and set the payout rate as a value in a column for each cell year 
3. Multiply the payout with the population in the cell in a new column to get the population-weighted payout 
4. Sum the population-weighted payout for each country year 
5. Divide the summed population-weighted payout for each country year by the country’s total population 

<span style="color:red">Sentence structure example provided by Jerry</span>

To make this a bit more clear, let's assume you have two grids that have 100 percent payout rate. These grids each have 1 percent of the countries population. The national number is = 100 x.01 + 100 x .01 or 2 percent. That 2 percent value represents the payout rate for the nation. The protection level of $1M for the national would pay 2 percent of $1M.  

<span style="color:red">Key difference to explore</span>

what effect does the order of operations have, to adjust calculating proportional population as the third task rather than fifth, on the resulting payout metric?


### Locate necessary files:

The function used to generate the complete payout table requires 5 parameters. To keep this focus constrained to the processes generating the final payout value, preceding steps to compute the necessary tables are ommitted. If you have an interest in exploring this information, the Benz_Graphics branch is updated and contains an .ipynb referenced to generate all tables and infographics. 

**base files relevant for this review:**
- x: <span style="color:lightblue">/.../VIEWS_FAO_index/notebooks/methods/Proof_For_Summary_Table/Example_dataframe.csv</span>
- y: <span style="color:lightblue">/.../VIEWS_FAO_index/notebooks/methods/Proof_For_Summary_Table/y__annual_summary_intensity.csv</span>
- z: <span style="color:lightblue">/.../VIEWS_FAO_index/notebooks/methods/Proof_For_Summary_Table/z__raw_insurance_table.csv</span>
- info_df: <span style="color:lightblue">/.../VIEWS_FAO_index/notebooks/methods/Proof_For_Summary_Table/Example_return_period_ranges.csv</span>

#### Address the requirement for multiple input tables:

- `x` is the main DataFrame. It contains the most granular data, with fields such as [pg_id, year, fatalities_sum, pop_gpw_sum, percapita_100k]. This represents the most disaggregated information.

- `y` contains summary statistics, including fields like 'max' and 'average'. This table was initially requested by Jerry on XX to complement specific analysis needs.

- `z` is the original, unrevised "insurance payout table." It communicates the floor thresholds associated with each return period and the respective payout rates. 

- `info_df` is used primarily for formatting graphics and aids in deriving a range that facilitates feature engineering.

These seemingly arbitrary names are intentionally chosen. Using easily distinguishable variable names facilitates the differentiation between tables that contain closely related information. For instance, the term "annual table" could apply to both **`y`** and the resulting payout table, which may lead to confusion. Similarly, a variable named "insurance" could be misinterpreted as referring to either **`info_df`** or **`z`**. Each table contains unique fields that need to remain distinguishable, yet all are collectively integrated in the final payout table through the `append_return_periods_to_annual_table` function.

Table y offers the least intuitive contribution to the final payout table; We aim to retain the ability to sort the comprehensive table by magnitude or intensity, ensuring flexibility in analysis.

#### Load the csv files

In [None]:
import os
import pandas as pd

#SET PATH TO FILES
#--------------------------------------------------------------------------------------------
main_dir = os.getcwd()
#--------------------------------------------------------------------------------------------
#Load files:
example_dataframe_path = os.path.join(main_dir, 'Example_dataframe.csv')
example_return_period_ranges = os.path.join(main_dir, 'Example_return_period_ranges.csv')
#--------------------------------------------------------------------------------------------
original_insurance_table = os.path.join(main_dir, 'z__raw_insurance_table.csv')
example_return_period_ranges = os.path.join(main_dir, 'Example_return_period_ranges.csv')
#--------------------------------------------------------------------------------------------
annual_intensity_table = os.path.join(main_dir, 'y__annual_summary_intensity.csv')
#--------------------------------------------------------------------------------------------
# access the files
x = pd.read_csv(example_dataframe_path, index_col=None)
y = pd.read_csv(annual_intensity_table, index_col=None)
#--------------------------------------------------------------------------------------------
z = pd.read_csv(original_insurance_table, index_col=None)
filtered_info = pd.read_csv(example_return_period_ranges, index_col=None)
#--------------------------------------------------------------------------------------------
#--------------------------------------------------------------------------------------------
print('This provides a review of x:')
print()
display(x.head(5))
print()
print('This provides a review of y')
display(y)
print()
print('This provides a review of z:')
display(z)
print()
print('This provides a review of filtered_info:')
display(filtered_info)

### Load the function: `append_return_periods_to_annual_table`

In addition to the previously defined DataFrames x, y, z, and filtered_info, two additional parameters must be specified when using this function.

- **Attribute Designation:** The user is required to identify the attribute that corresponds to the construction of the x, y, z, and filtered_info tables. In this application, the relevant field used is <span style="color:red">'fatalities_sum'</span> .

- **Population Field Reference:** The user must also provide the name of the population field. This is important as it ensures that the code remains adaptable for scenarios where different population data might be utilized in the future, requiring a unique field name distinct from the one currently being evaluated. The field for this use-case is <span style="color:red"> pop_gpw_sum </span> 

- **Return Period Statistic** Similarly, designate the return period type. The parameter expects one of two strings:
    - Country year -- this is the 'little p' method 
    - Event year -- this is the 'big p' method
    - This analysis employs the 'little p' method so <span style="color:red">Country year</span> will be inserted as the parameter

Information on these methods is present in the glossary markdown file located in VIEWS_FAO_INDEX/docs

In [None]:
from generate_payout_table import append_return_periods_to_annual_table

#### Run the function:

Proceeding steps will then break down the two processes considered

In [None]:
payout_table = append_return_periods_to_annual_table(x, y, z, filtered_info, 'fatalities_sum', 'pop_gpw_sum', 'Country year')

display(payout_table)

1. Start out with a dataset with rows for each grid cell year 
2. <span style="color:yellow">Identify the cells for a country for each year that qualify to each of the thresholds and set the payout rate as a value in a column for each cell year</span>
3. Multiply the payout with the population in the cell in a new column to get the population-weighted payout 
4. Sum the population-weighted payout for each country year 
5. Divide the summed population-weighted payout for each country year by the country’s total population 

In [None]:
#Example dataset ()
x_year = x[x['year'] == 2020]

x_100yr_events = x_year[(x_year['fatalities_sum'] >= 31.0) & 
                (x_year['fatalities_sum'] < 100000.0)]

print('shows rows meeting the range corresponding to the 1 in 100 year thresholds:')
display(x_100yr_events)

x_50yr_events = x_year[(x_year['fatalities_sum'] >= 8.0) & 
                (x_year['fatalities_sum'] < 31.0)]

print('shows rows meeting the range corresponding to the 1 in 50 year thresholds:')
display(x_50yr_events)

x_20yr_events = x_year[(x_year['fatalities_sum'] >= 1.0) & 
                (x_year['fatalities_sum'] < 8.0)]

print('shows rows meeting the range corresponding to the 1 in 20 year thresholds:')
display(x_20yr_events)

1. Start out with a dataset with rows for each grid cell year 
2. Identify the cells for a country for each year that qualify to each of the thresholds and set the payout rate as a value in a column for each cell year
3. <span style="color:yellow">Multiply the payout with the population in the cell in a new column to get the population-weighted payout</span>
4. Sum the population-weighted payout for each country year 
5. Divide the summed population-weighted payout for each country year by the country’s total population 

In [None]:
print('After adding the population weighted column the dataframes now contain:')

x_100yr_events['population_weighted_payout'] = x_100yr_events['pop_gpw_sum'] * 1.00
display(x_100yr_events)

x_50yr_events['population_weighted_payout'] = x_50yr_events['pop_gpw_sum'] * .75
display(x_50yr_events)

x_20yr_events['population_weighted_payout'] = x_20yr_events['pop_gpw_sum'] * .55
display(x_20yr_events)


1. Start out with a dataset with rows for each grid cell year 
2. Identify the cells for a country for each year that qualify to each of the thresholds and set the payout rate as a value in a column for each cell year
3. Multiply the payout with the population in the cell in a new column to get the population-weighted payout
4. <span style="color:yellow">Sum the population-weighted payout for each country year</span>
5. Divide the summed population-weighted payout for each country year by the country’s total population 

In [None]:
x_100_sum_population_weighted_payout = x_100yr_events['population_weighted_payout'].sum()
print(f'The sum of population weighted payout for 1 in 100 year cells is: {x_100_sum_population_weighted_payout}')

x_50_sum_population_weighted_payout = x_50yr_events['population_weighted_payout'].sum()
print(f'The sum of population weighted payout for 1 in 50 year cells is: {x_50_sum_population_weighted_payout}')

x_20_sum_population_weighted_payout = x_20yr_events['population_weighted_payout'].sum()
print(f'The sum of population weighted payout for 1 in 50 year cells is: {x_20_sum_population_weighted_payout}')

1. Start out with a dataset with rows for each grid cell year 
2. Identify the cells for a country for each year that qualify to each of the thresholds and set the payout rate as a value in a column for each cell year
3. Multiply the payout with the population in the cell in a new column to get the population-weighted payout
4. Sum the population-weighted payout for each country year
5. <span style="color:yellow">Divide the summed population-weighted payout for each country year by the country’s total population</span>

In [None]:
total_population = x[(x['year'] == 2020)]['pop_gpw_sum'].sum()

x_100_payout_percentage = x_100_sum_population_weighted_payout / total_population
print(x_100_payout_percentage)

x_50_payout_percentage = x_50_sum_population_weighted_payout / total_population
print(x_50_payout_percentage)

x_20_payout_percentage = x_20_sum_population_weighted_payout / total_population
print(x_20_payout_percentage)

#### Compare this result with the product derived from the function

The fields within the `payout_table` to compare against the observed results of this sanity check are
- pay_weight_100
- pay_weight_50
- pay_weight 20

Keep in mind these fields are already expressed as a percentage; the results should vary by two (.01) decimal places. To make the values comparable, the data sourced from the function will be converted to the consistent format.

In [None]:
#subset 2020
payout_table_2020 = payout_table[payout_table['year'] == 2020]

display(payout_table_2020)

#compare the results of appropriate fields:
fromfunction__pay_weight_100 = payout_table_2020.iloc[0]['pay weight 100'] * 0.01
fromfunction__pay_weight_50 = payout_table_2020.iloc[0]['pay weight 50'] * 0.01
fromfunction__pay_weight_20 = payout_table_2020.iloc[0]['pay weight 20'] * 0.01

print(f'Compare the 1 in 100 year result from this sanity test {x_100_payout_percentage} agains the result produced from the function {fromfunction__pay_weight_100}')
print()
print(f'Compare the 1 in 100 year result from this sanity test {x_50_payout_percentage} agains the result produced from the function {fromfunction__pay_weight_50}')
print()
print(f'Compare the 1 in 100 year result from this sanity test {x_20_payout_percentage} agains the result produced from the function {fromfunction__pay_weight_20}')



### Excellent, the results prove that both procedures yeild the same result. 