# rCSI time-series dataset released by WFP

The complete version of the data released by WFP regarding the **Reduced Coping Strategy Index (rCSI)**. In this notebook I analyze two versions of data released by WFP. The second version (it will be the first version to be analyzed) is a data collection at daily level of the rcsi indicator, while the first version of the data is a data collection at at monthly level. The first version of the data is already analyzed in the notebook of the demo version, here, the first version, it will be used only to treat the rcsi data of the Yemen country.

In [1]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')

In [2]:
from scipy.interpolate import splrep, splev
from plotly_dataframe import plot, plot_comparison
import pandas as pd
import numpy as np
import ntpath
import glob

In [1]:
# Define the path where the results arising from this analysis will be saved.
path_to_save_data = "./time-series/"

# 2° version - day granularity - YEM, NGA, SYR, BFA

In this version the rcsi data of four countries are provided: *Yemen* (YEM), *Nigeria* (NGA), *Syria* (SYR) and *Burkina Faso* (BFA).

In [4]:
# Read the data released by wfp regarding the rCSI indicator.
path = "./wfp_data/"
all_files = glob.glob(path + "*.csv")

dfs = []

for filename in all_files:
    df = pd.read_csv(filename)
    df["Country"] = ntpath.basename(filename).split(".")[0]
    dfs.append(df)

In [5]:
print("The data released by wfp:")
df = pd.concat(dfs, axis = 0, ignore_index = True)
df.head()

The data released by wfp:


Unnamed: 0,SvyDate,Date,Dmgrph,DmgrphCode,Mean_crrnt,Pop,PopNmbr,Country
0,2019-7-15_2019-9-8,2019-09-08,BOUCLE-DU-MOUHOUN,900712.0,19.499504,1976217.0,385352.0,Burkina Faso
1,2019-7-15_2019-9-8,2019-09-08,CASCADES,900713.0,8.430561,822445.0,69336.0,Burkina Faso
2,2019-7-15_2019-9-8,2019-09-08,CENTRE,900714.0,7.589967,2854356.0,216644.0,Burkina Faso
3,2019-7-15_2019-9-8,2019-09-08,CENTRE-EST,900715.0,20.051049,2854356.0,572328.0,Burkina Faso
4,2019-7-15_2019-9-8,2019-09-08,CENTRE-NORD,900716.0,18.757613,1687858.0,316601.0,Burkina Faso


### Brief items description

- *SvyDate*: reference period of the data collection.
- *Date*: reference date (i.e. the end of the reference period). 
- *Dmgrph*: administrative area name.
- *DmgrphCode*: a code for identyfing the adminstrata.
- *Mean_crrnt*: corresponds to the % of people with rCSI>=19. 
- *Pop*: area population size.
- *PopNmbr*: number of people with rCSI>=19 (i.e. Mean_crrnt * Pop).

In [6]:
# Check if the dataframe contains NaN values.
print("Check if the dataframe contains NaN values:")
df.isnull().sum()

Check if the dataframe contains NaN values:


SvyDate         0
Date            0
Dmgrph          0
DmgrphCode    524
Mean_crrnt      2
Pop             2
PopNmbr         2
Country         0
dtype: int64

In [7]:
# Let's delete the item 'DmgrphCode' because it is not of interest.
df.drop(["DmgrphCode"], axis = 1, inplace = True) 

In [8]:
# Rename some columns.
df.rename(columns = {"Date": "Datetime", "Dmgrph": "AdminStrata", "Mean_crrnt": "Metric"}, inplace = True)

### Country item

In [9]:
print("The countries are:", ", ".join(df.Country.unique()))

The countries are: Burkina Faso, Nigeria, Syria, Yemen


### AdminStrata item

In [10]:
AdminStratas = df.groupby("Country")["AdminStrata"].unique()

- **Yemen**: the administrative division of Yemen is divided into two main divisions (governorates and districts). There are 22 governorates, including the capital Sana'a (Amanat Al Asimah) and Socotra Archipelago. The rCSI dataframe has the values of all the 22 governorates.

In [11]:
print(AdminStratas["Yemen"].shape)
AdminStratas["Yemen"]

(22,)


array(['Abyan', 'Aden', 'Al Bayda', "Al Dhale'e", 'Al Hudaydah',
       'Al Jawf', 'Al Maharah', 'Al Mahwit', 'Amanat Al Asimah', 'Amran',
       'Dhamar', 'Hadramaut', 'Hajjah', 'Ibb', 'Lahj', 'Marib', 'Raymah',
       "Sa'ada", "Sana'a", 'Shabwah', 'Socotra', 'Taizz'], dtype=object)

- **Nigeria**: Nigeria is divided into 36 states. It also includes *The Federal Capital Territory* that is not a state and it is under the direct control of the federal government. The AdminStrata items of the rCSI dataframe cover 3 states (Adamawa, Borno and Yobe) and each of these state is diveded into three parts: North, South, Central.

In [12]:
print(AdminStratas["Nigeria"].shape)
AdminStratas["Nigeria"]

(9,)


array(['Adamawa Central', 'Adamawa North', 'Adamawa South',
       'Borno Central', 'Borno North', 'Borno South', 'Yobe East',
       'Yobe North', 'Yobe South'], dtype=object)

- **Syria**: the governatores of the Syria country are 14. The rCSI dataframe has the values of 13 governorates (Idlib governatore not included).

In [13]:
print(AdminStratas["Syria"].shape)
AdminStratas["Syria"]

(13,)


array(['Al-Hasakeh', 'Aleppo', 'As-Sweida', 'Damascus', "Dar'a", 'Hama',
       'Homs', 'Lattakia', 'Rural Damascus', 'Tartous', 'Ar-Raqqa',
       'Deir-ez-Zor', 'Quneitra'], dtype=object)

- **Burkina Faso**: Burkina Faso is divided into 13 administrative regions. The rCSI dataframe has the values of all 13 regions.

In [14]:
print(AdminStratas["Burkina Faso"].shape)
AdminStratas["Burkina Faso"]

(13,)


array(['BOUCLE-DU-MOUHOUN', 'CASCADES', 'CENTRE', 'CENTRE-EST',
       'CENTRE-NORD', 'CENTRE-OUEST', 'CENTRE-SUD', 'EST',
       'HAUTS-BASSINS', 'NORD', 'PLATEAU-CENTRAL', 'SAHEL', 'SUD-OUEST'],
      dtype=object)

In [15]:
# The strings of AdminStrata key for Burkina Faso country are uppercase, I change them to obtain lowercase with first capital letter.
def to_lower(country, admin):
    if country == "Burkina Faso":
        admin = admin.lower().title()      
    return admin
    
df["AdminStrata"] = df[["Country", "AdminStrata"]].apply(lambda x: to_lower(*x), axis = 1)

In [16]:
# Check the min e max values of the Metric.
print("The min and max values of the Metric:")
print(df.Metric.min(), ",", df.Metric.max())

The min and max values of the Metric:
0.0 , 99.97458240652935


In [17]:
# Creation of an accurate datetime format (not consider the item 'SvyDate').
df.drop(["SvyDate"], axis = 1, inplace = True) 
df["Datetime"] = pd.to_datetime(df["Datetime"])
df.sort_values("Datetime", ascending = True, inplace = True) # Sort on datetime.
df = df.groupby(["Country", "AdminStrata"]).apply(lambda group: group.set_index("Datetime").resample("D").mean()).reset_index()
df.reset_index(drop = True, inplace = True)
df.head()

Unnamed: 0,Country,AdminStrata,Datetime,Metric,Pop,PopNmbr
0,Burkina Faso,Boucle-Du-Mouhoun,2019-09-08,19.499504,1976217.0,385352.0
1,Burkina Faso,Boucle-Du-Mouhoun,2019-09-09,19.567379,1976217.0,386693.0
2,Burkina Faso,Boucle-Du-Mouhoun,2019-09-10,20.400684,1976217.0,403161.0
3,Burkina Faso,Boucle-Du-Mouhoun,2019-09-11,20.405577,1976217.0,403258.0
4,Burkina Faso,Boucle-Du-Mouhoun,2019-09-12,19.238527,1976217.0,380195.0


In [18]:
# Create a dataframe with multi index column in order to have a summary dataframe of the time-series.
df.drop(labels = ["Pop", "PopNmbr"], axis = 1, inplace = True)
df = df.set_index(["Datetime", "Country", "AdminStrata"]).unstack(["Country", "AdminStrata"])
df.columns = df.columns.droplevel(0)
df.columns = pd.MultiIndex.from_tuples(list(map(lambda x: tuple(list(x) + ["rCSI"]), df.columns)))
df.columns.rename("Country", level = 0, inplace = True)
df.columns.rename("AdminStrata", level = 1, inplace = True)
df.columns.rename("Indicator", level = 2, inplace = True)
freq = "D"
df.index.freq = freq
df.head()

Country,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,...,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen
AdminStrata,Boucle-Du-Mouhoun,Cascades,Centre,Centre-Est,Centre-Nord,Centre-Ouest,Centre-Sud,Est,Hauts-Bassins,Nord,...,Hajjah,Ibb,Lahj,Marib,Raymah,Sa'ada,Sana'a,Shabwah,Socotra,Taizz
Indicator,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,...,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI
Datetime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
2018-08-22,,,,,,,,,,,...,66.69857,60.797456,44.782684,60.490004,56.3011,46.435518,52.667849,34.3022,38.303104,50.919038
2018-08-23,,,,,,,,,,,...,61.331133,61.01165,44.038726,59.568557,57.324106,45.261947,50.214606,36.742499,39.545071,50.415852
2018-08-24,,,,,,,,,,,...,62.286205,62.032249,43.296925,58.277608,57.417614,44.520462,50.073157,35.857748,41.867782,49.391298
2018-08-25,,,,,,,,,,,...,59.525377,60.993741,42.149079,56.908462,55.324169,46.34594,48.791655,34.658434,48.585591,50.302392
2018-08-26,,,,,,,,,,,...,59.384136,58.80024,42.146629,57.722567,57.666522,43.294599,48.83964,35.91308,48.662394,50.293046


In [19]:
plot(df, title = "rCSI original (2° version - daily)", yaxis = "% of people with rCSI >= 19", 
     first_last_valid_index_group = True)

interactive(children=(ToggleButtons(description='Country', options=('Burkina Faso', 'Nigeria', 'Syria', 'Yemen…

ATTENTION: the adminstratas 'Hadramaut' and 'Socotra' of the Yemen country have the same time-series of the rCSI indicator! I delete the time-series of the 'Socotra' adminstrata. 

In [20]:
df = df.drop("Socotra", axis = 1, level = 1)

In [21]:
# Now save the time-series of each country keeping as indeces the own first and last index.
def save(group, name):
    country = group.name
    group = group[country]
    # Adjust time-series group.
    first_idx = group.first_valid_index()
    last_idx = group.last_valid_index()
    group = group.loc[first_idx:last_idx]
    # Save.
    group.to_csv(path_to_save_data + country + "/" + name + ".csv", index_label = False)

In [22]:
_ = df.groupby(level = 0, axis = 1).apply(lambda x: save(x, name = "wfp_rcsi-v2-daily-original"))

## Adjust March month for the Yemen country

Replace the value for the month of March 2019 beacuse seems to have some anomalies and keep this dataset as reference for the next computations.

In [23]:
def correction_march_Yemen(group):
    country = group.name
    if country == "Yemen":
        mask = (group.index >= "2019-3-1") & (group.index <= "2019-3-31")
        group.loc[mask] = np.nan
        group = group.interpolate(method = "linear")
        
        return group
    else:
        return group

In [24]:
df = df.groupby(level = 0, axis = 1).apply(correction_march_Yemen)

In [25]:
plot(df, title = "rCSI march Yemen adhust (2° version - daily)", yaxis = "% of people with rCSI >= 19", 
     first_last_valid_index_group = True)

interactive(children=(ToggleButtons(description='Country', options=('Burkina Faso', 'Nigeria', 'Syria', 'Yemen…

## Adjusting the time-series (interpolation nan values)

In [26]:
# I get the time-series for each country without nan values in the middle and with equal start and end between adminstrata of the same country.
def interpolation(group):   
    group.columns = group.columns.droplevel()
    first_idx = group.first_valid_index()
    last_idx = group.last_valid_index()
    group = group.loc[first_idx:last_idx]
    group = group.interpolate(method = "linear", limit = 7)
    # Delete time-series that still have some NaN values.
    group.dropna(inplace = True, axis = 1)
    return group

df_interpolate = df.groupby(axis = 1, level = 0).apply(interpolation)
df_interpolate.head()

Country,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,...,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen
AdminStrata,Boucle-Du-Mouhoun,Cascades,Centre,Centre-Est,Centre-Nord,Centre-Ouest,Centre-Sud,Est,Hauts-Bassins,Nord,...,Hadramaut,Hajjah,Ibb,Lahj,Marib,Raymah,Sa'ada,Sana'a,Shabwah,Taizz
Indicator,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,...,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI
Datetime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
2018-08-22,,,,,,,,,,,...,38.303104,66.69857,60.797456,44.782684,60.490004,56.3011,46.435518,52.667849,34.3022,50.919038
2018-08-23,,,,,,,,,,,...,39.545071,61.331133,61.01165,44.038726,59.568557,57.324106,45.261947,50.214606,36.742499,50.415852
2018-08-24,,,,,,,,,,,...,41.867782,62.286205,62.032249,43.296925,58.277608,57.417614,44.520462,50.073157,35.857748,49.391298
2018-08-25,,,,,,,,,,,...,48.585591,59.525377,60.993741,42.149079,56.908462,55.324169,46.34594,48.791655,34.658434,50.302392
2018-08-26,,,,,,,,,,,...,48.662394,59.384136,58.80024,42.146629,57.722567,57.666522,43.294599,48.83964,35.91308,50.293046


In [27]:
plot(df_interpolate, title = "rCSI interpolation (2° version - daily)", yaxis = "% of people with rCSI >= 19", 
     first_last_valid_index_group = True)

interactive(children=(ToggleButtons(description='Country', options=('Burkina Faso', 'Nigeria', 'Syria', 'Yemen…

In [28]:
_ = df_interpolate.groupby(level = 0, axis = 1).apply(lambda x: save(x, name = "wfp_rcsi-v2-daily-interpolate"))

## Fit of the time-series (smooth data)

In [29]:
def fit(group):   
    group.columns = group.columns.droplevel()
    # Delete time-series that still have some NaN values.
    group.dropna(inplace = True, axis = 0)
    
    def smooth(serie):
        bspl = splrep(np.arange(0, len(serie)), serie.values, s = 500)
        bspl_y = splev(np.arange(0, len(serie)), bspl)
        return pd.Series(bspl_y, index = serie.index, name = serie.name)
    
    group_fit = group.apply(smooth)

    return group_fit

df_fit = df_interpolate.groupby(axis = 1, level = 0).apply(fit)
df_fit.dropna(axis = 0, how = "all", inplace = True)
df_fit.head()

Country,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,...,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen
AdminStrata,Boucle-Du-Mouhoun,Cascades,Centre,Centre-Est,Centre-Nord,Centre-Ouest,Centre-Sud,Est,Hauts-Bassins,Nord,...,Hadramaut,Hajjah,Ibb,Lahj,Marib,Raymah,Sa'ada,Sana'a,Shabwah,Taizz
Indicator,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,...,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI
Datetime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
2018-08-22,,,,,,,,,,,...,37.980913,66.113915,61.20538,44.934896,60.381167,57.055439,45.558955,52.552281,35.197134,50.723749
2018-08-23,,,,,,,,,,,...,40.092413,62.895618,61.046822,44.001177,59.249679,56.238853,45.887905,50.504234,35.278945,50.41171
2018-08-24,,,,,,,,,,,...,42.831949,60.91866,60.813031,43.280723,58.407722,56.338785,45.811153,49.432431,35.313619,50.17807
2018-08-25,,,,,,,,,,,...,45.929383,59.93234,60.522595,42.755471,57.815499,57.084495,45.423209,49.128108,35.485134,50.017246
2018-08-26,,,,,,,,,,,...,49.114576,59.685954,60.194107,42.407358,57.433216,58.205246,44.818584,49.382503,35.97747,49.923655


In [30]:
plot_comparison(df_interpolate, df_fit, title = "Fit comparison", yaxis = "% of people with rCSI >= 19", 
                first_last_valid_index_group = True)

interactive(children=(ToggleButtons(description='Country:', options=('Burkina Faso', 'Nigeria', 'Syria', 'Yeme…

In [31]:
_ = df_fit.groupby(level = 0, axis = 1).apply(lambda x: save(x, name = "wfp_rcsi-v2-daily-smooth"))

## Resampling datetime

I decide to resample data with a monthly frequency taking the end point of each month as reference of that month according to the sliding window used during the survey. 

In [32]:
# Resampling according to the survey date monthly.
mask = df.index.map(lambda x: x.is_month_end)
df_resample = df_interpolate[mask]
df_resample.head()

Country,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,Burkina Faso,...,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen
AdminStrata,Boucle-Du-Mouhoun,Cascades,Centre,Centre-Est,Centre-Nord,Centre-Ouest,Centre-Sud,Est,Hauts-Bassins,Nord,...,Hadramaut,Hajjah,Ibb,Lahj,Marib,Raymah,Sa'ada,Sana'a,Shabwah,Taizz
Indicator,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,...,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI
Datetime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
2018-08-31,,,,,,,,,,,...,59.116148,61.055159,59.364978,42.989421,57.486047,60.087065,42.064457,52.503916,42.519974,49.918146
2018-09-30,,,,,,,,,,,...,61.638783,72.325001,62.570381,47.185779,54.724298,48.38632,44.69682,63.282387,40.860363,56.390524
2018-10-31,,,,,,,,,,,...,62.808985,75.145629,65.079506,47.624562,52.545863,68.766048,58.253724,61.704043,36.893336,66.881011
2018-11-30,,,,,,,,,,,...,37.546534,50.661478,67.122719,30.660859,70.323817,73.489547,62.265032,63.611793,36.270584,48.757339
2018-12-31,,,,,,,,,,,...,43.100547,57.058951,60.75953,37.402488,54.020975,56.890895,53.718758,53.722729,36.010941,58.003071


In [33]:
plot(df_resample, title = "rCSI resampling (2° version - monthly)", yaxis = "% of people with rCSI >= 19", 
     style = "lines+markers", first_last_valid_index_group = True)

interactive(children=(ToggleButtons(description='Country', options=('Burkina Faso', 'Nigeria', 'Syria', 'Yemen…

In [34]:
_ = df_resample.groupby(level = 0, axis = 1).apply(lambda x: save(x, name = "wfp_rcsi-v2-monthly"))

# 1° version - month granularity - YEM

In [35]:
# Obtain the desired data from the right folder demo.
df_demo = pd.read_excel("../../../Demo/Data Sources/Reduced Coping Strategy Index (rCSI)/mVam_ReducedCopingStrategiesIndex.xlsx")
# Select only data of the Yemen country ignoring also the adminstrata that represents the full country.
df_demo = df_demo[(df_demo.Country == "Yemen") & (df_demo.AdminStrata != "Yemen")]
df_demo.reset_index(drop = True, inplace = True)
# Adjusting some names of adminstratas in order to be equivalent to the 2° version.
df_demo.AdminStrata.replace(to_replace = {"Ad Dali": "Al Dhale'e", "Sana'a City": "Amanat Al Asimah", "Sa'dah": "Sa'ada"}, 
                            inplace = True)
df_demo.head()

Unnamed: 0,Country,Year,Month,AdminStrata,Mean,Median,Coping Prevalence,% Reducing Meals,% Restricting Consumption of Adults,% Receiving help from family friends,% Limiting Portion Size,% Using Less Expensive Food
0,Yemen,2019.0,September,Al Bayda,19.64,18.0,87.36,67.21,64.51,56.47,73.5,76.73
1,Yemen,2019.0,September,Al Mahwit,21.46,21.0,88.58,74.7,76.82,65.92,81.76,73.39
2,Yemen,2019.0,September,Dhamar,23.12,24.0,91.29,69.92,70.21,64.88,79.9,80.1
3,Yemen,2019.0,September,Ibb,21.49,21.0,91.44,69.6,69.54,69.18,75.21,82.38
4,Yemen,2019.0,September,Amanat Al Asimah,24.22,24.0,91.38,77.23,69.6,67.93,81.83,76.9


In [36]:
# Delete rows with some nan values into it.
df_demo.dropna(inplace = True)
# Adjust the temporal information.
data = pd.to_datetime(df_demo["Year"].astype(int).astype(str) + df_demo["Month"], format = "%Y%B") 
df_demo.insert(1, "Datetime", data)
df_demo.drop(["Year", "Month"], axis = 1, inplace = True)
df_demo.sort_values("Datetime", ascending = True, inplace = True) 
df_demo = df_demo.groupby(["Country", "AdminStrata"]).apply(lambda group: group.set_index("Datetime").resample("M").mean()).reset_index()
df_demo.reset_index(drop = True, inplace = True)

In [37]:
# Selection of the metric for the rcsi indicator.
df_demo = df_demo[["Country", "AdminStrata", "Datetime", "Coping Prevalence"]]

In [38]:
# Create an appropriate multi-columns dataframe.
df_demo = df_demo.set_index(["Datetime", "Country", "AdminStrata"]).unstack(["Country", "AdminStrata"])
df_demo.columns = df_demo.columns.droplevel(0)
df_demo.columns = pd.MultiIndex.from_tuples(list(map(lambda x: tuple(list(x) + ["rCSI"]), df_demo.columns)))
df_demo.columns.rename("Country", level = 0, inplace = True)
df_demo.columns.rename("AdminStrata", level = 1, inplace = True)
df_demo.columns.rename("Indicator", level = 2, inplace = True)
freq = "M"
df_demo.index.freq = freq
# Making interpolation.
df_demo = df_demo.interpolate(method = "linear")
# Delete rows that still have some nan values.
df_demo.dropna(inplace = True, axis = 0)
df_demo.head()

Country,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen
AdminStrata,Abyan,Aden,Al Bayda,Al Dhale'e,Al Hudaydah,Al Jawf,Al Maharah,Al Mahwit,Amanat Al Asimah,Amran,...,Hadramaut,Hajjah,Ibb,Lahj,Marib,Raymah,Sa'ada,Sana'a,Shabwah,Taizz
Indicator,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,...,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI
Datetime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
2015-09-30,81.7,79.15,79.33,91.67,83.1,85.74,74.38,89.55,83.86,80.29,...,70.84,95.81,85.11,71.12,92.91,96.81,92.46,86.89,69.22,88.64
2015-10-31,81.99,69.58,86.36,87.86,91.68,93.54,75.0,87.56,82.34,81.04,...,64.17,98.56,85.11,74.81,93.56,90.54,90.851429,89.29,71.59,89.67
2015-11-30,82.28,73.95,83.01,89.3,82.87,86.94,80.0,91.52,79.27,84.95,...,75.99,98.21,85.35,73.69,86.93,91.41,89.242857,80.14,72.87,90.7
2015-12-31,88.54,69.99,85.7,84.31,81.43,87.39,64.78,86.18,80.77,82.13,...,78.61,88.17,83.13,84.28,74.33,93.48,87.634286,87.85,77.0,87.09
2016-01-31,88.36,60.77,79.08,87.06,77.47,84.35,65.03,87.06,76.88,79.78,...,83.98,93.46,82.33,78.94,74.88,90.61,86.025714,81.97,80.97,86.21


In [39]:
plot(df_demo, title = "rCSI interpolation (1° version - monthly)", yaxis = "Coping Prevalence", 
     style = "lines+markers")

interactive(children=(ToggleButtons(description='Country', options=('Yemen',), value='Yemen'), RadioButtons(de…

In [40]:
_ = df_demo.groupby(level = 0, axis = 1).apply(lambda x: save(x, name = "wfp_rcsi-v1-monthly-interpolate"))

## Interpolation of the time-series at daily level

In [41]:
df_demo_fit = df_demo.resample("D").interpolate(method = "polynomial", order = 2)
freq = "D"
df_demo_fit.index.freq = freq
df_demo_fit.head()

Country,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen
AdminStrata,Abyan,Aden,Al Bayda,Al Dhale'e,Al Hudaydah,Al Jawf,Al Maharah,Al Mahwit,Amanat Al Asimah,Amran,...,Hadramaut,Hajjah,Ibb,Lahj,Marib,Raymah,Sa'ada,Sana'a,Shabwah,Taizz
Indicator,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,...,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI
Datetime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
2015-09-30,81.7,79.15,79.33,91.67,83.1,85.74,74.38,89.55,83.86,80.29,...,70.84,95.81,85.11,71.12,92.91,96.81,92.46,86.89,69.22,88.64
2015-10-01,81.731383,78.556985,79.77608,91.42392,83.729734,86.287102,74.243776,89.340493,83.859928,80.231951,...,70.24292,95.916981,85.09647,71.374948,93.044492,96.481975,92.409261,87.248846,69.324399,88.656163
2015-10-02,81.761297,77.982924,80.20754,91.186052,84.335937,86.814504,74.117968,89.140674,83.856591,80.179384,...,69.671301,96.022744,85.083842,71.620835,93.171416,96.162334,92.358446,87.58893,69.426935,88.673463
2015-10-03,81.789742,77.427817,80.62438,90.956396,84.91861,87.322208,74.002574,88.950543,83.849991,80.132301,...,69.125143,96.127289,85.072116,71.857661,93.290772,95.851078,92.307554,87.910252,69.527608,88.691901
2015-10-04,81.816719,76.891663,81.026599,90.734952,85.477752,87.810212,73.897595,88.770099,83.840127,80.0907,...,68.604447,96.230616,85.061293,72.085426,93.402559,95.548206,92.256585,88.212813,69.626418,88.711477


In [42]:
plot(df_demo_fit, title = "rCSI fit (1° version - daily)", yaxis = "Coping Prevalence", 
     style = "lines")

interactive(children=(ToggleButtons(description='Country', options=('Yemen',), value='Yemen'), RadioButtons(de…

In [43]:
_ = df_demo_fit.groupby(level = 0, axis = 1).apply(lambda x: save(x, name = "wfp_rcsi-v1-daily-fit"))

# 2° version + 1° version - month and day granularity - YEM

In [44]:
# I decide to concatenate the two monthly dataframes giving way to the second version of the rcsi (for the overlapping rows).
mask = (df_demo.index < df_resample[["Yemen"]].index[0])
df_union_month = pd.concat([df_demo.loc[mask], df_resample[["Yemen"]]])
df_union_month.head()

Country,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen,Yemen
AdminStrata,Abyan,Aden,Al Bayda,Al Dhale'e,Al Hudaydah,Al Jawf,Al Maharah,Al Mahwit,Amanat Al Asimah,Amran,...,Hadramaut,Hajjah,Ibb,Lahj,Marib,Raymah,Sa'ada,Sana'a,Shabwah,Taizz
Indicator,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,...,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI,rCSI
Datetime,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
2015-09-30,81.7,79.15,79.33,91.67,83.1,85.74,74.38,89.55,83.86,80.29,...,70.84,95.81,85.11,71.12,92.91,96.81,92.46,86.89,69.22,88.64
2015-10-31,81.99,69.58,86.36,87.86,91.68,93.54,75.0,87.56,82.34,81.04,...,64.17,98.56,85.11,74.81,93.56,90.54,90.851429,89.29,71.59,89.67
2015-11-30,82.28,73.95,83.01,89.3,82.87,86.94,80.0,91.52,79.27,84.95,...,75.99,98.21,85.35,73.69,86.93,91.41,89.242857,80.14,72.87,90.7
2015-12-31,88.54,69.99,85.7,84.31,81.43,87.39,64.78,86.18,80.77,82.13,...,78.61,88.17,83.13,84.28,74.33,93.48,87.634286,87.85,77.0,87.09
2016-01-31,88.36,60.77,79.08,87.06,77.47,84.35,65.03,87.06,76.88,79.78,...,83.98,93.46,82.33,78.94,74.88,90.61,86.025714,81.97,80.97,86.21


In [45]:
plot(df_union_month, title = "rCSI (1° version + 2° version - monthly)", yaxis = "?", 
     style = "lines+markers")

interactive(children=(ToggleButtons(description='Country', options=('Yemen',), value='Yemen'), RadioButtons(de…

This last union not works well. Check the motivation. Maybe regarding the choice of the metric to use in the demo (1°) version.