# Between model
This model takes as input any variable that is static, that is the OSM variables, ESA Landcover variables and the WSF variables. Moreover, it takes the mean over all dynamic variables. The dynamic variables include Nightlights, NDVI, and NDWI_Gao as well as NDWI_McF. 

The idea is that the between model captures variation between clusters and thus the target variable for the between model is $\bar{w}_c = \frac{1}{T_c}\sum_t^{T_c} w_{c,t}$ 

# Within model
This goal of this model is to predict the deviations from the cluster mean for each year. I.e. the model should capture variation within each cluster. To do so, the target variable is $\tilde{w}_{ct} = w_{ct} - \bar{w}_{c}$. 

For cluster $c$ in time period $t$, the feature vector is defined as $\tilde{\boldsymbol{x}}_{ct} = \boldsymbol{x}_{ct} - \bar{\boldsymbol{x}}_{c}, where~\bar{\boldsymbol{x}}_{c} \in \mathbb{R}^{k\times1}$. 

To predict $\tilde{w}_{ct}$, I rely on $\tilde{\boldsymbol{x}}_{ct}$. This allows me to interpret the performance metric as the within R2, i.e. the share of the variance the model captures within clusters. 


(this does not help at all, thus disregard)...
To augment the number of training observations, I train the model on deltas, rather than on the demeaned variables. This substantially increases the number of training observations and covers a wider range of differences, making the training dataset more versatile and robust. Ideally, this helps to learn from a wider range of differences and thus increases the out-of-sample when predicting $\tilde{\boldsymbol{w}}_{ct}$.

In [21]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import pickle
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

In [22]:
# set the font size for matplotlib and the font family.
font = {'family' : 'sans-serif',
        'weight' : 'normal',
        'size'   : 14}

matplotlib.rc('font', **font)

In [23]:
# load the necessary functions from the analysis package

# load the variable names, this allows to access the variables in the feature data in a compact way
from analysis_utils.variable_names import *

# load flagged ids 
from analysis_utils.flagged_uids import *

# load the functions to do spatial k-fold CV
from analysis_utils.spatial_CV import *

# load the helper functions
from analysis_utils.analysis_helpers import *

# load the random forest trainer and cross_validator
import analysis_utils.RandomForest as rf

# load the combien model
from analysis_utils.CombinedModel import CombinedModel

In [24]:
# set the global file paths
root_data_dir = "../../Data"

# the lsms data
lsms_pth = f"{root_data_dir}/lsms/processed/labels_cluster_v1.csv"

# the feature data
feat_data_pth = f"{root_data_dir}/feature_data/tabular_data.csv"

# set the random seed
random_seed = 423
spatial_cv_random_seed = 348

# set the number of folds for k-fold CV
n_folds = 5

In [25]:
# load the feature and the label data
lsms_df = pd.read_csv(lsms_pth)
# remove flagged ids form dataset
lsms_df = lsms_df[~lsms_df.unique_id.isin(flagged_uids)].reset_index(drop = True)

# remove ethiopia from the sample
lsms_df = lsms_df[lsms_df.country != 'eth'].reset_index(drop = True)

lsms_df['avg_log_mean_pc_cons_usd_2017'] = lsms_df.groupby('cluster_id')['log_mean_pc_cons_usd_2017'].transform('mean')
lsms_df['avg_mean_asset_index_yeh'] = lsms_df.groupby('cluster_id')['mean_asset_index_yeh'].transform('mean')
feat_df = pd.read_csv(feat_data_pth)

# describe the training data broadly
print(f"Number of observations {len(lsms_df)}")
print(f"Number of clusters {len(np.unique(lsms_df.cluster_id))}")
print(f"Number of x vars {len(feat_df.columns)-2}")

Number of observations 5214
Number of clusters 1697
Number of x vars 113


In [26]:
# merge the label and the feature data to one dataset
lsms_vars = ['unique_id', 'n_households',           
             'log_mean_pc_cons_usd_2017', 'avg_log_mean_pc_cons_usd_2017',
             'mean_asset_index_yeh', 'avg_mean_asset_index_yeh']
df = pd.merge(lsms_df[lsms_vars], feat_df, on = 'unique_id', how = 'left')

# Run Training

In [27]:
# define the within and between x variables
avg_rs_vars = avg_ndvi_vars + avg_ndwi_gao_vars + avg_nl_vars
osm_vars = osm_dist_vars + osm_count_vars + osm_road_vars

between_x_vars = osm_vars + esa_lc_vars + wsf_vars + avg_rs_vars + avg_preciptiation

dyn_rs_vars = dyn_ndvi_vars + dyn_ndwi_gao_vars + dyn_nl_vars
within_x_vars = dyn_rs_vars + precipitation

### Target: Log per capita consumption

In [28]:
between_target_var = 'avg_log_mean_pc_cons_usd_2017'
cl_df = df[['cluster_id', between_target_var] + between_x_vars].drop_duplicates().reset_index(drop = True)

# normalise the feature data
cl_df_norm = standardise_df(cl_df, exclude_cols = [between_target_var])

In [29]:
# get the within dataframe
# define the within variables
within_target_var = 'log_mean_pc_cons_usd_2017'
within_df = df[['cluster_id','unique_id', within_target_var] + within_x_vars]

# demean the data and standardise the variables
demeaned_df = demean_df(within_df)
demeaned_df_norm = standardise_df(demeaned_df, exclude_cols = [within_target_var])

In [30]:
# run repeated cross validation
rep_cv_res_cons = {
    'between_r2': [],
    'within_r2': [],
    'overall_r2': []
}

for j in range(10):
    print("="*100)
    print(f"Iteration {j}")
    print("="*100)
    rep_seed = random_seed + j
    
    # divide the data into k different folds
    fold_ids = split_lsms_spatial(lsms_df, n_folds = n_folds, random_seed = spatial_cv_random_seed + j)
    
    # run the bewtween training
    print('Between training')
    between_cv_trainer_cons = rf.CrossValidator(cl_df_norm, 
                                                fold_ids, 
                                                between_target_var, 
                                                between_x_vars, 
                                                id_var = 'cluster_id', 
                                                random_seed = rep_seed)
    between_cv_trainer_cons.run_cv_training(min_samples_leaf = 1)
    
    # run the within training
    print("\nWithin training")
    within_cv_trainer_cons = rf.CrossValidator(demeaned_df_norm, 
                                               fold_ids, 
                                               within_target_var, 
                                               within_x_vars, 
                                               id_var = 'unique_id', 
                                               random_seed = rep_seed)
    within_cv_trainer_cons.run_cv_training(min_samples_leaf = 15)
    
    # combine both models
    combined_model_cons = CombinedModel(lsms_df, between_cv_trainer_cons, within_cv_trainer_cons)
    combined_model_cons.evaluate()
    combined_results = combined_model_cons.compute_overall_performance(use_fold_weights = True)
    
    # store the results 
    rep_cv_res_cons['between_r2'].append(combined_results['r2']['between'])
    rep_cv_res_cons['within_r2'].append(combined_results['r2']['within'])
    rep_cv_res_cons['overall_r2'].append(combined_results['r2']['overall'])
    
    # print the results
    print("."*100)
    print(combined_results)
    print("."*100)

Iteration 0
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.21
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.18
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 163 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 176 seconds
....................................................................................................
{'r2': {'overall': 0.36151827969612477, 'between': 0.4398150991729906, 'within': 0.007934999864063074}, 'mse': {'overall': 0.19480732381111998, 'between': 0.13673283813680862, 'within': 0.05198376582885943}}
....................................................................................................
Iteration 1
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.22
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.21
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.17
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 161 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 168 seconds
....................................................................................................
{'r2': {'overall': 0.3866317094725334, 'between': 0.450740510040778, 'within': 0.01613106921336538}, 'mse': {'overall': 0.19217749151285865, 'between': 0.1361103394022302, 'within': 0.05094585958152407}}
....................................................................................................
Iteration 2
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.21
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.18
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 160 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 183 seconds
....................................................................................................
{'r2': {'overall': 0.3395030207654687, 'between': 0.40457129653934265, 'within': 0.012276536583358144}, 'mse': {'overall': 0.19485633176266665, 'between': 0.1395488553922638, 'within': 0.05166292013832535}}
....................................................................................................
Iteration 3
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.21
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.21
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.18
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 152 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 185 seconds
....................................................................................................
{'r2': {'overall': 0.40651756157938956, 'between': 0.468452698986991, 'within': 0.01639036200656241}, 'mse': {'overall': 0.1904219203529563, 'between': 0.1344568767470028, 'within': 0.05150239244640409}}
....................................................................................................
Iteration 4
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.24
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.15
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 152 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 167 seconds
....................................................................................................
{'r2': {'overall': 0.3747394571576124, 'between': 0.44584964101927377, 'within': 0.005820348018511922}, 'mse': {'overall': 0.1907022880766038, 'between': 0.13384885681746536, 'within': 0.05207692938522908}}
....................................................................................................
Iteration 5
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.28
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.11
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 155 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 172 seconds
....................................................................................................
{'r2': {'overall': 0.37897679398530043, 'between': 0.4586526750460834, 'within': 0.00483261875012706}, 'mse': {'overall': 0.1899470966342628, 'between': 0.13335379011876133, 'within': 0.05208156691411051}}
....................................................................................................
Iteration 6
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.22
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.17
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 153 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 169 seconds
....................................................................................................
{'r2': {'overall': 0.38496596370551245, 'between': 0.44955651668627683, 'within': 0.007363604665373513}, 'mse': {'overall': 0.19335103930958375, 'between': 0.13747807754717473, 'within': 0.05174258753591787}}
....................................................................................................
Iteration 7
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.27
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.13
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 154 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 173 seconds
....................................................................................................
{'r2': {'overall': 0.3837836510834522, 'between': 0.46490856930748664, 'within': 0.004867340146313026}, 'mse': {'overall': 0.1915661546646743, 'between': 0.13381759845087263, 'within': 0.052375305074746845}}
....................................................................................................
Iteration 8
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.24
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.15
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 153 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 167 seconds
....................................................................................................
{'r2': {'overall': 0.351246339015771, 'between': 0.42433255777196044, 'within': -0.0002933726203318443}, 'mse': {'overall': 0.19403825632673752, 'between': 0.13810484681996152, 'within': 0.05282304244188975}}
....................................................................................................
Iteration 9
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.21
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.19
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 159 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 184 seconds
....................................................................................................
{'r2': {'overall': 0.34763482909083065, 'between': 0.42543976401601036, 'within': 0.01113745410327729}, 'mse': {'overall': 0.19412514852072507, 'between': 0.13741493225285015, 'within': 0.0519222346572748}}
....................................................................................................


In [31]:
pth = 'results/baseline/rep_cv_res_cons_no_eth.pkl'
with open(pth, 'wb') as f:
    pickle.dump(rep_cv_res_cons, f)

In [32]:
mean_r2_cons = {k: np.mean(v) for k,v in rep_cv_res_cons.items()}
se_r2_cons = {k: np.std(v)/np.sqrt(10) for k,v in rep_cv_res_cons.items()}

# print as tex
print(f"& {mean_r2_cons['between_r2']:.4f} & {mean_r2_cons['within_r2']:.4f} & {mean_r2_cons['overall_r2']:.4f} \\\\")
print(f"& \\footnotesize({se_r2_cons['between_r2']:.4f}) & \\footnotesize({se_r2_cons['within_r2']:.4f}) & \\footnotesize({se_r2_cons['overall_r2']:.4f})\\\\")

& 0.4432 & 0.0086 & 0.3716 \\
& \footnotesize(0.0060) & \footnotesize(0.0016) & \footnotesize(0.0063)\\


### Target: Asset index

In [33]:
# get a dataset that only varies at the cluster level
between_target_var = 'avg_mean_asset_index_yeh'
cl_df = df[['cluster_id', between_target_var] + between_x_vars].drop_duplicates().reset_index(drop = True)

# normalise the feature data
cl_df_norm = standardise_df(cl_df, exclude_cols = [between_target_var])

In [34]:
# define the within variables
within_target_var = 'mean_asset_index_yeh'
within_df = df[['cluster_id','unique_id', within_target_var] + within_x_vars]

# demean the data and standardise the variables
demeaned_df = demean_df(within_df)
demeaned_df_norm = standardise_df(demeaned_df, exclude_cols = [within_target_var])

In [35]:
# run repeated cross validation
rep_cv_res_asset = {
    'between_r2': [],
    'within_r2': [],
    'overall_r2': []
}

for j in range(10):
    print("="*100)
    print(f"Iteration {j}")
    print("="*100)
    rep_seed = random_seed + j
    
    # divide the data into k different folds
    fold_ids = split_lsms_spatial(lsms_df, n_folds = n_folds, random_seed = spatial_cv_random_seed + j)
    
    # run the bewtween training
    print('Between training')
    between_cv_trainer_asset = rf.CrossValidator(cl_df_norm, 
                                                fold_ids, 
                                                between_target_var, 
                                                between_x_vars, 
                                                id_var = 'cluster_id', 
                                                random_seed = rep_seed)
    between_cv_trainer_asset.run_cv_training(min_samples_leaf = 1)
    
    # run the within training
    print("\nWithin training")
    within_cv_trainer_asset = rf.CrossValidator(demeaned_df_norm, 
                                               fold_ids, 
                                               within_target_var, 
                                               within_x_vars, 
                                               id_var = 'unique_id', 
                                               random_seed = rep_seed)
    within_cv_trainer_asset.run_cv_training(min_samples_leaf = 15)
    
    # combine both models
    combined_model_asset = CombinedModel(lsms_df, between_cv_trainer_asset, within_cv_trainer_asset)
    combined_model_asset.evaluate()
    combined_results = combined_model_asset.compute_overall_performance(use_fold_weights = True)
    
    # store the results 
    rep_cv_res_asset['between_r2'].append(combined_results['r2']['between'])
    rep_cv_res_asset['within_r2'].append(combined_results['r2']['within'])
    rep_cv_res_asset['overall_r2'].append(combined_results['r2']['overall'])
    
    # print the results
    print("."*100)
    print(combined_results)
    print("."*100)

Iteration 0
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.21
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.18
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 165 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 178 seconds
....................................................................................................
{'r2': {'overall': 0.42163113907399485, 'between': 0.4372769627998298, 'within': 0.014935226039086502}, 'mse': {'overall': 1.1300655033858413, 'between': 1.1758038060692082, 'within': 0.07897135862843373}}
....................................................................................................
Iteration 1
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.22
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.21
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.17
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 166 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 187 seconds
....................................................................................................
{'r2': {'overall': 0.45052250774172575, 'between': 0.44129937993512625, 'within': 0.01556166901934115}, 'mse': {'overall': 1.1334751141476562, 'between': 1.194226580144326, 'within': 0.0759218959943763}}
....................................................................................................
Iteration 2
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.21
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.18
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 168 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 179 seconds
....................................................................................................
{'r2': {'overall': 0.4263777019983448, 'between': 0.4333465656155204, 'within': 0.015096121879853519}, 'mse': {'overall': 1.1211576344121756, 'between': 1.1756799250773384, 'within': 0.07772333977098587}}
....................................................................................................
Iteration 3
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.21
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.21
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.18
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 148 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 189 seconds
....................................................................................................
{'r2': {'overall': 0.4333436910696416, 'between': 0.4395164815708625, 'within': 0.018693818028638676}, 'mse': {'overall': 1.1464344429838509, 'between': 1.1965974279117688, 'within': 0.07895440211720198}}
....................................................................................................
Iteration 4
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.24
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.15
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 153 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 179 seconds
....................................................................................................
{'r2': {'overall': 0.3886245288126949, 'between': 0.39402806569901, 'within': 0.0174017735855585}, 'mse': {'overall': 1.150634543736813, 'between': 1.2021157900572785, 'within': 0.07687608645508008}}
....................................................................................................
Iteration 5
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.28
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.11
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 153 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 189 seconds
....................................................................................................
{'r2': {'overall': 0.4052163721664537, 'between': 0.39576019094587384, 'within': 0.017071678245108024}, 'mse': {'overall': 1.1364139951219143, 'between': 1.2043609205191388, 'within': 0.0748601716966445}}
....................................................................................................
Iteration 6
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.22
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.17
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 154 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 185 seconds
....................................................................................................
{'r2': {'overall': 0.44523596739434945, 'between': 0.4379681577960241, 'within': 0.016780762176740027}, 'mse': {'overall': 1.1441985444883265, 'between': 1.20179588133688, 'within': 0.0763898724873263}}
....................................................................................................
Iteration 7
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.27
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.13
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 151 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 197 seconds
....................................................................................................
{'r2': {'overall': 0.4202367243752705, 'between': 0.4189305489004145, 'within': 0.022753807789239588}, 'mse': {'overall': 1.135345381368025, 'between': 1.1969577952009147, 'within': 0.0755345471230208}}
....................................................................................................
Iteration 8
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.24
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.15
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 152 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 177 seconds
....................................................................................................
{'r2': {'overall': 0.3880589889779468, 'between': 0.391616383624699, 'within': 0.015379038125294332}, 'mse': {'overall': 1.130771701993096, 'between': 1.192433702517365, 'within': 0.07685750301775318}}
....................................................................................................
Iteration 9
Fold 0, specified test ratio: 0.2 - Actual test ratio 0.21
Fold 1, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 2, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 3, specified test ratio: 0.2 - Actual test ratio 0.20
Fold 4, specified test ratio: 0.2 - Actual test ratio 0.19
Between training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 149 seconds

Within training
Initialising training


  0%|          | 0/5 [00:00<?, ?it/s]

Finished training after 177 seconds
....................................................................................................
{'r2': {'overall': 0.4074314514825143, 'between': 0.4168541173008199, 'within': 0.014565199352215074}, 'mse': {'overall': 1.1233417290932357, 'between': 1.1765293277239035, 'within': 0.07864547255102924}}
....................................................................................................


In [36]:
pth = 'results/baseline/rep_cv_res_asset_no_eth.pkl'
with open(pth, 'wb') as f:
    pickle.dump(rep_cv_res_asset, f)

In [37]:
mean_r2_asset = {k: np.mean(v) for k,v in rep_cv_res_asset.items()}
se_r2_asset = {k: np.std(v)/np.sqrt(10) for k,v in rep_cv_res_asset.items()}

# print as tex
print(f"& {mean_r2_asset['between_r2']:.4f} & {mean_r2_asset['within_r2']:.4f} & {mean_r2_asset['overall_r2']:.4f} \\\\")
print(f"& \\footnotesize({se_r2_asset['between_r2']:.4f}) & \\footnotesize({se_r2_asset['within_r2']:.4f}) & \\footnotesize({se_r2_asset['overall_r2']:.4f})\\\\")

& 0.4207 & 0.0168 & 0.4187 \\
& \footnotesize(0.0061) & \footnotesize(0.0007) & \footnotesize(0.0064)\\
