## Example: Pre-Compute Weight Matrices

This Notebook shows how to create the weight matrices for the councillor and affair graph using approaches I and II.


Of course, one can also write an entire end-to-end pipeline instead doing this separate pre-computation step.


Just as for the Gower weights, in this example we compute Weight matrices for 49th period.

Again (as a disclaimer), when one is interested in doing CV across periods or feed-forward CV, the computation of weight matrices should be adapted accordingly to avoid leakage. 

### Import Modules

In [1]:
import sys
import os
import pandas as pd

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..', 'src')))
from data_loading import load_data
from create_weight_matrices import cospon_weight_matrix, weighted_gower_similarity, tfidf_weight_matrix,contextual_embedding_weight_matrix

### Create Weight Matrices for 49th Period

In [None]:
# load main data
period = [49]
votes, affairs, councillors = load_data(period)

# load corresponding gower weights (for W(x) with Approach II)
gower_weights = pd.read_csv(f'../data/clean/gower_weights_{period[0]}.csv', index_col=0).squeeze()

# Define features
feature_cols = [
    'degree_class', 
    'profession_class', 
    'gender',
    'average_age', 
    'lang_region', 
    'military_rank_ordinal', 
    'faction_ordinal'
]

# mappings
ordered_c_ids = sorted(set(councillors['elanId']))
ordered_a_ids = sorted(set(affairs['id']))

c_id2idx = {id_: i for i, id_ in enumerate(ordered_c_ids)}
a_id2idx = {id_: i for i, id_ in enumerate(ordered_a_ids)}

# compute weight matrices

# Approach I
W_x1 = cospon_weight_matrix(c_id2idx, affairs, councillors)
W_x2 = weighted_gower_similarity(councillors, feature_cols, gower_weights)

# Approach II
W_y1 = tfidf_weight_matrix(affairs, ordered_a_ids)
W_y2 = contextual_embedding_weight_matrix(affairs, ordered_a_ids)

# save output
output_dir = '../data/weight_matrices'
pd.DataFrame(W_x1).to_csv(os.path.join(output_dir, f'W_x1_{period[0]}.csv'), index=False)
pd.DataFrame(W_x2).to_csv(os.path.join(output_dir, f'W_x2_{period[0]}.csv'), index=False)
pd.DataFrame(W_y1).to_csv(os.path.join(output_dir, f'W_y1_{period[0]}.csv'), index=False)
pd.DataFrame(W_y2).to_csv(os.path.join(output_dir, f'W_y2_{period[0]}.csv'), index=False)