### This notebook is used to derive participant dominance index and combine it with the subjective scores. 

Participant dominance is defined as the fraction of words sent by the data donor (participant) and the total word count sent by them and their contacts. The subjective scores of this concept are assessed by asking the participant the following question "On average, I send a higher number of words per month than I receive", evaluated on a 7-point-Likert scale (Disagree strongly = 1 ... 'Agree strongly = 7). There are two assessments: one before and one after seeing the visual feedback, referenced in the paper and code by "pre" and "post".

In [1]:
import os
import sys
import warnings
import numpy as np
import pandas as pd
from pathlib import Path

warnings.filterwarnings("ignore")
sys.path.insert(1, os.path.abspath('../'))
sys.path.insert(1, os.path.abspath('../../..'))

raw_data_path = Path("../../data/raw")
processed_data_path = Path("../../data/processed")

### Load messaging data 

In [2]:
# Load the donation info from the data dable
donation_table = pd.read_csv(Path(f'{raw_data_path}/donation_table_CHB_filtered.csv'))

# Load donated messages from the relevant donations (e.g. those who filled in both surveys)
messages_table = pd.read_csv(Path(f'{raw_data_path}/messages_table_CHB_filtered.csv'))
messages_table['datetime'] = pd.to_datetime(messages_table['datetime']) # ensure the date is in datetime format

### Calculate the PDI for the entire donation period and for each month

Calculations for each month are important for robustness analysis. In the analysis, we check for recency effect, i.e. whether last month's PDI has a higher correlation with the subjective score. 

In [5]:
from modules.metrics import calculate_PDI 
from modules.utils import get_relevant_messages, align_monthly_data, validate_values, get_last_non_nan_value

donationIDs = list(donation_table['donation_id'])
donor_info = {}
for donationID in donationIDs:
    external_id = donation_table[donation_table['donation_id']==donationID]['external_id'].iloc[0]
    donor_info[external_id] = {}
    
    # Get the donor_id for the donation and separate the donor messages
    egoID = donation_table[donation_table['donation_id']==donationID]['donor_id'].iloc[0]
    donation_messages, ego_messages = get_relevant_messages(messages_table, donationID, egoID)
    
    # Calculate the overall PDI for the given donor
    donor_info[external_id]['Overall PDI'] = calculate_PDI(ego_messages['word_count'].sum(),donation_messages['word_count'].sum())

    # Separate the messages per month
    all_messages_monthly, ego_messages_monthly = align_monthly_data(donation_messages, ego_messages)

    PDI_monthly = []    
    for (ind1, ego), (ind2, total) in zip(ego_messages_monthly.iterrows(), all_messages_monthly.iterrows()):  
        PDI_monthly.append(calculate_PDI(ego['word_count'],total['word_count']))
        
    donor_info[external_id]['Median PDI'] = np.nanmedian(PDI_monthly)
    donor_info[external_id]['Last Month PDI'] = get_last_non_nan_value(PDI_monthly)

### Combine the objective PDI score with the subjective scores 

In [4]:
from modules.utils import map_7point_likert
objective_table = pd.DataFrame.from_dict(donor_info, orient='index').reset_index().rename(columns={'index': 'external_id'})
question_column = 'sending_more_words' 

# Load and transform the question columns relevant for this aspect
pre_survey = map_7point_likert(Path(f'{raw_data_path}/pre-survey_CHB.xlsx'), question_column) # makes sure Likert scale is in numerical form
post_survey = map_7point_likert(Path(f'{raw_data_path}/post-survey_CHB.xlsx'), question_column) # makes sure Likert scale is in numerical form
combined_survey = pd.merge(pre_survey, post_survey, on='external_id', how='inner', suffixes=('_pre', '_post'))

# Pair objective data with the subjective assessments based on external_id
all_data = pd.merge(combined_survey,objective_table,on='external_id', how ='inner')
all_data[f'{question_column}_diff'] = all_data[f'{question_column}_post'] - all_data[f'{question_column}_pre']
all_data.to_excel(Path(f'{processed_data_path}/messaging_dominance_data.xlsx'),index=False)