# Implementing Extended Moral Foundations Dictionary (eMFD) for rationality

Hopp, F. R., Fisher, J. T., Cornell, D., Huskey, R., & Weber, R. (2020). The extended Moral Foundations Dictionary (eMFD): Development and applications of a crowd-sourced approach to extracting moral intuitions from text. Behavior Research Methods, https://doi.org/10.3758/s13428-020-01433-0

## Please download Spacy, its English model and eMFDscore prior to the implementation
1. Install spacy (ver 3.4.0): *pip install Spacy* 
2. Install English model: *python -m spacy download en_core_web_sm* 
3. Install eMFDscore: *pip install https://github.com/medianeuroscience/emfdscore/archive/master.zip*

### Import dataset
We are going to use both full dataset and train set.

In [None]:
#Dependencies
import pandas as pd 
import numpy as np
from scipy import stats
import spacy

In [None]:
#import dataset
full_data = pd.read_csv(r"~\Desktop\Research\Social Media Analysis\publicsphere\data\sample\Data_ReadyForAnalysis_WithComments&MetaInfo.csv")
full_data = full_data[['commentText','ID']] #retain only comment text and comment ID
full_data = full_data.rename(columns={'commentText':0}) #need to be set to 0, otherwise will lead to keyerror 

### Implementation
As recommended by the authors, to discriminate which foundations are more or less represented in a text, we are going to use method 2:
1. Assign Single Probability per Word and Return Sentiment Scores

In [None]:
#Assign Single Probability per Word and Return Sentiment Scores
from emfdscore.scoring import score_docs 

num_docs = len(full_data) #select only comment text
 
DICT_TYPE = 'emfd' #use emfd as the dictionary
PROB_MAP = 'single' #assign a single probability to each word in the eMFD according to the foundation with the highest probability
SCORE_METHOD = 'bow' #bow approach (contextless)
OUT_METRICS = 'sentiment' #average sentiment for each foundation

df = score_docs(full_data,DICT_TYPE,PROB_MAP,SCORE_METHOD,OUT_METRICS,num_docs)

df

Processed: 0   0% |                      | Elapsed Time: 0:00:00 ETA:  --:--:--
Processed: 20   0% |                     | Elapsed Time: 0:00:00 ETA:   0:00:19
Processed: 48   1% |                     | Elapsed Time: 0:00:00 ETA:   0:00:16
Processed: 80   2% |                     | Elapsed Time: 0:00:00 ETA:   0:00:14
Processed: 98   2% |                     | Elapsed Time: 0:00:00 ETA:   0:00:13
Processed: 123   3% |                    | Elapsed Time: 0:00:00 ETA:   0:00:14
Processed: 147   3% |                    | Elapsed Time: 0:00:00 ETA:   0:00:13
Processed: 172   4% |                    | Elapsed Time: 0:00:00 ETA:   0:00:13
Processed: 196   5% |❤                   | Elapsed Time: 0:00:00 ETA:   0:00:13
Processed: 223   5% |❤                   | Elapsed Time: 0:00:00 ETA:   0:00:13
Processed: 245   6% |❤                   | Elapsed Time: 0:00:00 ETA:   0:00:13
Processed: 269   6% |❤                   | Elapsed Time: 0:00:01 ETA:   0:00:13
Processed: 293   7% |❤                  

Unnamed: 0,care_p,fairness_p,loyalty_p,authority_p,sanctity_p,care_sent,fairness_sent,loyalty_sent,authority_sent,sanctity_sent,moral_nonmoral_ratio,f_var,sent_var
0,0.185185,0.000000,0.000000,0.000000,0.000000,-0.291760,0.000000,0.000000,0.000000,0.000000,1.000000,0.006859,0.017025
1,0.038095,0.098627,0.036405,0.000000,0.000000,-0.106818,0.018554,0.062361,0.000000,0.000000,1.400000,0.001627,0.003877
2,0.071429,0.053635,0.000000,0.037500,0.000000,0.028342,-0.068767,0.000000,0.022872,0.000000,1.000000,0.001025,0.001498
3,0.058345,0.032072,0.016008,0.030516,0.012626,-0.105237,0.026819,-0.009292,-0.057323,0.015111,1.500000,0.000326,0.003005
4,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3857,0.078012,0.052659,0.026260,0.000000,0.000000,-0.022988,-0.000511,0.043429,0.000000,0.000000,1.000000,0.001156,0.000584
3858,0.000000,0.000000,0.000000,0.081633,0.000000,0.000000,0.000000,0.000000,-0.170325,0.000000,0.333333,0.001333,0.005802
3859,0.039031,0.019687,0.008003,0.040979,0.008475,-0.082759,-0.017446,0.028764,-0.040175,0.001157,1.750000,0.000257,0.001789
3860,0.037715,0.048842,0.034504,0.020918,0.012500,0.006129,0.001440,0.043606,-0.024904,-0.019173,1.111111,0.000205,0.000730


### Labelling conservative and liberals by likelihood

Conservatives and liberals are assumed to use distinct foundations while expressing opinions:
1. Liberals: Care and fairness
2. Conservatives: Loyalty, authority and sanctity

In [None]:
#Retain only useful columns
clean_df = df[['care_p','fairness_p','loyalty_p','authority_p','sanctity_p']]

#create max_likelihood_dict for storing likelihood and ideology
column_names = ['Liberal_foundation_likelihood', 'Conservative_foundation_likelihood', 'Likelihood_distance', 'Ideology']
max_likelihood_dict = pd.DataFrame(columns=column_names)


#Get max likelihood from both liberal and conservative foundations
max_likelihood_dict['Liberal_foundation_likelihood'] = clean_df[['care_p','fairness_p']].max(axis = 1) 
max_likelihood_dict['Conservative_foundation_likelihood'] = clean_df[['loyalty_p','authority_p','sanctity_p']].max(axis = 1)

#Get the likelihood distance between liberal and conservative
max_likelihood_dict['Likelihood_distance'] = max_likelihood_dict['Liberal_foundation_likelihood'] - max_likelihood_dict['Conservative_foundation_likelihood']

max_likelihood_dict['Ideology'] = max_likelihood_dict.apply(lambda x: 'Liberal' if x['Likelihood_distance'] >= 0 else 'Conservative', axis = 1)

max_likelihood_dict.head(5)

Unnamed: 0,Liberal_foundation_likelihood,Conservative_foundation_likelihood,Likelihood_distance,Ideology
0,0.185185,0.0,0.185185,Liberal
1,0.098627,0.036405,0.062222,Liberal
2,0.071429,0.0375,0.033929,Liberal
3,0.058345,0.030516,0.027829,Liberal
4,0.0,0.0,0.0,Liberal


In [None]:
max_likelihood_dict

Unnamed: 0,Liberal_foundation_likelihood,Conservative_foundation_likelihood,Likelihood_distance,Ideology
0,0.185185,0.000000,0.185185,Liberal
1,0.098627,0.036405,0.062222,Liberal
2,0.071429,0.037500,0.033929,Liberal
3,0.058345,0.030516,0.027829,Liberal
4,0.000000,0.000000,0.000000,Liberal
...,...,...,...,...
3857,0.078012,0.026260,0.051752,Liberal
3858,0.000000,0.081633,-0.081633,Conservative
3859,0.039031,0.040979,-0.001947,Conservative
3860,0.048842,0.034504,0.014338,Liberal


In [None]:
#Merge with original document
max_likelihood_dict = max_likelihood_dict.merge(full_data, left_index=True, right_index=True)

#rename columns
max_likelihood_dict = max_likelihood_dict.rename(columns = {0:'commentText'})

print("Full dataset:\n----------------------------------------------")
display(max_likelihood_dict)

#Create df with rows contained in train set
train_set = pd.read_csv(r"C:\Users\shrim\Desktop\Research\Social Media Analysis\publicsphere\data\sample\test_train_split\train.csv")
train_set = train_set[['ID']] #retain only comment ID

print("Train set subset:\n----------------------------------------------")
train_set_merge = train_set.merge(how = 'left',right = max_likelihood_dict, on='ID')
display(train_set_merge)

#export csv
max_likelihood_dict.to_csv('emfd_diversity.csv', index=False)
train_set_merge.to_csv('emfd_diversity_train_set.csv', index=False)

Full dataset:
----------------------------------------------


Unnamed: 0,Liberal_foundation_likelihood,Conservative_foundation_likelihood,Likelihood_distance,Ideology,commentText,ID
0,0.185185,0.000000,0.185185,Liberal,sad,UgyPHwv8G0cDE6-wEgl4AaABAg.8_0ZjJKSJty8_0kXGkAd2U
1,0.098627,0.036405,0.062222,Liberal,That's a vicious insult!!! What did a box of r...,Ugx2WXq9UdV8mPPjejJ4AaABAg.8yHCKV0Boe58yYRxEQEF45
2,0.071429,0.037500,0.033929,Liberal,@colbertlateshow The question has always been ...,1110578710648890000
3,0.058345,0.030516,0.027829,Liberal,Goya Solidar. So there are a few of us left. ...,UgwUPFScjJ0MCeaP2F54AaABAg.8lvp3fc9Euf8lvvgsUgEgV
4,0.000000,0.000000,0.000000,Liberal,hello hello \r\nNo-one else will hug him.,UgwWKCWtSJdFvjGHvTp4AaABAg.8kUC5dGrQ2H8kUDRihE2f3
...,...,...,...,...,...,...
3857,0.078012,0.026260,0.051752,Liberal,@FullFrontalSamB They can’t afford chemical pe...,1152219467579100000
3858,0.000000,0.081633,-0.081633,Conservative,@AC360 @CNN @andersoncooper It's not if..... h...,1085362296472430000
3859,0.039031,0.040979,-0.001947,Conservative,"Nah, they knew all about the cameras. I'm gue...",UghFY3QJ6nmT_ngCoAEC.7-H0Z7--wxd8goqpaPs-bl
3860,0.048842,0.034504,0.014338,Liberal,Alexander Hamilton. Troops are waiting in the ...,UgyWabsmmnq3zam4DgZ4AaABAg


Train set subset:
----------------------------------------------


Unnamed: 0,ID,Liberal_foundation_likelihood,Conservative_foundation_likelihood,Likelihood_distance,Ideology,commentText
0,UgyPHwv8G0cDE6-wEgl4AaABAg.8_0ZjJKSJty8_0kXGkAd2U,0.185185,0.000000,0.185185,Liberal,sad
1,Ugx2WXq9UdV8mPPjejJ4AaABAg.8yHCKV0Boe58yYRxEQEF45,0.098627,0.036405,0.062222,Liberal,That's a vicious insult!!! What did a box of r...
2,1110578710648890000,0.071429,0.037500,0.033929,Liberal,@colbertlateshow The question has always been ...
3,UgwUPFScjJ0MCeaP2F54AaABAg.8lvp3fc9Euf8lvvgsUgEgV,0.058345,0.030516,0.027829,Liberal,Goya Solidar. So there are a few of us left. ...
4,UgwWKCWtSJdFvjGHvTp4AaABAg.8kUC5dGrQ2H8kUDRihE2f3,0.000000,0.000000,0.000000,Liberal,hello hello \r\nNo-one else will hug him.
...,...,...,...,...,...,...
3084,1152219467579100000,0.078012,0.026260,0.051752,Liberal,@FullFrontalSamB They can’t afford chemical pe...
3085,1085362296472430000,0.000000,0.081633,-0.081633,Conservative,@AC360 @CNN @andersoncooper It's not if..... h...
3086,UghFY3QJ6nmT_ngCoAEC.7-H0Z7--wxd8goqpaPs-bl,0.039031,0.040979,-0.001947,Conservative,"Nah, they knew all about the cameras. I'm gue..."
3087,UgyWabsmmnq3zam4DgZ4AaABAg,0.048842,0.034504,0.014338,Liberal,Alexander Hamilton. Troops are waiting in the ...
