# Training to Spot Fake News
### Research Question
Is training developed to innoculate people against fake news effective? We ran an experiment that tested two training methods designed to help people spot fake news.
### Method
Partipcants were randomly assigned to one of three conditions. Participants in the first condition played the [Bad News Game](https://getbadnews.com/#intro) designed to "vaccinate the world against disinformation". Participants assigned to the second condition watched a [video](https://www.factcheck.org/2016/12/video-spotting-fake-news/) "How to Spot Fake News" cretaed by [factcheck.org](https://www.factcheck.org/). A third condition served as a control condition.

Participants were then asked to classify 20 articles into one of five categories: fake news, satire, extreme bias, political, or credible.  


### Load Libraries and Packages

In [2]:
import numpy as np
import pandas as pd

### Extract, Transform, Load 
Take original Qualtircs csv file, remove rejected subjects, complete transformations, save clean data .csv file

In [3]:
# Read in original Qualtrics .csv file.
df_raw = pd.read_csv('data/Spot Fake News_July 28, 2019_13.40.csv', skiprows=[1,2])

In [8]:
# Remove rejected responses from data
df_raw = df_raw[df_raw.Consent != 0] # Take out surveys where participant did not consent
df_raw = df_raw[df_raw.Finished != 0] # Take out incomplete surveys
df_raw = df_raw[df_raw.mTurkID != 'asd'] # Non-sensical text responses/incorrect completion code
df_raw = df_raw[df_raw.mTurkID != 'A1TXOZQU1O4F0N'] # Response not in mTurk
df_raw = df_raw[df_raw.mTurkID != 'A5LYLHG880ABE'] # Worker repeated survey
df_raw = df_raw[df_raw.mTurkID != 'AZM3H44W1D65P'] # Response not in mTurk
df_raw = df_raw[df_raw.mTurkID != 'A1YC558J4E5KZ'] # Worker repeated survey
df_raw = df_raw[df_raw.mTurkID != 'AK2C9AX5QJWUU'] # Incorrect completion code
df_raw = df_raw[df_raw.mTurkID != 'A110KENBXU7SUJ'] # Incorrect completion code
df_raw = df_raw[df_raw.mTurkID != 'AJ60KRY0FTB1F'] # Incorrect completion code

In [13]:
# Create data frame for cleaned data
col_names = ['ID','Cond','Cor01','Cor02','Cor03','Cor04','Cor05','Cor06','Cor07','Cor08','Cor09',\
            'Cor10','Cor11','Cor12','Cor13','Cor14','Cor15','Cor16','Cor17','Cor18','Cor19','Cor20',\
            'TotCor']
df_clean = pd.DataFrame(columns=col_names)
df_clean.ID = df_raw.mTurkID
df_clean.Cond = df_raw.Condition

##### Variable Description
ID: Subject ID (mTurk ID)

Cond: Assigned experimental condition. T1=Training Game, T2=Training video, C=Control(no training)

Cor01-Cor20: Correct categorization made for Article #. '1' if assigned correctly, '0' else. 

TotCor: Total correct out of 20 articles classified. 

In [16]:
# Record if article coded correctly, Total number articles coded correctly
df_clean.Cor01 = np.where(df_raw.Article01==2, 1, 0)
df_clean.Cor02 = np.where(df_raw.Article02==2, 1, 0)
df_clean.Cor03 = np.where(df_raw.Article03==2, 1, 0)
df_clean.Cor04 = np.where(df_raw.Article04==2, 1, 0)
df_clean.Cor05 = np.where(df_raw.Article05==2, 1, 0)
df_clean.Cor06 = np.where(df_raw.Article06==2, 1, 0)
df_clean.Cor07 = np.where(df_raw.Article07==2, 1, 0)
df_clean.Cor08 = np.where(df_raw.Article08==2, 1, 0)
df_clean.Cor09 = np.where(df_raw.Article09==1, 1, 0)
df_clean.Cor10 = np.where(df_raw.Article10==1, 1, 0)
df_clean.Cor11 = np.where(df_raw.Article11==1, 1, 0)
df_clean.Cor12 = np.where(df_raw.Article12==3, 1, 0)
df_clean.Cor13 = np.where(df_raw.Article13==3, 1, 0)
df_clean.Cor14 = np.where(df_raw.Article14==3, 1, 0)
df_clean.Cor15 = np.where(df_raw.Article15==4, 1, 0)
df_clean.Cor16 = np.where(df_raw.Article16==4, 1, 0)
df_clean.Cor17 = np.where(df_raw.Article17==4, 1, 0)
df_clean.Cor18 = np.where(df_raw.Article18==5, 1, 0)
df_clean.Cor19 = np.where(df_raw.Article19==5, 1, 0)
df_clean.Cor20 = np.where(df_raw.Article20==5, 1, 0)
df_clean.TotCor = df_clean.Cor01+df_clean.Cor02+df_clean.Cor03+df_clean.Cor04+df_clean.Cor05+\
    df_clean.Cor06+df_clean.Cor07+df_clean.Cor08+df_clean.Cor09+df_clean.Cor10+df_clean.Cor11+\
    df_clean.Cor12+df_clean.Cor13+df_clean.Cor14+df_clean.Cor15+df_clean.Cor16+df_clean.Cor17+\
    df_clean.Cor18+df_clean.Cor19+df_clean.Cor20

In [22]:
# Calculate means
print('Mean Accuracy rates')

Control_Check = df_clean['Cond'] =='C'
Control_Res = df_clean[Control_Check]
print('Control: ', Control_Res.mean())

T1_Check = df_clean['Cond'] =='T1'
T1_Res = df_clean[T1_Check] 
print('T1: ', T1_Res.mean())

T2_Check = df_clean['Cond'] =='T2'
T2_Res = df_clean[T2_Check] 
print('T2: ', T2_Res.mean())

Mean Accuracy rates
Control:  Cor01     0.340426
Cor02     0.180851
Cor03     0.436170
Cor04     0.297872
Cor05     0.265957
Cor06     0.617021
Cor07     0.563830
Cor08     0.574468
Cor09     0.744681
Cor10     0.585106
Cor11     0.585106
Cor12     0.287234
Cor13     0.436170
Cor14     0.308511
Cor15     0.159574
Cor16     0.276596
Cor17     0.180851
Cor18     0.553191
Cor19     0.340426
Cor20     0.372340
TotCor    8.106383
dtype: float64
T1:  Cor01     0.391753
Cor02     0.164948
Cor03     0.536082
Cor04     0.298969
Cor05     0.134021
Cor06     0.567010
Cor07     0.412371
Cor08     0.608247
Cor09     0.701031
Cor10     0.618557
Cor11     0.556701
Cor12     0.381443
Cor13     0.381443
Cor14     0.350515
Cor15     0.082474
Cor16     0.237113
Cor17     0.216495
Cor18     0.484536
Cor19     0.278351
Cor20     0.195876
TotCor    7.597938
dtype: float64
T2:  Cor01     0.405941
Cor02     0.247525
Cor03     0.455446
Cor04     0.336634
Cor05     0.178218
Cor06     0.594059
Cor07     0.564356