<a href="https://www.kaggle.com/code/jgiaquinto/mturk-final-project?scriptVersionId=193110155" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np



## This cell imports the survey results collected from Amazon's MTurk crowdsourcing marketplace. 

mturk_data = pd.read_csv("/kaggle/input/mturk-data/data_mturk.csv")



## The Mturk workers were given an audio clip and asked to transcribe the recording.
## The "Baseline" group was offered 15 cents for their work.
## The "Bonus", or 'Treatment", group was offered the same but with a $5 bonus incentive for the most accurate transcription.

treatment = mturk_data.loc[mturk_data["Treatment.1"] == "bonus"]


baseline = mturk_data.loc[mturk_data["Treatment.1"] == "baseline"]



## The below cells calculate observed test statistics of certain variables measured in the survey responses.

observed_test_stat_count = np.mean(treatment["Word count"]) - np.mean(baseline["Word count"])


observed_test_stat_quality = np.mean(treatment["Quality_Grammar"]) - np.mean(baseline["Quality_Grammar"])


observed_test_stat_typo = np.mean(treatment["Quality_Typo"]) - np.mean(baseline["Quality_Typo"])




mturk_data["Word count"].sample(frac = 1)



mturk_data["Quality_Typo"].sample(frac = 1)



mturk_data["Quality_Grammar"].sample(frac = 1)



def perm(mturk_data) :
    return mturk_data.sample(frac = 1).reset_index(drop = True)



## This loop randomly permutes the  Word Count data and stores in it a new array called "Sim Test Stat".


sim_test_stat = np.array([])

reps = 10000

for i in range(reps) :
    perm_info = perm(mturk_data["Word count"])
    df = pd.DataFrame({"Permuted Word Count" : perm_info, "Treatment" : mturk_data["Treatment.1"]})
    bonus_incentive = df.loc[df["Treatment"] == "bonus", "Permuted Word Count"]
    no_incentive = df.loc[df["Treatment"] == "baseline", "Permuted Word Count"]
    stat = np.mean(bonus_incentive) - np.mean(no_incentive)
    sim_test_stat = np.append(sim_test_stat, stat)




sim_test_stat



## Here we will conduct the permutation test to see if the bonus treatment has a statistically significant effect on the word count variable.

p_value = np.count_nonzero(sim_test_stat <= observed_test_stat_count) / reps



if p_value > .05 :
    print ("Since the p value is higher than our level of signifcance, we accept the null hypothesis.") 
else :  print ("Since the p value is lower than our level of significance, we reject the null hypothesis.")

Since the p value is higher than our level of signifcance, we accept the null hypothesis.
