### Coursera
#### Wesleyan University Data Analysis and Interpretation Specialization
###### Course 2: Data Analysis Tools
###### Week 4: Testing Moderation
###### Author: Matt Clark
---

#### Introduction:
For our final assignment in the Data Analysis Tools course, we test *_moderation_* of variables of interest, a.k.a. *_statistical interaction._* We will resume with use of the Outlook On Life codebook, using our our newly learned tool in an attempt to glean yet more insight from the new analysis of our data.

First, we will look for an association between the variables *_W1_F6_* ( Economic Optimism: How far along the road to your American Dream do you think you will ultimately get on a 10-point scale where 1 is not far at all and 10 nearly there? ) and *_W1_A5A_* ( Who did you vote for?)

In [1]:
# import libraries

import pandas as pd
import numpy as np
import seaborn as sb
import scipy
import matplotlib.pyplot as plt

In [2]:
# generate dataframe

df = pd.read_csv('mycodebook.csv', low_memory=False)

In [3]:
# condition/subset data

# exclude unusable W1_A5A, W1_C1 values
df1 = df[(df['W1_A5A'] >=1) & (df['W1_C1'] >= 1) & (df['W1_C1'] <= 3)]

In [4]:
# contingency table of observed counts
ct1=pd.crosstab(df1['W1_F6'], df1['W1_A5A'])
print (ct1)

W1_A5A  1.0  2.0  3.0
W1_F6                
-1        4   23    1
 1        5   29    0
 2        4   30    0
 3       17   47    1
 4       19   80    1
 5       41  200    2
 6       47  191    3
 7       72  276    6
 8       80  211    0
 9       34  101    3
 10      45  116    3


In [5]:
# column percentages
colsum=ct1.sum(axis=0)
colpct=ct1/colsum
print(colpct)

W1_A5A       1.0       2.0   3.0
W1_F6                           
-1      0.010870  0.017638  0.05
 1      0.013587  0.022239  0.00
 2      0.010870  0.023006  0.00
 3      0.046196  0.036043  0.05
 4      0.051630  0.061350  0.05
 5      0.111413  0.153374  0.10
 6      0.127717  0.146472  0.15
 7      0.195652  0.211656  0.30
 8      0.217391  0.161810  0.00
 9      0.092391  0.077454  0.15
 10     0.122283  0.088957  0.15


In [6]:
# chi-square
print ('chi-square value, p value, expected counts')
cs1= scipy.stats.chi2_contingency(ct1)
print (cs1)

chi-square value, p value, expected counts
(27.590043992694298, 0.11946460890382729, 20, array([[  6.08983452,  21.57919622,   0.33096927],
       [  7.39479905,  26.20330969,   0.40189125],
       [  7.39479905,  26.20330969,   0.40189125],
       [ 14.13711584,  50.09456265,   0.76832151],
       [ 21.74940898,  77.06855792,   1.1820331 ],
       [ 52.85106383, 187.27659574,   2.87234043],
       [ 52.41607565, 185.73522459,   2.84869976],
       [ 76.9929078 , 272.82269504,   4.18439716],
       [ 63.29078014, 224.26950355,   3.43971631],
       [ 30.0141844 , 106.35460993,   1.63120567],
       [ 35.66903073, 126.39243499,   1.93853428]]))


Now we investigate whether the variable *W1_C1* moderates thet association between W1_F6 and W1_A5A

In [7]:
sub1=df1[(df1['W1_C1']== 1)]
sub2=df1[(df1['W1_C1']== 2)]
sub3=df1[(df1['W1_C1']== 3)]

In [8]:
print ('Association between W1_F6 and W1_A5A for Republicans')
print (scipy.stats.pearsonr(sub1['W1_F6'], sub1['W1_A5A']))
print ('       ')
print ('Association between W1_F6 and W1_A5A for Democrats')
print (scipy.stats.pearsonr(sub2['W1_F6'], sub2['W1_A5A']))
print ('       ')
print ('Association between W1_F6 and internetuserate for Independents')
print (scipy.stats.pearsonr(sub3['W1_F6'], sub3['W1_A5A']))

Association between W1_F6 and W1_A5A for Republicans
(0.012067822822930115, 0.8429525398567782)
       
Association between W1_F6 and W1_A5A for Democrats
(-0.053446433505905756, 0.08721866846736208)
       
Association between W1_F6 and internetuserate for Independents
(-0.034155825362122236, 0.498482231284581)


#### Summary:
