### 1. Is gender independent of education level? A random sample of 395 people were surveyed and each person was asked to report the highest education level they obtained. The data that resulted from the survey is summarized in the following table:


- High School Bachelors 	Masters	 Ph.d. 		Total

- Female 	60		     54		      46	  41 		201
- Male	40			 44 		  53 	  57		 194
- Total	100			 98		      99 	  98 		395

Are gender and education level dependent at 5% level of significance? In other
words, given the data collected above, is there a relationship between the gender of an
individual and the level of education that they have obtained?



- H0 :- Gender and education are independent
- H1 :- Gender and education are dependent


In [57]:
import numpy as np
import pandas as pd
import scipy.stats as stats

female_list = [60,54,46,41]
male_list = [40,44,53,57]
column_list=['High School', 'Bachelors', 'Masters', 'Ph.d.']
data_directory={"male":male_list,"female":female_list}
df=pd.DataFrame(data_directory,index=column_list).T
df

Unnamed: 0,High School,Bachelors,Masters,Ph.d.
male,40,44,53,57
female,60,54,46,41


#### Chi Square Statistic

- To get the expected count for a cell, multiply the row total for that cell by the column total for that cell and then divide by the total number of observations. 

In [58]:
df2 = pd.concat([df,pd.DataFrame(df.sum(axis=0),columns=['col_total']).T])
df2=pd.concat([df2,pd.DataFrame(df2.sum(axis=1),columns=['row_total'])],axis=1)

In [65]:
df2

Unnamed: 0,High School,Bachelors,Masters,Ph.d.,row_total
male,40,44,53,57,194
female,60,54,46,41,201
col_total,100,98,99,98,395


In [63]:
expected =  np.outer(df2["row_total"][0:2],
                     df2.loc["col_total"][0:4]) / 395.0
expected = pd.DataFrame(expected)
expected.columns = ["High School","Bachelors","Masters","Ph.d."]
expected.index = ["male","female"]
expected

Unnamed: 0,High School,Bachelors,Masters,Ph.d.
male,49.113924,48.131646,48.622785,48.131646
female,50.886076,49.868354,50.377215,49.868354


In [64]:
chi_squared_stat = (((df-expected)**2)/expected).sum().sum()

print(chi_squared_stat)

stats.chi2.stats

8.006066246262538


- The degrees of freedom for a test of independence equals the product of the number of categories in each variable minus 1. In this case we have a 2x4 table so df = 1x3 = 3.

In [67]:
critical_value = stats.chi2.ppf(q = 0.95,df = 3)

print("Critical value")
print(critical_value)

p_value = 1 - stats.chi2.cdf(x=chi_squared_stat,df=3)
print("P value")
print(p_value)

Critical value
7.814727903251179
P value
0.04588650089174717


- The output shows the chi-square statistic = 8, the p-value as 0.045 and the degrees of freedom as 3 followed by the expected counts.

- The critical value with 3 degree of freedom is 7.815. Since 8.006 > 7.815, therefore we reject the null hypothesis and conclude that the education level depends on gender at a 5% level of significance.

### 2. Using the following data, perform a oneway analysis of variance using α=.05. Write up the results in APA format.

[Group1: 51, 45, 33, 45, 67] [Group2: 23, 43, 23, 43, 45] [Group3: 56, 76, 74, 87, 56]

In [69]:
import scipy.stats as stats
Group1 = [51, 45, 33, 45, 67]
Group2 = [23, 43, 23, 43, 45]
Group3 = [56, 76, 74, 87, 56]
# Perform the ANOVA
statistic, pvalue = stats.f_oneway(Group1,Group2,Group3)
print("F Statistic value {} , p-value {}".format(statistic,pvalue))

F Statistic value 9.747205503009463 , p-value 0.0030597541434430556


- Here the p-value returned is 0.00305 which is < 0.05
- result suggests the groups don't have the same sample means

### 3.Calculate F Test for given 10, 20, 30, 40, 50 and 5,10,15, 20, 25. For 10, 20, 30, 40, 50:

In [70]:
import scipy.stats as stats
stats.f_oneway([10, 20, 30, 40, 50],[5,10,15, 20, 25])

F_onewayResult(statistic=3.6, pvalue=0.0943497728424377)

In [75]:
mean_1 = np.mean([10, 20, 30, 40, 50])
mean_2 = np.mean([5,10,15, 20, 25])

add1 = 0
add2 = 0
for items in [10, 20, 30, 40, 50]:
    add1 += (items - mean_1)**2
for items in [5,10,15, 20, 25]:
    add2 += (items - mean_2)**2
var1 = add1/4
var2 = add2/4

F_Test = var1/var2
print("F Test result is : ",F_Test)

F Test result is :  4.0
