#### Importing Libraries

In [1]:
import scipy.stats as sts
from scipy.stats import norm
import math
import numpy as np
import pandas as pd

### Problem Statement 1:
Is gender independent of education level? A random sample of 395 people were
surveyed and each person was asked to report the highest education level they
obtained. The data that resulted from the survey is summarized in the following table:<br>

            High School  Bachelors Masters Ph.d. Total
    Female  60           54        46      41    201
    Male    40           44        53      57    194
    Total   100          98        99      98    395

Question: Are gender and education level dependent at 5% level of significance? In
other words, given the data collected above, is there a relationship between the
gender of an individual and the level of education that they have obtained?

In [2]:
data = {"High School" : [60, 40, 100],
        "Bachelors" : [54, 44, 98],
        "Masters" : [46, 53, 99],
        "Ph.d" : [41, 57, 98],
        "Total" : [201, 194, 395]
}
index = ["Female", "Male", "Total"]
df = pd.DataFrame(data, index = index)
print(df)

        High School  Bachelors  Masters  Ph.d  Total
Female           60         54       46    41    201
Male             40         44       53    57    194
Total           100         98       99    98    395


In [3]:
alpha = 0.05 #Significance Level

#degree of freedom is 1 less than number of categories.
dof = 3

#Chi Square value for degree of freedom 3.
chi2 = sts.chi2.ppf(alpha, dof)
print("Chi Square value for degree of freedom 3 = ",chi2)

Chi Square value for degree of freedom 3 =  0.35184631774927144


In [4]:
#Calculated Test statitics

X2 = 0
for i in range(0,4):
    X2 = X2 + ((df.iat[0,i]-df.iat[1,i])**2)/df.iat[1,i]
print(X2)

17.688483644789503


In [5]:
print("Calulated Chi Square > Expected Chi Square")
print("So, we reject the Null Hypothesis.")
print("Therefore, there is a relationship between the gender of an individual and the level of education that they have obtained.")

Calulated Chi Square > Expected Chi Square
So, we reject the Null Hypothesis.
Therefore, there is a relationship between the gender of an individual and the level of education that they have obtained.


### Problem Statement 2:
Using the following data, perform a oneway analysis of variance using Î±=.05. Write
up the results in APA format. <br>
[Group1: 51, 45, 33, 45, 67] <br>
[Group2: 23, 43, 23, 43, 45] <br>
[Group3: 56, 76, 74, 87, 56]

In [6]:
Group1 = [51, 45, 33, 45, 67]
Group2 = [23, 43, 23, 43, 45]
Group3 = [56, 76, 74, 87, 56]

# Perform the ANOVA
statistic, pvalue = sts.f_oneway(Group1,Group2,Group3)
print("F Statistic value {} , p-value {}".format(statistic,pvalue))
if pvalue < 0.05:
    print('True')
else:
    print('False')

F Statistic value 9.747205503009463 , p-value 0.0030597541434430556
True


Note: The test result suggests the groups don't have the same sample means in this case, since the p-value is significant at a 99% confidence level. Here the p-value returned is 0.00305 which is < 0.05

### Problem Statement 3:
Calculate F Test for given 10, 20, 30, 40, 50 and 5,10,15, 20, 25.<br>
For 10, 20, 30, 40, 50:

In [7]:
sts.f_oneway([10, 20, 30, 40, 50],[5,10,15, 20, 25])

F_onewayResult(statistic=3.6, pvalue=0.0943497728424377)

In [8]:
Group1 = [10, 20, 30, 40, 50]
Group2 = [5,10,15, 20, 25]

mean_1 = np.mean(Group1)
mean_2 = np.mean(Group2)

grp1_sub_mean1 = []
grp2_sub_mean2 = []
add1 = 0
add2 = 0
for items in Group1:
    add1 += (items - mean_1)**2
for items in Group2:
    add2 += (items - mean_2)**2
var1 = add1/(len(Group1)-1)
var2 = add2/(len(Group2)-1)

F_Test = var1/var2
print("F Test for given 10, 20, 30, 40, 50 and 5, 10, 15, 20, 25 is : ", F_Test)

F Test for given 10, 20, 30, 40, 50 and 5, 10, 15, 20, 25 is :  4.0
