**Problem Statement 1:**

Is gender independent of education level? A random sample of 395 people were surveyed and each person was asked to report the highest education level they obtained. The data that resulted from the survey is summarized in the following table:

            High School Bachelors Masters Ph.d. Total

        Female    60        54        46     41    201

        Male      40        44        53     57    194

        Total    100        98        99     98     395

Question: Are gender and education level dependent at 5% level of significance? In other words, given the data collected above, is there a relationship between the gender of an individual and the level of education that they have obtained?

In [33]:
import scipy.stats
import pandas as pd
from scipy.stats import chi2_contingency
#1.
#H0:The two categorical variables are independent 
#H1:The two categorical variables are dependent 
observed_table = pd.DataFrame([[60,54,46,41],[40,44,53,57]])
print("The table of observed frequency is: ")
print(observed_table)
#2.
#Chi-square test statistic = Sigma(O-E)^2/E
#O-observed frequency,E is the expected frequency under the null hypothesis and computed by:
#E=(row_count*column_count)/sample size e.g., 100*201/395 = 50.88607595
chi2, p, dof, expected = chi2_contingency(observed_table)
print("\nThe table of expected counts is: ")
print(expected)
print("\nChi-square value is : "+str(chi2))
#Degrees of freedom (dof)= (r-1)*(c-1) =(2-1)*(4-1) = 3
cv = scipy.stats.chi2.ppf(0.95,dof)
print("Critical value of chi-square with 3 degree of freedom is "+ str(cv)+"\n")
if(cv > chi2):
    print("Accept the null hypothesis and concluding that gender and education level are independent at 5% level of significance")
else: 
    print("Reject the null hypothesis and concluding that gender and education level are dependent at 5% level of significance")


The table of observed frequency is: 
    0   1   2   3
0  60  54  46  41
1  40  44  53  57

The table of expected counts is: 
[[50.88607595 49.86835443 50.37721519 49.86835443]
 [49.11392405 48.13164557 48.62278481 48.13164557]]

Chi-square value is : 8.006066246262538
Critical value of chi-square with 3 degree of freedom is 7.814727903251179

Reject the null hypothesis and concluding that gender and education level are dependent at 5% level of significance


**Problem Statement 2:**

Using the following data, perform a oneway analysis of variance using α=.05. Write up the results in APA format.


[Group1: 51, 45, 33, 45, 67]

[Group2: 23, 43, 23, 43, 45]

[Group3: 56, 76, 74, 87, 56]

In [42]:
import pandas as pd
import scipy.stats as stats
import statsmodels.api as sm
from statsmodels.formula.api import ols

table = pd.DataFrame()
table['A']=[51, 45, 33, 45, 67]
table['B']=[23, 43, 23, 43, 45]
table['C']=[56, 76, 74, 87, 56]


# stats f_oneway functions takes the groups as input and returns F and P-value
fvalue, pvalue = stats.f_oneway(table['A'], table['B'], table['C'])
print(fvalue, pvalue)

# reshape the d dataframe suitable for statsmodels package 
d_melt = pd.melt(table.reset_index(), id_vars=['index'], value_vars=['A', 'B', 'C'])
# replace column names
d_melt.columns = ['index', 'treatments', 'value']
# Ordinary Least Squares (OLS) model
model = ols('value ~ C(treatments)', data=d_melt).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(" \nAPA writeup")
print("F(2, 12)= "+str(fvalue))
if(pvalue <0.05):
    print("p<0.05")
else:
    print("p<0.05")    

anova_table

9.747205503009463 0.0030597541434430556
 
APA writeup
F(2, 12)= 9.747205503009463
p<0.05


Unnamed: 0,sum_sq,df,F,PR(>F)
C(treatments),3022.933333,2.0,9.747206,0.00306
Residual,1860.8,12.0,,


**Problem Statement 3:**

Calculate F Test for given 10, 20, 30, 40, 50 and 5,10,15, 20, 25.

For 10, 20, 30, 40, 50:

In [8]:
import statistics as stat
set1=(10,20,30,40,50)
set2=(5,10,15,20,25)
#calculating variance for the first set of elements
var1 = stat.variance(set1)
#calculating variance for the second set of elements
var2 = stat.variance(set2)
#F-test = var1/var2
F = var1/var2
print("The F-test value is "+str(F))

The F-test value is 4.0
