### Problem Statement 1:

Is gender independent of education level? A random sample of 395 people were
surveyed and each person was asked to report the highest education level they
obtained. The data that resulted from the survey is summarized in the following table:

High School Bachelors Masters Ph.d. Total
* Female 60 54 46 41 201
* Male 40 44 53 57 194
* Total 100 98 99 98 395

**Question: Are gender and education level dependent at 5% level of significance? In
other words, given the data collected above, is there a relationship between the
gender of an individual and the level of education that they have obtained?**

In [1]:
import numpy as np
import pandas as pd
from scipy import stats as st

In [2]:
# H0: Gender and education are independent.
# H1: Gender and education are dependent.

# Level of significance = 0.05

# Highest education level obtained by male and female is as follows:

female_qual = [60, 54, 46, 41, 201]
male_qual = [40, 44, 53, 57, 194]
Total_qual = [100, 98, 99, 98, 395]

sample_size = 395

In [3]:
data = pd.DataFrame([ female_qual, male_qual, Total_qual], columns = ["High_school", "Bachelors", "Masters", "Ph.d", "Total"], 
             index = ["Female", "Male", "Total"])
data

Unnamed: 0,High_school,Bachelors,Masters,Ph.d,Total
Female,60,54,46,41,201
Male,40,44,53,57,194
Total,100,98,99,98,395


In [4]:
''' Here we have to perform chi_square_test, in order to find out if the two variable Gender and
    education are independent or not. '''

# Here first we have to select data only for two categories i.e. Gender and education.
# So total column and rows are dropped.
x = data[:2].drop("Total", axis = 1)

chi, p , dof, exp = st.chi2_contingency(x, correction=True, lambda_=None)

print("The Degree of Freedom: ", dof)
print("The Chi-Square Value: ", chi)
print("\nThe Expected Values: \n", exp)
print("\np-value: ", p)

The Degree of Freedom:  3
The Chi-Square Value:  8.006066246262538

The Expected Values: 
 [[50.88607595 49.86835443 50.37721519 49.86835443]
 [49.11392405 48.13164557 48.62278481 48.13164557]]

p-value:  0.045886500891747214


In [5]:
from scipy.stats import chi2
critical_value=chi2.ppf(q=1-0.05,df=3)
print('critical_value:',critical_value)

critical_value: 7.814727903251179


In [6]:
alpha = 0.05

if chi >= critical_value:
    print("Accept H1: It means that Gender and education are dependent. ")
else:
    print("Accept H0: Gender and education are independent")
        
if p <= alpha:
    print("Accept H1: It means that Gender and education are dependent. ")
else:
    print("Accept H0: Gender and education are independent")

Accept H1: It means that Gender and education are dependent. 
Accept H1: It means that Gender and education are dependent. 


### Problem Statement 2:

Using the following data, perform a oneway analysis of variance using α=.05. Write
up the results in APA format.

* [Group1: 51, 45, 33, 45, 67]
* [Group2: 23, 43, 23, 43, 45]
* [Group3: 56, 76, 74, 87, 56]

In [7]:
# Here we have to find the oneway analysis of variance using alpha = 0.05

alpha = 0.05

group1 = [51, 45, 33, 45, 67]
group2 = [23, 43, 23, 43, 45]
group3 = [56, 76, 74, 87, 56]

#### Hypothesis
---
* H0: All population means are exactly same.
* H1: All population are not same.

In [8]:
f_statistics, p = st.f_oneway(group1,group2,group3)
f_critical = st.f.ppf(q = 1 - alpha, dfn = 2 , dfd = 12)
print("Test statistics:", f_statistics)
print("F_critical:", f_critical)

Test statistics: 9.747205503009463
F_critical: 3.8852938346523933


In [9]:
if f_statistics > f_critical:
    print("Reject Null Hypothesis H0")
else:
    print("Accept Null Hypothesis H0")

Reject Null Hypothesis H0


In [10]:
print("APA:", "\nF_critical:", f_critical)
print("p:", p)

APA: 
F_critical: 3.8852938346523933
p: 0.0030597541434430556


### Problem Statement 3:

Calculate F Test for given 10, 20, 30, 40, 50 and 5,10,15, 20, 25.

For 10, 20, 30, 40, 50:

In [11]:
sample1 = [10,20,30,40,50]
sample2 = [5,10,15,20,25]

In [12]:
import statistics as s
s.variance(sample1), s.variance(sample2)

(250, 62.5)

In [13]:
# to calculate f-test we have to find the ratio 

f_test = (s.variance(sample1)/ s.variance(sample2))
print('The f-test value is:', f_test)

The f-test value is: 4.0
