# --- Hypothesis Test ---

# T-TEST
https://www.investopedia.com/terms/t/t-test.asp

T-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be related in certain features

A t-test is used as a hypothesis testing tool, which allows testing of an assumption applicable to a population.

A t-test looks at the t-statistic, the t-distribution values, and the degrees of freedom to determine the statistical significance. To conduct a test with three or more means, one must use an analysis of variance.

In [1]:
import pandas as pd
df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[2,4,6,8,10]})
df.head()

Unnamed: 0,a,b
0,1,2
1,2,4
2,3,6
3,4,8
4,5,10


In [2]:
# descriptive statistics
# All about describing a variable: Its center, its spread, its distribution
df.describe()

Unnamed: 0,a,b
count,5.0,5.0
mean,3.0,6.0
std,1.581139,3.162278
min,1.0,2.0
25%,2.0,4.0
50%,3.0,6.0
75%,4.0,8.0
max,5.0,10.0


In [23]:
# the probability of a coinflip is the population mean
import numpy as np

coinflip = np.random.binomial(n=1, p=.5, size=10)

In [24]:
coinflip.mean()

0.6

In [25]:
coinflip = np.random.binomial(n=1, p=.5, size=1000)

In [26]:
coinflip.mean()

0.48

## T-Test process

In [30]:
from scipy import stats

In [31]:
coinflip = np.random.binomial(n=1, p=.6, size=10)

In [32]:
coinflip

array([0, 0, 0, 0, 1, 1, 0, 0, 0, 0])

In [None]:
# Null hypothesis:
# this is a fair coin . my concept of population is correct probability (heads) = 0.5
# this means my population mean (u -"miu") is equals to my sample mean.


# Alternative hypothesis: The opposite 
# My population mean is different or not equal


# Confidence Level: 95% . this means I'm willing to be tricked by the randomess of the sampling process 5% of the time.

# This also means that I want to be 95% confidence that my null hypothesis is false before I reject it.

In [34]:
#sample, null hypothesis value (fair coin == 0.5)
stats.ttest_1samp(coinflip,0.5)

Ttest_1sampResult(statistic=-2.2499999999999996, pvalue=0.05100326070695081)

In [35]:
statistic = -2.249
pvalue = 0.051

In [36]:
# p-value is the probability of my null hypothesis is true given the same that I have collected 5.1%

# p-value >= .05(1- .95(confidence level) = 0.05) -> fail to reject my null hypothesis
# p-value <= .05(1- .95(confidence level) = 0.05) -> reject null hypotheis

# Conclusion: based on my p-value = 0.051 
# I fail to reject my null hypothesis that 
# my sample mean of coinflips is equal to my population mean of coinflips
 # I cannot say that this a fair coin

# Two Sample Test

In [37]:
# compare two samples each other

In [40]:
xbar1 = 180  # sample mean 1
s1 = 5 # Standard Deviation 1
#instead using binomial we can use np.random 
sample1 =  np.random.normal(xbar1, s1, 1000)
sample1

array([174.20930223, 175.52903401, 178.21549696, 187.92042666,
       176.42653579, 177.87951205, 192.29690654, 186.73517348,
       181.68452604, 177.82316892, 189.74024091, 177.0005559 ,
       173.81379387, 174.88688427, 179.88457278, 179.05533938,
       180.87420716, 172.12495762, 180.41489756, 180.62266093,
       177.07414269, 174.92837793, 191.13011719, 171.43286869,
       184.55832547, 182.45951835, 178.52349168, 180.30738075,
       188.86889461, 174.95147526, 188.01554472, 182.45531894,
       185.46074569, 177.634918  , 174.81310486, 172.2981585 ,
       176.8415317 , 178.97547905, 181.52543616, 181.44620526,
       183.72497526, 183.04187422, 180.0681406 , 191.46766634,
       170.49132755, 185.78601395, 172.89542924, 178.87586188,
       179.79928531, 176.8992009 , 182.69920432, 184.24919079,
       184.45040859, 182.17770308, 181.81160064, 180.58543651,
       187.05253069, 175.22612266, 167.92975011, 187.21768093,
       186.18277117, 179.38934061, 177.33080706, 189.81

In [41]:
xbar2 = 178.5
s2 = 4.25
sample2 =  np.random.normal(xbar2, s2, 800)
sample2

array([177.13559222, 177.20914761, 181.84732762, 178.13952237,
       182.16402234, 175.21625821, 168.65654479, 179.32290316,
       176.11225271, 181.67585907, 171.22209412, 175.32129291,
       177.31950912, 180.29175576, 181.65343312, 177.40919323,
       186.7370861 , 179.84407078, 177.75838354, 185.18623999,
       181.49137292, 182.98387208, 174.49050328, 170.11296693,
       178.23906291, 179.41913508, 174.30060645, 177.50788936,
       171.29770302, 180.01724025, 173.30609484, 173.54460952,
       180.12643759, 172.65565456, 178.11687778, 182.35382394,
       175.65187897, 183.22149608, 179.39645164, 177.90362435,
       177.76490787, 181.74099044, 176.0430774 , 175.93772902,
       173.55179   , 180.72754972, 185.53584428, 177.23189411,
       177.43129618, 176.79380239, 177.50164048, 179.03914146,
       181.76427964, 180.46930636, 180.67646638, 178.21508284,
       173.25828427, 177.90125967, 181.77095783, 184.36735966,
       178.63366583, 171.88655733, 179.06978814, 176.74

In [42]:
# null hypothesis : sample mean 1 is equals sample mean 2
# alternative : mean are different
# condifence level: 95%

In [43]:
# independ means that they dont affect each other
stats.ttest_ind(sample1, sample2)

Ttest_indResult(statistic=8.80256579704537, pvalue=3.0745119566138474e-18)

In [None]:
# Conclusion: based on a  statistic=8.80256579704537, pvalue=3.0745119566138474e-18
# I reject the null hypothesis because 3.0745119566138474e-18 is smaller than 0.05
# this suggests the alternative hypothesis is correct`

# Chisquare

In [54]:
import numpy as np
from scipy.stats import chisquare # one-way chi sqaure test

# chi square can take any crosstab/table and test the independance of rows/cols 
# the null hypothesis is that the rows/cols are independent -> low chi square
# The alternative hypothesis: is that there is a dependence -> high chi square
# be aware! Chi square does not tell you direction/causation

independence_obs = np.array([[1, 1],[2, 2]]).T
print(independence_obs)
print(chisquare(independence_obs, axis=None))

dependence_obs = np.array([[16, 18, 48, 14, 14, 12],[32, 36, 56, 28, 26,20]]).T
print(dependence_obs)
print(chisquare(dependence_obs, axis=None))

[[1 2]
 [1 2]]
Power_divergenceResult(statistic=0.6666666666666666, pvalue=0.8810148425137847)
[[16 32]
 [18 36]
 [48 56]
 [14 28]
 [14 26]
 [12 20]]
Power_divergenceResult(statistic=82.6, pvalue=4.626984040565082e-13)


## Chisquare Two samples

In [62]:
# Roll a dice n times and you observe the frequencie :
# 1 - 27
# 2 - 13
# 3 - 10
# 4 - 15
# 5 - 30
# 6 - 32

observed_frequencies = np.array([27, 13, 10, 15, 30, 32])

# how to caluclate espected frequency? with the mean
expected_frequency = observed_frequencies.mean()

# The expected frquencies has to be an array
expected_frequencies = np.array([expected_frequency]*6)
expected_frequencies

array([21.16666667, 21.16666667, 21.16666667, 21.16666667, 21.16666667,
       21.16666667])

In [60]:
# Null Hypothesys: the distribution of observed frequencies is equal to
# the distribution of expected frequencies (the dies is fear)

# Alternative Hypothesys: the distribution of observed frequencies is not equal to 
# the distibution expected frequencies - they are different

# Confidence Level : 95% 

In [64]:
from scipy.stats import stats
stats.chisquare(observed_frequencies,expected_frequencies)

Power_divergenceResult(statistic=21.67716535433071, pvalue=0.0006029877129094496)

In [65]:
# Concusion:
# Base on a chi^2 of 21.6 and a p-value of pvalue=0.0006, I reject the null hypothesis that the expected frequencies 
# are the same as the observed frequencies, and cconclude tha this is not a fair die at the 95% level.

## The Difference Between a T-Test & a Chi Square

A t-test tests a null hypothesis about two means; most often, it tests the hypothesis that two means are equal, or that the difference between them is zero. For example, we could test whether boys and girls in fourth grade have the same average height.

A chi-square test tests a null hypothesis about the relationship between two variables. For example, you could test the hypothesis that men and women are equally likely to vote "Democratic," "Republican," "Other" or "not at all."

# Distribution Test

In [49]:
#We often assum that something is normal, but it can be important to check

# for example, later on with predictive modeling, a typcal assumption is that 
# residuals (prediction errors) are normal - cheking is a good diagnostic

from scipy.stats import normaltest
import numpy as np 
# Poisoon models arrival times and is related to binomial (coinflip)
sample = np.random.poisson(5, 1000)
print(normaltest(sample)) # pretty clear not normal

NormaltestResult(statistic=22.338626497169525, pvalue=1.4100317422707899e-05)


In [44]:
# Kruskal-Wallis H-Test compare the median rank between 2+ gropus
# Can be applied to rankin decisions/outcomes/recommendations
# The underlayong match comes from Chi-square distribution, and is best for n>5

In [47]:
from scipy.stats import kruskal

x1 = [1, 3, 5, 7, 9]
y1 = [2, 4, 6, 8, 10]
print(kruskal(x1, y1)) # x1 is a little better, but not "significantly" so

x2 = [1, 1, 1,]
y2 = [2, 2, 2]
z = [2, 2] # a third gropu and of different size
print(kruskal(x2, y2, z)) # x clearly dominates!

KruskalResult(statistic=0.2727272727272734, pvalue=0.6015081344405895)
KruskalResult(statistic=7.0, pvalue=0.0301973834223185)


## Graphically Represent a Confidence Interval