### Z-Test

#### Finding p-Value Using Z-value 

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
stats.norm.cdf(-1.46) #for lower tailed

0.07214503696589378

#### Find Z-value using p-Value

In [3]:
stats.norm.ppf(0.1)

-1.2815515655446004

#### Find p-value using z-value

In [4]:
1 - stats.norm.cdf(1.75) #for upper tailed

0.040059156863817114

In [5]:
1 - stats.norm.cdf(2.29)

0.011010658324411393

#### Find z-value at given area covered

In [6]:
stats.norm.ppf(1 - 0.04) #given alfa = 4%

1.7506860712521692

#### If we want to find p-value of two tailed using Z-value...

In [7]:
(1 - stats.norm.cdf(2.74))*2 #for finding both side of area

0.006143918437300888

In [8]:
#if we given area (p-value) for two tailed test then...
#for finding z-value , we use (alfa)/2 instead of alfa.

#Example:
#Given alfa(area) = 0.05
#we use alfa/2 = 0.025

stats.norm.ppf(0.025) #for left side

-1.9599639845400545

In [9]:
stats.norm.ppf(1-0.025) #for right side

1.959963984540054

In [10]:
stats.norm.ppf(0.015)

-2.1700903775845606

In [11]:
stats.norm.ppf(1 - 0.015)

2.17009037758456

### T-Test

#### 1 - sample

In [12]:
X  = [10,12,20,21,22,24,18,15] #given sample
stats.ttest_1samp(X,15) # 15 is assumed mean 

Ttest_1sampResult(statistic=1.5623450931857947, pvalue=0.1621787560592894)

##### Q. ICE CREAM DEMAND

In an ICE-CREAM parlour at CURAJ, the following data represent the number of ice-cream sold in 20 days:
X = [13,8,10,10,8,9,10,11,6,8,12,11,11,12,10,12,7,10,11,8]
Test Hyosthesis: Ho: mu <= 10 , Ha: mu > 10 
alfa = 0.05

In [13]:
X = [13,8,10,10,8,9,10,11,6,8,12,11,11,12,10,12,7,10,11,8]

In [14]:
stats.ttest_1samp(X,10)

Ttest_1sampResult(statistic=-0.35843385854878496, pvalue=0.7239703579964252)

In [15]:
0.72397/2

0.361985

Above p-value is greater than 0.05 So we do not reject null hypothesis.

In [16]:
stats.t.cdf(-0.3584,19) #19 is degree of freedom (n-1)

0.36199764140607527

In [17]:
stats.t.ppf(0.05,19)

-1.7291328115213678

#### Hypothesis Testing using proportion

In [18]:
from statsmodels.stats.proportion import proportions_ztest

In [19]:
count = 67
samplesize = 120
p = 0.5

In [20]:
proportions_ztest(count,samplesize,p)

(1.286806739751111, 0.1981616572238455)

#### Alpha and Beta value

Define function for calculating alpha value (area covered)

In [21]:
def z_value(X,mu,sem):
    z = (X - mu)/sem
    if (z<0):
        alpha = stats.norm.cdf(z)
    else:
        alpha = 1 - stats.norm.cdf(z)
    return alpha

In [22]:
x= 48.5
mu = 50
sem = 0.79

In [23]:
l1 = z_value(x,mu,sem)

In [24]:
l2 = z_value(51.5,mu,sem)

In [25]:
l1+l2

0.05759943549430551

This implies that 5.7% of the all random samples would lead to rejection of the Null Hypothesis Ho: mu = 50.

#### Type 2nd Error (Beta value)

Calculating the probability of Type II Error:

In [26]:
def type_2(mu1,mu2,sigma,n,alpha):
    z = stats.norm.ppf(alpha)
    xbar = mu1 + (z * sigma/np.sqrt(n))
    z2 = (xbar - mu2)/(sigma/np.sqrt(n))
    if(mu1>mu2):
        beta = 1 - stats.norm.cdf(z2)
    else:
        beta = stats.norm.cdf(z2)
    return beta

In [27]:
type_2(8.3,7.4,3.1,60,0.05)

0.27292999450730004

Problem: The Quantity of interest is the difference in mean drying times:
            mu1 - mu2 = 0
         Ho: mu1 - mu2 = 0
         H1: mu1- mu2 > 0
         
         at alpha = 0.05. test it!

In [28]:
import math

In [29]:
def z_and_p(x1, x2, sigma1, sigma2, n1, n2):
    z = (x1 - x2)/math.sqrt(((sigma1**2/n1)+(sigma2**2/n2)))
    if z<0:
        p = stats.norm.cdf(z)
    else:
        p = 1 - stats.norm.cdf(z)
    return (z,p)

In [30]:
z_and_p(121,112,8,8,10,10)

(2.5155764746872635, 0.00594189462107364)

sigma1 and sigma2 are unknown, assumed equal: T_test

In [31]:
b = [89.19,90.95,90.46,93.21,97.19,97.04,91.07,92.75]

a = [91.5,94.18,92.18,95.39,91.79,89.07,94.72,89.21]

In [32]:
stats.ttest_ind(a,b,equal_var = True) #for two independent samples

Ttest_indResult(statistic=-0.3535908643461798, pvalue=0.7289136186068217)

In [33]:
stats.t.ppf(0.025,14)

-2.1447866879169277

sigma1 and sigma2 are unknown, assumed unequal: T_test

In [34]:
stats.t.ppf(0.025,13) #alfha = 5%

-2.160368656461013

In [35]:
metro = [3,7,25,10,15,6,12,25,15,7]
rural = [48,44,40,38,33,21,20,12,1,18]

In [36]:
stats.ttest_ind(metro,rural,equal_var = False)

Ttest_indResult(statistic=-2.7669395785560558, pvalue=0.015827284816100885)

from above, Z= -2.7669 < Za= -2.16
we conclude that reject Null Hypothesis.

##### Test for Two Dependent Samples

In [37]:
stats.t.ppf(0.025,8) # 0.025 is half of alpha and n-1

-2.306004135033371

In [38]:
stats.t.ppf(1-0.025,8)

2.3060041350333704

So, Our Acceptance region will  be between -2.306 and 2.306

In [39]:
karl = [1.186,1.151,1.322,1.339,1.200,1.402,1.365,1.537,1.559]
leh = [1.061,0.992,1.063,1.062,1.065,1.178,1.037,1.086,1.052]

In [40]:
stats.ttest_rel(karl,leh)

Ttest_relResult(statistic=6.0819394375848255, pvalue=0.00029529546278604066)

from above t-statistic is out of the interval. So we do not reject Null Hyposthesis.

#### Inference About the Difference between two population proportions

In [41]:
def two_samp_proportion(p1,p2,n1,n2):
    p_pool = ((p1*n1)+(p2*n2))/(n1+n2)
    x = (p_pool*(1 - p_pool)*((1/n1)+(1/n2)))
    s = math.sqrt(x)
    z = (p1-p2)/s
    if (z<0):
        p_val = stats.norm.cdf(z)
    else:
        p_val = 1 - stats.norm.cdf(z)
    return z,p_val*2

In [42]:
two_samp_proportion(0.27,0.19,100,100)

(1.3442056254198995, 0.17888190308175567)

Here p-value  is greater than critical area (alpha) = 0.05

#### F-test for testing variance

In [43]:
stats.f.ppf(q = 1-0.05,dfn = 15,dfd = 10)

2.8450165269958436

In [44]:
stats.f.ppf(q = 0.05, dfn = 15, dfd = 10)

0.3931252536255495

###### Example

In [45]:
x = [3,7,25,10,15,6,12,25,15,7]
y = [48,44,40,38,33,21,20,12,1,18]

In [46]:
import numpy as np

In [47]:
F = np.var(x)/np.var(y)

In [48]:
dfn = len(x) - 1
dfd = len(y) - 1

In [49]:
p_value = stats.f.cdf(F,dfn,dfd)
p_value

0.024680183438910465

#### Determining the sample size

In [50]:
def samplesize(alpha,beta,mu1,mu2,sigma):
    z1 = -1 * stats.norm.ppf(alpha)
    z2 = -1 * stats.norm.ppf(beta)
    n = (((z1+z2)**2) * (sigma**2))/((mu1-mu2)**2)
    return n

In [51]:
samplesize(0.05,0.1,12,12.75,3.2)

155.900083325938