# Hypothesis Testing

## 1. Z - Test

Given

(Data: 72, 75, 68, 74, 71, 73; n = 6; population œÉ = 10)

We want to check whether the teacher‚Äôs class mean score differs from the historical mean of 70.

Null hypothesis (H‚ÇÄ):

ùêª0 : ùúá = 70

Alternate hypothesis (H‚ÇÅ):

ùêª1 :ùúá ‚â† 70

(two-tailed test)

In [28]:
import numpy as np
from scipy.stats import norm
import pandas as pd

In [2]:
# Creating Dataset
data = np.array([72,75,68,74,71,73])

In [3]:
data

array([72, 75, 68, 74, 71, 73])

In [18]:
mu_Z = data.mean().round(2)
n = len(data)

In [21]:
print('mean of the sample data is' , mu_Z)
print('no. of points in the data is' , n)

mean of the sample data is 72.17
no. of points in the data is 6


In [23]:
# since population std dev is known as 10

SE = 10/(np.sqrt(n))
print('standard error of the data is' , SE)

standard error of the data is 4.08248290463863


In [32]:
Z = abs(((70-mu_Z)/SE))
print('Z score of the data is' , Z)

Z score of the data is 0.5315392741839501


In [34]:
#lets calculate the p value using scipy.stats for two tailed test
p_two = 2 * norm.sf(Z) 

In [35]:
print('the p value is ', p_two)

the p value is  0.5950451330775138


*We do not have enough statistical evidence to say the true mean is different from 70.*

*The observed sample mean (72.17) is not significantly different from 70.*

*The data is consistent with a population whose mean is 70.*

scipy.stats doesn‚Äôt have ztest

Because:

Z-test is mathematically simple.

SciPy gives the tools (normal CDF), not the wrapper.

Z-tests are less common in real data (population œÉ rarely known).

Statsmodels provides ztest, so SciPy didn‚Äôt duplicate it.

## 2. one sample t test

Data: 72,75,68,74,71,73

n=6

**Hypothesis**

We test whether the population mean equals 70.

Null hypothesis: 
ùêª0: ùúá =70

Alternative (two-tailed): 
ùêª1: ùúá ‚â† 70
(Also will show one-tailed result for 
ùêª1 : ùúá > 70

Significance level: ùõº = 0.05

In [38]:
import numpy as np
from scipy import stats

data = np.array([72,75,68,74,71,73])

t_stat, p_two = stats.ttest_1samp(data, popmean=70.0)
print("t-stat:", t_stat)
print("Two-tailed p:", p_two)

# scipy will always give a two tail p value

# Convert to one-tailed (H1: mean > 70)
if t_stat > 0:
    p_one = p_two / 2
else:
    p_one = 1 - p_two / 2

print("One-tailed p (mean > 70):", p_one)


t-stat: 2.1371868349696497
Two-tailed p: 0.08562160527971222
One-tailed p (mean > 70): 0.04281080263985611


Because the one-tailed p-value (0.0428) is less than 0.05,

we reject the null hypothesis and conclude that there is statistically significant evidence, at the 5% level, that the true mean is greater than 70.

## 3. two sample independent t test

We‚Äôll compare two independent groups:

Group A (n‚ÇÅ = 10): 78, 82, 85, 80, 79, 81, 77, 84, 83, 86

Group B (n‚ÇÇ = 8): 74, 75, 72, 70, 73, 71, 69, 76

We‚Äôll test whether the two population means differ (two-tailed). Use Welch‚Äôs t-test (does not assume equal variances ‚Äî safer default).

In [39]:
from scipy import stats

A = [78,82,85,80,79,81,77,84,83,86]
B = [74,75,72,70,73,71,69,76]

t_stat, p_two = stats.ttest_ind(A, B, equal_var=False)  # Welch's t-test
print("t =", t_stat, "two-tailed p =", p_two)

t = 6.971370023173351 two-tailed p = 3.161333356771593e-06


Since  p‚â™0.05, reject  ùêª0

Conclusion : There is very strong statistical evidence that the mean of Group A is different from the mean of Group B. Numerically, Group A mean = 81.5 is significantly higher than Group B mean = 72.5.

## 3. Paired t Test

We test whether the treatment changed scores (after ‚àí before).

Null hypothesis ùêª0: mean difference Œºd = 0.

Two-tailed alternate ùêª1 :ùúáùëë ‚â† 0


In [43]:
import numpy as np
from scipy import stats

before = np.array([65,68,64,70,66,67])
after  = np.array([68,70,67,72,69,68])

t_stat, p_two = stats.ttest_rel(after, before)   # paired t-test
print("t-stat:", t_stat)
print("two-tailed p:", p_two)

# If you want one-tailed p (H1: mean difference > 0) convert:
if t_stat > 0:
    p_one = p_two / 2
else:
    p_one = 1 - p_two / 2
print("one-tailed p (mean difference > 0):", p_one)


t-stat: 7.000000000000001
two-tailed p: 0.0009167475143984045
one-tailed p (mean difference > 0): 0.00045837375719920225


Two-tailed: ùëù two ‚âà 0.0009167 < 0.05 ‚Üí Reject ùêª0

The after scores are significantly higher than the before scores. The average increase (‚âà 2.33 points) is statistically significant at the 5% level.

## 4. ANOVA

Test whether the three group population means are equal.

Hypotheses ùêª0: ùúáùê¥ = ùúáùêµ =ùúáùê∂ (all group means equal)

ùêª1: at least one group mean differs

Significance level:  Œ±=0.05.

Data (small, easy arithmetic)

Group A (n=4): 10, 12, 11, 13
Group B (n=4): 14, 15, 13, 16
Group C (n=4): 20, 18, 19, 21

Total observations  ùëÅ = 12

Number of groups  k=3.

In [52]:
from scipy.stats import f_oneway

A = [10,12,11,13]
B = [14,15,13,16]
C = [20,18,19,21]

F, p = f_oneway(A, B, C)
print("F =", F)
print("p =", p)

F = 39.199999999999925
p = 3.608193714412663e-05


### 4.1. eta 2 test - effect size of ANOVA

In [53]:
# ETA-SQUARED (eta^2) from one-way ANOVA (statsmodels)
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data (replace with your own)
A = [10,12,11,13]
B = [14,15,13,16]
C = [20,18,19,21]
data = np.array(A + B + C)
groups = ['A']*len(A) + ['B']*len(B) + ['C']*len(C)
df = pd.DataFrame({'score': data, 'group': groups})

# Fit ANOVA model (using formula)
model = ols('score ~ C(group)', data=df).fit()

# ANOVA table (type II)
anova_table = sm.stats.anova_lm(model, typ=2)

# Extract sums of squares
ss_between = anova_table.loc['C(group)', 'sum_sq']   # SSB
ss_within  = anova_table.loc['Residual', 'sum_sq']  # SSW
ss_total   = ss_between + ss_within                 # SST

# Compute eta-squared
eta2 = ss_between / ss_total

# Print results
print(f"SS_between = {ss_between:.6f}")
print(f"SS_within  = {ss_within:.6f}")
print(f"SSTotal    = {ss_total:.6f}")
print(f"eta-squared = {eta2:.6f}")

PatsyError: Error evaluating factor: TypeError: 'list' object is not callable
    score ~ C(group)
            ^^^^^^^^

In [54]:
import numpy as np
import pandas as pd
from statsmodels.formula.api import ols
import statsmodels.api as sm

# Data (replace with your real data)
A = [10,12,11,13]
B = [14,15,13,16]
C_group = [20,18,19,21]   # avoid naming a variable 'C'

data = np.array(A + B + C_group)
groups = ['A']*len(A) + ['B']*len(B) + ['C']*len(C_group)
df = pd.DataFrame({'score': data, 'group': groups})

# Ensure group is categorical
df['group'] = df['group'].astype('category')

# Fit ANOVA model - use 'group' (no C() call)
model = ols('score ~ group', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


              sum_sq   df     F    PR(>F)
group     130.666667  2.0  39.2  0.000036
Residual   15.000000  9.0   NaN       NaN


## 5. Chi Square 

         Science  Commerce  Arts   Row sums

Male       20       15       5       40

Female     10       18      12       40

Col sums   30       33      17       80


STEP 0 ‚Äî Hypotheses

Null (H‚ÇÄ): Gender and Stream are independent (no association).

Alternative (H‚ÇÅ): Gender and Stream are associated (not independent).

In [50]:
import numpy as np
from scipy import stats
import math

# observed table
obs = np.array([[20,15,5],
                [10,18,12]])

chi2, p, dof, expected = stats.chi2_contingency(obs)

# Cramer's V
n = obs.sum()
k = min(obs.shape)   # min(rows, cols)
cramers_v = math.sqrt(chi2 / (n * (k - 1)))

print("chi2 =", round(chi2, 4))
print("p =", round(p, 6))
print("dof =", dof)
print("expected counts:\n", expected)
print("Cramer's V =", round(cramers_v, 4))


chi2 = 6.4884
p = 0.038999
dof = 2
expected counts:
 [[15.  16.5  8.5]
 [15.  16.5  8.5]]
Cramer's V = 0.2848


At Œ± = 0.05:

p‚âà0.0390<0.05 ‚Üí Reject H‚ÇÄ.

Report wording (correct):

The chi-square test of independence yields ùúí2(2)=6.49, p‚âà0.039. We reject the null hypothesis of independence; there is statistical evidence of an association between Gender and Stream. Cram√©r‚Äôs V ‚âà 0.285 indicates a moderate association.