# Hypothesis Testing Assignment using Synthetic Dataset

This notebook performs statistical hypothesis tests including:
- Shapiro-Wilk Test (Normality)
- Levene’s Test (Homogeneity of variance)
- One-Way ANOVA (Parametric mean test)
- Mann-Whitney U Test (Non-parametric test)
- Chi-Square Test (Association between categorical variables)


## Import Libraries and Create synthetic dataset

In [7]:
import pandas as pd
import numpy as np
from scipy.stats import shapiro, levene, f_oneway, mannwhitneyu, chi2_contingency
np.random.seed(42)
group_A = np.random.normal(loc=100, scale=5, size=100)
group_B = np.random.normal(loc=100, scale=5, size=100)
group_C = np.random.normal(loc=100, scale=10, size=100)

df = pd.DataFrame({
    "Score": np.concatenate([group_A, group_B, group_C]),
    "Group": ["A"]*100 + ["B"]*100 + ["C"]*100
})
df["Category"] = pd.cut(df["Score"], bins=[0, 50, 60, 100], labels=["Low", "Medium", "High"])
df.head()

Unnamed: 0,Score,Group,Category
0,102.483571,A,
1,99.308678,A,High
2,103.238443,A,
3,107.615149,A,
4,98.829233,A,High


## Shapiro-Wilk Test for Normality

In [2]:
for group in ["A", "B", "C"]:
    stat, p = shapiro(df[df["Group"] == group]["Score"])
    print(f"Group {group} Shapiro-Wilk p-value: {p:.4f}")
    if p > 0.05:
        print(f"✅ Group {group} data is normally distributed.\n")
    else:
        print(f"❌ Group {group} data is NOT normally distributed.\n")

Group A Shapiro-Wilk p-value: 0.6552
✅ Group A data is normally distributed.

Group B Shapiro-Wilk p-value: 0.0853
✅ Group B data is normally distributed.

Group C Shapiro-Wilk p-value: 0.3551
✅ Group C data is normally distributed.



## Levene’s Test for Homogeneity of Variance

In [3]:
stat, p = levene(df[df["Group"] == "A"]["Score"],
                 df[df["Group"] == "B"]["Score"],
                 df[df["Group"] == "C"]["Score"])
print(f"Levene’s Test p-value: {p:.4f}")
if p > 0.05:
    print("✅ Variances are equal across groups.")
else:
    print("❌ Variances are NOT equal across groups.")

Levene’s Test p-value: 0.0000
❌ Variances are NOT equal across groups.


## One-Way ANOVA

In [8]:
stat, p = f_oneway(df[df["Group"] == "A"]["Score"],
                   df[df["Group"] == "B"]["Score"],
                   df[df["Group"] == "C"]["Score"])
print(f"ANOVA p-value: {p:.4f}")
if p < 0.05:
    print("❗ There is a significant difference in means among groups.")
else:
    print("✅ No significant difference in means among groups.")

ANOVA p-value: 0.5294
✅ No significant difference in means among groups.


## Mann-Whitney U Test (Group A vs B)

In [9]:
stat, p = mannwhitneyu(df[df["Group"] == "A"]["Score"],
                       df[df["Group"] == "B"]["Score"],
                       alternative='two-sided')
print(f"Mann-Whitney U Test p-value: {p:.4f}")
if p < 0.05:
    print("❗ Significant difference between Group A and B (non-parametric).")
else:
    print("✅ No significant difference between Group A and B.")

Mann-Whitney U Test p-value: 0.5181
✅ No significant difference between Group A and B.


## Chi-Square Test for Independence (Group vs Category)

In [6]:
contingency = pd.crosstab(df["Group"], df["Category"])
stat, p, dof, expected = chi2_contingency(contingency)
print(f"Chi-Square Test p-value: {p:.4f}")
if p < 0.05:
    print("❗ There is a relationship between Group and Category.")
else:
    print("✅ Group and Category are independent.")

Chi-Square Test p-value: 1.0000
✅ Group and Category are independent.
