# Hypothesis and Statistical Testing
---

Import libraries:

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import pingouin as pg

---

## Import cleaned data parquet file

Import the cleaned data that was the output of the 1st notebook

In [2]:
df = pd.read_parquet("../data/mental_health_social_media_dataset_cleaned.parquet")

---
## Hypothesises to test

- Sleep vs Age Group
  - H₀: Mean sleep hours are equal across age groups
  - H₁: At least one group differs
- Stress vs Platform
  - H₀: Mean stress is equal across platforms
  - H₁: At least one platform differs
- Platform × Mental State
  - H₀: No association between platform and mental state
  - H₁: They are associated
- Sleep vs Mental State
  - H₀: Sleep is equal across mental states
  - H₁: At least one mental_state differs
- Screen Time vs Stress
  - Correlation
- Negative Interaction Ratio vs Stress
  - Correlation

---
## Sleep vs Age Group

Hypotheses:

- H₀: Mean sleep hours are equal across age groups
- H₁: At least one group differs

Code:

In [5]:
# Select the relevant columns
data = df[["sleep_hours", "age_group",]]

# Perform one-way ANOVA
anova = pg.anova(data=data, dv="sleep_hours", between="age_group", detailed=True)

# Display the ANOVA results
anova

Unnamed: 0,Source,SS,DF,MS,F,p-unc,np2
0,age_group,1215.360066,5,243.072013,5898.930092,0.0,0.855199
1,Within,205.783356,4994,0.041206,,,


Results:

A one-way ANOVA revealed extremely large differences in sleep variable across age groups.

p < 0.001 says that I can confidently say age groups have very different sleep means.

The effect size was exceptionally high, indicating that age group explained approximately 85.5% of the total variance.

Such a strong effect is far greater than would be expected in real-world behavioural data and likely reflects the synthetic structure of the dataset.

We reject H₀: Mean sleep hours are equal across age groups and accept H₁: At least one group differs.

---
## Stress vs Platform

Hypothesis:
- H₀: Mean stress is equal across platforms
- H₁: At least one platform differs

Code:

In [6]:
# Select the relevant columns
data = df[["stress_level", "platform"]].dropna()

# Perform one-way ANOVA
anova = pg.anova(data=data, dv="stress_level", between="platform", detailed=True)

# Display the ANOVA results
anova

Unnamed: 0,Source,SS,DF,MS,F,p-unc,np2
0,platform,1076.422753,6,179.403792,196.194538,3.4678859999999997e-225,0.190784
1,Within,4565.688447,4993,0.914418,,,


Results:

A one-way ANOVA showed a statistically significant effect of platform on the stress level.

p < 0.001 says that I can confidently say that stress level differs across platforms.

The effect size was large, indicating that platform accounts for about 19% of the total variance.

This represents a substantial difference between platforms, consistent with observations in the EDA.

We can reject H₀: Mean stress is equal across platforms and accept H₁: At least one platform differs.

---
## Platform × Mental State

Hypotheses:
- H₀: No association between platform and mental state
- H₁: They are associated

Code:

In [10]:
# select the relevant columns
ct = pd.crosstab(df["platform"], df["mental_state"])

# Chi-square test
chi2, p, dof, expected = stats.chi2_contingency(ct)
print("Chi-square:", chi2)
print("p-value:", p)
print("dof:", dof)

Chi-square: 466.0026390307361
p-value: 3.765329048527234e-92
dof: 12


Results:

There is overwhelming statistical evidence that platform and mental_state are not independent.

p < 0.001 which indicates that mental_state distribution differs substantially across platforms.

We can reject H₀: No association between platform and mental state and accept H₁: They are associated.

---
## Sleep vs Mental State

Hypotheses:
- H₀: Sleep is equal across mental states
- H₁: At least one mental_state differs

Code:

In [11]:
# Select the relevant columns
data = df[["sleep_hours", "mental_state"]].dropna()

# Perform one-way ANOVA
anova = pg.anova(data=data, dv="sleep_hours", between="mental_state", detailed=True)

# Display the ANOVA results
anova

Unnamed: 0,Source,SS,DF,MS,F,p-unc,np2
0,mental_state,353.511891,2,176.755946,827.298029,4.416012e-311,0.248752
1,Within,1067.631531,4997,0.213654,,,


Results:

A one-way ANOVA showed a significant effect of mental state on sleep duration.

p < 0.001 which says there is essentially zero probability that all groups have equal sleep means.

The effect size was large indicating that mental-state category explains roughly 25% of the total variance in sleep hours.

This suggests substantial differences in sleep patterns across mental-state groups.

We can reject H₀: Sleep is equal across mental states and accept H₁: At least one mental_state differs.