## ANOVA - Analysis of Variance

1. In the below code, we first import the necessary libraries, including NumPy for numerical operations and SciPy for statistical tests.
2. We generate data for three groups (`group1_data`, `group2_data`, `group3_data`) using the `np.random.normal()` function to 3.represent the observations in each group.
3. We then perform a one-way ANOVA using the `stats.f_oneway()` function from SciPy. This test compares the means of multiple groups to determine if there is a significant difference between them. It returns the F-statistic and the p-value.
4. Next, we set the significance level (`alpha`) to 0.05, which is commonly used in hypothesis testing. This represents the threshold for determining statistical significance.
5. We check the p-value against the significance level. If the p-value is less than alpha, we reject the null hypothesis; otherwise, we fail to reject the null hypothesis.
6. Finally, we display the F-statistic and p-value to provide additional information about the test results.

ANOVA allows us to determine if there are any significant differences in means between multiple groups. By running this code, you can see the result of the one-way ANOVA and determine if there is a significant difference in means between the three groups.

- Example 1: ANOVA with equal group means
- Example 2: ANOVA with unequal group means
- Example 3: ANOVA with unequal group variances
- Example 4: ANOVA with unequal group sizes
- Example 5: ANOVA with missing values (NaNs)
- Example 6: Two-way ANOVA with interaction effect
- Example 7: Two-way ANOVA without interaction effect
- Example 8: One-way repeated measures ANOVA
- Example 9: ANOVA with non-normal data
- Example 10: ANOVA with non-parametric alternative test (Kruskal-Wallis test)

Each example demonstrates a specific variation of ANOVA and checks the obtained p-value against the significance level to determine whether to reject or fail to reject the null hypothesis. The code also displays the F-statistic (or Kruskal-Wallis statistic) and the p-value for each test.

In [19]:
# Importing the required libraries
import numpy as np
import scipy.stats as stats

In [20]:
# Setting the significance level (alpha)
alpha = 0.05

Example 1: ANOVA with equal group means

In this example, we generate three groups of data (`group1_data`, `group2_data`, `group3_data`) with equal means but different variances. We perform a one-way ANOVA using `stats.f_oneway()` to test if there is a significant difference between the means of the groups. We check the p-value against the significance level (`alpha`) and display the F-statistic and p-value.

In [21]:
# Example 1: ANOVA with equal group means
np.random.seed(42)

group1_data = np.random.normal(loc=10, scale=2, size=50)
group2_data = np.random.normal(loc=10, scale=2, size=50)
group3_data = np.random.normal(loc=10, scale=2, size=50)

f_statistic, p_value = stats.f_oneway(group1_data, group2_data, group3_data)

if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

print(f"F-statistic: {f_statistic}")
print(f"p-value: {p_value}")

Fail to reject the null hypothesis
F-statistic: 0.9100682232514057
p-value: 0.4047522508412271


Example 2: ANOVA with unequal group means

This example is similar to the previous one, but now the means of the three groups (`group1_data`, `group2_data`, `group3_data`) are different. We perform a one-way ANOVA and interpret the results by comparing the p-value with the significance level.

In [22]:
# Example 2: ANOVA with unequal group means
np.random.seed(42)

group1_data = np.random.normal(loc=10, scale=2, size=50)
group2_data = np.random.normal(loc=12, scale=2, size=50)
group3_data = np.random.normal(loc=15, scale=2, size=50)

f_statistic, p_value = stats.f_oneway(group1_data, group2_data, group3_data)

if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

print(f"F-statistic: {f_statistic}")
print(f"p-value: {p_value}")

Reject the null hypothesis
F-statistic: 101.63413536241711
p-value: 1.9247105735805746e-28


Example 3: ANOVA with unequal group variances

In this example, the three groups (`group1_data`, `group2_data`, `group3_data`) have different variances but the same mean. We perform a one-way ANOVA to test if there is a significant difference in means between the groups, considering the variation within each group. The results are interpreted by comparing the p-value with the significance level.

In [23]:
# Example 3: ANOVA with unequal group variances
np.random.seed(42)

group1_data = np.random.normal(loc=10, scale=2, size=50)
group2_data = np.random.normal(loc=12, scale=3, size=50)
group3_data = np.random.normal(loc=15, scale=4, size=50)

f_statistic, p_value = stats.f_oneway(group1_data, group2_data, group3_data)

if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

print(f"F-statistic: {f_statistic}")
print(f"p-value: {p_value}")

Reject the null hypothesis
F-statistic: 39.157552091059785
p-value: 2.3320433225368608e-14


Example 4: ANOVA with unequal group sizes

Here, the three groups (`group1_data`, `group2_data`, `group3_data`) have different sizes but the same mean and variance. We conduct a one-way ANOVA to examine if there are significant differences in means between the groups. The obtained p-value is compared with the significance level to make a decision.

In [24]:
# Example 4: ANOVA with unequal group sizes
np.random.seed(42)

group1_data = np.random.normal(loc=10, scale=2, size=30)
group2_data = np.random.normal(loc=12, scale=2, size=40)
group3_data = np.random.normal(loc=15, scale=2, size=50)

f_statistic, p_value = stats.f_oneway(group1_data, group2_data, group3_data)

if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

print(f"F-statistic: {f_statistic}")
print(f"p-value: {p_value}")

Reject the null hypothesis
F-statistic: 83.51051012874761
p-value: 2.9367680540941276e-23


Example 5: ANOVA with missing values (NaNs)

This example involves missing values (NaNs) in one of the groups (`group2_data`). We perform a one-way ANOVA using `nan_policy='omit'` to handle the missing values properly. The p-value is checked against the significance level to determine the significance of the differences in means between the groups.

In [25]:
# Example 5: ANOVA with missing values (NaNs)
np.random.seed(42)

group1_data = np.random.normal(loc=10, scale=2, size=50)
group2_data = np.random.normal(loc=12, scale=2, size=50)
group3_data = np.random.normal(loc=15, scale=2, size=50)

group2_data[10:20] = np.nan

# Removing NaNs before performing ANOVA
group1_data_clean = group1_data[~np.isnan(group2_data)]
group2_data_clean = group2_data[~np.isnan(group2_data)]
group3_data_clean = group3_data[~np.isnan(group2_data)]

f_statistic, p_value = stats.f_oneway(group1_data_clean, group2_data_clean, group3_data_clean)

if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

print(f"F-statistic: {f_statistic}")
print(f"p-value: {p_value}")

Reject the null hypothesis
F-statistic: 72.55490798037921
p-value: 3.2181087099125278e-21


Example 6: Two-way ANOVA with interaction effect

Here, we have a two-way ANOVA with two independent variables (`factor1` and `factor2`) and three groups (`group1_data`, `group2_data`, `group3_data`). We investigate the effects of both factors on the response variable using a two-way ANOVA. The obtained p-value is compared with the significance level to test the significance of the interaction effect and main effects.

In [26]:
# Example 6: Two-way ANOVA with interaction effect
np.random.seed(42)

group1_data = np.random.normal(loc=10, scale=2, size=50)
group2_data = np.random.normal(loc=12, scale=2, size=50)
group3_data = np.random.normal(loc=15, scale=2, size=50)

factor1 = np.repeat([1, 2, 3], 50)
factor2 = np.tile([1, 2, 3], 50)

f_statistic, p_value = stats.f_oneway(group1_data, group2_data, group3_data, factor1, factor2)

if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

print(f"F-statistic: {f_statistic}")
print(f"p-value: {p_value}")

Reject the null hypothesis
F-statistic: 1702.8260909967366
p-value: 3.739598642733654e-268


Example 7: Two-way ANOVA without interaction effect

This example is similar to the previous one, but the two-way ANOVA does not exhibit an interaction effect. We test the main effects of the two factors (`factor1` and `factor2`) and interpret the results based on the p-value and the significance level.

In [27]:
# Example 7: Two-way ANOVA without interaction effect
np.random.seed(42)

group1_data = np.random.normal(loc=10, scale=2, size=50)
group2_data = np.random.normal(loc=12, scale=2, size=50)
group3_data = np.random.normal(loc=10, scale=2, size=50)

factor1 = np.repeat([1, 2, 3], 50)
factor2 = np.tile([1, 2, 3], 50)

f_statistic, p_value = stats.f_oneway(group1_data, group2_data, group3_data, factor1, factor2)

if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

print(f"F-statistic: {f_statistic}")
print(f"p-value: {p_value}")

Reject the null hypothesis
F-statistic: 1140.2777059557445
p-value: 2.693316556751811e-232


Example 8: One-way repeated measures ANOVA

Here, we demonstrate a one-way repeated measures ANOVA, where the same group of individuals is measured under different conditions (`group1_data`, `group2_data`, `group3_data`). We perform the ANOVA to test if there are significant differences in means between the conditions. The p-value is compared with the significance level to determine the outcome

In [28]:
# Example 8: One-way repeated measures ANOVA
np.random.seed(42)

group1_data = np.random.normal(loc=10, scale=2, size=50)
group2_data = group1_data + np.random.normal(loc=1, scale=0.5, size=50)
group3_data = group1_data + np.random.normal(loc=2, scale=0.8, size=50)

f_statistic, p_value = stats.f_oneway(group1_data, group2_data, group3_data)

if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

print(f"F-statistic: {f_statistic}")
print(f"p-value: {p_value}")

Reject the null hypothesis
F-statistic: 13.08242696690926
p-value: 5.9024513103815574e-06


Example 9: ANOVA with non-normal data

This example involves non-normal data generated using the uniform distribution for all three groups (`group1_data`, `group2_data`, `group3_data`). We conduct a one-way ANOVA to investigate if there are significant differences in means between the groups. The obtained p-value is compared with the significance level to make a decision.

In [29]:
# Example 9: ANOVA with non-normal data
np.random.seed(42)

group1_data = np.random.uniform(low=0, high=1, size=50)
group2_data = np.random.uniform(low=0, high=1, size=50)
group3_data = np.random.uniform(low=0, high=1, size=50)

f_statistic, p_value = stats.f_oneway(group1_data, group2_data, group3_data)

if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

print(f"F-statistic: {f_statistic}")
print(f"p-value: {p_value}")

Fail to reject the null hypothesis
F-statistic: 0.3440180218170918
p-value: 0.7094853477989632


Example 10: ANOVA with non-parametric alternative test

In this example, we use the Kruskal-Wallis test (`stats.kruskal()`) as a non-parametric alternative to ANOVA. The data in the three groups (`group1_data`, `group2_data`, `group3_data`) do not need to follow a normal distribution. We compare the p-value obtained from the Kruskal-Wallis test with the significance level to determine the outcome.

In [30]:
# Example 10: ANOVA with non-parametric alternative test
np.random.seed(42)

group1_data = np.random.normal(loc=10, scale=2, size=50)
group2_data = np.random.normal(loc=12, scale=2, size=50)
group3_data = np.random.normal(loc=15, scale=2, size=50)

f_statistic, p_value = stats.kruskal(group1_data, group2_data, group3_data)

if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

print(f"Kruskal-Wallis statistic: {f_statistic}")
print(f"p-value: {p_value}")

Reject the null hypothesis
Kruskal-Wallis statistic: 91.34433907284767
p-value: 1.4616008731241652e-20
