
## Step 1: Import Required Libraries

We need several libraries to handle data, perform statistical tests, and visualize results.

```python
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import shapiro, levene, bartlett, f_oneway
from statsmodels.stats.multicomp import pairwise_tukeyhsd
```

### **Breaking Down Each Import:**
1. **NumPy (`numpy`)**:
   - A numerical computing library used for **array operations and mathematical functions**.
   - Here, it helps in **handling numerical data** and performing matrix calculations.

2. **Pandas (`pandas`)**:
   - A powerful library for **handling and manipulating structured data**.
   - We use Pandas **to create and transform datasets**.

3. **Statsmodels (`statsmodels.api`)**:
   - A library used for **statistical modeling and hypothesis testing**.
   - We use it to **perform ANOVA** and **build regression models**.

4. **OLS from `statsmodels.formula.api`**:
   - **OLS (Ordinary Least Squares)** is used to **fit a linear model**.
   - In ANOVA, we use it to **compare mean differences among treatment groups**.

5. **Matplotlib (`matplotlib.pyplot`)**:
   - A fundamental library for **plotting graphs**.
   - Used for **customizing and displaying visualizations**.

6. **Seaborn (`seaborn`)**:
   - A higher-level visualization library **built on Matplotlib**.
   - Helps in **creating aesthetically appealing statistical plots** (boxplots, histograms).

7. **Scipy Stats (`scipy.stats`)**:
   - A module containing **various statistical tests**.
   - Used for **Shapiro-Wilk test (normality), Levene’s test (variance), Bartlett’s test, and One-Way ANOVA**.

8. **Pairwise Tukey HSD (`statsmodels.stats.multicomp.pairwise_tukeyhsd`)**:
   - Used for **post-hoc multiple comparison testing**.
   - Helps in identifying **which specific groups differ** after ANOVA.


In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols



## Step 2: Creating the Dataset

We define a dataset representing **four treatment groups**:

```python
data = {
    'Treatment 1': [44, 30, 27, 48, 42, 19, 15, 34, 20, 47],
    'Treatment 2': [28, 38, 35, 41, 25, 16, 46, 19, 17, 33],
    'Treatment 3': [49, 36, 23, 29, 11, 39, 18, 12, 45, 26],
    'Treatment 4': [14, 43, 32, 50, 37, 22, 13, 21, 31, 40]
}
df = pd.DataFrame(data)
```

### **Breaking Down Each Line:**
1. **Dictionary `data`**:
   - Defines **four treatment groups**, each containing **10 values**.
   - Each treatment represents **a different experimental condition**.

2. **Creating a DataFrame (`pd.DataFrame(data)`)**:
   - Converts the dictionary into a **structured tabular format**.
   - **Rows** represent **individual observations**.
   - **Columns** represent **different treatment groups**.

### **Why is this Important?**
- This dataset mimics **real-world experiments** where different treatments affect the observed values.
- It serves as the basis for **ANOVA analysis** to test **whether group means differ significantly**.


In [None]:
data = {'Treatment 1':[44, 30, 27, 48, 42, 19, 15, 34, 20, 47],
        'Treatment 2':[28, 38, 35, 41, 25, 16, 46, 19, 17, 33],
        'Treatment 3':[49, 36, 23, 29, 11, 39, 18, 12, 45, 26],
        'Treatment 4':[14, 43, 32, 50, 37, 22, 13, 21, 31, 40]}


## Step 3: Transforming Data for ANOVA

ANOVA requires **long format data** (each row represents one measurement and its treatment).

```python
data_melt = df.melt(var_name='Treatment', value_name='Value')
```

### **Breaking Down the Code:**
1. **`df.melt(var_name='Treatment', value_name='Value')`**:
   - Reshapes the dataset from **wide format** (columns = treatments) to **long format**.
   - **`var_name='Treatment'`** → Creates a new column called `"Treatment"` listing group names.
   - **`value_name='Value'`** → Creates a `"Value"` column containing the numerical data.

### **Example Before and After:**

#### **Before (Wide Format)**
|   | Treatment 1 | Treatment 2 | Treatment 3 | Treatment 4 |
|---|------------|------------|------------|------------|
| 0 | 44         | 28         | 49         | 14         |
| 1 | 30         | 38         | 36         | 43         |

#### **After (Long Format)**
| Treatment   | Value |
|------------|-------|
| Treatment 1 | 44    |
| Treatment 1 | 30    |
| Treatment 2 | 28    |
| Treatment 2 | 38    |

### **Why is this Important?**
- ANOVA in **Statsmodels** requires a **categorical column** (Treatment) and a **numeric column** (Value).
- This transformation allows **ANOVA to analyze data properly**.


In [None]:
data_melt = df.melt(var_name = 'Treatment', value_name = 'Value')
data_melt


## Step 4: Performing One-Way ANOVA

```python
model = ols('Value ~ C(Treatment)', data=data_melt).fit()
anova_table = sm.stats.anova_lm(model)
```

### **Breaking Down the Code:**
1. **`ols('Value ~ C(Treatment)', data=data_melt).fit()`**:
   - **Fits an Ordinary Least Squares (OLS) model** to compare treatment group means.
   - `Value ~ C(Treatment)`: Formula specifying **dependent and independent variables**.
     - `Value` = Dependent Variable (**numeric response variable**).
     - `C(Treatment)` = Categorical Independent Variable (**treatment groups**).

2. **`sm.stats.anova_lm(model)`**:
   - Performs **ANOVA** using the OLS model.
   - Outputs **Sum of Squares (SS), F-statistic, p-value**.

### **Interpreting the Results:**
- **Sum of Squares (SS)** → Measures total variability in data.
- **F-statistic** → Compares variance between and within groups.
- **p-value** → Determines statistical significance.
  - If **p < 0.05**, at least one treatment differs **significantly**.

### **Why is this Important?**
- This test tells us **whether at least one treatment group differs**.
- However, it does **not specify which groups are different** (for that, we use **Tukey’s HSD**).


In [None]:
r_squared = model.rsquared
print('R-Squared:', round(r_squared*100, 2))


## Step 5: Checking Assumptions of ANOVA

Before running ANOVA, we must check **two key assumptions**:

1. **Normality Assumption** (Shapiro-Wilk Test)  
   - Determines whether the data is **normally distributed**.
   - Null Hypothesis (H₀): Data **follows** a normal distribution.
   - If **p < 0.05**, reject H₀ → Data **does not** follow a normal distribution.

2. **Equal Variance (Homoscedasticity) Assumption**  
   - **Levene’s Test**: Checks if variance is equal across groups.
   - **Bartlett’s Test**: Similar but **more sensitive** to non-normal data.
   - Null Hypothesis (H₀): Groups **have equal variance**.
   - If **p < 0.05**, reject H₀ → Variances **are not equal**, violating ANOVA assumptions.


In [None]:

import scipy.stats as stats

# Normality Check using Shapiro-Wilk Test
shapiro_results = {treatment: shapiro(df[treatment]) for treatment in df.columns}

print("Shapiro-Wilk Normality Test Results:")
for treatment, result in shapiro_results.items():
    print(f"{treatment}: Statistic={result.statistic:.4f}, p-value={result.pvalue:.4f}")

# Homogeneity of Variance Tests
levene_stat, levene_p = levene(*[df[col] for col in df.columns])
bartlett_stat, bartlett_p = bartlett(*[df[col] for col in df.columns])

print("\nLevene’s Test (Equal Variance Assumption):")
print(f"Statistic={levene_stat:.4f}, p-value={levene_p:.4f}")

print("\nBartlett’s Test (Equal Variance Assumption):")
print(f"Statistic={bartlett_stat:.4f}, p-value={bartlett_p:.4f}")

# Visualizing Distributions with Histograms
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
fig.suptitle("Distribution of Treatment Groups")

for i, col in enumerate(df.columns):
    sns.histplot(df[col], bins=10, kde=True, ax=axes[i//2, i%2])
    axes[i//2, i%2].set_title(f"{col} Distribution")

plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()



## Step 6: Effect Size Calculation (Partial Eta Squared)

Even if ANOVA shows a significant difference, we need to **quantify** how strong the effect is.

### **Partial Eta Squared (η²) Formula:**
\[ \eta^2 = \frac{SS_{Between}}{SS_{Total}} \]

Where:
- **SS_{Between}**: Sum of Squares between groups (treatment effect).
- **SS_{Total}**: Total Sum of Squares (overall variance in data).

### **Interpretation of η² Values:**
- **η² < 0.01** → Small effect  
- **0.01 ≤ η² < 0.06** → Moderate effect  
- **η² ≥ 0.06** → Large effect  

A larger η² means that more of the variance is **explained** by the treatment.


In [None]:
from scipy.stats import f_oneway

Treatment_1 = [44, 30, 27, 48, 42, 19, 15, 34, 20, 47]
Treatment_2 = [28, 38, 35, 41, 25, 16, 46, 19, 17, 33]
Treatment_3 = [49, 36, 23, 29, 11, 39, 18, 12, 45, 26]
Treatment_4 = [14, 43, 32, 50, 37, 22, 13, 21, 31, 40]

f_stats, p_value = f_oneway(Treatment_1, Treatment_2, Treatment_3, Treatment_4)

print(f_stats, p_value)


## Step 7: Post-hoc Analysis (Tukey’s HSD Test)

ANOVA tells us **if there is a difference**, but **not where**.  
We use **Tukey’s Honest Significant Difference (HSD) Test** to compare groups **pairwise**.

### **How Tukey’s HSD Works:**
- Compares **every group against every other group**.
- Adjusts for **multiple comparisons** to **reduce false positives**.
- Outputs **which groups differ significantly**.

### **Interpreting Tukey’s Results:**
- **p < 0.05** → Groups **differ significantly**.
- **Confidence Intervals** that **don’t overlap zero** suggest real differences.



## Step 8: Visualizing ANOVA Results with Boxplots

A **boxplot** helps visualize treatment group distributions.  
If ANOVA finds significant differences, we expect **non-overlapping distributions**.

### **Breaking Down the Code:**

```python
plt.figure(figsize=(10,6))
sns.boxplot(x='Treatment', y='Value', data=data_melt, palette="Set2")
plt.title("Treatment-wise Distribution with ANOVA", fontsize=14)
plt.xlabel("Treatment Groups", fontsize=12)
plt.ylabel("Values", fontsize=12)
plt.show()
```

- **`plt.figure(figsize=(10,6))`** → Creates a **10x6-inch** figure.  
- **`sns.boxplot(x='Treatment', y='Value', data=data_melt, palette="Set2")`**  
  - Plots a **boxplot** comparing treatment groups.  
  - **`x='Treatment'`** → Groups on the x-axis.  
  - **`y='Value'`** → Numeric values on the y-axis.  
  - **`palette="Set2"`** → Uses a predefined color scheme.  
- **Adding Title & Labels** → Ensures readability.  
- **`plt.show()`** → Displays the plot.

### **How to Interpret the Boxplot?**
- **Non-overlapping boxes** → Suggest significant differences.  
- **Overlapping boxes** → Suggest groups may **not** differ significantly.  


In [None]:

# Step 9: Final Boxplot Visualization

# Creating a boxplot to compare treatment distributions
plt.figure(figsize=(10,6))
sns.boxplot(x='Treatment', y='Value', data=data_melt, palette="Set2")

# Title and labels
plt.title("Treatment-wise Distribution with ANOVA", fontsize=14)
plt.xlabel("Treatment Groups", fontsize=12)
plt.ylabel("Values", fontsize=12)

# Display the plot
plt.show()
