# PlantGrowth

## 1. Introduction

In this project, we analyse the **PlantGrowth** dataset. It contains plant weights for three groups:

- **ctrl** (control)  
- **trt1** (treatment 1)  
- **trt2** (treatment 2)

We want to:
1. Describe the dataset
2. Explain t-tests and perform a t-test between `trt1` and `trt2`
3. Explain and perform ANOVA across `ctrl`, `trt1`, and `trt2`
4. Justify why ANOVA is more appropriate than multiple t-tests for three or more groups

We will document each step in Markdown cells and keep the code organised.

---


In [51]:
# 2. Data Import & Setup

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import ttest_ind, f_oneway, shapiro, levene
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Path to data
url = "https://raw.githubusercontent.com/nf-me/8651-applied-statistics/refs/heads/main/data/PlantGrowth.csv"
df = pd.read_csv(url)

# Group is categorical
df['group'] = df['group'].astype('category')

# review the data
display(df.head())

# Basic descriptive stats
print("\nBasic Summary by Group:")
print(df.groupby('group')['weight'].describe().round(3))


Unnamed: 0,rownames,weight,group
0,1,4.17,ctrl
1,2,5.58,ctrl
2,3,5.18,ctrl
3,4,6.11,ctrl
4,5,4.5,ctrl



Basic Summary by Group:
       count   mean    std   min    25%    50%    75%   max
group                                                      
ctrl    10.0  5.032  0.583  4.17  4.550  5.155  5.292  6.11
trt1    10.0  4.661  0.794  3.59  4.208  4.550  4.870  6.03
trt2    10.0  5.526  0.443  4.92  5.268  5.435  5.735  6.31


## Description of Dataset

Each row represents a single plant's weight and the treatment group it belongs to.  
- **Observations**: Typically 30 (10 plants per group).  
- **Variables**:  
  - `group`: Categorical (ctrl, trt1, trt2)  
  - `weight`: Numeric, representing plant weight.

---
