# PlantGrowth Project

The PlantGrowth dataset consists of 30 observations of plant weights under three different treatment groups:

Variables:

* rownames: The row number
* weight: The weight of the plant (in grams).
* group: Treatment group

There are 10 plants in each group: 10 in ctrl (control), 10 in trt1 (treatment 1), and 10 in trt2 (treatment 2).

Now, let's import the dataset and take a look at the data!

In [None]:
import pandas as pd

plant_data = pd.read_csv('data_files/plantgrowth.csv')

plant_data.head()


Unnamed: 0,rownames,weight,group
0,1,4.17,ctrl
1,2,5.58,ctrl
2,3,5.18,ctrl
3,4,6.11,ctrl
4,5,4.5,ctrl


Now let's also look at some stats of the data.

In [11]:
plant_data.describe()

Unnamed: 0,rownames,weight
count,30.0,30.0
mean,15.5,5.073
std,8.803408,0.701192
min,1.0,3.59
25%,8.25,4.55
50%,15.5,5.155
75%,22.75,5.53
max,30.0,6.31


## T-Test

A t-test is a statistical test used to compare the means of two groups and determine if their difference is statistically significant.  The t-test calculates a t-statistic, which is the difference between the group means divided by the variability in the data. There are some assumptions when using t-tests:

* Normality: The data (or differences in paired data) should be approximately normally distributed.
* Independence: Observations in each group must be independent (for independent t-tests).
* Equal Variance: For independent t-tests, the groups should have roughly equal variances.




Let's perform a t-test on the PlantGrowth data to determine whether there is a significant difference between the two treatment groups trt1 and trt2.

In [12]:
import scipy.stats as stats

df = pd.DataFrame(plant_data)

# find the weights for trt1 and trt2
trt1_weights = df[df['group'] == 'trt1']['weight']
trt2_weights = df[df['group'] == 'trt2']['weight']

# t-test between trt1 and trt2
t_statistic, p_value = stats.ttest_ind(trt1_weights, trt2_weights)

print(f"t-statistic: {t_statistic}")
print(f"p-value: {p_value}")

t-statistic: -3.0100985421243616
p-value: 0.0075184261182198574


The t-statistic: -3.01

This indicates that the mean plant weight in trt1 is 3.01 standard errors lower than in trt2.

p-value: 0.0075

Since the p-value is less than 0.05, we reject the null hypothesis and conclude there is a significant difference between the two groups.

Conclusion:

There is a statistically significant difference between trt1 and trt2, with trt1 having a lower mean plant weight than trt2.

## ANOVA

Let's perform ANOVA to determine whether there is a significant difference between the three treatment groups

In [18]:
from scipy import stats

# group the weights of each plant type
ctrl = plant_data[plant_data['group'] == 'ctrl']['weight']
trt1 = plant_data[plant_data['group'] == 'trt1']['weight']
trt2 = plant_data[plant_data['group'] == 'trt2']['weight']

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(ctrl, trt1, trt2)

# Output results
print(f"f-statistic: {f_statistic}")
print(f"p-value: {p_value}")


f-statistic: 4.846087862380136
p-value: 0.0159099583256229


Let's analyse the results.

F-statistic: 4.8461

The f-statistic is a ratio of the variance between the group means to the variance within the groups. 

A higher f-statistic generally suggests a larger difference between the group means relative to the variability within the groups.
The F-statistic of 4.8461 suggests that there is some evidence that the group means differ, but we need to look at the p-value to confirm if this difference is statistically significant.

The p-value is approx 0.016.

Since the p-value is less than 0.05, we reject the null hypothesis and conclude there is a significant difference between the two groups.

ANOVA is more appropriate than using several t-tests when analyzing more than 2 groups.

The reasons for this is as follows:

Type I error control: ANOVA controls the overall Type I error rate when comparing more than two groups, while multiple t-tests increase the likelihood of Type I errors as more tests are performed.

Efficiency: ANOVA allows for the simultaneous comparison of multiple groups in one test, whereas multiple t-tests become increasingly inefficient as the number of groups increases.

Comprehensive analysis: ANOVA tests for overall differences between all groups at once, while multiple t-tests only compare pairs of groups.

Post-hoc tests: After ANOVA, post-hoc tests can identify which specific groups differ, and these tests are designed to adjust for multiple comparisons, unlike t-tests which don't provide this built-in correction.