# Independent (Two-Sample) T-Test

The independent (two-sample) T-test is a statistical method used to compare the means of two independent groups to determine if there is a statistically significant difference between them. This test is particularly useful when the data for the two groups are normally distributed and have similar variances.

#### Statistical Formula

The formula for the independent two-sample T-test, assuming equal variances, is:

$$
t = \frac{\bar{X}_1 - \bar{X}_2}{s_p \cdot \sqrt{\frac{2}{n}}}
$$

where:
- $\bar{X}_1$ and $\bar{X}_2$ are the sample means of the two groups,
- $s_p$ is the pooled standard deviation of the two samples,
- $n$ is the sample size (assuming equal sample sizes for simplicity).

The pooled standard deviation is calculated as:

$$
s_p = \sqrt{\frac{s_1^2 + s_2^2}{2}}
$$

where $s_1^2$ and $s_2^2$ are the variances of the two samples.

#### Business Scenario: Comparing Sepal Widths of Iris Setosa and Iris Versicolor

In this scenario, a botanist wants to determine if there is a significant difference in the sepal widths between two species of the Iris flower: Iris Setosa and Iris Versicolor. The botanist uses the Iris dataset, which includes measurements of sepal widths among other features, for this analysis.

#### Python Code for Independent Two-Sample T-Test

In [6]:
import pandas as pd
from scipy import stats

# Load the Iris dataset
url = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
iris = pd.read_csv(url)

# Filter the data for Iris Setosa and Iris Versicolor
setosa = iris[iris['species'] == 'setosa']['sepal_width']
versicolor = iris[iris['species'] == 'versicolor']['sepal_width']

# Perform the independent two-sample T-test
t_stat, p_value = stats.ttest_ind(setosa, versicolor, equal_var=True)

print(f"T-statistic: {round(t_stat, 4)}")
print(f"P-value: {p_value:.2e}")

T-statistic: 9.2828
P-value: 4.36e-15


#### Interpretation

The T-statistic of 9.2828 indicates a significant difference in the means of the sepal widths between Iris Setosa and Iris Versicolor, with this value reflecting the degree to which the groups differ standardized by the variability observed in the samples.

The P-value of 4.36e-15 (which is significantly lower than 0.05) provides very strong evidence against the null hypothesis, which posited that there is no difference in sepal widths between the two species. Since the P-value is much less than the commonly used significance level (Î± = 0.05), we reject the null hypothesis.

**Conclusion**: There is a statistically significant difference in the sepal widths between Iris Setosa and Iris Versicolor. This result suggests that sepal width can be one of the distinguishing features between these two Iris species.

Given the extremely low P-value, the likelihood that the observed difference in means could have occurred by chance is exceedingly small. This underscores the robustness of the conclusion that Iris Setosa and Iris Versicolor differ with respect to their sepal widths, based on the dataset analyzed. This information could be valuable for botanists or researchers interested in the morphological differentiation among Iris species, contributing to classification, identification, and understanding of species variation.
