<h1>Table of Contents<span class="tocSkip"></span></h1>


# Introduction
<hr style="border:2px solid black"> </hr>


**What?** T-test



# What is it?
<hr style="border:2px solid black"> </hr>


- The t-test is a statistical test that can be used to determine if there is a significant difference between the means of two independent samples of data. 
- In this tutorial, we illustrate the most basic version of the t-test, for **which it is assumed** that the two samples have equal variances.
- Other advanced versions of the t-test include the **Welch's t-test**, which is an adaptation of the t-test, and is more reliable when the two samples have unequal variances and possibly unequal sample sizes.



# Imports
<hr style="border:2px solid black"> </hr>

In [1]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

# Import dataset
<hr style="border:2px solid black"> </hr>

In [2]:
from sklearn import datasets
iris = datasets.load_iris()
sep_length = iris.data[:, 0]
a_1, a_2 = train_test_split(sep_length, test_size=0.4, random_state=0)
b_1, b_2 = train_test_split(sep_length, test_size=0.4, random_state=1)

# Run T-test Using Equal Sample Size
<hr style="border:2px solid black"> </hr>


- We observe that the using “true” or “false” for the “equal-var” parameter does not change the t-test results that much.
- We also observe that interchanging the order of the sample arrays a_1 and b_1 yields a negative t-test value, but does not change the magnitude of the t-test value, as expected.
- Since the calculated p-value is way larger than the threshold value of 0.05, we can reject the null hypothesis that the difference between the means of sample 1 and sample 2 are significant. 
- This shows that the sepal lengths for sample 1 and sample 2 were drawn from same population data.

</font >
</div >

In [3]:
# Calculate the sample means and sample variances

mu1 = np.mean(a_1)
mu2 = np.mean(b_1)

np.std(a_1)
np.std(b_1)

0.7912242428472069

In [4]:
# Implement t-test
stats.ttest_ind(a_1, b_1, equal_var=False)

Ttest_indResult(statistic=0.830066093774641, pvalue=0.4076270841218671)

In [5]:
stats.ttest_ind(b_1, a_1, equal_var=False)

Ttest_indResult(statistic=-0.830066093774641, pvalue=0.4076270841218671)

In [6]:
stats.ttest_ind(a_1, b_1, equal_var=True)

Ttest_indResult(statistic=0.830066093774641, pvalue=0.4076132965045395)

# Run T-test Using Unequal Sample Size
<hr style="border:2px solid black"> </hr>

In [7]:
a_1, a_2 = train_test_split(sep_length, test_size=0.4, random_state=0)
b_1, b_2 = train_test_split(sep_length, test_size=0.5, random_state=1)

In [8]:
# Calculate the sample means and sample variances

mu1 = np.mean(a_1)
mu2 = np.mean(b_1)

np.std(a_1)
np.std(b_1)

0.8016915450055172

In [9]:
stats.ttest_ind(a_1, b_1, equal_var=False)

Ttest_indResult(statistic=0.808385246795547, pvalue=0.4200557921940715)

# References
<hr style="border:2px solid black"> </hr>



- https://www.kdnuggets.com/2023/01/performing-ttest-python.html

