# T-Test
A t-test is a statistical test used to determine the significant difference between the means (averages) of two groups, which may be related to each other.

### When Is a T-Test Useful?
Let's give a scenario:

100 students in class A scored an average of 85% with a standard deviation of 3%.  
100 students in class B scored an average of 87% with a standard deviation of 4%.  

However, we CANNOT immediately conclude that class B is smarter than A. 
- This is because, along with the mean (average), the standard deviation of class B is also higher than class A
    - This indicates that the extreme percentage for class B spread-out more than the extreme percentages for class A
    
We can use a t-test to determine if it's statistically reasonable to conclude that class B is smarter than class A based on the mean and standard deviation. 

In [73]:
import numpy as np

# create normalized random Array of 10 values with a center (mean) of 85 and a standard dev of 3
class_a_scores = np.random.normal(loc=85, scale=3, size=10)

# create normalized random Array of 10 values with a center (mean) of 87 and a standard dev of 4
class_b_scores = np.random.normal(loc=87, scale=4, size=10)

In [74]:
class_a_scores

array([79.73354979, 84.69203149, 88.57870978, 85.5571137 , 87.73715506,
       79.3719265 , 81.7782444 , 85.49569007, 85.45887397, 83.98827031])

In [75]:
class_b_scores

array([92.75103818, 90.53133718, 87.62736307, 84.00571902, 91.38628752,
       82.42913812, 81.19672127, 86.68159926, 86.46684252, 87.23551415])

In [76]:
# import the t-test function from scipy
from scipy.stats import ttest_ind

# perform a t-test for the class scores
class_scores_test = ttest_ind(class_a_scores, class_b_scores)

class_scores_test

Ttest_indResult(statistic=-1.8050225224867893, pvalue=0.08782737217775613)

# Conclusion Based on T-Test
We reject the null hypothesis if the p-value is LESS than 0.05.
- If we reject the null hypothesis, then we can accept our original conclusion

However, from the results, the p-value was 0.0878. Therefore, we must accept the null hypothesis and determine that we CANNOT conclude that class B is smarter than class A.

In [90]:
from scipy.stats import ttest_1samp
ttest_1samp(class_a_scores, popmean=0)

Ttest_1sampResult(statistic=86.0829710084803, pvalue=1.9519068572843585e-14)

# T-Test For Individual Variables
We can also perform a t-test on individual variable(s). For example, in a regression model, we can assume the null hypothesis that every independent variable's coefficient is equal to zero.

Using ```scipy.stat```, import the ```ttest_1samp``` function to perform a t-test on a null hypothesis' assumed population mean. For instance, for regression, if the population mean is assumed to be 0:
```python
ttest_1samp(regressor_coeffs_array, popmean=0)
```
Calling the above function would return a 1D Array of p-values for each independent variable (column).

Therefore, for any p-values less than 0.05, we reject the null hypothesis and keep the independent variable when building the regression model. If it's greater than 0.05, then we remove the independent variable from the regression model because then we can assume the coefficient equals 0, which cancels-out the independent variable anyway.