# T-Test
A t-test is a statistical test used to determine the significant difference between the means (averages) of two groups, which may be related to each other.

### When Is a T-Test Useful?
Let's give a scenario:

100 students in class A scored an average of 85% with a standard deviation of 3%.  
100 students in class B scored an average of 87% with a standard deviation of 4%.  

However, we CANNOT immediately conclude that class B is smarter than A. 
- This is because, along with the mean (average), the standard deviation of class B is also higher than class A
    - This indicates that the extreme percentage for class B spread-out more than the extreme percentages for class A
    
We can use a t-test to determine if it's statistically reasonable to conclude that class B is smarter than class A based on the mean and standard deviation. 

In [11]:
import numpy as np

In [12]:
# create normalized random Array of 10 values with a center (mean) of 85 and a standard dev of 3
class_a_scores = np.random.normal(loc=85, scale=3, size=10)

# create normalized random Array of 10 values with a center (mean) of 87 and a standard dev of 4
class_b_scores = np.random.normal(loc=87, scale=4, size=10)

In [13]:
class_a_scores

array([86.17742361, 86.51055478, 81.62915934, 89.59871709, 85.11774997,
       82.80365601, 84.51441016, 87.09215749, 93.6427178 , 80.49942699])

In [14]:
class_b_scores

array([85.64660125, 88.53644971, 86.80024697, 93.7500767 , 86.15836509,
       87.17237874, 87.23807021, 86.76164762, 91.37636324, 90.78678979])

In [42]:
# import the t-test function from scipy
from scipy.stats import ttest_ind

# perform a t-test for the class scores
class_scores_test = ttest_ind(class_a_scores, class_b_scores)

class_scores_test

Ttest_indResult(statistic=-1.7928549660490936, pvalue=0.0898179668151302)

# Conclusion Based on T-Test
We reject the null hypothesis if the p-value is LESS than 0.05.
- If we reject the null hypothesis, then we can accept our original conclusion

However, from the results, the p-value was ~0.22. Therefore, we must accept the null hypothesis and determine that we CANNOT conclude that class B is smarter than class A.

### ttest_ind Function
We used the ```ttest_ind``` function to perform a t-test on the class A and class B samples.

The function calculates a t-test under the null hypothesis that the 2 independent variables have an equal population mean.


Population Mean is NOT the same as Sample Mean.
- Population Mean is the average of a population (whole group)
- Sample Mean is the the average of a sample (part of the group)

### Population Mean Formula
The formula to find the population mean is: ```μ = (Σ * X)/ N```

where:
- Σ means “the sum of”
- X = all the individual items in the group
- N = the number of items in the group

### Population Mean Example
Ex: 125 people were surveyed to see how many hours per day they play video games.
- 1 hour (50 people)
- 3 hours (25 people)
- 5 hours (50 people)

What is the population mean for the number of hours played per day?  
```[(1 * 50) + (3 * 25) + (5 * 50)] / 125``` = 3 hours

### ttest_1samp Function
We can use the ```ttest_1samp``` function to perform a t-test on a single independent variable under the null hypothesis that it equals to a population mean parameter ```popmean```.

In [45]:
from scipy.stats import ttest_1samp

# perform a t-test under a null hypothesis that the class A scores population mean is 85
class_a_score_test = ttest_1samp(class_a_scores, popmean=85)

class_a_score_test

Ttest_1sampResult(statistic=0.6197902692695071, pvalue=0.550762935241339)

# Conclusion Based on T-Test
We performed a t-test on the class A scores under the null hypothesis that the population mean is 85. In fact, the p-value returned a value greater than 0.05, so we accept the null hypothesis.