In [1]:
import numpy       as np
import pandas      as pd
import scipy.stats as stats

## Two sample t test for unpaired data is defined as 
* $H_0$: $\mu1$        = $\mu2$ 
* $H_a$: $\mu1$ $\neq$ = $\mu2$ 

### Test statistic T = $\frac{\overline{X_1} - \overline{X_2}}{\sqrt{\frac{{s_1}^2} {n1}+ \frac{{s_2}^2}{n2}}}$

* where n1 and n2 are the sample sizes and X1 and X2 are the sample means 
* ${S_1}^2$ and ${S_2}^2$ are sample variances

### Example 3

Compare two unrelated samples. Data was collected on the weight loss of 16 women and 20 men enrolled in a weight reduction program.
At $\alpha$ = 0.05, test whether the weight loss of these two samples is different.

In [2]:
Weight_loss_Male   = [ 3.69, 4.12, 4.65, 3.19,  4.34, 3.68, 4.12, 4.50, 3.70, 3.09,3.65, 4.73, 3.93, 3.46, 3.28, 4.43, 4.13, 3.62, 3.71, 2.92]
Weight_loss_Female = [2.99, 1.80, 3.79, 4.12, 1.76, 3.50, 3.61, 2.32, 3.67, 4.26, 4.57, 3.01, 3.82, 4.33, 3.40, 3.86]

In [3]:
from    scipy.stats             import  ttest_1samp,ttest_ind, wilcoxon, ttest_ind_from_stats
import  scipy.stats             as      stats  
from    statsmodels.stats.power import  ttest_power
import  matplotlib.pyplot       as      plt

### Step 1: Define null and alternative hypotheses

In testing whether weight reduction of female and male are same,the null hypothesis states that mean weight reduction, $\mu{M}$ equals $\mu{F}$. The alternative hypthesis states that the weight reduction is different for Male and Female, $\mu{M}$ $\neq$ $\mu{F}$

* $H_0$: $\mu{M}$ - $\mu{F}$ =      0
* $H_A$: $\mu{M}$ - $\mu{F}$ $\neq$  0

### Step 2: Decide the significance level

Here we select $\alpha$ = 0.05 and sample size < 30 and population standard deviation is not known.

### Step 3: Identify the test statistic

* We have two samples and we do not know the population standard deviation.
* Sample sizes for both samples are not same.
* The sample is not a large sample, n < 30. So you use the t distribution and the $t_STAT$ test statistic for two sample unpaired test.

### Step 4: Calculate the p - value and test statistic

** We use the scipy.stats.ttest_ind to calculate the t-test for the means of TWO INDEPENDENT samples of scores given the two sample observations. This function returns t statistic and two-tailed p value.**

** This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances.**

In [4]:
t_statistic, p_value  =  stats.ttest_ind(Weight_loss_Male,Weight_loss_Female)
print('P Value %1.3f' % p_value)    

P Value 0.076


### Step 5:  Decide to reject or accept null hypothesis

In this example, p value is 0.076 and it is more than 5% level of significance

So the statistical decision is to accept the null hypothesis at 5% level of significance.

### So there is no sufficient evidence  to reject the null hypothesis that the weight loss of these men and women is same.

### Practice Exercise 2

Compare the following two unrelated samples. Data was collected on the weight of women and men enrolled in a weight reduction program.
At $\alpha$ = 0.05, test whether the weight of these two samples is different.

In [5]:
Weight_Female       =  [ 53.8, 54.4, 51.2, 52.5, 61.0, 50.6, 51.6, 70.0]
Weight_Male         =  [ 72.5, 80.3, 71.3, 67.7, 66.2, 73.4, 61.3, 76.8]

## Two sample t test for paired data

### Example 4

Compare two related samples. Data was collected on the marks scored by 25 students in their final practice exam and the marks scored by the students after attending special coaching classes conducted by their college.
At 5% level of significance, is there any evidence that the coaching classes has any effect on the marks scored.

In [6]:
Marks_before = [ 52, 56, 61, 47, 58, 52, 56, 60, 52, 46, 51, 62, 54, 50, 48, 59, 56, 51, 52, 44, 52, 45, 57, 60, 45]

Marks_after  = [62, 64, 40, 65, 76, 82, 53, 68, 77, 60, 69, 34, 69, 73, 67, 82, 62, 49, 44, 43, 77, 61, 67, 67, 54]

## Step 1: Define null and alternative hypotheses

In testing whether coaching has any effect on marks scored, the null hypothesis states that difference in marks, $\mu{After}$ equals $\mu{Before}$. The alternative hypthesis states that difference in marks is more than 0, $\mu{After}$ $\neq$ $\mu{Before}$

* $H_0$: $\mu{After}$ - $\mu{Before}$ =  0
* $H_A$: $\mu{After}$ - $\mu{Before}$ $\neq$  0

### Step 2: Decide the significance level

Here we select $\alpha$ = 0.05 and sample size < 30 and population standard deviation is not known.

### Step 3: Identify the test statistic

* Sample sizes for both samples are  same.
* We have two paired samples and we do not know the population standard deviation.
* The sample is not a large sample, n < 30. So you use the t distribution and the $t_STAT$ test statistic for two sample paired test.

### Step 4: Calculate the p - value and test statistic

** We use the scipy.stats.ttest_rel to calculate the T-test on TWO RELATED samples of scores.
This is a two-sided test for the null hypothesis that 2 related or repeated samples have identical average (expected) values. Here we give the two sample observations as input. This function returns t statistic and two-tailed p value.**

In [None]:
import  scipy.stats             as      stats  
t_statistic, p_value  =  stats.ttest_rel(Marks_after, Marks_before )
print('P Value %1.3f' % p_value)  

P Value 0.002


### Step 5:  Decide to reject or accept null hypothesis

In this example, p value is 0.002 and it is less than 5% level of significance

So the statistical decision is to reject the null hypothesis at 5% level of significance.

### So there is  sufficient evidence  to reject the null hypothesis that there is an effect of coaching classes on marks scored by students.

## End