### How to Code the Student’s t-Test from Scratch in Python
``` 
Perhaps one of the most widely used statistical hypothesis tests is the Student’s t test.
Because you may use this test yourself someday, it is important to have a deep understanding of how the test works. As a developer, this understanding is best achieved by implementing the hypothesis test yourself from scratch.

In this tutorial, you will discover how to implement the Student’s t-test statistical hypothesis test from scratch in Python.

After completing this tutorial, you will know:

The Student’s t-test will comment on whether it is likely to observe two samples given that the samples were drawn from the same population.
How to implement the Student’s t-test from scratch for two independent samples.
How to implement the paired Student’s t-test from scratch for two dependent samples.

```

### Student’s t-Test
```
The Student’s t-Test is a statistical hypothesis test for testing whether two samples are expected to have been drawn from the same population.

It is named for the pseudonym “Student” used by William Gosset, who developed the test.

The test works by checking the means from two samples to see if they are significantly different from each other. It does this by calculating the standard error in the difference between means, which can be interpreted to see how likely the difference is, if the two samples have the same mean (the null hypothesis).

The t statistic calculated by the test can be interpreted by comparing it to critical values from the t-distribution. The critical value can be calculated using the degrees of freedom and a significance level with the percent point function (PPF).

We can interpret the statistic value in a two-tailed test, meaning that if we reject the null hypothesis, it could be because the first mean is smaller or greater than the second mean. To do this, we can calculate the absolute value of the test statistic and compare it to the positive (right tailed) critical value, as follows:

If abs(t-statistic) <= critical value: Accept null hypothesis that the means are equal.
If abs(t-statistic) > critical value: Reject the null hypothesis that the means are equal.
We can also retrieve the cumulative probability of observing the absolute value of the t-statistic using the cumulative distribution function (CDF) of the t-distribution in order to calculate a p-value. The p-value can then be compared to a chosen significance level (alpha) such as 0.05 to determine if the null hypothesis can be rejected:

If p > alpha: Accept null hypothesis that the means are equal.
If p <= alpha: Reject null hypothesis that the means are equal.
In working with the means of the samples, the test assumes that both samples were drawn from a Gaussian distribution. The test also assumes that the samples have the same variance, and the same size, although there are corrections to the test if these assumptions do not hold. For example, see Welch’s t-test.

There are two main versions of Student’s t-test:

Independent Samples. The case where the two samples are unrelated.
Dependent Samples. The case where the samples are related, such as repeated measures on the same population. Also called a paired test.
Both the independent and the dependent Student’s t-tests are available in Python via the ttest_ind() and ttest_rel() SciPy functions respectively.

Note: I recommend using these SciPy functions to calculate the Student’s t-test for your applications, if they are suitable. The library implementations will be faster and less prone to bugs. I would only recommend implementing the test yourself for learning purposes or in the case where you require a modified version of the test.

We will use the SciPy functions to confirm the results from our own version of the tests.

Note, for reference, all calculations presented in this tutorial are taken directly from Chapter 9 “t Tests” in “Statistics in Plain English“, Third Edition, 2010. I mention this because you may see the equations with different forms, depending on the reference text that you use.
```

### Student’s t-Test for Independent Samples

```
We’ll start with the most common form of the Student’s t-test: the case where we are comparing the means of two independent samples.

Calculation
The calculation of the t-statistic for two independent samples is as follows:

t = observed difference between sample means / standard error of the difference between the means
or

t = (mean(X1) - mean(X2)) / sed
Where X1 and X2 are the first and second data samples and sed is the standard error of the difference between the means.

The standard error of the difference between the means can be calculated as follows:

sed = sqrt(se1^2 + se2^2)
Where se1 and se2 are the standard errors for the first and second datasets.

The standard error of a sample can be calculated as:
se = std / sqrt(n)
Where se is the standard error of the sample, std is the sample standard deviation, and n is the number of observations in the sample.

These calculations make the following assumptions:

The samples are drawn from a Gaussian distribution.
The size of each sample is approximately equal.
The samples have the same variance.

for more details please visit : 
``` 
https://machinelearningmastery.com/how-to-code-the-students-t-test-from-scratch-in-python/

In [1]:
# The Calculation of the t-statistic for two independent samples is as follow as
#t = observe_difference_between_sample_means/standard_error_of_the_difference_between_the_means
#or
#t = (mean(x1) - mean(x2))/sed
# where x1 and x2 are first and second data samples and sed is the standard error of the difference between the means

In [2]:
#The standard error of the difference between the means can be calculated as follows:
# sed = sqrt(se1**2 + se2**2)
#se  = std/sqrt(n)

In [3]:
# the Calculations make the following assumptions:
#    1. The sample are drawn from a Gaussian distribution.
#    2. The size of each sample is approximately equal
#    3. The sample have the same variance

In [4]:
from math import sqrt
from numpy.random import seed, randn
from numpy import mean
from scipy.stats import sem
from scipy.stats import t

def independent_ttest(data1, data2, alpha):
    # Claculate the means
    mean1, mean2 = mean(data1), mean(data2)
    # Calculate the Standard errors
    standard_error1, standard_error2 = sem(data1), sem(data2)
    #standard errors on difference between samples
    standard_error_difference = sqrt(standard_error1**2 + standard_error2**2)
    #Calculate the t-statistic
    t_statistic = (mean1-mean2)/standard_error_difference
    # Degrees of freedom
    degrees_of_freedom = len(data1)+len(data2) - 2
    #calculate the critical value
    critical_value = t.ppf(1.0 - alpha, degrees_of_freedom)
    #Calculate the p_value
    p_value = (1.0 - t.cdf(abs(t_statistic), degrees_of_freedom))*2.0
    # return everything
    return t_statistic, degrees_of_freedom, critical_value, p_value


#seed the random number generator
seed(1)

# generate two dependent samples
data1 = 5*randn(100) + 50
data2 = 5*randn(100) + 51

#calculate the t-test
alpha = 0.05
t_statistic, degrees_of_freedom, critical_value, p_value = independent_ttest(data1, data2, alpha)

print('t_statistic=%.3f, degrees_of_freedom=%.3f, critical_value=%.3f, p_value=%.3f'%(t_statistic,degrees_of_freedom, critical_value,p_value))

# interpret via critical value
if abs(t_statistic) <= critical_value:
    print('For t-test via critical_value Accept null hypothesis that the means are equal.')
else:
    print('For t-test  via critical_value Reject null hypothesis that the means are equal.')
    
#interpret via p_value

if p_value> alpha:
    print('For t-test  via p_value Accepts null hypothesis that the means are equal.')
else:
    print('For t-test  via p_value Reject null hypothesis that the means are equal.')

t_statistic=-2.262, degrees_of_freedom=198.000, critical_value=1.653, p_value=0.025
For t-test  via critical_value Reject null hypothesis that the means are equal.
For t-test  via p_value Reject null hypothesis that the means are equal.


In [5]:
def dependent_ttest(data1, data2, alpha):
    mean1, mean2 = mean(data1), mean(data2)
    
    n = len(data1)
    #sum squared difference between observations
    d1 = sum([(data1[i] - data2[i])**2 for i in range(n)])
    #SUM difference between observation
    d2 = sum([data1[i] - data2[i] for i in range(n)])
    # standard deviations of the difference between the means
   
    sd = sqrt((d1 - (d2**2/n))/(n-1))
    sed = sd/sqrt(n)
    #calculate the t-statistic
    t_stat = (mean1 - mean2)/sed
    #calculate the degree of freedom
    df = n-1
    #calculate the critical value
    cv = t.ppf(1.0 - alpha, df)
    # calculate the p_value
    p = (1.0 - t.cdf(abs(t_stat), df))*2.0
    #return everythings
    return t_stat, df, cv, p


seed(1)
data1 = 5*randn(100) + 50
data2 = 5*randn(100) + 51
alpha = 0.05

t_stat, df, cv, p = dependent_ttest(data1, data2, alpha)

print('t_stat =%.3f, df=%.3f, cv=%.3f, p=%.3f'%(t_stat, df, cv, p))

if abs(t_stat) <= cv:
    print('Accept null hyphothesis that means are equal')
else:
    print('Reject null hyphothesis that means are equal')
    
    
if p > alpha:
    print('Accetps null hyphothesis that means are equal')
    
else:
    print('Reject null hyphothesis that means are equal')

t_stat =-2.372, df=99.000, cv=1.660, p=0.020
Reject null hyphothesis that means are equal
Reject null hyphothesis that means are equal


In [11]:
from scipy.stats import ttest_rel, ttest_ind

In [7]:
seed(1)
data1 = 5*randn(100)+50
data2 = 5*randn(100)+51
stat, p = ttest_rel(data1, data2)
print('Statistics = %.3f, p =%.3f'%(stat, p))

Statistics = -2.372, p =0.020


###  Function

In [10]:
def func1():
    print('I am learning Python function')
    print('Still in func')
    
    
func1()


def square(x):
    return x**2

def multiply(x, y = 0):
    print('value of x=', x)
    print('value of y=', y)
    return x*y

print(multiply(y = 2, x = 4))

I am learning Python function
Still in func
value of x= 4
value of y= 2
8


In [12]:
data1 = 5*randn(100) + 50
data2 = 5 *randn(100) + 51
stat, p = ttest_ind(data1, data2)
print(stat, p)

-1.154243603012057 0.2497915571478183
