# 統計學上常用的 T-Test
The t test (also called Student’s T Test) compares two averages (means) and tells you if they are different from each other. The t test also tells you how significant the differences are; In other words it lets you know if those differences could have happened by chance

# 參考資料
* [自由度的定義:摘要了以下](https://zh.wikipedia.org/wiki/%E8%87%AA%E7%94%B1%E5%BA%A6_(%E7%BB%9F%E8%AE%A1%E5%AD%A6)
* 自由度白話的說 : 數學上，自由度是一個隨機向量的維度數，也就是一個向量能被完整描述所需的最少單位向量數。舉例來說，從電腦螢幕到廚房的位移能夠用三維向量 $xi+ yj + zh $ 來描述，因此這個位移向量的自由度是3
* 若存在兩個變數 $a$, $b$，而 $a+b=6$ 那麼他的自由度為1。因為其實只有 $a$ 才能真正的自由變化， $b$ 會被 $a$ 選值的不同所限制。
* [參考程試碼](https://towardsdatascience.com/inferential-statistics-series-t-test-using-numpy-2718f8f9bf2f)

# 以下為用 Python Numpy 原生 func 來計算 T-test，來實際理解 T-test 的運算過程

In [2]:
## Import the packages
import numpy as np
from scipy import stats


## Define 2 random distributions
#Sample Size
N = 10
#Gaussian distributed data with mean = 2 and var = 1
a = np.random.randn(N) + 10
#Gaussian distributed data with with mean = 0 and var = 1
b = np.random.randn(N)


In [3]:
## Calculate the Standard Deviation
#Calculate the variance to get the standard deviation

#For unbiased max likelihood estimate we have to divide the var by N-1, and therefore the parameter ddof = 1
var_a = a.var(ddof=1)
var_b = b.var(ddof=1)

In [4]:
#std deviation
s = np.sqrt((var_a + var_b)/2)
s


0.8530830470732802

In [30]:
## Calculate the t-statistics
t = (a.mean() - b.mean())/(s*np.sqrt(2/N))
t

20.328015687432096

In [31]:
## Compare with the critical t-value
#Degrees of freedom
df = 2*N - 2
df

18

In [32]:
#p-value after comparison with the t 
p = 1 - stats.t.cdf(t,df=df)
p

3.6304292905242619e-14

In [37]:
print("t = " + str(t))
print("p = " + str(2*p))
#Note that we multiply the p value by 2 because its a twp tail t-test
### You can see that after comparing the t statistic with the critical t value (computed internally) we get a good p value of 0.0005 and thus we reject the null hypothesis and thus it proves that the mean of the two distributions are different and statistically significant.

t = 20.3280156874
p = 7.26085858105e-14


# 使用 Scipy 裡面的 T-test 函式庫

In [36]:
## Cross Checking with the internal scipy function
t2, p2 = stats.ttest_ind(a,b)
print("t = " + str(t2))
print("p = " + str(p2))

t = 20.3280156874
p = 7.27244199236e-14


# 結論
* T-test 用來驗證兩筆資料是否有不一至
* 影響的變數有
 * 這兩個數例分別的筆數
 * 這兩個數例的 Varience
 * 然後算出標準差
 * 然後算出這兩個數例的 mean 差再除以 std ，來作一個"標準化"的動作
 * 算出 t-statistics 及可查表 P Value