In [7]:
from scipy import stats
from scipy.stats import norm
from __future__ import print_function
import numpy as np

## Analysing One Sample

In [8]:
np.random.seed(282629734)
x = stats.t.rvs(df=10, size=1000)

### Descriptive Statistics

How do the some sample properties compare to their theoretical counterparts?

In [16]:
m, v, s, k = stats.t.stats(df=10, moments='mvsk')  # Theoretical Counterparts
n, (smin, smax), sm, sv, ss, sk = stats.describe(x)

In [18]:
print(m, '\t', v, '\t', s, '\t', k)
print(sm, '\t', sv, '\t', ss, '\t', sk)

0.0 	 1.25 	 0.0 	 1.0
0.007760379167583533 	 1.0639650855972986 	 0.0027585736657464812 	 0.20389024339467765


### T-test and KS-test

We can use t-test to test whether **the mean of sample differs** in a statistically significant way from the theoretical expectation.

In [25]:
value = stats.ttest_1samp(x, m)
print(value.statistic, value.pvalue)

0.23791359136285903 0.8119969000496687


## Comparing Two Samples

In the following, we are given two samples, which can come either from the same or from different distribution, and we want to test **whether they two have the same statistical properties**.

### Comparing means

**T检验用于判断两样本是否有显著性差别**

**ttest_ind**的零假设：两者均值相同

In [27]:
# Samples with identical means
rvs1 = stats.norm.rvs(loc=5, scale=10, size=500)
rvs2 = stats.norm.rvs(loc=5, scale=10, size=500)
stats.ttest_ind(rvs1, rvs2)

Ttest_indResult(statistic=0.8771673121577234, pvalue=0.3806068672181312)

In [28]:
# Samples with different means
rvs3 = stats.norm.rvs(loc=8, scale=10, size=500)
stats.ttest_ind(rvs1, rvs3)

Ttest_indResult(statistic=-3.8568417151087644, pvalue=0.00012222309536958636)

对于rvs1与rvs2，p值较高，无法拒绝原假设；

对于rvs1与rvs3，p值小于0.05，则拒绝原假设，即**两者均值不相同，有显著性差异**。

### Kolmogorov-Smirnov test for two samples ks_2samp

**KS检验用于判断两样本是否来自同一分布**

**ks_2samp**的零假设：两者同分布

In [30]:
stats.ks_2samp(rvs1, rvs2)

Ks_2sampResult(statistic=0.03799999999999998, pvalue=0.8567045353837416)

In [33]:
stats.ks_2samp(rvs1, rvs3)

Ks_2sampResult(statistic=0.12400000000000005, pvalue=0.0008097301144065585)

对于rvs1, rvs2, p值较高，无法拒绝原假设，则两者同分布；

对于rvs1, rvs3，p值小于0.05，则拒绝原假设，即两者同分布！