In [1]:
# In this lesson, as the name suggests, we'll discuss p-values which
# have nothing to do with urological testing. Instead they are the
# most common measure of statistical significance.

# However, because they're popular they're used a lot, and often
# they're misused or misinterpreted. In this lecture we'll focus on
# how to generate them and interpret them correctly.

# The question motivating p-values is this. Given that we have some
# null hypothesis concerning our data (for example, its mean), how
# unusual or extreme is the sample value we get from our data? Is
# our test statistic consistent with our hypothesis? So there are,
# implicitly, three steps we have to take to answer these types of
# questions.

In [23]:
from scipy import stats

# create a normal distribution random variables
rvs = stats.norm.rvs(loc=5, scale=10, size=(50,2))

# This is a two-sided test for the null hypothesis that the expected value (mean) 
# of a sample of independent observations ‘a’ is equal to the given population mean, 
# popmean. Let us consider the following example.
stats, p_value = stats.ttest_1samp(rvs, 5.0)

p_value

array([0.70190223, 0.22967982])

In [31]:
# ttest_ind − Calculates the T-test for the means of two independent samples of scores. 
# This is a two-sided test for the null hypothesis that two independent samples have 
# identical average (expected) values. This test assumes that the populations have 
# identical variances by default.

# We can use this test, if we observe two independent samples from the same or different 
# population. Let us consider the following example.
from scipy import stats

rvs1 = norm.rvs(loc=5,scale=10,size=(500,1))
rvs2 = norm.rvs(loc=5,scale=10,size=(500,1))
stats, p_value = stats.ttest_ind(rvs1, rvs2)
p_value

array([0.95711003])