In [1]:
import numpy as np
from scipy.stats import ttest_1samp

First, let's create a population by simulating data from an imagined world in which the true mean is 100, and the true standard deviation is 1.  We'll use the set seed function to make sure that when this code is run more than once it gives the same results.  (Set seed basically picks the number that the pseudo-random algorithm starts with.)

In [2]:
np.random.seed(52240)
simulated_population = np.random.normal(100, 1, 10000) # generate 10,000 instances

In [3]:
small_sample = np.random.choice(simulated_population, size=10, replace=False)
big_sample = np.random.choice(simulated_population, size=100, replace=False)


In [4]:
print(f"standard deviation of small sample is: {np.std(small_sample)}")
print(f"standard deviation of big sample is: {np.std(big_sample)}")

standard deviation of small sample is: 1.1044950204366775
standard deviation of big sample is: 1.0059816078685067


In [5]:
print(f"p-value of small sample w/ null hypothesis of zero-mean is: {ttest_1samp(small_sample, popmean=0)[1]}")
print(f"p-value of big sample w/ null hypothesis of zero-mean is: {ttest_1samp(big_sample, popmean=0)[1]}")

p-value of small sample w/ null hypothesis of zero-mean is: 6.446575878194687e-19
p-value of big sample w/ null hypothesis of zero-mean is: 1.2942964843008244e-199
