<div class='alert alert-warning'>

SciPy's interactive examples with Jupyterlite are experimental and may not always work as expected. Execution of cells containing imports may result in large downloads (up to 60MB of content for the first import from SciPy). Load times when importing from SciPy may take roughly 10-20 seconds. If you notice any problems, feel free to open an [issue](https://github.com/scipy/scipy/issues/new/choose).

</div>

Suppose we wish to simulate the power of the independent sample t-test
under the following conditions:

- The first sample has 10 observations drawn from a normal distribution
  with mean 0.
- The second sample has 12 observations drawn from a normal distribution
  with mean 1.0.
- The threshold on p-values for significance is 0.05.


In [None]:
import numpy as np
from scipy import stats
rng = np.random.default_rng()

test = stats.ttest_ind
n_observations = (10, 12)
rvs1 = rng.normal
rvs2 = lambda size: rng.normal(loc=1, size=size)
rvs = (rvs1, rvs2)
res = stats.power(test, rvs, n_observations, significance=0.05)
res.power

0.6116

With samples of size 10 and 12, respectively, the power of the t-test
with a significance threshold of 0.05 is approximately 60% under the chosen
alternative. We can investigate the effect of sample size on the power
by passing sample size arrays.


In [None]:
import matplotlib.pyplot as plt
nobs_x = np.arange(5, 21)
nobs_y = nobs_x
n_observations = (nobs_x, nobs_y)
res = stats.power(test, rvs, n_observations, significance=0.05)
ax = plt.subplot()
ax.plot(nobs_x, res.power)
ax.set_xlabel('Sample Size')
ax.set_ylabel('Simulated Power')
ax.set_title('Simulated Power of `ttest_ind` with Equal Sample Sizes')
plt.show()

Alternatively, we can investigate the impact that effect size has on the power.
In this case, the effect size is the location of the distribution underlying
the second sample.


In [None]:
n_observations = (10, 12)
loc = np.linspace(0, 1, 20)
rvs2 = lambda size, loc: rng.normal(loc=loc, size=size)
rvs = (rvs1, rvs2)
res = stats.power(test, rvs, n_observations, significance=0.05,
                  kwargs={'loc': loc})
ax = plt.subplot()
ax.plot(loc, res.power)
ax.set_xlabel('Effect Size')
ax.set_ylabel('Simulated Power')
ax.set_title('Simulated Power of `ttest_ind`, Varying Effect Size')
plt.show()

We can also use `power` to estimate the Type I error rate (also referred to by the
ambiguous term "size") of a test and assess whether it matches the nominal level.
For example, the null hypothesis of `jarque_bera` is that the sample was drawn from
a distribution with the same skewness and kurtosis as the normal distribution. To
estimate the Type I error rate, we can consider the null hypothesis to be a true
*alternative* hypothesis and calculate the power.


In [None]:
test = stats.jarque_bera
n_observations = 10
rvs = rng.normal
significance = np.linspace(0.0001, 0.1, 1000)
res = stats.power(test, rvs, n_observations, significance=significance)
size = res.power

As shown below, the Type I error rate of the test is far below the nominal level
for such a small sample, as mentioned in its documentation.


In [None]:
ax = plt.subplot()
ax.plot(significance, size)
ax.plot([0, 0.1], [0, 0.1], '--')
ax.set_xlabel('nominal significance level')
ax.set_ylabel('estimated test size (Type I error rate)')
ax.set_title('Estimated test size vs nominal significance level')
ax.set_aspect('equal', 'box')
ax.legend(('`ttest_1samp`', 'ideal test'))
plt.show()

As one might expect from such a conservative test, the power is quite low with
respect to some alternatives. For example, the power of the test under the
alternative that the sample was drawn from the Laplace distribution may not
be much greater than the Type I error rate.


In [None]:
rvs = rng.laplace
significance = np.linspace(0.0001, 0.1, 1000)
res = stats.power(test, rvs, n_observations, significance=0.05)
print(res.power)

0.0587

This is not a mistake in SciPy's implementation; it is simply due to the fact
that the null distribution of the test statistic is derived under the assumption
that the sample size is large (i.e. approaches infinity), and this asymptotic
approximation is not accurate for small samples. In such cases, resampling
and Monte Carlo methods (e.g. `permutation_test`, `goodness_of_fit`,
`monte_carlo_test`) may be more appropriate.
