<div class='alert alert-warning'>

SciPy's interactive examples with Jupyterlite are experimental and may not always work as expected. Execution of cells containing imports may result in large downloads (up to 60MB of content for the first import from SciPy). Load times when importing from SciPy may take roughly 10-20 seconds. If you notice any problems, feel free to open an [issue](https://github.com/scipy/scipy/issues/new/choose).

</div>

In [None]:
import numpy as np
from scipy import stats
x, y = [1, 2, 3, 4, 5, 6, 7], [10, 9, 2.5, 6, 4, 3, 2]
res = stats.pearsonr(x, y)
res

PearsonRResult(statistic=-0.828503883588428, pvalue=0.021280260007523286)

To perform an exact permutation version of the test:


In [None]:
rng = np.random.default_rng()
method = stats.PermutationMethod(n_resamples=np.inf, random_state=rng)
stats.pearsonr(x, y, method=method)

PearsonRResult(statistic=-0.828503883588428, pvalue=0.028174603174603175)

To perform the test under the null hypothesis that the data were drawn from
*uniform* distributions:


In [None]:
method = stats.MonteCarloMethod(rvs=(rng.uniform, rng.uniform))
stats.pearsonr(x, y, method=method)

PearsonRResult(statistic=-0.828503883588428, pvalue=0.0188)

To produce an asymptotic 90% confidence interval:


In [None]:
res.confidence_interval(confidence_level=0.9)

ConfidenceInterval(low=-0.9644331982722841, high=-0.3460237473272273)

And for a bootstrap confidence interval:


In [None]:
method = stats.BootstrapMethod(method='BCa', random_state=rng)
res.confidence_interval(confidence_level=0.9, method=method)

ConfidenceInterval(low=-0.9983163756488651, high=-0.22771001702132443)  # may vary

There is a linear dependence between x and y if y = a + b*x + e, where
a,b are constants and e is a random error term, assumed to be independent
of x. For simplicity, assume that x is standard normal, a=0, b=1 and let
e follow a normal distribution with mean zero and standard deviation s>0.


In [None]:
rng = np.random.default_rng()
s = 0.5
x = stats.norm.rvs(size=500, random_state=rng)
e = stats.norm.rvs(scale=s, size=500, random_state=rng)
y = x + e
stats.pearsonr(x, y).statistic

0.9001942438244763

This should be close to the exact value given by


In [None]:
1/np.sqrt(1 + s**2)

0.8944271909999159

For s=0.5, we observe a high level of correlation. In general, a large
variance of the noise reduces the correlation, while the correlation
approaches one as the variance of the error goes to zero.

It is important to keep in mind that no correlation does not imply
independence unless (x, y) is jointly normal. Correlation can even be zero
when there is a very simple dependence structure: if X follows a
standard normal distribution, let y = abs(x). Note that the correlation
between x and y is zero. Indeed, since the expectation of x is zero,
cov(x, y) = E[x*y]. By definition, this equals E[x*abs(x)] which is zero
by symmetry. The following lines of code illustrate this observation:


In [None]:
y = np.abs(x)
stats.pearsonr(x, y)

PearsonRResult(statistic=-0.05444919272687482, pvalue=0.22422294836207743)

A non-zero correlation coefficient can be misleading. For example, if X has
a standard normal distribution, define y = x if x < 0 and y = 0 otherwise.
A simple calculation shows that corr(x, y) = sqrt(2/Pi) = 0.797...,
implying a high level of correlation:


In [None]:
y = np.where(x < 0, x, 0)
stats.pearsonr(x, y)

PearsonRResult(statistic=0.861985781588, pvalue=4.813432002751103e-149)

This is unintuitive since there is no dependence of x and y if x is larger
than zero which happens in about half of the cases if we sample x and y.