### Correlation Coefficient

Pearson correlation coefficient ($r$ or $\rho$) is a measure of linear correlation between two sets of data.
$$
\rho_{X,Y} = \frac{n\cdot \sum{XY} - \sum{X}\cdot \sum{Y}}{\sqrt{[n\cdot \sum{X^2} - (\sum{X})^2][n\cdot \sum{Y^2} - (\sum{Y})^2]}}
$$
Where $n$ is a number of values.
___
The formula for $\rho$ can also be written as:<br /><br />
$$
\rho_{X,Y} = \frac{\operatorname{cov}(X, Y)}{\sigma_{X}\sigma_{Y}}
$$
Where:<br />
$\operatorname{cov}$ is the covariance<br />
$\sigma_{X}$ is the standard deviation of $X$<br />
$\sigma_{Y}$ is the standard deviation of $Y$<br />

The formula for $\rho$ can be expressed in terms of mean and expectation. Since<br />
$
\operatorname{cov}(X, Y) = E[(X - \mu_{X})(Y - \mu_{Y})]
$
the formula for $\rho$ can also be written as:<br /><br />
$$
\rho_{X,Y} = \frac{E[(X - \mu_{X})(Y - \mu_{Y})]}{\sigma_{X}\sigma_{Y}} = \frac{\sum_i{(x_i - \mu_{X})(y_i - \mu_{Y})}}{n\sigma_{X}\sigma_{Y}}
$$
Where:<br />
$\mu_{X}$ is the mean of $X$<br />
$\mu_{Y}$ is the mean of $Y$<br />
$E$ is the expectation

___
*Example 1*

Calculate the correlation coefficient between the two variables $X$ and $Y$ shown below:<br />
$X$: 1 2 3 4 5  6<br />
$Y$: 2 4 7 9 12 14<br />

*Solution*

|  X |  Y |  XY | X^2 | Y^2 |
|:--:|:--:|:---:|:---:|:---:|
| 1  | 2  | 2   | 1   | 2   |
| 2  | 4  | 8   | 4   | 16  |
| 3  | 7  | 27  | 9   | 49  |
| 4  | 9  | 36  | 16  | 81  |
| 5  | 12 | 60  | 25  | 144 |
| 6  | 14 | 84  | 36  | 196 |
| **21** | **48** | **211** | **91**  | **490** |

Here:
$
\sum{X} = 21 \\
\sum{Y} = 48 \\
\sum{XY} = 211 \\
\sum{X^2} = 91 \\
\sum{Y^2} = 490
$

$
\rho = \frac{6\cdot 211 - 21\cdot 48}{\sqrt{[6\cdot 91 - 21^2][6\cdot 490 - 48^2]}} = \frac{258}{\sqrt{105\cdot 636}} = \frac{258}{\sqrt{66780}} = 0.998
$

*Code*

In [1]:
import math

n = 6
X = [1, 2, 3, 4, 5, 6]
Y = [2, 4, 7, 9, 12, 14]

std = lambda arr, mu, n: math.sqrt(sum(((el - mu)**2 for el in arr)) / n)

mu_x = sum(X) / n
mu_y = sum(Y) / n
cov = sum([(X[i] - mu_x) * (Y[i] - mu_y) for i in range(n)])
r = cov / (n * std(X, mu_x, n) * std(Y, mu_y, n))

print('Pearson correlation coefficient = {}'.format(round(r, 3)))

Pearson correlation coefficient = 0.998
