#### General formula for Pearson's correlation coefficient
Taken from [Wikipedia](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient).
\begin{equation*}
\rho_{XY} = \frac{Cov(X,Y)}{\sigma_X \sigma_Y}
\end{equation*}

For particular n-element samples from the X and the Y distributions:

\begin{equation*}
\hat{\rho}_{XY}= \frac{\sum_{i=1}^n{(x_i - \mu_X)(y_i - \mu_Y)}}{\sqrt{\sum_{i=1}^n(x_i - \mu_X)^2}\sqrt{\sum_{i=1}^n(y_i - \mu_Y)^2}}
\end{equation*}

I am going to very simple implement this formula, because a simplification of the above equation is sometimes numerically unstable and implementing a more complex algorithm would defeat the purpose of this kernel being a demonstration of the Pearson's correlation coefficient.

In [9]:
##########################
# Import libraries
##########################
import numpy as np

# Declaring two vectors
x = np.random.normal(loc=0, scale=1, size=50)
y = x + np.random.normal(loc=0, scale=0.2, size=50)

I derived the vector y from the vector x by adding to it a sample from a normal distribution with a low standard deviation, so the correlation should be positive and quite high, but we will see.

In [26]:
def pearson_r(x_, y_):
    return np.sum((x_ - np.mean(x_)) * (y_ - np.mean(y_))) / np.sqrt(np.sum((x_ - np.mean(x_))**2)) / np.sqrt(np.sum((y_ - np.mean(y_))**2))

print(pearson_r(x, y))

0.9842099417406069


And that is an outrageously inefficient way to calculate pearson's correlation coefficient!