# Correlation

Be sure to import Symbulate using the following commands.

In [1]:
from symbulate import *
%matplotlib inline

<a id='contents'></a>

The **correlation coefficient** is a standardized measure of linear dependence which takes values in $[-1, 1]$.
$$
Corr(X,Y) = \frac{Cov (X,Y)}{\sqrt{Var(X)}\sqrt{Var(Y}} = Cov\left(\frac{X - E(X)}{SD(X)},\frac{Y - E(Y)}{SD(Y)}\right)
$$
The correlation coefficient can be approximated by simulating many pairs of values and  using `.corr()`.

In [2]:
die = list(range(1, 6+1, 1)) # this is just a list of the number 1 through 6
roll1, roll2 = RV(BoxModel(die, size = 1)*BoxModel(die, size = 1))
rollpairs = (roll1 & roll2).sim(10000)
rollpairs.corr()

-0.00399570813616338

Again, we expect this value to be close to zero.

*Example.* A bivariate normal example.

In [24]:
X, Y = RV(BivariateNormal(mean1=0, mean2=1, sd1=1, sd2=2, corr=-0.25 ))
xy = (X & Y).sim(10000)
xy.corr()

-0.2574815238793667

When simulating more than two random variables, applying `.corr()` returns the **correlation matrix** of correlations between each pair of values (with 1s on the diagonal since a variable is perfectly correlated with itself).

In [25]:
(X & Y & X+Y).sim(10000).corr()

array([[ 1.        , -0.24955397,  0.25710783],
       [-0.24955397,  1.        ,  0.87164496],
       [ 0.25710783,  0.87164496,  1.        ]])

<a id='transform'></a>