# Karl Pearson’s correlation

##  variables must have a Gaussian distribution and a linear relationship

## Covariance

### cov(x,y) = sum((x-mean(x))*(y-mean(y)))/(n-1)

## A problem with covariance as a statistical tool alone is that it is challenging to interpret. This leads us to the Pearson’s correlation coefficient next.

## The Pearson’s correlation coefficient is calculated as the covariance of the two variables divided by the product of the standard deviation of each data sample. It is the normalization of the covariance between the two variables to give an interpretable score.

### Pearson’s correlation coefficient = covariance(x,y)/(stdv(x)*stdv(y))

## The use of mean and standard deviation in the calculation suggests the need for the two data samples to have a Gaussian or Gaussian-like distribution.

# Spearman’s Correlation

## Two variables may be related by a nonlinear relationship, such that the relationship is stronger or weaker across the distribution of the variables

## the two variables being considered may have a non-Gaussian distribution.

## This test of relationship can also be used if there is a linear relationship between the variables, but will have slightly less power (e.g. may result in lower coefficient scores).

## Instead of calculating the coefficient using covariance and standard deviations on the samples themselves, these statistics are calculated from the relative rank of values on each sample. This is a common approach used in non-parametric statistics, e.g. statistical methods where we do not assume a distribution of the data such as Gaussian.



In [128]:
History_Scores = [10,  25,  17,  11,  13,  17,  20,  13,  9,   15]

In [129]:
Physics_Scores = [15,  12,  8,   8,   7,   7,   7,   6,   5,   3]

In [130]:
avg_history_score = sum(History_Scores)/len(History_Scores)

In [131]:
avg_physics_score = sum(Physics_Scores)/len(Physics_Scores)

In [132]:
difference_history_score = map(lambda x: x-avg_history_score,History_Scores)

In [133]:
difference_physics_score = map(lambda x: x-avg_physics_score,Physics_Scores)

In [134]:
product = map(lambda x,y:x*y,(difference_history_score),(difference_physics_score))

In [135]:
coeff = sum(product)/(len(product)-1)

In [136]:
stdv__history_score = (sum([i**2 for i in difference_history_score])/(len(difference_history_score)-1))**0.5

In [137]:
stdv__physics_score = (sum([i**2 for i in difference_physics_score])/(len(difference_physics_score)-1))**0.5

In [138]:
score = coeff/(stdv__history_score*stdv__physics_score)
print("Karl Pearson’s coefficient {score:.3f}".format(score=score))

Karl Pearson’s coefficient 0.118


In [139]:
from scipy import stats

In [None]:
stats.pearsor()