### Review problem 6 (hard)

For this problem you will be defining a function that calculates the **spearman correlation coefficient** between two lists. The spearman correlation is a measure of how related two sets of numbers are.

Your function should:

- Accept two provided lists of numbers, ```X``` and ```Y```
- Print the length of ```X``` and ```Y``` using the ```len()``` function, like so:
    ```python
    Length of X: 40
    Length of Y: 40
    ```
- Calculate the **rank** of the numbers in the ```X``` and ```Y``` lists. The **rank** is a number that defines what index position each number would be if the list were in order.
    - For example: say ```list1 = [5,2,0,9,-5]```, then ```list1_rank = [3,2,1,4,0]```
    - Calculating the rank is not trivial. You can use the ```rankdata()``` function from ```scipy.stats``` on a list to get the ranks of the numbers.
    - Assign the rank of list ```X``` to ```X_rank``` and list ```Y``` to ```Y_rank```
- Calculate the **covariance between ```X_rank``` and ```Y_rank```** as ```XY_rank_cov```:
    - The **covariance** is a measure of the "relatedness" between two lists of variables.
    - To calculate the covariance between these two lists:
        1. Calculate ```X_mean```: the mean of ```X_rank``` using ```np.mean()```
        2. Calculate ```Y_mean```: the mean of ```Y_rank``` using ```np.mean()```
        3. Calculate ```X_deviation```: subtract ```X_mean``` from each element of ```X_rank```
        4. Calculate ```Y_deviation```: subtract ```Y_mean``` from each element of ```Y_rank```
        5. Calculate ```XY_d```: multiply ```X_deviation``` with ```Y_deviation```, **element by element**. You can use pythons ```zip()``` function to iterate across lists at the same time:
            ```python
            for xd, yd in zip(X_deviation, Y_deviation):
            ```
        6. Calculate ```sum_XY_d```: the sum of the elements in ```XY_d``` with ```np.sum()```
        7. Calculate ```XY_rank_cov```: divide ```sum_XY_d``` by ```len(XY_d)```
- Calculate the standard deviations ```X_rank_std``` and ```Y_rank_std``` of the ```X_rank``` and ```Y_rank``` lists using ```np.std()```
- Calculate the **spearman rank correlation coefficient** as ```XY_spearman```: divide ```XY_rank_cov``` by ```(X_rank_std * Y_rank_std)```
- Print ```XY_spearman```
- Compare your value to the scipy function for spearman: print out ```spearmanr(X, Y)```

In [90]:
import numpy as np
from scipy.stats import spearmanr
from scipy.stats import rankdata

X = [14.2,5.8,4.8,12.7,5.6,-1.2,5.3,11.9,4.8,8.1,1.5,8.5,14.9,6.1,
     6.8,12.6,15.5,24.3,15.6,16.8,22.3,22.6,26.2,19.0,24.3,26.3,
     25.3,31.6,27.3,33.0,32.6,30.7,29.6,34.7,32.7,43.1,40.1,35.4,49.6,38.6]

Y = [-15.5,-8.5,0.8,-3.9,4.9,12.7,10.0,16.5,5.7,13.1,10.3,12.4,-1.5,
     1.7,26.0,14.3,30.3,21.7,27.5,38.2,18.9,21.2,18.2,26.1,14.7,16.4,
     22.8,34.3,37.1,38.9,39.1,33.8,52.2,36.5,20.7,21.6,14.5,33.6,44.5,44.2]

In [110]:
# Create your function for spearman here:
# Print the length of both lists
print"Length of X:",len(X)
print"Length of Y:",len(Y)

Length of X: 40
Length of Y: 40


In [108]:
# rank lists X and Y to x_rank and y_rank respectively
x_rank = rankdata(X)
y_rank = rankdata(Y)

print(x_rank)
print(y_rank)


[ 15.    7.    3.5  14.    6.    1.    5.   12.    3.5  10.    2.   11.
  16.    8.    9.   13.   17.   23.5  18.   19.   21.   22.   26.   20.
  23.5  27.   25.   31.   28.   34.   32.   30.   29.   35.   33.   39.
  38.   36.   40.   37. ]
[  1.   2.   5.   3.   7.  12.   9.  18.   8.  13.  10.  11.   4.   6.  26.
  14.  29.  24.  28.  35.  20.  22.  19.  27.  16.  17.  25.  32.  34.  36.
  37.  31.  40.  33.  21.  23.  15.  30.  39.  38.]


In [100]:
# calculated the mean values for x_rank and y_rank
x_mean = np.mean(x_rank)
y_mean = np.mean(y_rank)
print x_mean, y_mean
# calculated the deviation values between the values at X and x_mean and Y and y_mean
x_deviation = [i - x_mean for i in x_rank]
y_deviation = [i - y_mean for i in y_rank]
print x_deviation
print y_deviation

20.5 20.5
[-5.5, -13.5, -17.0, -6.5, -14.5, -19.5, -15.5, -8.5, -17.0, -10.5, -18.5, -9.5, -4.5, -12.5, -11.5, -7.5, -3.5, 3.0, -2.5, -1.5, 0.5, 1.5, 5.5, -0.5, 3.0, 6.5, 4.5, 10.5, 7.5, 13.5, 11.5, 9.5, 8.5, 14.5, 12.5, 18.5, 17.5, 15.5, 19.5, 16.5]
[-19.5, -18.5, -15.5, -17.5, -13.5, -8.5, -11.5, -2.5, -12.5, -7.5, -10.5, -9.5, -16.5, -14.5, 5.5, -6.5, 8.5, 3.5, 7.5, 14.5, -0.5, 1.5, -1.5, 6.5, -4.5, -3.5, 4.5, 11.5, 13.5, 15.5, 16.5, 10.5, 19.5, 12.5, 0.5, 2.5, -5.5, 9.5, 18.5, 17.5]


In [101]:
XY_d = [xd*yd for xd, yd in zip(x_deviation, y_deviation)]
print(XY_d)

[107.25, 249.75, 263.5, 113.75, 195.75, 165.75, 178.25, 21.25, 212.5, 78.75, 194.25, 90.25, 74.25, 181.25, -63.25, 48.75, -29.75, 10.5, -18.75, -21.75, -0.25, 2.25, -8.25, -3.25, -13.5, -22.75, 20.25, 120.75, 101.25, 209.25, 189.75, 99.75, 165.75, 181.25, 6.25, 46.25, -96.25, 147.25, 360.75, 288.75]


In [102]:
sum_XY_d = np.sum(XY_d)
print(sum_XY_d)

3847.5


In [103]:
# Calculate XY_rank_cov: divide sum_XY_d by len(XY_d)
XY_rank_cov = sum_XY_d / len(XY_d)
print(XY_rank_cov)

96.1875


In [104]:
# Calculate the standard deviations X_rank_std and Y_rank_std of the X_rank and Y_rank lists using np.std()
X_rank_std = np.std(x_rank)
Y_rank_std = np.std(y_rank)
print(X_rank_std)
print(Y_rank_std)

11.5423134596
11.5433963806


In [105]:
# Calculate the spearman rank correlation coefficient as XY_spearman: divide XY_rank_cov by (X_rank_std * Y_rank_std)
XY_spearman = XY_rank_cov / (X_rank_std * Y_rank_std)

In [106]:
# Print XY_spearman
# Compare your value to the scipy function for spearman: print out spearmanr(X, Y)
print XY_spearman
print spearmanr(X,Y)

0.721925136867
SpearmanrResult(correlation=0.72192513686692761, pvalue=1.4606957738616958e-07)
