In statistics, correlation refers to the strength and direction of a relationship between two variables. The value of a correlation coefficient can range from -1 to 1, with the following interpretations:

* -1: a perfect negative relationship between two variables
* 0: no relationship between two variables
* 1: a perfect positive relationship between two variables

One special type of correlation is called **Spearman Rank Correlation**, which is used to measure the correlation between two ranked variables. (e.g. rank of a student’s math exam score vs. rank of their science exam score in a class). Another usage of Spearman Rank Correlation is where the variables have interval or rational measurement and dont prove normality assumptions.

In [1]:
import pandas as pd

# create a DataFrame
df = pd.DataFrame({"student" : ["A","B","C","D","E","F","G","H","I","J"],
                   "math" : [70, 78, 90, 87, 84, 86, 91, 74, 83, 85],
                   "science" : [90, 94, 79, 86, 84, 83, 88, 92, 76, 75]})
df

Unnamed: 0,student,math,science
0,A,70,90
1,B,78,94
2,C,90,79
3,D,87,86
4,E,84,84
5,F,86,83
6,G,91,88
7,H,74,92
8,I,83,76
9,J,85,75


In [4]:
from scipy.stats import spearmanr

# calculate Spearman Rank correlation and corresponding p-value
rho, p = spearmanr(df["math"], df["science"])

print("Spearman's Rho: %.3f" % rho)
print("Significance (p-value): %.3f" % p)

Spearman's Rho: -0.418
Significance (p-value): 0.229
