# Spearson's r test

[Spearson's r test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html) or Spearman's rank correlation test is one of the ways of measuring a linear correlation. It is performed on 2 independent variables and uses the assigned rank of each value in the data provided rather than the actual value. The rank of each entry is determined based on how it compares to all of the other entries. It is the non-parametric version of Pearson's r test. The Spearman correlation coefficient or "r" is a number between –1 and 1 that measures the strength and direction of the relationship between two variables.  The equation used to calculate the Spearman correlation is the same as that of the Pearson correlation test and is shown below.

$\frac{n(\sum xy) - (\sum x))(\sum y)}{\sqrt{(n\sum x^{2} - (\sum x)^{2})(n\sum y^{2} - (\sum y)^{2})}}$

You can extract columns as such from a pre-defined pandas data frame.

In [2]:
import pandas as pd
from scipy import stats
data = {'col1': [1,12,3,7,8,9],
        'col2': [4,23,9,12,8,17]
       }
df = pd.DataFrame(data)
setOne = df['col1']
setTwo = df['col2']


result = stats.spearmanr(setOne, setTwo)
print(result)

SignificanceResult(statistic=0.8285714285714287, pvalue=0.04156268221574334)


## Understanding the output

The test will return a value between -1 and 1 labeled as a statistic. The sign indicates whether the variables are positively or negatively correlated and the magnitude of the value refers to the intensity of the correlation. 0 would mean there is no correlation.

The second value labeled as pvalue is a statistical measurement used to validate a hypothesis against observed data. The lower it is the more statistically significant your r-statistic.