In this page we compare performance of 4 different measures of associations. 
Pearson and tau show only correlation; if they're zero, it does not mean the covariates are independent.'
Distance correlation and tau-star, on the other hand, are zero iff the covariates are independent.
The tau star is not available in python (to best of my knowledge), but we can use R functions by calling R into
python. For this, we would need python 3.

In [1]:
import numpy as np

import scipy as sp
import scipy.stats as stats
from scipy.spatial.distance import pdist, squareform
import copy

The distance correlation (https://projecteuclid.org/download/pdfview_1/euclid.aos/1201012979) is a measure of 
dependence between two variables. The quantity is zero iff the variables are independent. The permutation test pvalues can be evaluated too:

In [2]:

def distcorr(u, v, pval, nruns=500):
    """ Compute the distance correlation function, returning the (permutation) p-value.
    reference: https://gist.github.com/wladston/c931b1495184fbb99bec
    """
    X = u.flatten().reshape(-1, 1)
    Y = v.flatten().reshape(-1, 1)
    n = X.shape[0]
    if Y.shape[0] != X.shape[0]:
        raise ValueError('Number of samples must match')
    a = squareform(pdist(X))
    b = squareform(pdist(Y))
    A = a - a.mean(axis=0)[None, :] - a.mean(axis=1)[:, None] + a.mean()
    B = b - b.mean(axis=0)[None, :] - b.mean(axis=1)[:, None] + b.mean()

    dcov2_xy = (A * B).sum() / float(n * n)
    dcov2_xx = (A * A).sum() / float(n * n)
    dcov2_yy = (B * B).sum() / float(n * n)
    dcor = np.sqrt(dcov2_xy) / np.sqrt(np.sqrt(dcov2_xx) * np.sqrt(dcov2_yy))

    if pval:
        greater = 0.
        Y_r = Y.copy()
        for i in range(nruns):
            np.random.shuffle(Y_r)
            if distcorr(X, Y_r, pval=False) >= dcor:
                greater += 1
        return ([dcor, greater / nruns])
    else:
        return dcor


Example:

In [3]:
x = np.random.normal(size=(1000,))
y = np.random.normal(size=(1000,))
print(distcorr(x, y, pval=True, nruns=10))# nruns should be of order of 10000, but say 5 here for quick computations..

[0.04922592084371134, 0.8]


The tau-star, also known as signed distance correlation, is a measure of dependence. We use R codes:

In [4]:

#####################################################################
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
utils = importr("TauStar")
ro.r('library("TauStar")')

def taustar(u, v, pval):
    """
    u and v are two columns of continuous variables.
    pval is a boolian indicating if the pvalue of the tau_star is interested to be evaluated.
    """
    ro.r('x = c{}'.format(tuple(u)))
    ro.r('y = c{}'.format(tuple(v)))
    tau_star = ro.r('tStar(x, y)')[0]
    if pval == False:
        return(tau_star)
    elif pval == True:
        ro.r('testResults = tauStarTest(x,y)')
        pvalue = ro.r('testResults$pVal[1]')[0]
        return([float(tau_star), float(pvalue)])


Example using the above function:

In [5]:
x = np.random.normal(size=(1000,))
y = np.random.normal(size=(1000,))
print(taustar(x, y, pval=True))

[-0.0001384570762604664, 0.480547183146005]


You can directly simulate variables in R and run the codes in python:

In [6]:
ro.r('x = rnorm(1000)')
ro.r('y = rnorm(1000)')
ro.r('testResults = tauStarTest(x,y)')
ro.r('testResults$pVal')[0]

0.20851458333092898

Pearson correlation is very common but is a measure of statistical correlation not necessarily dependence. The pvalues are easy to calculate without using permutation:

In [7]:
def pearson(u, v, pval):
    if pval == True:
        corr, pval = sp.stats.pearsonr(u, v)
        return([float(corr), float(pval)])
    else:
        return(np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v)))


Example:

In [8]:
x = np.random.normal(size=(1000,))
y = np.random.normal(size=(1000,))
print(pearson(x, y, pval=True))

[-0.05027041437360486, 0.11212670182974116]


The tau measure of signed correlation is another measure of statistical correlation not necessarily dependence:

In [9]:
def tau(u, v, pval):
    tau, pval = stats.stats.kendalltau(u, v)
    if pval == True:
        return([float(tau), float(pval)])
    else:
        return(float(tau))

Example:

In [10]:
x = np.random.normal(size=(1000,))
y = np.random.normal(size=(1000,))
print(tau(x, y, pval=True))

0.0059899899899899895
