An implementation of the Universal Correlation Coefficient in Python via Pandas
Here is a high-level overview, based on the R library that I wrote to compute the UCC. In a nutshell, for two discrete random variables, the UCC gives an indication as to whether or not there is a--possibly non-linear--relationship between them.
- Include tests and examples
Extend to computing UCCs for pairs of columns from a given list
- Also, allow for automatic output of scatterplots for pairs having UCC >= a given threshold
- Print modes:
- Pretty print mode (for interactive use)
- CSV output mode (for dumping to file for later).
Figure out how to make a proper pip package out of this (with
setup.pyand all that happy stuff).