{{ message }}
/ pymccorrelation Public

Correlation coefficients with uncertainties

# privong/pymccorrelation

Switch branches/tags
Could not load branches
Nothing to show

## Latest commit

privong and George C. Privon
`4155244`

## Files

Failed to load latest commit information.
Type
Name
Commit time

# pymccorrelation

A tool to calculate correlation coefficients for data, using bootstrapping and/or perturbation to estimate the uncertainties on the correlation coefficient. This was initially a python implementation of the Curran (2014) method for calculating uncertainties on Spearman's Rank Correlation Coefficient, but has since been expanded. Curran's original C implementation is `MCSpearman` (ASCL entry).

Currently the following correlation coefficients can be calculated (with bootstrapping and/or perturbation):

Kendall's tau can also calculated when some of the data are left/right censored, following the method described by Isobe+1986.

• python3
• scipy
• numpy

## Installation

`pymccorrelation` is available via PyPi and can be installed with:

``````pip install pymccorrelation
``````

## Usage

`pymccorrelation` exports a single function to the user (also called `pymccorrelation`).

``````from pymccorrelation import pymccorrelation

``````

The correlation coefficient can be one of `pearsonr`, `spearmanr`, or `kendallt`.

For example, to compute the Pearson's r for a sample, using 1000 bootstrapping iterations to estimate the uncertainties:

``````res = pymccorrelation(data['x'], data['y'],
coeff='pearsonr',
Nboot=1000)
``````

The output, `res` is a tuple of length 2, and the two elements are:

• numpy array with the correlation coefficient (Pearson's r, in this case) percentiles (by default 16%, 50%, and 84%)
• numpy array with the p-value percentiles (by default 16%, 50%, and 84%)

The percentile ranges can be adjusted using the `percentiles` keyword argument.

Additionally, if the full posterior distribution is desired, that can be obtained by setting the `return_dist` keyword argument to `True`. In that case, `res` becomes a tuple of length four:

• numpy array with the correlation coefficient (Pearson's r, in this case) percentiles (by default 16%, 50%, and 84%)
• numpy array with the p-value percentiles (by default 16%, 50%, and 84%)
• numpy array with full set of correlation coefficient values from the bootstrapping
• numpy array with the full set of p-values computed from the bootstrapping

Please see the docstring for the full set of arguments and information including measurement uncertainties (necessary for point perturbation) and for marking censored data.

## Citing

If you use this script as part of your research, I encourage you to cite the following papers:

• Curran 2014: Describes the technique and application to Spearman's rank correlation coefficient
• Privon+ 2020: First use of this software, as `pymcspearman`.

Please also cite scipy and numpy.

If your work uses Kendall's tau with censored data please also cite:

• Isobe+ 1986: Censoring of data when computing Kendall's rank correlation coefficient.

Correlation coefficients with uncertainties

v0.2.4 Latest
May 26, 2021

•
•