# Correlations

## Simple correlation
Computing a correlation between two variables in Pingouin is done using the [corr](https://pingouin-stats.org/generated/pingouin.corr.html#pingouin.corr) function

In [28]:
x = [4, 5, 7, 4, 5, 6, 8, 9]
y = [3, 4, 5, 3, 4, 5, 4, 3]

from pingouin import corr
corr(x, y)

Unnamed: 0,n,r,CI95%,r2,adj_r2,p-val,BF10,power
pearson,8,0.1849,"[-0.6, 0.79]",0.034188,-0.352137,0.661133,0.468,0.071911


The [corr](https://pingouin-stats.org/generated/pingouin.corr.html#pingouin.corr) function returns a pandas DataFrame with:

1. The sample size `n` (after removal of NaN)
2. the correlation coefficient (`r`)
3. the parametric 95% confidence intervals of the coefficient (`CI95%`)
4. the R<sup>2</sup> (= coefficient of determination, `r2`)
5. the adjusted R<sup>2</sup> (`adj_r2`)
6. the p-value (`p-unc`)
7. the Bayes Factor for the alternative hypothesis (`BF10`)
8. the achieved power of the test (`power`, = 1 - type 2 error)

By default, the function returns the two-sided Pearson's correlation coefficients. This can be adjusted using the `tail` and `method` arguments:

In [29]:
corr(x, y, method='spearman', tail='one-sided')

Unnamed: 0,n,r,CI95%,r2,adj_r2,p-val,power
spearman,8,0.318788,"[-0.5, 0.84]",0.101626,-0.257724,0.220767,0.20127


***********************

## Pairwise correlations

We will now see how to compute pairwise correlations coefficients across columns of a pandas DataFrame using the [pairwise_corr](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html#pingouin.pairwise_corr) function.

To do so, we will first load an example dataset in which each row represents one subject and each columns represent a score on the well-known Big Five personality traits. There are 500 subjects in total.

In [30]:
from pingouin import read_dataset

df = read_dataset('pairwise_corr')

# Remove the 'Subject' column
df.drop(columns='Subject', inplace=True)

# Print the first lines
df.head()

Unnamed: 0,Neuroticism,Extraversion,Openness,Agreeableness,Conscientiousness
0,2.47917,4.20833,3.9375,3.95833,3.45833
1,2.60417,3.1875,3.95833,3.39583,3.22917
2,2.8125,2.89583,3.41667,2.75,3.5
3,2.89583,3.5625,3.52083,3.16667,2.79167
4,3.02083,3.33333,4.02083,3.20833,2.85417


Let's see if the personality dimensions are correlated or not. For that, we will compute the pairwise correlations between all the columns of the DataFrame:

In [31]:
from pingouin import pairwise_corr
pairwise_corr(df)  # Similar to df.pairwise_corr()

Unnamed: 0,X,Y,method,tail,n,r,CI95%,r2,adj_r2,z,p-unc,BF10,power
0,Neuroticism,Extraversion,pearson,two-sided,500,-0.350079,"[-0.42, -0.27]",0.122555,0.119024,-0.365533,7.323047e-16,6765000000000.0,1.0
1,Neuroticism,Openness,pearson,two-sided,500,-0.010383,"[-0.1, 0.08]",0.000108,-0.003916,-0.010383,0.816854,0.058,0.056138
2,Neuroticism,Agreeableness,pearson,two-sided,500,-0.134322,"[-0.22, -0.05]",0.018042,0.014091,-0.135138,0.002615436,5.122,0.854395
3,Neuroticism,Conscientiousness,pearson,two-sided,500,-0.368007,"[-0.44, -0.29]",0.135429,0.13195,-0.386115,1.7589680000000002e-17,264400000000000.0,1.0
4,Extraversion,Openness,pearson,two-sided,500,0.267131,"[0.18, 0.35]",0.071359,0.067622,0.273772,1.287742e-09,5277000.0,0.999983
5,Extraversion,Agreeableness,pearson,two-sided,500,0.054547,"[-0.03, 0.14]",0.002975,-0.001037,0.054601,0.2233908,0.117,0.229853
6,Extraversion,Conscientiousness,pearson,two-sided,500,0.064591,"[-0.02, 0.15]",0.004172,0.000165,0.064681,0.1492461,0.158,0.302974
7,Openness,Agreeableness,pearson,two-sided,500,0.159208,"[0.07, 0.24]",0.025347,0.021425,0.160574,0.0003516781,32.635,0.947714
8,Openness,Conscientiousness,pearson,two-sided,500,-0.013448,"[-0.1, 0.07]",0.000181,-0.003843,-0.013449,0.7641957,0.059,0.060345
9,Agreeableness,Conscientiousness,pearson,two-sided,500,0.15867,"[0.07, 0.24]",0.025176,0.021253,0.160022,0.0003685092,31.243,0.946382


In the example above, we can see that the highest correlation between personality dimensions is between `Neuroticism` and `Conscientiousness`, as indicated by the correlation coefficient (-0.368), the p-value (1.75e-17) and the Bayes Factor (1.80e14).

### Non-parametric correlations
If your data do not follow a normal distribution or contains outliers, you may want to use a non-parametric method such as the Spearman rank-correlation.

In the example below, we compute the one-sided Spearman pairwise correlations between a subset of columns. Note that the Bayes Factor is only computed when using the Pearson method and is therefore not present in the table above.

In [32]:
pairwise_corr(data=df, columns=['Neuroticism', 'Extraversion'], tail='one-sided', method='spearman')

Unnamed: 0,X,Y,method,tail,n,r,CI95%,r2,adj_r2,z,p-unc,power
0,Neuroticism,Extraversion,spearman,one-sided,500,-0.325488,"[-0.4, -0.24]",0.105943,0.102345,-0.337774,4.192429e-14,1.0


### Robust correlations
If you believe that your dataset contains outliers, you can use a robust correlation method. There are currently three robust correlation methods implemented in Pingouin, namely the percentage bend correlation ([Wilcox 1994](https://link.springer.com/article/10.1007/BF02294395)), the Shepherd's pi correlation ([Schwarzkopf et al. 2012](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3397314/)) and the skipped correlation ([Rousselet and Pernet 2012](https://www.frontiersin.org/articles/10.3389/fnhum.2012.00119/full)). 

While the former method is particularly well-suited for univariate outlier (e.g. present in only one variable), the two latter methods work well with multivariate outliers. Note that the skipped correlation requires the scikit-learn package. Learn more on the documentation of the [corr](https://pingouin-stats.org/generated/pingouin.corr.html#pingouin.corr) function.

In [33]:
# Introduce two outliers in variable X
df.loc[[5, 12, 24, 58], 'Neuroticism'] = 18

# Percentage bend correlation
pairwise_corr(data=df, columns=['Neuroticism', 'Extraversion'], method='percbend')

Unnamed: 0,X,Y,method,tail,n,r,CI95%,r2,adj_r2,z,p-unc,power
0,Neuroticism,Extraversion,percbend,two-sided,500,-0.327312,"[-0.4, -0.25]",0.107133,0.10354,-0.339815,5.985071e-14,1.0


In [34]:
# Shepherd's correlation
pairwise_corr(data=df, columns=['Neuroticism', 'Extraversion'], method='shepherd')

Unnamed: 0,X,Y,method,tail,n,outliers,r,CI95%,r2,adj_r2,z,p-unc,power
0,Neuroticism,Extraversion,shepherd,two-sided,500,16.0,-0.318793,"[-0.4, -0.24]",0.101629,0.098014,-0.330303,6.790904e-13,1.0


### Correction for multiple comparisons
Finally, if you are computing a large number of correlation coefficients, you might want to correct the p-values for multiple comparisons. This can be done with `padjust` argument:

In [35]:
pairwise_corr(df, method='spearman', padjust="holm").round(3)

Unnamed: 0,X,Y,method,tail,n,r,CI95%,r2,adj_r2,z,p-unc,p-corr,p-adjust,power
0,Neuroticism,Extraversion,spearman,two-sided,500,-0.33,"[-0.41, -0.25]",0.109,0.105,-0.343,0.0,0.0,holm,1.0
1,Neuroticism,Openness,spearman,two-sided,500,-0.02,"[-0.11, 0.07]",0.0,-0.004,-0.02,0.662,1.0,holm,0.072
2,Neuroticism,Agreeableness,spearman,two-sided,500,-0.132,"[-0.22, -0.04]",0.017,0.014,-0.133,0.003,0.015,holm,0.843
3,Neuroticism,Conscientiousness,spearman,two-sided,500,-0.365,"[-0.44, -0.29]",0.133,0.129,-0.382,0.0,0.0,holm,1.0
4,Extraversion,Openness,spearman,two-sided,500,0.243,"[0.16, 0.32]",0.059,0.055,0.248,0.0,0.0,holm,1.0
5,Extraversion,Agreeableness,spearman,two-sided,500,0.062,"[-0.03, 0.15]",0.004,-0.0,0.062,0.166,0.666,holm,0.283
6,Extraversion,Conscientiousness,spearman,two-sided,500,0.056,"[-0.03, 0.14]",0.003,-0.001,0.056,0.213,0.666,holm,0.238
7,Openness,Agreeableness,spearman,two-sided,500,0.17,"[0.08, 0.25]",0.029,0.025,0.171,0.0,0.001,holm,0.969
8,Openness,Conscientiousness,spearman,two-sided,500,-0.007,"[-0.09, 0.08]",0.0,-0.004,-0.007,0.88,1.0,holm,0.053
9,Agreeableness,Conscientiousness,spearman,two-sided,500,0.161,"[0.07, 0.24]",0.026,0.022,0.162,0.0,0.002,holm,0.951


### Advanced columns selection

One of the advantages of the [pairwise_corr](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html#pingouin.pairwise_corr) function is that it allows for almost unlimited flexibility with regards to columns indexing. To understand this, we'll first start by adding some fake columns to our dataframe:

In [36]:
import numpy as np
np.random.seed(123)
df['Age'] = np.random.randint(18, 70, size=df.shape[0])
df['BMI'] = np.random.randint(18, 45, size=df.shape[0])
df['Gender'] = np.random.randint(0, 2, size=df.shape[0])
df['Ethnicity'] = 'Caucasian'

df.head()

Unnamed: 0,Neuroticism,Extraversion,Openness,Agreeableness,Conscientiousness,Age,BMI,Gender,Ethnicity
0,2.47917,4.20833,3.9375,3.95833,3.45833,63,28,1,Caucasian
1,2.60417,3.1875,3.95833,3.39583,3.22917,20,36,0,Caucasian
2,2.8125,2.89583,3.41667,2.75,3.5,46,25,0,Caucasian
3,2.89583,3.5625,3.52083,3.16667,2.79167,52,27,0,Caucasian
4,3.02083,3.33333,4.02083,3.20833,2.85417,56,33,0,Caucasian


Now, let's assume that I am interested in looking only at the correlation between `Age` on the one hand and all the other variables on the other hand. This can be done very easily:

In [37]:
pairwise_corr(df, columns='Age')  # Age versus all the other numeric columns

Unnamed: 0,X,Y,method,tail,n,r,CI95%,r2,adj_r2,z,p-unc,BF10,power
0,Age,Neuroticism,pearson,two-sided,500,-0.036287,"[-0.12, 0.05]",0.001317,-0.002702,-0.036303,0.418148,0.078,0.127849
1,Age,Extraversion,pearson,two-sided,500,-0.004429,"[-0.09, 0.08]",2e-05,-0.004004,-0.004429,0.921314,0.056,0.051087
2,Age,Openness,pearson,two-sided,500,0.034727,"[-0.05, 0.12]",0.001206,-0.002813,0.034741,0.438452,0.076,0.12113
3,Age,Agreeableness,pearson,two-sided,500,-0.045198,"[-0.13, 0.04]",0.002043,-0.001973,-0.045229,0.313146,0.093,0.172313
4,Age,Conscientiousness,pearson,two-sided,500,0.058631,"[-0.03, 0.15]",0.003438,-0.000573,0.058698,0.190576,0.132,0.258275
5,Age,BMI,pearson,two-sided,500,-0.05343,"[-0.14, 0.03]",0.002855,-0.001158,-0.053481,0.233026,0.114,0.222418
6,Age,Gender,pearson,two-sided,500,-0.022832,"[-0.11, 0.06]",0.000521,-0.003501,-0.022836,0.610529,0.064,0.080198


Great! What about if I am interested in looking at `Age` and `BMI` on the one hand and the personality dimensions on the other hand? That's also very easy:

In [38]:
subj = ['Age', 'Gender']
personality = ['Neuroticism', 'Extraversion', 'Openness', 'Agreeableness', 'Conscientiousness']
pairwise_corr(df, columns=[subj, personality])  # Cartesian product between the first and second list

Unnamed: 0,X,Y,method,tail,n,r,CI95%,r2,adj_r2,z,p-unc,BF10,power
0,Age,Neuroticism,pearson,two-sided,500,-0.036287,"[-0.12, 0.05]",0.001317,-0.002702,-0.036303,0.418148,0.078,0.127849
1,Age,Extraversion,pearson,two-sided,500,-0.004429,"[-0.09, 0.08]",2e-05,-0.004004,-0.004429,0.921314,0.056,0.051087
2,Age,Openness,pearson,two-sided,500,0.034727,"[-0.05, 0.12]",0.001206,-0.002813,0.034741,0.438452,0.076,0.12113
3,Age,Agreeableness,pearson,two-sided,500,-0.045198,"[-0.13, 0.04]",0.002043,-0.001973,-0.045229,0.313146,0.093,0.172313
4,Age,Conscientiousness,pearson,two-sided,500,0.058631,"[-0.03, 0.15]",0.003438,-0.000573,0.058698,0.190576,0.132,0.258275
5,Gender,Neuroticism,pearson,two-sided,500,0.006815,"[-0.08, 0.09]",4.6e-05,-0.003978,0.006815,0.879184,0.057,0.05262
6,Gender,Extraversion,pearson,two-sided,500,0.010541,"[-0.08, 0.1]",0.000111,-0.003913,0.010541,0.814121,0.058,0.056328
7,Gender,Openness,pearson,two-sided,500,-0.046408,"[-0.13, 0.04]",0.002154,-0.001862,-0.046441,0.300354,0.096,0.179144
8,Gender,Agreeableness,pearson,two-sided,500,0.02007,"[-0.07, 0.11]",0.000403,-0.00362,0.020073,0.65437,0.062,0.073245
9,Gender,Conscientiousness,pearson,two-sided,500,0.028945,"[-0.06, 0.12]",0.000838,-0.003183,0.028953,0.518443,0.069,0.098978


Also great...but what if I don't want to have to write all the columns names, and let's say that I am just interested in looking at `Age` and `Gender` versus all the other columns in the dataframe?

In [39]:
pairwise_corr(df, columns=[subj, None])

Unnamed: 0,X,Y,method,tail,n,r,CI95%,r2,adj_r2,z,p-unc,BF10,power
0,Age,Neuroticism,pearson,two-sided,500,-0.036287,"[-0.12, 0.05]",0.001317,-0.002702,-0.036303,0.418148,0.078,0.127849
1,Age,Extraversion,pearson,two-sided,500,-0.004429,"[-0.09, 0.08]",2e-05,-0.004004,-0.004429,0.921314,0.056,0.051087
2,Age,Openness,pearson,two-sided,500,0.034727,"[-0.05, 0.12]",0.001206,-0.002813,0.034741,0.438452,0.076,0.12113
3,Age,Agreeableness,pearson,two-sided,500,-0.045198,"[-0.13, 0.04]",0.002043,-0.001973,-0.045229,0.313146,0.093,0.172313
4,Age,Conscientiousness,pearson,two-sided,500,0.058631,"[-0.03, 0.15]",0.003438,-0.000573,0.058698,0.190576,0.132,0.258275
5,Age,BMI,pearson,two-sided,500,-0.05343,"[-0.14, 0.03]",0.002855,-0.001158,-0.053481,0.233026,0.114,0.222418
6,Gender,Neuroticism,pearson,two-sided,500,0.006815,"[-0.08, 0.09]",4.6e-05,-0.003978,0.006815,0.879184,0.057,0.05262
7,Gender,Extraversion,pearson,two-sided,500,0.010541,"[-0.08, 0.1]",0.000111,-0.003913,0.010541,0.814121,0.058,0.056328
8,Gender,Openness,pearson,two-sided,500,-0.046408,"[-0.13, 0.04]",0.002154,-0.001862,-0.046441,0.300354,0.096,0.179144
9,Gender,Agreeableness,pearson,two-sided,500,0.02007,"[-0.07, 0.11]",0.000403,-0.00362,0.020073,0.65437,0.062,0.073245


### MultiIndex columns

The [pairwise_corr](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html#pingouin.pairwise_corr) function also works with two-dimensional columns. To illustrate this, let's create a fake multi-level dataframe:

In [40]:
import pandas as pd
from numpy.random import randint as rdint
np.random.seed(123)

# Create MultiIndex dataframe
columns = pd.MultiIndex.from_tuples([('Behavior', 'Rating'),
                                     ('Behavior', 'ReactionTime'),
                                     ('Physio', 'PupilDilation'),
                                     ('Physio', 'BPM'),
                                     ('Psycho', 'Sleepiness')])

data = pd.DataFrame(dict(Rating=rdint(0, 100, size=10),
                         RT=rdint(200, 500, size=10),
                         PupilDilation=rdint(-5, 5, size=10),
                         HR=rdint(45, 90, size=10),
                         Sleepiness=rdint(1, 7, size=10)))

data.columns = columns
data

Unnamed: 0_level_0,Behavior,Behavior,Physio,Physio,Psycho
Unnamed: 0_level_1,Rating,ReactionTime,PupilDilation,BPM,Sleepiness
0,66,273,-3,66,1
1,92,232,-1,75,4
2,98,424,3,72,3
3,17,311,-5,79,3
4,83,453,2,78,3
5,57,296,4,57,6
6,86,408,-2,85,3
7,97,268,-1,48,5
8,96,202,1,87,4
9,47,239,-4,50,4


To compute the correlation on the combination of all the columns, one can simply use:

In [41]:
pairwise_corr(data)

Unnamed: 0,X,Y,method,tail,n,r,CI95%,r2,adj_r2,z,p-unc,BF10,power
0,"(Behavior, Rating)","(Behavior, ReactionTime)",pearson,two-sided,10,0.134624,"[-0.54, 0.7]",0.018124,-0.262412,0.135446,0.710789,0.411,0.06515
1,"(Behavior, Rating)","(Physio, PupilDilation)",pearson,two-sided,10,0.541574,"[-0.13, 0.87]",0.293302,0.091389,0.60638,0.105902,1.234,0.387566
2,"(Behavior, Rating)","(Physio, BPM)",pearson,two-sided,10,0.159533,"[-0.52, 0.72]",0.025451,-0.252992,0.160907,0.659769,0.422,0.071938
3,"(Behavior, Rating)","(Psycho, Sleepiness)",pearson,two-sided,10,0.075526,"[-0.58, 0.67]",0.005704,-0.27838,0.07567,0.835726,0.394,0.053902
4,"(Behavior, ReactionTime)","(Physio, PupilDilation)",pearson,two-sided,10,0.327401,"[-0.38, 0.79]",0.107191,-0.147897,0.339914,0.355769,0.566,0.15504
5,"(Behavior, ReactionTime)","(Physio, BPM)",pearson,two-sided,10,0.299504,"[-0.41, 0.78]",0.089703,-0.170382,0.308975,0.400509,0.53,0.136152
6,"(Behavior, ReactionTime)","(Psycho, Sleepiness)",pearson,two-sided,10,-0.293512,"[-0.78, 0.41]",0.086149,-0.174951,-0.302405,0.410455,0.524,0.132385
7,"(Physio, PupilDilation)","(Physio, BPM)",pearson,two-sided,10,0.050473,"[-0.6, 0.66]",0.002548,-0.282439,0.050516,0.88987,0.39,0.051095
8,"(Physio, PupilDilation)","(Psycho, Sleepiness)",pearson,two-sided,10,0.424365,"[-0.28, 0.83]",0.180085,-0.054176,0.453003,0.221587,0.753,0.23977
9,"(Physio, BPM)","(Psycho, Sleepiness)",pearson,two-sided,10,-0.41955,"[-0.83, 0.29]",0.176022,-0.0594,-0.447145,0.227435,0.741,0.234806


All the previously-mentionned advanced column selection tricks work just as fine here. For example, to compute the one versus all:

In [42]:
pairwise_corr(data, columns=('Behavior', 'Rating'))

Unnamed: 0,X,Y,method,tail,n,r,CI95%,r2,adj_r2,z,p-unc,BF10,power
0,"(Behavior, Rating)","(Behavior, ReactionTime)",pearson,two-sided,10,0.134624,"[-0.54, 0.7]",0.018124,-0.262412,0.135446,0.710789,0.411,0.06515
1,"(Behavior, Rating)","(Physio, PupilDilation)",pearson,two-sided,10,0.541574,"[-0.13, 0.87]",0.293302,0.091389,0.60638,0.105902,1.234,0.387566
2,"(Behavior, Rating)","(Physio, BPM)",pearson,two-sided,10,0.159533,"[-0.52, 0.72]",0.025451,-0.252992,0.160907,0.659769,0.422,0.071938
3,"(Behavior, Rating)","(Psycho, Sleepiness)",pearson,two-sided,10,0.075526,"[-0.58, 0.67]",0.005704,-0.27838,0.07567,0.835726,0.394,0.053902


Or between multiple columns:

In [43]:
pairwise_corr(data, columns=[('Behavior', 'ReactionTime'), ('Psycho', 'Sleepiness')])

Unnamed: 0,X,Y,method,tail,n,r,CI95%,r2,adj_r2,z,p-unc,BF10,power
0,"(Behavior, ReactionTime)","(Psycho, Sleepiness)",pearson,two-sided,10,-0.293512,"[-0.78, 0.41]",0.086149,-0.174951,-0.302405,0.410455,0.524,0.132385


In [44]:
pairwise_corr(data, columns=[[('Behavior', 'Rating'), ('Behavior', 'ReactionTime')], None])

Unnamed: 0,X,Y,method,tail,n,r,CI95%,r2,adj_r2,z,p-unc,BF10,power
0,"(Behavior, Rating)","(Physio, PupilDilation)",pearson,two-sided,10,0.541574,"[-0.13, 0.87]",0.293302,0.091389,0.60638,0.105902,1.234,0.387566
1,"(Behavior, Rating)","(Physio, BPM)",pearson,two-sided,10,0.159533,"[-0.52, 0.72]",0.025451,-0.252992,0.160907,0.659769,0.422,0.071938
2,"(Behavior, Rating)","(Psycho, Sleepiness)",pearson,two-sided,10,0.075526,"[-0.58, 0.67]",0.005704,-0.27838,0.07567,0.835726,0.394,0.053902
3,"(Behavior, ReactionTime)","(Physio, PupilDilation)",pearson,two-sided,10,0.327401,"[-0.38, 0.79]",0.107191,-0.147897,0.339914,0.355769,0.566,0.15504
4,"(Behavior, ReactionTime)","(Physio, BPM)",pearson,two-sided,10,0.299504,"[-0.41, 0.78]",0.089703,-0.170382,0.308975,0.400509,0.53,0.136152
5,"(Behavior, ReactionTime)","(Psycho, Sleepiness)",pearson,two-sided,10,-0.293512,"[-0.78, 0.41]",0.086149,-0.174951,-0.302405,0.410455,0.524,0.132385


And finally between levels, using the [pandas.xs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.xs.html) function:

In [45]:
col_behavior = data.xs('Behavior', axis=1, level=0, drop_level=False).columns.to_list()
col_physio = data.xs('Physio', axis=1, level=0, drop_level=False).columns.to_list()

print('All columns in level "Behavior"', col_behavior)
print('All columns in level "Physio"', col_physio)

# pairwise_corr(data, columns=col_behavior + col_physio)  # All possible correlations
pairwise_corr(data, columns=[col_behavior, col_physio])   # Between-list correlations

All columns in level "Behavior" [('Behavior', 'Rating'), ('Behavior', 'ReactionTime')]
All columns in level "Physio" [('Physio', 'PupilDilation'), ('Physio', 'BPM')]


Unnamed: 0,X,Y,method,tail,n,r,CI95%,r2,adj_r2,z,p-unc,BF10,power
0,"(Behavior, Rating)","(Physio, PupilDilation)",pearson,two-sided,10,0.541574,"[-0.13, 0.87]",0.293302,0.091389,0.60638,0.105902,1.234,0.387566
1,"(Behavior, Rating)","(Physio, BPM)",pearson,two-sided,10,0.159533,"[-0.52, 0.72]",0.025451,-0.252992,0.160907,0.659769,0.422,0.071938
2,"(Behavior, ReactionTime)","(Physio, PupilDilation)",pearson,two-sided,10,0.327401,"[-0.38, 0.79]",0.107191,-0.147897,0.339914,0.355769,0.566,0.15504
3,"(Behavior, ReactionTime)","(Physio, BPM)",pearson,two-sided,10,0.299504,"[-0.41, 0.78]",0.089703,-0.170382,0.308975,0.400509,0.53,0.136152


***
## Partial correlation

In some cases, you will want to measure the correlation between two variables whilst controlling for the potential influence of other variables (also known as covariates). This can be done easily using the [partial_corr](https://pingouin-stats.org/generated/pingouin.partial_corr.html#pingouin.partial_corr) function.

In [46]:
from pingouin import partial_corr

# Correlation between extraversion and openess whilst controlling for age:
# Pandas-style: df.partial_corr(x='Extraversion', y='Openness', covar='Age')
partial_corr(data=df, x='Extraversion', y='Openness', covar='Age', method='pearson')

Unnamed: 0,n,r,CI95%,r2,adj_r2,p-val,BF10,power
pearson,500,0.267449,"[0.18, 0.35]",0.071529,0.067793,1.229016e-09,5522000.0,0.999984


In [47]:
# Correlation between extraversion and openess whilst controlling for age and BMI:
partial_corr(data=df, x='Extraversion', y='Openness', covar=['Age', 'BMI'], method='pearson')

Unnamed: 0,n,r,CI95%,r2,adj_r2,p-val,BF10,power
pearson,500,0.265946,"[0.18, 0.35]",0.070727,0.066988,1.531824e-09,4458000.0,0.999981


Cherry on the cake, the [pairwise_corr](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html#pingouin.pairwise_corr) function also supports partial correlation with the `covar` argument!

In [48]:
# Pandas-style: df.pairwise_corr(covar=['Age', 'Gender', 'BMI'], method='spearman')
pairwise_corr(data=df, covar=['Age', 'Gender', 'BMI'], method='spearman')

Unnamed: 0,X,Y,method,covar,tail,n,r,CI95%,r2,adj_r2,z,p-unc,power
0,Neuroticism,Extraversion,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,-0.32399,"[-0.4, -0.24]",0.10497,0.101368,-0.336099,1.104245e-13,1.0
1,Neuroticism,Openness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,-0.010925,"[-0.1, 0.08]",0.000119,-0.003904,-0.010926,0.8074715,0.056802
2,Neuroticism,Agreeableness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,-0.130847,"[-0.22, -0.04]",0.017121,0.013166,-0.131602,0.003377397,0.835604
3,Neuroticism,Conscientiousness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,-0.366116,"[-0.44, -0.29]",0.134041,0.130556,-0.38393,2.6358400000000003e-17,1.0
4,Extraversion,Openness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,0.241304,"[0.16, 0.32]",0.058228,0.054438,0.246158,4.677573e-08,0.999794
5,Extraversion,Agreeableness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,0.064141,"[-0.02, 0.15]",0.004114,0.000106,0.064229,0.1521116,0.299469
6,Extraversion,Conscientiousness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,0.056395,"[-0.03, 0.14]",0.00318,-0.000831,0.056455,0.2080778,0.242477
7,Openness,Agreeableness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,0.173708,"[0.09, 0.26]",0.030174,0.026272,0.175487,9.452982e-05,0.974759
8,Openness,Conscientiousness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,-0.002239,"[-0.09, 0.09]",5e-06,-0.004019,-0.002239,0.9601765,0.050254
9,Agreeableness,Conscientiousness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,0.161829,"[0.08, 0.25]",0.026189,0.02227,0.163264,0.0002795706,0.953823


If you are only interested in the partial correlation matrix (and not the p-values, CI..), an alternative is to use the pandas.DataFrame.pcorr() method that is implemented in Pingouin. This returns the pairwise correlation matrix between two variables while controlling for all the other variables:

In [49]:
df.pcorr().round(3)

Unnamed: 0,Neuroticism,Extraversion,Openness,Agreeableness,Conscientiousness,Age,BMI,Gender
Neuroticism,1.0,-0.14,0.074,0.057,-0.169,-0.026,0.016,0.014
Extraversion,-0.14,1.0,0.272,0.007,0.046,-0.023,-0.018,0.025
Openness,0.074,0.272,1.0,0.156,-0.043,0.046,-0.056,-0.049
Agreeableness,0.057,0.007,0.156,1.0,0.172,-0.06,-0.001,0.021
Conscientiousness,-0.169,0.046,-0.043,0.172,1.0,0.067,0.043,0.023
Age,-0.026,-0.023,0.046,-0.06,0.067,1.0,-0.052,-0.017
BMI,0.016,-0.018,-0.056,-0.001,0.043,-0.052,1.0,0.067
Gender,0.014,0.025,-0.049,0.021,0.023,-0.017,0.067,1.0


### Semi-partial correlation

With partial correlation, we find the correlation between $x$ and $y$ holding $C$ constant for both $x$ and $y$. Sometimes, however, we want to hold $C$ constant for just $x$ or just $y$. In that case, we compute a semi-partial correlation. While a partial correlation is computed between two residuals, a semi-partial correlation is computed between one residual and another raw (or unresidualized) variable.

In [50]:
# Correlation between extraversion and openess whilst controlling 
# Extraversion for age:
df.partial_corr(x='Extraversion', y='Openness', x_covar='Age')

Unnamed: 0,n,r,CI95%,r2,adj_r2,p-val,BF10,power
pearson,500,0.267288,"[0.18, 0.35]",0.071443,0.067706,1.258499e-09,5396000.0,0.999984


In [51]:
# Correlation between Extraversion and openess whilst controlling extraversion 
# for Age and Openess for BMI and Gender:
df.partial_corr(x='Extraversion', y='Openness', x_covar='Age', y_covar=['BMI', 'Gender'])

Unnamed: 0,n,r,CI95%,r2,adj_r2,p-val,BF10,power
pearson,500,0.26638,"[0.18, 0.35]",0.070958,0.06722,1.437726e-09,4741000.0,0.999982


***
## Correlation matrix

If you have a lot of variables, it can be difficult to read the output of the [pairwise_corr](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html#pingouin.pairwise_corr) function. An alternative is to use the [rcorr](https://pingouin-stats.org/generated/pingouin.rcorr.html#pingouin.rcorr) function, which works directly as a Pandas DataFrame method, to obtain a correlation matrix with the r-values on the lower triangle and the p-values on the upper triangle.

In [52]:
df.rcorr()

Unnamed: 0,Neuroticism,Extraversion,Openness,Agreeableness,Conscientiousness,Age,BMI,Gender
Neuroticism,-,**,,,***,,,
Extraversion,-0.131,-,***,,,,,
Openness,0.046,0.267,-,***,,,,
Agreeableness,0.036,0.055,0.159,-,***,,,
Conscientiousness,-0.171,0.065,-0.013,0.159,-,,,
Age,-0.036,-0.004,0.035,-0.045,0.059,-,,
BMI,0.011,-0.033,-0.069,-0.0,0.039,-0.053,-,
Gender,0.007,0.011,-0.046,0.02,0.029,-0.023,0.071,-


In [53]:
# Using Spearman correlation and adjusting the p-values for multiple comparisons
df.rcorr(method='spearman', padjust='holm')

Unnamed: 0,Neuroticism,Extraversion,Openness,Agreeableness,Conscientiousness,Age,BMI,Gender
Neuroticism,-,***,,,***,,,
Extraversion,-0.33,-,***,,,,,
Openness,-0.02,0.243,-,**,,,,
Agreeableness,-0.132,0.062,0.17,-,**,,,
Conscientiousness,-0.365,0.056,-0.007,0.161,-,,,
Age,-0.045,0.003,0.032,-0.048,0.044,-,,
BMI,0.031,-0.054,-0.07,0.017,0.027,-0.053,-,
Gender,0.021,-0.014,-0.043,0.017,0.016,-0.025,0.072,-


In [54]:
# Showing the raw p-values (with 2 decimals) instead of stars
df.rcorr(method='spearman', stars=False, padjust='holm', decimals=2)

Unnamed: 0,Neuroticism,Extraversion,Openness,Agreeableness,Conscientiousness,Age,BMI,Gender
Neuroticism,-,0.00,1.,0.07,0.00,1.,1.,1.
Extraversion,-0.33,-,0.00,1.,1.,1.,1.,1.
Openness,-0.02,0.24,-,0.00,1.,1.,1.,1.
Agreeableness,-0.13,0.06,0.17,-,0.01,1.,1.,1.
Conscientiousness,-0.36,0.06,-0.01,0.16,-,1.,1.,1.
Age,-0.04,0.0,0.03,-0.05,0.04,-,1.,1.
BMI,0.03,-0.05,-0.07,0.02,0.03,-0.05,-,1.
Gender,0.02,-0.01,-0.04,0.02,0.02,-0.02,0.07,-
