# Correlations

## Simple correlation
Computing a correlation between two variables in Pingouin is done using the [corr](https://pingouin-stats.org/generated/pingouin.corr.html#pingouin.corr) function

In [1]:
import pandas as pd
import pingouin as pg

# Set the default Pandas float precision to 3 decimals
pd.set_option("display.precision", 3)

x = [4, 5, 7, 4, 5, 6, 8, 9]
y = [3, 4, 5, 3, 4, 5, 4, 3]

pg.corr(x, y)

Unnamed: 0,n,r,CI95%,p-val,BF10,power
pearson,8,0.185,"[-0.6, 0.79]",0.661,0.468,0.072


The [corr](https://pingouin-stats.org/generated/pingouin.corr.html#pingouin.corr) function returns a pandas DataFrame with:

1. The sample size `n` (after removal of NaN)
2. the correlation coefficient (`r`)
3. the parametric 95% confidence intervals of the coefficient (`CI95%`)
4. the p-value (`p-unc`)
5. the Bayes Factor for the alternative hypothesis (`BF10`)
6. the achieved power of the test (`power`, = 1 - type 2 error)

By default, the function returns the two-sided Pearson's correlation coefficients. This can be adjusted using the `tail` and `method` arguments:

In [2]:
pg.corr(x, y, method='spearman', alternative='greater')

Unnamed: 0,n,r,CI95%,p-val,power
spearman,8,0.319,"[-0.38, 1.0]",0.221,0.201


***********************

## Pairwise correlations

We will now see how to compute pairwise correlations coefficients across columns of a pandas DataFrame using the [pairwise_corr](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html#pingouin.pairwise_corr) function.

To do so, we will first load an example dataset in which each row represents one subject and each columns represent a score on the well-known Big Five personality traits. There are 500 subjects in total.

In [3]:
df = pg.read_dataset('pairwise_corr')

# Remove the 'Subject' column
df.drop(columns='Subject', inplace=True)

# Print the first lines
df.head()

Unnamed: 0,Neuroticism,Extraversion,Openness,Agreeableness,Conscientiousness
0,2.479,4.208,3.938,3.958,3.458
1,2.604,3.188,3.958,3.396,3.229
2,2.812,2.896,3.417,2.75,3.5
3,2.896,3.562,3.521,3.167,2.792
4,3.021,3.333,4.021,3.208,2.854


Let's see if the personality dimensions are correlated or not. For that, we will compute the pairwise correlations between all the columns of the DataFrame:

In [4]:
pg.pairwise_corr(df)  # Similar to df.pairwise_corr()

Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,BF10,power
0,Neuroticism,Extraversion,pearson,two-sided,500,-0.35,"[-0.42, -0.27]",7.323e-16,6765000000000.0,1.0
1,Neuroticism,Openness,pearson,two-sided,500,-0.01,"[-0.1, 0.08]",0.8169,0.058,0.056
2,Neuroticism,Agreeableness,pearson,two-sided,500,-0.134,"[-0.22, -0.05]",0.002615,5.122,0.854
3,Neuroticism,Conscientiousness,pearson,two-sided,500,-0.368,"[-0.44, -0.29]",1.759e-17,264400000000000.0,1.0
4,Extraversion,Openness,pearson,two-sided,500,0.267,"[0.18, 0.35]",1.288e-09,5277000.0,1.0
5,Extraversion,Agreeableness,pearson,two-sided,500,0.055,"[-0.03, 0.14]",0.2234,0.117,0.23
6,Extraversion,Conscientiousness,pearson,two-sided,500,0.065,"[-0.02, 0.15]",0.1492,0.158,0.303
7,Openness,Agreeableness,pearson,two-sided,500,0.159,"[0.07, 0.24]",0.0003517,32.635,0.948
8,Openness,Conscientiousness,pearson,two-sided,500,-0.013,"[-0.1, 0.07]",0.7642,0.059,0.06
9,Agreeableness,Conscientiousness,pearson,two-sided,500,0.159,"[0.07, 0.24]",0.0003685,31.243,0.946


In the example above, we can see that the highest correlation between personality dimensions is between `Neuroticism` and `Conscientiousness`, as indicated by the correlation coefficient (-0.368), the p-value (1.75e-17) and the Bayes Factor (1.80e14).

### Non-parametric correlations
If your data do not follow a normal distribution or contains outliers, you may want to use a non-parametric method such as the Spearman rank-correlation.

In the example below, we compute the one-sided Spearman pairwise correlations between a subset of columns. Note that the Bayes Factor is only computed when using the Pearson method and is therefore not present in the table above.

In [5]:
pg.pairwise_corr(data=df, columns=['Neuroticism', 'Extraversion'], method='spearman')

Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,power
0,Neuroticism,Extraversion,spearman,two-sided,500,-0.325,"[-0.4, -0.24]",8.385e-14,1.0


### Robust correlations
If you believe that your dataset contains outliers, you can use a robust correlation method. Learn more on the documentation of the [corr](https://pingouin-stats.org/generated/pingouin.corr.html#pingouin.corr) function.

In [6]:
# Introduce two outliers in variable X
df.loc[[5, 12, 24, 58], 'Neuroticism'] = 18

# Biweight midcorrelation
pg.pairwise_corr(data=df, columns=['Neuroticism', 'Extraversion'], method='bicor')

Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,power
0,Neuroticism,Extraversion,bicor,two-sided,500,-0.343,"[-0.42, -0.26]",2.908e-15,1.0


In [7]:
# Percentage bend correlation
pg.pairwise_corr(data=df, columns=['Neuroticism', 'Extraversion'], method='percbend')

Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,power
0,Neuroticism,Extraversion,percbend,two-sided,500,-0.327,"[-0.4, -0.25]",5.985e-14,1.0


In [8]:
# Shepherd's correlation
pg.pairwise_corr(data=df, columns=['Neuroticism', 'Extraversion'], method='shepherd')

Unnamed: 0,X,Y,method,alternative,n,outliers,r,CI95%,p-unc,power
0,Neuroticism,Extraversion,shepherd,two-sided,500,16.0,-0.319,"[-0.4, -0.24]",6.791e-13,1.0


### Correction for multiple comparisons
Finally, if you are computing a large number of correlation coefficients, you might want to correct the p-values for multiple comparisons. This can be done with `padjust` argument:

In [9]:
pg.pairwise_corr(df, method='spearman', padjust="holm").round(3)

Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,p-corr,p-adjust,power
0,Neuroticism,Extraversion,spearman,two-sided,500,-0.33,"[-0.41, -0.25]",0.0,0.0,holm,1.0
1,Neuroticism,Openness,spearman,two-sided,500,-0.02,"[-0.11, 0.07]",0.662,1.0,holm,0.072
2,Neuroticism,Agreeableness,spearman,two-sided,500,-0.132,"[-0.22, -0.04]",0.003,0.015,holm,0.843
3,Neuroticism,Conscientiousness,spearman,two-sided,500,-0.365,"[-0.44, -0.29]",0.0,0.0,holm,1.0
4,Extraversion,Openness,spearman,two-sided,500,0.243,"[0.16, 0.32]",0.0,0.0,holm,1.0
5,Extraversion,Agreeableness,spearman,two-sided,500,0.062,"[-0.03, 0.15]",0.166,0.666,holm,0.283
6,Extraversion,Conscientiousness,spearman,two-sided,500,0.056,"[-0.03, 0.14]",0.213,0.666,holm,0.238
7,Openness,Agreeableness,spearman,two-sided,500,0.17,"[0.08, 0.25]",0.0,0.001,holm,0.969
8,Openness,Conscientiousness,spearman,two-sided,500,-0.007,"[-0.09, 0.08]",0.88,1.0,holm,0.053
9,Agreeableness,Conscientiousness,spearman,two-sided,500,0.161,"[0.07, 0.24]",0.0,0.002,holm,0.951


### Advanced columns selection

One of the advantages of the [pairwise_corr](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html#pingouin.pairwise_corr) function is that it allows for almost unlimited flexibility with regards to columns indexing. To understand this, we'll first start by adding some fake columns to our dataframe:

In [10]:
import numpy as np
np.random.seed(123)
df['Age'] = np.random.randint(18, 70, size=df.shape[0])
df['BMI'] = np.random.randint(18, 45, size=df.shape[0])
df['Gender'] = np.random.randint(0, 2, size=df.shape[0])
df['Ethnicity'] = 'Caucasian'

df.head()

Unnamed: 0,Neuroticism,Extraversion,Openness,Agreeableness,Conscientiousness,Age,BMI,Gender,Ethnicity
0,2.479,4.208,3.938,3.958,3.458,63,28,1,Caucasian
1,2.604,3.188,3.958,3.396,3.229,20,36,0,Caucasian
2,2.812,2.896,3.417,2.75,3.5,46,25,0,Caucasian
3,2.896,3.562,3.521,3.167,2.792,52,27,0,Caucasian
4,3.021,3.333,4.021,3.208,2.854,56,33,0,Caucasian


Now, let's assume that I am interested in looking only at the correlation between `Age` on the one hand and all the other variables on the other hand. This can be done very easily:

In [11]:
pg.pairwise_corr(df, columns='Age')  # Age versus all the other numeric columns

Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,BF10,power
0,Age,Neuroticism,pearson,two-sided,500,-0.036,"[-0.12, 0.05]",0.418,0.078,0.128
1,Age,Extraversion,pearson,two-sided,500,-0.004,"[-0.09, 0.08]",0.921,0.056,0.051
2,Age,Openness,pearson,two-sided,500,0.035,"[-0.05, 0.12]",0.438,0.076,0.121
3,Age,Agreeableness,pearson,two-sided,500,-0.045,"[-0.13, 0.04]",0.313,0.093,0.172
4,Age,Conscientiousness,pearson,two-sided,500,0.059,"[-0.03, 0.15]",0.191,0.132,0.258
5,Age,BMI,pearson,two-sided,500,-0.053,"[-0.14, 0.03]",0.233,0.114,0.222
6,Age,Gender,pearson,two-sided,500,-0.023,"[-0.11, 0.06]",0.611,0.064,0.08


Great! What about if I am interested in looking at `Age` and `Gender` on the one hand and the personality dimensions on the other hand? That's also very easy:

In [12]:
subj = ['Age', 'Gender']
personality = ['Neuroticism', 'Extraversion', 'Openness', 'Agreeableness', 'Conscientiousness']
pg.pairwise_corr(df, columns=[subj, personality])  # Cartesian product between the first and second list

Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,BF10,power
0,Age,Neuroticism,pearson,two-sided,500,-0.036,"[-0.12, 0.05]",0.418,0.078,0.128
1,Age,Extraversion,pearson,two-sided,500,-0.004,"[-0.09, 0.08]",0.921,0.056,0.051
2,Age,Openness,pearson,two-sided,500,0.035,"[-0.05, 0.12]",0.438,0.076,0.121
3,Age,Agreeableness,pearson,two-sided,500,-0.045,"[-0.13, 0.04]",0.313,0.093,0.172
4,Age,Conscientiousness,pearson,two-sided,500,0.059,"[-0.03, 0.15]",0.191,0.132,0.258
5,Gender,Neuroticism,pearson,two-sided,500,0.007,"[-0.08, 0.09]",0.879,0.057,0.053
6,Gender,Extraversion,pearson,two-sided,500,0.011,"[-0.08, 0.1]",0.814,0.058,0.056
7,Gender,Openness,pearson,two-sided,500,-0.046,"[-0.13, 0.04]",0.3,0.096,0.179
8,Gender,Agreeableness,pearson,two-sided,500,0.02,"[-0.07, 0.11]",0.654,0.062,0.073
9,Gender,Conscientiousness,pearson,two-sided,500,0.029,"[-0.06, 0.12]",0.518,0.069,0.099


Also great...but what if I don't want to have to write all the columns names, and let's say that I am just interested in looking at `Age` and `Gender` versus all the other columns in the dataframe?

In [13]:
pg.pairwise_corr(df, columns=[subj, None])

Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,BF10,power
0,Age,Neuroticism,pearson,two-sided,500,-0.036,"[-0.12, 0.05]",0.418,0.078,0.128
1,Age,Extraversion,pearson,two-sided,500,-0.004,"[-0.09, 0.08]",0.921,0.056,0.051
2,Age,Openness,pearson,two-sided,500,0.035,"[-0.05, 0.12]",0.438,0.076,0.121
3,Age,Agreeableness,pearson,two-sided,500,-0.045,"[-0.13, 0.04]",0.313,0.093,0.172
4,Age,Conscientiousness,pearson,two-sided,500,0.059,"[-0.03, 0.15]",0.191,0.132,0.258
5,Age,BMI,pearson,two-sided,500,-0.053,"[-0.14, 0.03]",0.233,0.114,0.222
6,Gender,Neuroticism,pearson,two-sided,500,0.007,"[-0.08, 0.09]",0.879,0.057,0.053
7,Gender,Extraversion,pearson,two-sided,500,0.011,"[-0.08, 0.1]",0.814,0.058,0.056
8,Gender,Openness,pearson,two-sided,500,-0.046,"[-0.13, 0.04]",0.3,0.096,0.179
9,Gender,Agreeableness,pearson,two-sided,500,0.02,"[-0.07, 0.11]",0.654,0.062,0.073


### MultiIndex columns

The [pairwise_corr](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html#pingouin.pairwise_corr) function also works with two-dimensional columns. To illustrate this, let's create a fake multi-level dataframe:

In [14]:
from numpy.random import randint as rdint
np.random.seed(123)

# Create MultiIndex dataframe
columns = pd.MultiIndex.from_tuples([('Behavior', 'Rating'),
                                     ('Behavior', 'ReactionTime'),
                                     ('Physio', 'PupilDilation'),
                                     ('Physio', 'BPM'),
                                     ('Psycho', 'Sleepiness')])

data = pd.DataFrame(dict(Rating=rdint(0, 100, size=10),
                         RT=rdint(200, 500, size=10),
                         PupilDilation=rdint(-5, 5, size=10),
                         HR=rdint(45, 90, size=10),
                         Sleepiness=rdint(1, 7, size=10)))

data.columns = columns
data

Unnamed: 0_level_0,Behavior,Behavior,Physio,Physio,Psycho
Unnamed: 0_level_1,Rating,ReactionTime,PupilDilation,BPM,Sleepiness
0,66,273,-3,66,1
1,92,232,-1,75,4
2,98,424,3,72,3
3,17,311,-5,79,3
4,83,453,2,78,3
5,57,296,4,57,6
6,86,408,-2,85,3
7,97,268,-1,48,5
8,96,202,1,87,4
9,47,239,-4,50,4


To compute the correlation on the combination of all the columns, one can simply use:

In [15]:
pg.pairwise_corr(data)

Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,BF10,power
0,"(Behavior, Rating)","(Behavior, ReactionTime)",pearson,two-sided,10,0.135,"[-0.54, 0.7]",0.711,0.411,0.065
1,"(Behavior, Rating)","(Physio, PupilDilation)",pearson,two-sided,10,0.542,"[-0.13, 0.87]",0.106,1.234,0.388
2,"(Behavior, Rating)","(Physio, BPM)",pearson,two-sided,10,0.16,"[-0.52, 0.72]",0.66,0.422,0.072
3,"(Behavior, Rating)","(Psycho, Sleepiness)",pearson,two-sided,10,0.076,"[-0.58, 0.67]",0.836,0.394,0.054
4,"(Behavior, ReactionTime)","(Physio, PupilDilation)",pearson,two-sided,10,0.327,"[-0.38, 0.79]",0.356,0.566,0.155
5,"(Behavior, ReactionTime)","(Physio, BPM)",pearson,two-sided,10,0.3,"[-0.41, 0.78]",0.401,0.53,0.136
6,"(Behavior, ReactionTime)","(Psycho, Sleepiness)",pearson,two-sided,10,-0.294,"[-0.78, 0.41]",0.41,0.524,0.132
7,"(Physio, PupilDilation)","(Physio, BPM)",pearson,two-sided,10,0.05,"[-0.6, 0.66]",0.89,0.39,0.051
8,"(Physio, PupilDilation)","(Psycho, Sleepiness)",pearson,two-sided,10,0.424,"[-0.28, 0.83]",0.222,0.753,0.24
9,"(Physio, BPM)","(Psycho, Sleepiness)",pearson,two-sided,10,-0.42,"[-0.83, 0.29]",0.227,0.741,0.235


All the previously-mentionned advanced column selection tricks work just as fine here. For example, to compute the one versus all:

In [16]:
pg.pairwise_corr(data, columns=('Behavior', 'Rating'))

Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,BF10,power
0,"(Behavior, Rating)","(Behavior, ReactionTime)",pearson,two-sided,10,0.135,"[-0.54, 0.7]",0.711,0.411,0.065
1,"(Behavior, Rating)","(Physio, PupilDilation)",pearson,two-sided,10,0.542,"[-0.13, 0.87]",0.106,1.234,0.388
2,"(Behavior, Rating)","(Physio, BPM)",pearson,two-sided,10,0.16,"[-0.52, 0.72]",0.66,0.422,0.072
3,"(Behavior, Rating)","(Psycho, Sleepiness)",pearson,two-sided,10,0.076,"[-0.58, 0.67]",0.836,0.394,0.054


Or between multiple columns:

In [17]:
pg.pairwise_corr(data, columns=[('Behavior', 'ReactionTime'), ('Psycho', 'Sleepiness')])

Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,BF10,power
0,"(Behavior, ReactionTime)","(Psycho, Sleepiness)",pearson,two-sided,10,-0.294,"[-0.78, 0.41]",0.41,0.524,0.132


In [18]:
pg.pairwise_corr(data, columns=[[('Behavior', 'Rating'), ('Behavior', 'ReactionTime')], None])

Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,BF10,power
0,"(Behavior, Rating)","(Physio, PupilDilation)",pearson,two-sided,10,0.542,"[-0.13, 0.87]",0.106,1.234,0.388
1,"(Behavior, Rating)","(Physio, BPM)",pearson,two-sided,10,0.16,"[-0.52, 0.72]",0.66,0.422,0.072
2,"(Behavior, Rating)","(Psycho, Sleepiness)",pearson,two-sided,10,0.076,"[-0.58, 0.67]",0.836,0.394,0.054
3,"(Behavior, ReactionTime)","(Physio, PupilDilation)",pearson,two-sided,10,0.327,"[-0.38, 0.79]",0.356,0.566,0.155
4,"(Behavior, ReactionTime)","(Physio, BPM)",pearson,two-sided,10,0.3,"[-0.41, 0.78]",0.401,0.53,0.136
5,"(Behavior, ReactionTime)","(Psycho, Sleepiness)",pearson,two-sided,10,-0.294,"[-0.78, 0.41]",0.41,0.524,0.132


And finally between levels, using the [pandas.xs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.xs.html) function:

In [19]:
col_behavior = data.xs('Behavior', axis=1, level=0, drop_level=False).columns.to_list()
col_physio = data.xs('Physio', axis=1, level=0, drop_level=False).columns.to_list()

print('All columns in level "Behavior"', col_behavior)
print('All columns in level "Physio"', col_physio)

# pairwise_corr(data, columns=col_behavior + col_physio)  # All possible correlations
pg.pairwise_corr(data, columns=[col_behavior, col_physio])   # Between-list correlations

All columns in level "Behavior" [('Behavior', 'Rating'), ('Behavior', 'ReactionTime')]
All columns in level "Physio" [('Physio', 'PupilDilation'), ('Physio', 'BPM')]


Unnamed: 0,X,Y,method,alternative,n,r,CI95%,p-unc,BF10,power
0,"(Behavior, Rating)","(Physio, PupilDilation)",pearson,two-sided,10,0.542,"[-0.13, 0.87]",0.106,1.234,0.388
1,"(Behavior, Rating)","(Physio, BPM)",pearson,two-sided,10,0.16,"[-0.52, 0.72]",0.66,0.422,0.072
2,"(Behavior, ReactionTime)","(Physio, PupilDilation)",pearson,two-sided,10,0.327,"[-0.38, 0.79]",0.356,0.566,0.155
3,"(Behavior, ReactionTime)","(Physio, BPM)",pearson,two-sided,10,0.3,"[-0.41, 0.78]",0.401,0.53,0.136


***
## Partial correlation

In some cases, you will want to measure the correlation between two variables whilst controlling for the potential influence of other variables (also known as covariates). This can be done easily using the [partial_corr](https://pingouin-stats.org/generated/pingouin.partial_corr.html#pingouin.partial_corr) function.

In [20]:
# Correlation between extraversion and openess whilst controlling for age:
# Pandas-style: df.partial_corr(x='Extraversion', y='Openness', covar='Age')
pg.partial_corr(data=df, x='Extraversion', y='Openness', covar='Age', method='pearson')

Unnamed: 0,n,r,CI95%,p-val
pearson,500,0.267,"[0.18, 0.35]",1.277e-09


In [21]:
# Correlation between extraversion and openess whilst controlling for age and BMI:
pg.partial_corr(data=df, x='Extraversion', y='Openness', covar=['Age', 'BMI'], method='pearson')

Unnamed: 0,n,r,CI95%,p-val
pearson,500,0.266,"[0.18, 0.35]",1.652e-09


Cherry on the cake, the [pairwise_corr](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html#pingouin.pairwise_corr) function also supports partial correlation with the `covar` argument!

In [22]:
# Pandas-style: df.pairwise_corr(covar=['Age', 'Gender', 'BMI'], method='spearman')
pg.pairwise_corr(data=df, covar=['Age', 'Gender', 'BMI'], method='spearman')

Unnamed: 0,X,Y,method,covar,alternative,n,r,CI95%,p-unc
0,Neuroticism,Extraversion,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,-0.329,"[-0.41, -0.25]",5.247e-14
1,Neuroticism,Openness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,-0.016,"[-0.1, 0.07]",0.7287
2,Neuroticism,Agreeableness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,-0.135,"[-0.22, -0.05]",0.002495
3,Neuroticism,Conscientiousness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,-0.365,"[-0.44, -0.29]",4.28e-17
4,Extraversion,Openness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,0.241,"[0.16, 0.32]",5.7e-08
5,Extraversion,Agreeableness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,0.063,"[-0.02, 0.15]",0.1591
6,Extraversion,Conscientiousness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,0.058,"[-0.03, 0.14]",0.2001
7,Openness,Agreeableness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,0.174,"[0.09, 0.26]",9.949e-05
8,Openness,Conscientiousness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,-0.006,"[-0.09, 0.08]",0.8999
9,Agreeableness,Conscientiousness,spearman,"['Age', 'Gender', 'BMI']",two-sided,500,0.163,"[0.08, 0.25]",0.0002718


If you are only interested in the partial correlation matrix (and not the p-values, CI..), an alternative is to use the [pandas.DataFrame.pcorr()](https://pingouin-stats.org/generated/pingouin.pcorr.html#pingouin.pcorr) method that is implemented in Pingouin. This returns the pairwise correlation matrix between two variables while controlling for all the other variables:

In [23]:
df.pcorr().round(3)

Unnamed: 0,Neuroticism,Extraversion,Openness,Agreeableness,Conscientiousness,Age,BMI,Gender
Neuroticism,1.0,-0.14,0.074,0.057,-0.169,-0.026,0.016,0.014
Extraversion,-0.14,1.0,0.272,0.007,0.046,-0.023,-0.018,0.025
Openness,0.074,0.272,1.0,0.156,-0.043,0.046,-0.056,-0.049
Agreeableness,0.057,0.007,0.156,1.0,0.172,-0.06,-0.001,0.021
Conscientiousness,-0.169,0.046,-0.043,0.172,1.0,0.067,0.043,0.023
Age,-0.026,-0.023,0.046,-0.06,0.067,1.0,-0.052,-0.017
BMI,0.016,-0.018,-0.056,-0.001,0.043,-0.052,1.0,0.067
Gender,0.014,0.025,-0.049,0.021,0.023,-0.017,0.067,1.0


### Semi-partial correlation

With partial correlation, we find the correlation between $x$ and $y$ holding $C$ constant for both $x$ and $y$. Sometimes, however, we want to hold $C$ constant for just $x$ or just $y$. In that case, we compute a semi-partial correlation. While a partial correlation is computed between two residuals, a semi-partial correlation is computed between one residual and another raw (or unresidualized) variable.

In [24]:
# Correlation between extraversion and openess whilst controlling 
# Extraversion for age:
df.partial_corr(x='Extraversion', y='Openness', x_covar='Age')

Unnamed: 0,n,r,CI95%,p-val
pearson,500,0.267,"[0.18, 0.35]",1.307e-09


***
## Correlation matrix

If you have a lot of variables, it can be difficult to read the output of the [pairwise_corr](https://pingouin-stats.org/generated/pingouin.pairwise_corr.html#pingouin.pairwise_corr) function. An alternative is to use the [rcorr](https://pingouin-stats.org/generated/pingouin.rcorr.html#pingouin.rcorr) function, which works directly as a Pandas DataFrame method, to obtain a correlation matrix with the r-values on the lower triangle and the p-values on the upper triangle.

In [25]:
df.rcorr()

Unnamed: 0,Neuroticism,Extraversion,Openness,Agreeableness,Conscientiousness,Age,BMI,Gender
Neuroticism,-,**,,,***,,,
Extraversion,-0.131,-,***,,,,,
Openness,0.046,0.267,-,***,,,,
Agreeableness,0.036,0.055,0.159,-,***,,,
Conscientiousness,-0.171,0.065,-0.013,0.159,-,,,
Age,-0.036,-0.004,0.035,-0.045,0.059,-,,
BMI,0.011,-0.033,-0.069,-0.0,0.039,-0.053,-,
Gender,0.007,0.011,-0.046,0.02,0.029,-0.023,0.071,-


In [26]:
# Using Spearman correlation and adjusting the p-values for multiple comparisons
df.rcorr(method='spearman', padjust='holm')

Unnamed: 0,Neuroticism,Extraversion,Openness,Agreeableness,Conscientiousness,Age,BMI,Gender
Neuroticism,-,***,,,***,,,
Extraversion,-0.33,-,***,,,,,
Openness,-0.02,0.243,-,**,,,,
Agreeableness,-0.132,0.062,0.17,-,**,,,
Conscientiousness,-0.365,0.056,-0.007,0.161,-,,,
Age,-0.045,0.003,0.032,-0.048,0.044,-,,
BMI,0.031,-0.054,-0.07,0.017,0.027,-0.053,-,
Gender,0.021,-0.014,-0.043,0.017,0.016,-0.025,0.072,-


In [27]:
# Showing the raw p-values (with 2 decimals) instead of stars
df.rcorr(method='spearman', stars=False, padjust='holm', decimals=2)

Unnamed: 0,Neuroticism,Extraversion,Openness,Agreeableness,Conscientiousness,Age,BMI,Gender
Neuroticism,-,0.00,1.,0.07,0.00,1.,1.,1.
Extraversion,-0.33,-,0.00,1.,1.,1.,1.,1.
Openness,-0.02,0.24,-,0.00,1.,1.,1.,1.
Agreeableness,-0.13,0.06,0.17,-,0.01,1.,1.,1.
Conscientiousness,-0.36,0.06,-0.01,0.16,-,1.,1.,1.
Age,-0.04,0.0,0.03,-0.05,0.04,-,1.,1.
BMI,0.03,-0.05,-0.07,0.02,0.03,-0.05,-,1.
Gender,0.02,-0.01,-0.04,0.02,0.02,-0.02,0.07,-
