In statistics, we often use the Pearson correlation coefficient to measure the linear relationship between two variables. However, sometimes we’re interested in understanding the relationship between two variables while controlling for a third variable.

For example, suppose we want to measure the association between the number of hours a student studies and the final exam score they receive, while controlling for the student’s current grade in the class. In this case, we could use a partial correlation to measure the relationship between hours studied and final exam score.

In [1]:
import pandas as pd
import numpy as np

data = {'currentGrade':  [82, 88, 75, 74, 93, 97, 83, 90, 90, 80],
        'hours': [4, 3, 6, 5, 4, 5, 8, 7, 4, 6],
        'examScore': [88, 85, 76, 70, 92, 94, 89, 85, 90, 93]
        }

df = pd.DataFrame(data, columns = ["currentGrade", "hours", "examScore"])
df

Unnamed: 0,currentGrade,hours,examScore
0,82,4,88
1,88,3,85
2,75,6,76
3,74,5,70
4,93,4,92
5,97,5,94
6,83,8,89
7,90,7,85
8,90,4,90
9,80,6,93


To calculate the partial correlation between hours and examScore while controlling for currentGrade, we can use the partial_corr() function from the pingouin package, which uses the following syntax:

* partial_corr(data, x, y, covar)

where:

* data: name of the dataframe
* x, y: names of columns in the dataframe
* covar: the name of the covariate column in the dataframe (e.g. the variable you’re controlling for)

In [2]:
# !pip install pingouin

Collecting pingouin
  Downloading pingouin-0.5.2.tar.gz (185 kB)
Collecting pandas_flavor>=0.2.0
  Downloading pandas_flavor-0.3.0-py3-none-any.whl (6.3 kB)
Collecting outdated
  Downloading outdated-0.2.1-py3-none-any.whl (7.5 kB)
Collecting lazy-loader==0.1rc2
  Downloading lazy_loader-0.1rc2-py3-none-any.whl (5.1 kB)
Collecting littleutils
  Downloading littleutils-0.2.2.tar.gz (6.6 kB)
Building wheels for collected packages: pingouin, littleutils
  Building wheel for pingouin (setup.py): started
  Building wheel for pingouin (setup.py): finished with status 'done'
  Created wheel for pingouin: filename=pingouin-0.5.2-py3-none-any.whl size=196207 sha256=9e8d55b54e00e78392d72e6ba4a8a59af202b9e405424761882ed80edfbce297
  Stored in directory: c:\users\asus\appdata\local\pip\cache\wheels\c0\9f\92\4c574395e1e8e5e08cf73dcb76815a7eaa62921833b6b0f6ad
  Building wheel for littleutils (setup.py): started
  Building wheel for littleutils (setup.py): finished with status 'done'
  Created wheel 

In [3]:
import pingouin as pg

# find partial correlation between hours and exam score while controlling for current grade
pg.partial_corr(data = df, x = "hours", y = "examScore", covar = "currentGrade")

Unnamed: 0,n,r,CI95%,p-val
pearson,10,0.190626,"[-0.54, 0.76]",0.623228


In [4]:
# calculate all pairwise partial correlation, rounded to three decimal places
df.pcorr().round(3)

Unnamed: 0,currentGrade,hours,examScore
currentGrade,1.0,-0.311,0.736
hours,-0.311,1.0,0.191
examScore,0.736,0.191,1.0
