# Effect Size Tests (Using Pingouin)

Hypothesis tests report on the likelihood of observed results given the assumptions. Effect size methods are a suite of statistical tools for quantifying the size of effect, and are a good complement to hypothesis testing.

An effect size refers to the size of an effect or result as it would be expected to occur in a population, and can standardize measurement such that it is comparable across populations and experiments.

## Compute Effect Size

#### Parameters

<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th style="text-align: left">Function</th><th style="text-align: left">Parameter</th><th style="text-align: left">Type</th><th style="text-align: left">Description</th></tr></thead><tbody>
 <tr><td style="text-align: left">compute_effsize</td><td style="text-align: left">x</td><td style="text-align: left">array or list</td><td style="text-align: left">first set of observations</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">y</td><td style="text-align: left">array or list</td><td style="text-align: left">second set of observations</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">paired</td><td style="text-align: left">boolean</td><td style="text-align: left">if True, uses Cohen d-avg formula to correct for repeated measurements</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">eftype</td><td style="text-align: left">string</td><td style="text-align: left">desired output effect size ('none', 'cohen', 'hedges', 'r', 'eta-square', 'odds ratio', 'AUC', 'CLES')</td></tr>
 <tr><td style="text-align: left">returns:</td><td style="text-align: left">ef</td><td style="text-align: left">float</td><td style="text-align: left">effect size</td></tr>
</tbody></table>

#### Cohen's d from Independent Samples

<p>If x and y are independent, the Cohen d is:</p>

<p>$$d = \frac{ \bar{X}-\bar{Y} }{ \frac{ \sqrt{  (n_1-1)\sigma_1^2 + (n_2-1)\sigma_2^2 } }{ n_1 + n_2 - 2 } }$$</p>

<p>Cohen's d is a biased estimate of the population size, especially for small samples (e.g., $\le 20$), where Hedges g is preferable</p>

<p>$$g = d \left( 1 - \frac{ 3 }{ 4(n_1 + n_2) - 9 } \right)$$</p>

In [3]:
import numpy as np
import pingouin as pg
x = [1, 2, 3, 4]
y = [3, 4, 5, 6, 7]
pg.compute_effsize(x, y, paired=False, eftype='cohen')

-1.707825127659933

The sign of the Cohen d will be opposite if we reverse the order of x and y

In [4]:
pg.compute_effsize(y, x, paired=False, eftype='cohen')

1.707825127659933

#### Hedges g from Paired Samples

In [6]:
x = [1, 2, 3, 4, 5, 6, 7]
y = [1, 3, 5, 7, 9, 11, 13]
pg.compute_effsize(x, y, paired=True, eftype='hedges')

-0.8222477210374874

### Common Language Effect Size

<p>The 'common language' effect size is the proportion of pairs where x > y, and where each observation of x is paired to each observation of y</p>

<p>$$CL = P(X > Y) + 0.5(P(X=Y))$$</p>

In [7]:
pg.compute_effsize(x, y, eftype='cles')

0.2857142857142857

In [8]:
pg.compute_effsize(y, x, eftype='cles') # reversed the order of x and y

0.7142857142857143

## Effect Size from t

#### Parameters

<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th style="text-align: left">Function</th><th style="text-align: left">Parameter</th><th style="text-align: left">Type</th><th style="text-align: left">Description</th></tr></thead><tbody>
 <tr><td style="text-align: left">compute_effsize_from_t</td><td style="text-align: left">tval</td><td style="text-align: left">float</td><td style="text-align: left">t-value</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">nx, ny</td><td style="text-align: left">int</td><td style="text-align: left">group sample sizes</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">N</td><td style="text-align: left">int</td><td style="text-align: left">total sample size (will not be used if nx and ny are specified</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">eftype</td><td style="text-align: left">string</td><td style="text-align: left">desired output effect size</td></tr>
 <tr><td style="text-align: left">returns:</td><td style="text-align: left">ef</td><td style="text-align: left">float</td><td style="text-align: left">effect size</td></tr>
</tbody></table>

#### Sample Sizes Known

<p>If both nx and ny are specified, the formula is:</p>

<p>$$d = t\text{*} \sqrt{ \frac{1}{n_x} + \frac{1}{n_y} }$$</p>

In [9]:
from pingouin import compute_effsize_from_t
tval, nx, ny = 2.90, 35, 25
d = compute_effsize_from_t(tval, nx=nx, ny=ny, eftype='cohen')
print(d)

0.7593982580212534


### Only Total Sample Size Known

#### Description/Intuition

<p>If only N is specified, the formula is:</p>

<p>$$d = \frac{2t}{\sqrt{n}}$$</p>

In [10]:
tval, N = 2.90, 60
d = compute_effsize_from_t(tval, N=N, eftype='cohen')
print(d)

0.7487767802667672


## Convert Effect Size

#### Parameters

<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th style="text-align: left">Function</th><th style="text-align: left">Parameter</th><th style="text-align: left">Type</th><th style="text-align: left">Description</th></tr></thead><tbody>
 <tr><td style="text-align: left">convert_effsize</td><td style="text-align: left">ef</td><td style="text-align: left">float</td><td style="text-align: left">original effect size</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">input_type</td><td style="text-align: left">string</td><td style="text-align: left">'r' or 'cohen'</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">output_type</td><td style="text-align: left">string</td><td style="text-align: left">desired output effect size ('cohen', 'hedges', 'eta-square', 'odds ratio', 'AUC', 'none')</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">nx, ny</td><td style="text-align: left">int</td><td style="text-align: left">length of vector x and y, required to convert Hedges g</td></tr>
 <tr><td style="text-align: left">returns:</td><td style="text-align: left">ef</td><td style="text-align: left">float</td><td style="text-align: left">desired converted effect size</td></tr>
</tbody></table>

#### Cohen's d to Eta-Squared

<p>To convert d to eta-squared:</p>

<p>$$\eta^2 = \frac{ (0.5 d)^2 }{ 1 + (0.5 d)^2 }$$</p>

In [11]:
import pingouin as pg
d = .45
eta = pg.convert_effsize(d, 'cohen', 'eta-square')
print(eta)

0.048185603807257595


#### Cohen's d to Hedges g

In [12]:
pg.convert_effsize(.45, 'cohen', 'hedges', nx=10, ny=10)

0.4309859154929578

#### Pearson r to Cohen's d

In [13]:
# Description/Intuition

<p>The formula to convert r to d is given by:</p>

<p>$$d = \frac{2r}{\sqrt{1-r^2}}$$</p>

In [14]:
r = 0.40
d = pg.convert_effsize(r, 'r', 'cohen')
print(d)

0.8728715609439696


#### Cohen's d to Pearson r

<p>To convert d to r:</p>

<p>$$r = \frac{ d }{ \sqrt{ d^2 + \frac{ (n_x + n_y)^2 - 2(n_x + n_y) }{ n_x n_y } } }$$</p>

In [15]:
pg.convert_effsize(d, 'cohen', 'r')

0.4000000000000001

#### Cohen's d to Odds Ratio

<p>To convert d to an odds ratio:</p>

<p>$$OR = exp ( \frac{ d \pi }{ \sqrt{3} } )$$</p>

In [16]:
pg.convert_effsize(d, 'cohen', 'odds-ratio')

4.870584168175906

### Cohen's d to Area Under a Curve

#### Description/Intuition

<p>To convert d to area under a curve:</p>

<p>$$AUC = N_{CDF} (\frac{d}{\sqrt{2}})$$</p>

In [17]:
pg.convert_effsize(d, 'cohen', 'auc')

0.7314530107786792

## Effect Size Confidence Interval

#### Parameters

<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th style="text-align: left">Function</th><th style="text-align: left">Parameter</th><th style="text-align: left">Type</th><th style="text-align: left">Description</th></tr></thead><tbody>
 <tr><td style="text-align: left">compute_esci</td><td style="text-align: left">stat</td><td style="text-align: left">float</td><td style="text-align: left">original effect size; either a correlation coefficient or Cohen-type effect size (Cohen's d or Hedges g)</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">nx, ny</td><td style="text-align: left">int</td><td style="text-align: left">length of vector x and y</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">paired</td><td style="text-align: left">boolean</td><td style="text-align: left">whether effect size was estimated from a paired sample</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">eftype</td><td style="text-align: left">string</td><td style="text-align: left">'r' or 'cohen'</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">confidence</td><td style="text-align: left">float</td><td style="text-align: left">confidence level (e.g. 95%)</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">decimals</td><td style="text-align: left">int</td><td style="text-align: left">number digits after decimal to round to</td></tr>
 <tr><td>&nbsp;</td><td style="text-align: left">alternative</td><td style="text-align: left">string</td><td style="text-align: left">'two-sided', 'greater', or 'less'</td></tr>
 <tr><td style="text-align: left">returns:</td><td style="text-align: left">ci</td><td style="text-align: left">array</td><td style="text-align: left">desired converted effect size</td></tr>
</tbody></table>

#### For Pearson's r

<p>To compute the parametric confidence interval around a Pearson's r correlation coefficient, one must first apply a Fischer's r-to-z transformation</p>

<p>$$z = 0.5 \cdot ln \frac{1+r}{1-r} = arctanh(x)$$</p>

<p>and compute the standard error</p>

<p>$$SE = \frac{1}{\sqrt{n-3}}$$</p>

<p>The lower and upper confidence intervals in z-space are:</p>

<p>$$ci_z = z \pm crit \cdot SE$$</p>

<p>where $crit$ is the critical value of the normal distribution corresponding to the desired confidence level</p>

In [18]:
import pingouin as pg
x = [3, 4, 6, 7, 5, 6, 7, 3, 5, 4, 2]
y = [4, 6, 6, 7, 6, 5, 5, 2, 3, 4, 1]
nx, ny = len(x), len(y)
stat = pg.compute_effsize(x, y, eftype='r')
ci = pg.compute_esci(stat=stat, nx=nx, ny=ny, eftype='r')
print(round(stat, 4), ci)

0.7468 [0.27 0.93]


#### For Cohen's d

<p>A formula for calculating the confidence interval for a Cohen d effect size is:</p>

<p>$$SE = \sqrt{ \frac{n_x + n_y}{n_x n_y} + \frac{d^2}{2(n_x + n_y)} }$$</p>

<p>The lower and upper confidence intervals are then given by:</p>

<p>$$ci_d = d \pm crit \cdot SE$$</p>

<p>where $crit$ is the critical value of the t-distribution</p>

In [19]:
stat = pg.compute_effsize(x, y, eftype='cohen')
ci = pg.compute_esci(stat, nx=nx, ny=ny, eftype='cohen', decimals=3)
print(round(stat, 4), ci)

0.1538 [-0.737  1.045]
