Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
260 lines (139 sloc) 21.5 KB

Effect size calculators

My name is James Uanhoro, and I am a PhD student in the Quantitative Research, Evaluation & Measurement (QREM) program within the Educational Studies department at The Ohio State University. My advisor is Professor Ann O'Connell.

This is a calculator I originally built as a spreadsheet for the Introduction to Educational Statistics class I serve(d) as Teaching Assistant for. I initially intended that it focus only on the Cohen's d family of effect sizes. My work on the Cohen's d family of effect sizes owes a lot to Daniël Lakens' writings.

I want to thank Professor Cristian Gugiu for originally introducing me to noncentral distributions for calculating confidence intervals in his course on univariate statistics.

And I thank Professor O'Connell for her guidance through her multilevel modeling class and in person with my work here on multilevel models, and for her continued support of my graduate career and this project.

You can contact me at uanhoro.1@osu.edu.

Formulae

Cohen's d family

The formulae for point estimates for the Cohen's d family of effect sizes (d, g) and r were obtained from Lakens (2013). The R package MBESS (Kelley, 2007) - via the Open CPU API - is used to compute confidence intervals using the noncentral t method. The confidence intervals were computed on d rather than g (Cumming, 2012). The formulae for the estimation of the noncentrality parameter (equation) and its transformation to confidence intervals around d for:

  • the one-sample t-tests and independent-samples t-test are equivalent to equations 4.6 & 4.7 in chapter 4 of Smithson's Confidence Intervals (2003, pp. 33–41);
  • the within-subject designs are equations 1 & 2 in Algina, Keselman, and Penfield (2005).

Confidence Intervals

t is calculated by converting from d, except for the paired-samples test. An Open URI API call is made using t as an estimate of equation. This uses the conf.limits.nct function within the R MBESS package. It returns lower and upper limits on t, which are converted back to lower and upper limits d.

One-sample t-test

equation

Confidence Intervals

equation

https://public.opencpu.org/ocpu/library/MBESS/R/conf.limits.nct/json, body: { ncp: t, df: n - 1, 'conf.level' => confidence_interval }

equation

Notation:

M : sample mean; equation : population mean; s : sample standard deviation; n : sample size; t : estimate of equation; equation : Lower and upper limits on Cohen's d

Independent-samples t-test

equation

equation

equation

Confidence Intervals

equation

https://public.opencpu.org/ocpu/library/MBESS/R/conf.limits.nct/json, body: { ncp: t, df: equation, 'conf.level' => confidence_interval }

equation

Notation:

equation : mean of group 1; equation : mean of group 2; equation : sample size of group 1; equation : sample size of group 2; equation : standard deviation of group 1; equation : standard deviation of group 2; N : sum of sample size of group 1 and sample size of group 2; t : estimate of equation; equation : Lower and upper limits on Cohen's d

Paired-samples t-test

Calculated using average of standard deviations (recommended)

equation

equation

Calculated using repeated measures

equation

equation

Confidence Intervals

equation

equation

https://public.opencpu.org/ocpu/library/MBESS/R/conf.limits.nct/json, body: { ncp: t, df: equation, 'conf.level' => confidence_interval }

equation

Notation:

equation : mean of group 1; equation : mean of group 2; equation : standard deviation of group 1; equation : standard deviation of group 2; equation : number of pairs; r : correlation between group 1 and group 2; equation : covariance of group 1 and group 2; t : estimate of equation; equation : Lower and upper limits on Cohen's d

Odds-ratio

The R package epitools (Aragon, 2012) - via the Open CPU API - is used to compute the odds-ratio and confidence intervals using the oddsratio set of functions. The function called depends on the method the user selects: { midp: oddsratio.midp, fisher: oddsratio.fisher, wald: oddsratio.wald, small: oddsratio.small }. The 2x2 table is transformed into a vector ([00, 01, 10, 11]), which is passed to the selected oddsratio function, alongside the preferred confidence interval. It returns the odds ratio, and its confidence interval.

The sample call below is for the default method, midp.

https://public.opencpu.org/ocpu/library/epitools/R/oddsratio.midp, body: { 'x' => {Matrix transformed into vector}, 'conf.level' => confidence_interval }

ANOVA

These formulae apply only when all your factors are manipulated not measured, and when there are no covariates. The R package MBESS (Kelley, 2007) - via the Open CPU API - is used to compute confidence intervals using the noncentral F method. The default confidence interval is set to 90%. This is equivalent to the 95% two-sided confidence interval given that the F-statistic cannot be negative (Smithson, 2003, pp. 42–66).

Partial eta-squared

The formula for partial eta-squared is equation 13 from Lakens (2013), while that for its confidence intervals is equation 5.6 in chapter 5 of Smithson's Confidence Intervals (2003, pp. 42–66).

equation

Confidence Intervals

This call to Open CPU returns the limits on F, as noncentrality parameters (equation), which need to be converted back to partial eta-squared. I use the conf.limits.ncf function within the R MBESS package.

https://public.opencpu.org/ocpu/library/MBESS/R/conf.limits.ncf/json, body: { 'F.value' => F, 'df.1' => equation, 'df.2' => equation, 'conf.level' => confidence_interval }

equation

Notation:

equation : partial eta-squared; F : F-statistic; equation : effect degrees of freedom; equation : error degrees of freedom; equation : noncentrality parameter

Partial omega-squared

This formula for partial omega-squared is equation 10 in Carroll and Nordholm (1975).

equation

Notation:

equation : partial omega-squared; F : F-statistic; equation : effect degrees of freedom; equation : error degrees of freedom

Regression OLS

The R package MBESS (Kelley, 2007) - via the Open CPU API - is used to compute confidence intervals using the ci.R2 function. The default confidence interval is set to 90%. This is equivalent to the 95% two-sided confidence interval given that the R-squared cannot be negative (Smithson, 2003, pp. 42–66).

R-squared confidence intervals

https://public.opencpu.org/ocpu/library/MBESS/R/ci.R2/json, body: { 'R2' => R2, 'df.1' => equation, 'df.2' => equation, 'conf.level' => confidence_interval }

Notation:

R2 : R-squared; equation : effect degrees of freedom; equation : error degrees of freedom

Hierarchical Linear Modeling / Multilevel Modeling / Mixed Effects Modeling

All analysis related to multilevel models is performed using a Python API I created for the task. The API largely depends on the MixedLM function within the StatsModels package.

Intracluster/Intraclass correlation coefficient (ICC)

ANOVA Method

To calculate the confidence intervals, I used a variation of Searle's method (1971, p. 414 - third equation in Table 9.14) which adjusts for unbalanced data by replacing the number of subjects per cluster in Searle's formula with the weighted mean cluster size - equation 9 in Ukoumunne (2002). All of this is handled by a call to the Python API listed above. The code within the Python API is near-identical to the ICCest function in the R ICC package. The call to the API returns the ICC, an estimate of variance across clusters, an estimate of variance within clusters, lower and upper limits on ICC, the number of clusters used in the analysis, and the weighted mean cluster size.

REML/ML & Optimization method

The Python API performs REML and FEML/ML using the code below from the StatsModels package in Python. The Nelder-Mead optimization method (Nelder & Mead, 1965) is applied by default.

model = sm.MixedLM.from_formula('values ~ 1', df, groups=df['clusters'])

res = model.fit(reml=method, method='nm')

The data are stored in a Pandas dataframe, df; values are the outcome data, with clusters being the cluster groupings. Method is either True to use REML or False to use ML.

The level-2 variance around the intercept, equation, is obtained using res.cov_re.groups[0], the within group variance, equation, is obtained using res.scale, and the ICC is calculated using the formula, equation. REML and ML return only the ICC, and the variance estimates. All other results are computed using the ANOVA method.

Pseudo R-Squared

The model is run using REML.

Model equations

The Python API constructs the model equations. It also centers the variables based on user-specifications.

For example, consider a model with outcome math_achievement; the level-1 predictors are student_ses and gender, and level-2 predictor is school_type as a predictor of the intercept and student_ses.

The model equation is: math_achievement ~ student_ses + gender + school_type + student_ses:school_type. Without specifying additional options, this is a random-intercepts model.

Assuming the cluster variable is represented by school, the model (null or fitted) is saved into a variable called model_equation, and the data are stored in a Pandas dataframe called data, the StatsModels code is:

model = sm.MixedLM.from_formula(model_equation, data, groups=data['school'])

res = model.fit(reml=True, method='nm') # The method changes depending on the optimization method selected.

Nakagawa & Schielzeth R-squared

Nakagawa & Schielzeth's marginal and conditional R-squared's for mixed-effects models (Nakagawa & Schielzeth, 2013) are also computed on the Python API. They are an attempt to resolve some of the problems associated with previous formulations of R-squared's in mixed-effects models, including the formulations by Snijders and Bosker (1994). The R-squared's here are for random-intercepts models and random-slopes models based on Johnson's (2014) extension to the work of Nakagwa & Schielzeth (2013).

The marginal R-squared (equation) is variance explained by fixed factors, and the conditional R-squared (equation) is variance explained by both fixed and random factors. The formulae below differ from the standard expressions for (equation) & (equation) because for a linear mixed-effects model, there is no distribution-specific variance.

equation

equation

Notation:

equation is the variance explained by the fixed effects in the model and equation is the average of the random effects variance. In model where the intercept is the only random effect, this resolves to the variance of the random intercept. See Johnson (2014) for rationale.

ICCs

The residual ICC is calculated from the fitted model. It is:

Residual ICC

Model convergence

At times, the (fitted) models may fail to converge, and other times, the results time out. When either happens, try a different optimization method. It is also possible your dataset is too large for the servers I am using if your computation repeatedly times out.

References