Skip to content
{{ message }}

# stonegold546 / cohens_d_calculators

Switch branches/tags
Nothing to show

Cannot retrieve contributors at this time
259 lines (139 sloc) 21.5 KB

# Effect size calculators

My name is James Uanhoro, and I am a PhD student in the Quantitative Research, Evaluation & Measurement (QREM) program within the Educational Studies department at The Ohio State University. My advisor is Professor Ann O'Connell.

This is a calculator I originally built as a spreadsheet for the Introduction to Educational Statistics class I serve(d) as Teaching Assistant for. I initially intended that it focus only on the Cohen's d family of effect sizes. My work on the Cohen's d family of effect sizes owes a lot to Daniël Lakens' writings.

I want to thank Professor Cristian Gugiu for originally introducing me to noncentral distributions for calculating confidence intervals in his course on univariate statistics.

And I thank Professor O'Connell for her guidance through her multilevel modeling class and in person with my work here on multilevel models, and for her continued support of my graduate career and this project.

You can contact me at uanhoro.1@osu.edu.

# Formulae

## Cohen's d family

The formulae for point estimates for the Cohen's d family of effect sizes (d, g) and r were obtained from Lakens (2013). The R package `MBESS` (Kelley, 2007) - via the Open CPU API - is used to compute confidence intervals using the noncentral t method. The confidence intervals were computed on d rather than g (Cumming, 2012). The formulae for the estimation of the noncentrality parameter ( ) and its transformation to confidence intervals around d for:

• the one-sample t-tests and independent-samples t-test are equivalent to equations 4.6 & 4.7 in chapter 4 of Smithson's Confidence Intervals (2003, pp. 33–41);
• the within-subject designs are equations 1 & 2 in Algina, Keselman, and Penfield (2005).

### Confidence Intervals

t is calculated by converting from d, except for the paired-samples test. An Open URI API call is made using t as an estimate of . This uses the `conf.limits.nct` function within the R `MBESS` package. It returns lower and upper limits on t, which are converted back to lower and upper limits d.

### One-sample t-test #### Confidence Intervals https://public.opencpu.org/ocpu/library/MBESS/R/conf.limits.nct/json, body: { ncp: t, df: n - 1, 'conf.level' => confidence_interval } ### Independent-samples t-test   #### Confidence Intervals  ### Paired-samples t-test

#### Calculated using average of standard deviations (recommended)  #### Calculated using repeated measures  #### Confidence Intervals   ## Odds-ratio

The R package `epitools` (Aragon, 2012) - via the Open CPU API - is used to compute the odds-ratio and confidence intervals using the `oddsratio` set of functions. The function called depends on the method the user selects: { `midp`: `oddsratio.midp`, `fisher`: `oddsratio.fisher`, `wald`: `oddsratio.wald`, `small`: `oddsratio.small` }. The 2x2 table is transformed into a vector (`[00, 01, 10, 11]`), which is passed to the selected oddsratio function, alongside the preferred confidence interval. It returns the odds ratio, and its confidence interval.

The sample call below is for the default method, `midp`.

https://public.opencpu.org/ocpu/library/epitools/R/oddsratio.midp, body: { 'x' => {Matrix transformed into vector}, 'conf.level' => confidence_interval }

## ANOVA

These formulae apply only when all your factors are manipulated not measured, and when there are no covariates. The R package `MBESS` (Kelley, 2007) - via the Open CPU API - is used to compute confidence intervals using the noncentral F method. The default confidence interval is set to 90%. This is equivalent to the 95% two-sided confidence interval given that the F-statistic cannot be negative (Smithson, 2003, pp. 42–66).

### Partial eta-squared

The formula for partial eta-squared is equation 13 from Lakens (2013), while that for its confidence intervals is equation 5.6 in chapter 5 of Smithson's Confidence Intervals (2003, pp. 42–66). #### Confidence Intervals

This call to Open CPU returns the limits on F, as noncentrality parameters ( ), which need to be converted back to partial eta-squared. I use the `conf.limits.ncf` function within the R `MBESS` package. ### Partial omega-squared

This formula for partial omega-squared is equation 10 in Carroll and Nordholm (1975). ## Regression OLS

The R package `MBESS` (Kelley, 2007) - via the Open CPU API - is used to compute confidence intervals using the `ci.R2` function. The default confidence interval is set to 90%. This is equivalent to the 95% two-sided confidence interval given that the R-squared cannot be negative (Smithson, 2003, pp. 42–66).

## Hierarchical Linear Modeling / Multilevel Modeling / Mixed Effects Modeling

All analysis related to multilevel models is performed using a Python API I created for the task. The API largely depends on the `MixedLM` function within the `StatsModels` package.

### Intracluster/Intraclass correlation coefficient (ICC)

#### ANOVA Method

To calculate the confidence intervals, I used a variation of Searle's method (1971, p. 414 - third equation in Table 9.14) which adjusts for unbalanced data by replacing the number of subjects per cluster in Searle's formula with the weighted mean cluster size - equation 9 in Ukoumunne (2002). All of this is handled by a call to the Python API listed above. The code within the Python API is near-identical to the `ICCest` function in the R ICC package. The call to the API returns the ICC, an estimate of variance across clusters, an estimate of variance within clusters, lower and upper limits on ICC, the number of clusters used in the analysis, and the weighted mean cluster size.

#### REML/ML & Optimization method

The Python API performs REML and FEML/ML using the code below from the `StatsModels` package in Python. The Nelder-Mead optimization method (Nelder & Mead, 1965) is applied by default.

model = sm.MixedLM.from_formula('values ~ 1', df, groups=df['clusters'])

res = model.fit(reml=method, method='nm')

The data are stored in a `Pandas` dataframe, `df`; `values` are the outcome data, with `clusters` being the cluster groupings. Method is either `True` to use REML or `False` to use ML.

The level-2 variance around the intercept, , is obtained using `res.cov_re.groups`, the within group variance, , is obtained using `res.scale`, and the ICC is calculated using the formula, . REML and ML return only the ICC, and the variance estimates. All other results are computed using the ANOVA method.

### Pseudo R-Squared

The model is run using REML.

#### Model equations

The Python API constructs the model equations. It also centers the variables based on user-specifications.

For example, consider a model with outcome `math_achievement`; the level-1 predictors are `student_ses` and `gender`, and level-2 predictor is `school_type` as a predictor of the `intercept` and `student_ses`.

The model equation is: `math_achievement ~ student_ses + gender + school_type + student_ses:school_type`. Without specifying additional options, this is a random-intercepts model.

Assuming the cluster variable is represented by `school`, the model (null or fitted) is saved into a variable called `model_equation`, and the data are stored in a `Pandas` dataframe called `data`, the StatsModels code is:

model = sm.MixedLM.from_formula(model_equation, data, groups=data['school'])

res = model.fit(reml=True, method='nm') # The method changes depending on the optimization method selected.

#### Nakagawa & Schielzeth R-squared

Nakagawa & Schielzeth's marginal and conditional R-squared's for mixed-effects models (Nakagawa & Schielzeth, 2013) are also computed on the Python API. They are an attempt to resolve some of the problems associated with previous formulations of R-squared's in mixed-effects models, including the formulations by Snijders and Bosker (1994). The R-squared's here are for random-intercepts models and random-slopes models based on Johnson's (2014) extension to the work of Nakagwa & Schielzeth (2013).

The marginal R-squared ( ) is variance explained by fixed factors, and the conditional R-squared ( ) is variance explained by both fixed and random factors. The formulae below differ from the standard expressions for ( ) & ( ) because for a linear mixed-effects model, there is no distribution-specific variance.  ##### Notation: is the variance explained by the fixed effects in the model and is the average of the random effects variance. In model where the intercept is the only random effect, this resolves to the variance of the random intercept. See Johnson (2014) for rationale.

#### ICCs

The `residual ICC` is calculated from the fitted model. It is: #### Model convergence

At times, the (fitted) models may fail to converge, and other times, the results time out. When either happens, try a different optimization method. It is also possible your dataset is too large for the servers I am using if your computation repeatedly times out.