Skip to content
Randomization tests for nonparametric inference
R
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
docs/images
man
.gitignore
DESCRIPTION
NAMESPACE
README.md
nymph.Rproj

README.md

nymph

Randomization tests for noNparaMetric INFerence

Installation

devtools::install_git('https://github.com/holub008/nymph')

Purpose

Randomization methods provide a powerful toolset for performing inference on a wide range of statistics with few distributional constraints on the data. I believe these methods are underutilized, partially as a result of poor library support; existing R packages (e.g. coin & perm) provide a narrow range of functionality or obfuscate the underlying simplicity of the methods from the user. nymph aims to provide the practitioner a simple & generic interface to this class of methods.

Examples

Inference

Using the iris dataset, we'd like to investigate if the ratio of length to width of Virginica flower petals are different from that of Virginica sepals. We use the mcrd function to compute ratio differences between petals and sepals.

library(dplyr)

petal_measurements <- iris %>% 
    filter(Species == 'virginica') %>% 
    mutate(len = Petal.Length, 
           width = Petal.Width) %>%
    select(len, width)
sepal_measurements <- iris %>% 
    filter(Species == 'virginica') %>% 
    mutate(len = Sepal.Length,
           width = Sepal.Width) %>%
    select(len, width)

set.seed(55414)
mcrt <- mcrd_test(petal_measurements, sepal_measurements, 
                  lw_ratio = mean(len / width),
                  length_proportion = mean(len / (len + width)))
summary(mcrt)

with results:

Permutation test of 1000 permutations against alternative of two.sided at significance 0.95 
         statistic    ci_lower   ci_upper     actual p_value
          lw_ratio -0.16575289 0.17387946 0.55020960       0
 length_proportion -0.01267495 0.01315527 0.04388741       0

Quite convincing that a difference exists! For a visualization of the observed statistic differences (blue) against their null distributions:

plot(mcrt)

iris_mcrt

Power Analysis

Here we perform a power analysis of a contrived experiment - a two treatment experiment with a minimally impactful effect size of 1 against otherwise standard normal populations.

gen_data_s1 <- function(){ data.frame(x = rnorm(50)) }
gen_data_s2 <- function(){ data.frame(x = rnorm(50, 1)) }
mcrp <- mcrd_power(gen_data_s1, gen_data_s2, mean = mean(x), median = median(x), 
                    test_trials = 1e2)
summary(mcrp, alpha = .05, alternative = 'two.sided')

With result:

Power analysis of 100 experiments with alternative of two.sided at significance 0.05 
Group sizes:
 sample size
      1   50
      2   50
 statistic power average_effect
      mean     1     -0.9829345
    median  0.99     -0.9878372

We can also visualize the distribution of p-values (from repeated simulation of the experiment):

plot(mcrp, statistic = 'median', alternative = 'two.sided', 
     alpha = NULL)

median_p_dist

Or we can visualize how power varies across desired inferential FPRs:

plot(mcrp, statistic = 'median', alternative = 'two.sided', 
     alpha = seq(.01, .2, by = .01))

median_alpha_v_power

Implementation

  • No dependencies
    • Portability & ease of install
    • Functionality should be transparent to the user
    • Suggests the parallel package, which comes standard with R >= 2.14, for accelerated computations
  • S4 object system
    • Ensure consistency & validity across a range of tests
    • Fewer redundant objects for the user to comprehend
  • Non-standard evaluation wrappers for common use cases
    • Reduce boilerplate for interactive use
You can’t perform that action at this time.