Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
R
 
 
 
 
man
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

experDesign

CRAN status R build status AppVeyor build status Travis build status Coverage status Lifecycle: stable Project Status: Active - The project has reached a stable, usable state and is being actively developed.

The goal of experDesign is to help you decide which samples go in which batch, reducing the potential batch bias when analyzing.

Installation

To install the latest version on CRAN use:

install.packages("experDesign")

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("llrs/experDesign")

Example

Imagine you have some samples already collected and you want to distributed them in batches:

library("experDesign")
metadata <- expand.grid(height = seq(60, 80, 5), 
                        weight = seq(100, 300, 50),
                        sex = c("Male","Female"))
head(metadata, 15)
#>    height weight  sex
#> 1      60    100 Male
#> 2      65    100 Male
#> 3      70    100 Male
#> 4      75    100 Male
#> 5      80    100 Male
#> 6      60    150 Male
#> 7      65    150 Male
#> 8      70    150 Male
#> 9      75    150 Male
#> 10     80    150 Male
#> 11     60    200 Male
#> 12     65    200 Male
#> 13     70    200 Male
#> 14     75    200 Male
#> 15     80    200 Male

If you block incorrectly and end up with a group in a single batch we will end up with batch effect. In order to avoid this design helps you assign each sample to a batch (in this case each batch has 24 samples at most). First we can explore the number of samples and the number of batches:

size_data <- nrow(metadata)
size_batch <- 24
(batches <- optimum_batches(size_data, size_batch))
#> [1] 3
# So now the best number of samples for each batch is less than the available
(size <- optimum_subset(size_data, batches))
#> [1] 17
# The distribution of samples per batch
sizes_batches(size_data, size, batches)
#> [1] 17 17 16

Note that instead of using a whole batch and then leave a single sample on the third distributes all the samples in the three batches that will be needed. We can directly look for the distribution of the samples given our max number of samples per batch:

d <- design(metadata, size_batch)
# It is a list but we can convert it to a vector with:
batch_names(d)
#>  [1] "SubSet3" "SubSet2" "SubSet2" "SubSet1" "SubSet3" "SubSet2" "SubSet1"
#>  [8] "SubSet1" "SubSet2" "SubSet2" "SubSet1" "SubSet2" "SubSet1" "SubSet3"
#> [15] "SubSet1" "SubSet3" "SubSet2" "SubSet1" "SubSet3" "SubSet1" "SubSet2"
#> [22] "SubSet1" "SubSet3" "SubSet2" "SubSet1" "SubSet1" "SubSet1" "SubSet1"
#> [29] "SubSet3" "SubSet2" "SubSet3" "SubSet2" "SubSet3" "SubSet3" "SubSet2"
#> [36] "SubSet1" "SubSet2" "SubSet1" "SubSet3" "SubSet3" "SubSet2" "SubSet3"
#> [43] "SubSet2" "SubSet3" "SubSet3" "SubSet1" "SubSet1" "SubSet2" "SubSet2"
#> [50] "SubSet3"

Naively one would either fill some batches fully or distribute them not evenly (the first 17 packages together, the next 17 and so on). This solution ensures that the data is randomized. For more random distribution you can increase the number of iterations performed to calculate this distribution.

If you need space for replicates to control for batch effect you can use:

r <- replicates(metadata, size_batch, 5)
lengths(r)
#> SubSet1 SubSet2 SubSet3 
#>      20      20      20
r
#> $SubSet1
#>  [1]  4  9 10 12 20 21 22 23 25 26 28 29 31 39 40 41 43 45 49 50
#> 
#> $SubSet2
#>  [1]  2  7 13 15 16 18 21 23 24 27 30 33 35 36 37 38 41 47 49 50
#> 
#> $SubSet3
#>  [1]  1  3  5  6  8 11 14 17 19 21 23 32 34 41 42 44 46 48 49 50

Which seeks as controls the most diverse values and adds them to the samples distribution. Note that if the sample is already present on that batch is not added again, that’s why the number of samples per batch is different from the design without replicates.

Previous work

The CRAN task View of Experimental Design includes many packages relevant for designing an experiment before collecting data, but none of them provides how to manage them once the samples are already collected.

Two packages allow to distribute the samples on batches:

  • The OSAT package handles categorical variables but not numeric data. It doesn’t work with our data.

  • The minDiff package reported in Stats.SE, handles both numeric and categorical data. But it can only optimize for two nominal criteria. It doesn’t work for our data.

If you are still designing the experiment and do not have collected any data DeclareDesign might be relevant for you.

Question in Bioinformatics.SE I made before developing the package.

Other

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

You can’t perform that action at this time.