Skip to content

Commit

Permalink
add post on randomised complete block designs and vegan
Browse files Browse the repository at this point in the history
  • Loading branch information
gavinsimpson committed Nov 3, 2014
1 parent 6da9dc1 commit 9bd6c40
Show file tree
Hide file tree
Showing 4 changed files with 793 additions and 0 deletions.
28 changes: 28 additions & 0 deletions _biblio/blog-biblio.bib
Expand Up @@ -291,3 +291,31 @@ @ARTICLE{Minchin1987
year = 1987
}


@ARTICLE{Spackova1998-ad,
title = "Sensitivity of seedling recruitment to moss, litter and dominant
removal in an oligotrophic wet meadow",
author = "\v{S}pa\v{c}kov\'{a}, Iva and Kotorov\'{a}, Ivana and Lep\v{s},
Jan",
journal = "Folia geobotanica",
publisher = "Springer Netherlands",
volume = 33,
number = 1,
pages = "17--30",
month = "1~" # mar,
year = 1998,
issn = "1211-9520",
issn_alt = "1874-9348",
doi = "10.1007/BF02914928"
}


@BOOK{Smilauer2014-ac,
title = "Multivariate Analysis of Ecological Data using {CANOCO} 5",
author = "\v{S}milauer, Petr and Lep\v{s}, Jan",
publisher = "Cambridge University Press",
edition = "2 edition",
month = "17~" # apr,
year = 2014
}

284 changes: 284 additions & 0 deletions _posts/2014-11-03-randomized-complete-block-designs-and-vegan.Rmd
@@ -0,0 +1,284 @@
---
title: "Analysing a randomised complete block design with vegan"
status: publish
layout: post
published: true
type: post
tags:
- vegan
- permute
- permutations
active: blog
category: R
---

```{r, echo = FALSE, include = FALSE, cache = FALSE}
render_jekyll()
opts_knit$set(progress = TRUE, base.url = "{{ site.url }}/assets/img/posts/", base.dir = "/home/gavin/work/web/jekyll/blog/assets/img/posts/")
opts_chunk$set(results = "hold",
fig.path = "randomized-complete-block-design-and-vegan-",
comment = NA,
prompt = FALSE,
cache = TRUE,
cache.path = "../_knitr-cache/randomized-complete-block-design-and-vegan-",
fig.show = "hold",
dev = "png",
dpi = 85,
tidy = FALSE,
fig.height = 7,
fig.width = 7,
dev.args = list(pointsize = 10))
opts_chunk$set(fig.lp = "fig:")
```

It has been a long time coming.
[**Vegan**](http://cran.r-project.org/package=vegan) now has in-built,
native ability to use restricted permutation designs when testing
effects in constrained ordinations and in range of other methods. This
new-found functionality comes courtesy of Jari (mainly) and my efforts
to have vegan permutation routines use the
[**permute**](http://cran.r-project.org/package=permute) package. Jari
also cooked up a standard interface that we can use to drop this and
some extra features neatly into any function we want; this allows us to
have permutation tests run on many CPU cores in parallel, splitting the
computational burden and reducing the run time of tests, and also a
mechanism that allows users to pass a matrix of user-defined
permutations to be used in tests. These new features are now fully
working in the development version of **vegan**, which you can find on
[github](https://github.com/vegandevs/vegan), and which should be
released to CRAN shortly. Ahead of the release, I'm preparing some
examples to show off the new capabilities; first off I look at data
from a randomized, complete block design experiment analysed using RDA
& restricted permutations.

To follow this example locally you'll need to have version 2.1-43 or
later of **vegan** installed. You can grab the [sources from
github](https://github.com/vegandevs/vegan) and build it yourself, or
grab a Windows binary from the [Appveyor Continuous integration
service](https://ci.appveyor.com/project/gavinsimpson/vegan/branch/master/artifacts)
that we're using to test on that platform --- you want the `.zip` file
from the Artefacts. Once you've sorted out the installation, we can
begin.

```{r, load-library}
library("vegan")
library("gdata")
```

We'll need **gdata**, and its `read.xls()` function, to read from the
XLS format files that the data for the example come as.

The data set itself is quite simple and small, consisting of counts on 23 species from 16 plots, and arise from a randomised complete block designed experiment described by Špačková and colleagues [-@Spackova1998-ad] and analysed by [@Smilauer2014-ac] in their recent book using Canoco v5.

The experiment tested the effects on seedling recruitment to a range of treatments

* control
* removal of litter
* removal of the dominant species *Nardus stricta*
* removal of litter and moss (mos couldn't be removed without also removing litter)

The treatments were replicated replicated in four, randomised complete blocks.

The data are available from the accompanying website to the book *Multivariate Analysis of Ecological Data using CANOCO 5* [@Smilauer2014-ac]. They are supplied as XLS format files in a ZIP archive. We can read these into R directly from the website with a little bit of effort

```{r, load-data}
## Download the data zip
furl <- "http://regent.prf.jcu.cz/maed2/chap15.zip"
td <- tempdir()
tf <- tempfile(tmpdir = td, fileext = ".zip")
download.file(furl, tf)
## list the files in the zip, we want the xls version (file 3)
fname <- unzip(tf, list = TRUE)$Name[3]
unzip(tf, files = fname, exdir = td, overwrite = TRUE) # unzip
datpath <- file.path(td, fname) # path to xls
## read the xls file, sheet 2 contains species data, sheet 3 the env
spp <- read.xls(datpath, sheet = 2, skip = 1, row.names = 1)
env <- read.xls(datpath, sheet = 3, row.names = 1)
```

The `block` variable is currently coded as an integer and needs
converting to a factor if we are to use it correctly in the analysis

```{r, transform-block}
env <- transform(env, block = factor(block))
```

The gradient lengths are short,

```{r, decorana}
decorana(spp)
```

motivating the use of redundancy analysis (RDA). Additionally, we may
be interested in how the raw abundance of seedlings change following
experimental manipulation, o we may wish to focus on the proportional
differences between treatments. The first case is handled naturaly by
RDA. The second case will require some form of standardisation by
samples, say by sample totals.

First, let's test the first null hypothesis; that there is no effect of
the treatment on seedling recruitment. This is a simple RDA. We should
take into account the `block` factor when we assess this model for
significance. How we do this illustrates two potential approaches to
performing permutation tests

1. **design**-based permutations, where how the samples are permuted
follows the experimental design, or

2. **model**-based permutations, where the experimental design is
included in the analysis directly and residuals are permuted by simple
randomisation.

There is an important difference between the two approach, one which
I'll touch on shortly.

We'll proceed by fitting the model, conditioning on `block` to remove
between block differences

```{r, fit-conditioned-rda}
mod1 <- rda(spp ~ treatment + Condition(block), data = env)
mod1
```

There is a strong single, linear gradient in the data as evidenced by
the relative magnitudes of the eigenvalues (here expressed as
proportions of the total variance)

```{r, eigenvals}
eigenvals(mod1) / mod1$tot.chi
```

## Design-based permutations

A *design*-based permutation test of these data would be on conditioned on the `block` variable, by restricting permutation of sample only *within* the levels of `block`. In this situation, samples are never permuted between blocks, only within. We can set up this type of permutation design as follows

```{r, how1}
h <- how(blocks = env$block, nperm = 999)
```

Note that we could use the `plots` argument instead of `blocks` to
restrict the permutations in the same way, but using `blocks` is
simpler. I also set the required number of permutations for the test
here.

Constrained ordinations in **vegan** are tested using the `anova()`
function. New in the development version of the package is the
`permutations` argument, which is the key to supplying instructions on
how you want to permute to `anova()`. `permutations` can take a number of different types of instruction

1. an object of class `"how"`, whch contains details of a restricted
permutation design that `shuffleSet()` from the **permute** package will use to generate permutations from, or

2. a number indicating the number of permutations required, in which
case these are simple randomisations with no restriction, unless the
`strata` argument is used, or

3. a matrix of user-specified permutations, 1 row per permutation.

To perform the design-based permutation we'll pass `h`, created
earlier, to `anova()`

```{r, anova1}
set.seed(42)
p1 <- anova(mod1, permutations = h, parallel = 3)
p1
```

Note that I've run this on three cores in parallel; this is another new
feature of the development version of **vegan** and can considerably
reduce the time needed to run permutation tests. I have four cores on
my laptop but left one free for the other software I have running.

The overall permutation test indicates no significant effect of
treatment on the abundance of seedlings. We can test individual axes by
adding `by = "axis"` to the `anova()` call

```{r, anova1-by-axis}
set.seed(24)
p1axis <- anova(mod1, permutations = h, parallel = 3, by = "axis")
p1axis
```

This confirms the earlier impression that there is a single, linear
gradient in the data set. A biplot shows that this axis of variation is
associated with the Moss (& Litter) removal treatment. The variation
between the other treatments lies primarily along axis two and is
substantially less than that associated with the Moss & Litter removal.

```{r, biplot, fig = TRUE, fig.cap = "Figure 1: RDA biplot showing species scores and treatment centroids."}
plot(mod1, display = c("species", "cn"), scaling = 1, type = "n",
xlim = c(-10.5, 1.5))
text(mod1, display = "species", scaling = 1, cex = 0.8)
text(mod1, display = "cn", scaling = 1, col = "blue", cex = 1.2,
labels = c("Control", "Litter+Moss", "Litter", "Removal"))
```

In the above figure, I used `scaling = 1`, so-called *inter-sample
distance scaling*, as this best represents the centroid scores, which
are computed as the treatment-wise average of the sample scores.

## Model-based permutation

The alternative permutation approach, known as model-based
permutations, and would employ free permutation of residuals after the
effects of the covariables have been accounted for. This is justified
because under the null hypothesis, the residuals are freely
exchangeable once the effects of the covariables are removed. There is
a clear advantage of model-based permutations over design-based
permutations; where the sample size is small, as it is here, there
tends to be few blocks and the resulting design-based permutation test
relatively weak compared to the model-based version.

It is simple to switch to model-based permutations, be setting the
blocks indicator in the permutation design to `NULL`, removing the
blocking structure from the design

```{r, remove-blocks}
setBlocks(h) <- NULL # remove blocking
getBlocks(h) # confirm
```

Next we repeat the permutation test using the modified `h`

```{r, model-anova1}
set.seed(51)
p2 <- anova(mod1, permutations = h, parallel = 3)
p2
```

The estimated *p* value is slightly smaller now. The difference between
treatments is predominantly in the Moss & Litter removal with
differences between the control and the other treatments lying along
the insignificant axes

```{r, model-anova-by-axis}
set.seed(83)
p2axis <- anova(mod1, permutations = h, parallel = 3, by = "axis")
p2axis
```

## Chages in relative seedling composition

As mentioned earlier, interest is also, perhaps predominantly, in whether any of the treatments have different species composition. To test this hypothesis we standardise by the sample (row) norm using `decostand()`. Alternatively we could have used `method = "total"` to work with proportional abundances. We then repeat the earlier steps, this time using only model-based permutations owing to their greater power.

```{r, standardised-fits}
spp.norm <- decostand(spp, method = "normalize", MARGIN = 1)
mod2 <- rda(spp.norm ~ treatment + Condition(block), data = env)
mod2
eigenvals(mod2) / mod2$tot.chi
set.seed(76)
anova(mod2, permutations = h, parallel = 3)
```

The results suggest no difference in species composition under the
experimental manipulation.

That's it for this post. In the next example I'll take a look at a more
complex example, one where model-based permutations can't be used to
test all the hypotheses we might want to in an experimental design.

## References

0 comments on commit 9bd6c40

Please sign in to comment.