add post on randomised complete block designs and vegan

gavinsimpson · Nov 3, 2014 · 9bd6c40 · 9bd6c40
1 parent 6da9dc1
commit 9bd6c40
Show file tree

Hide file tree

Showing 4 changed files with 793 additions and 0 deletions.
diff --git a/_biblio/blog-biblio.bib b/_biblio/blog-biblio.bib
@@ -291,3 +291,31 @@ @ARTICLE{Minchin1987
   year      =  1987
 }
 
+
+@ARTICLE{Spackova1998-ad,
+  title     = "Sensitivity of seedling recruitment to moss, litter and dominant
+               removal in an oligotrophic wet meadow",
+  author    = "\v{S}pa\v{c}kov\'{a}, Iva and Kotorov\'{a}, Ivana and Lep\v{s},
+               Jan",
+  journal   = "Folia geobotanica",
+  publisher = "Springer Netherlands",
+  volume    =  33,
+  number    =  1,
+  pages     = "17--30",
+  month     =  "1~" # mar,
+  year      =  1998,
+  issn      = "1211-9520",
+  issn_alt  = "1874-9348",
+  doi       = "10.1007/BF02914928"
+}
+
+
+@BOOK{Smilauer2014-ac,
+  title     = "Multivariate Analysis of Ecological Data using {CANOCO} 5",
+  author    = "\v{S}milauer, Petr and Lep\v{s}, Jan",
+  publisher = "Cambridge University Press",
+  edition   = "2 edition",
+  month     =  "17~" # apr,
+  year      =  2014
+}
+
diff --git a/_posts/2014-11-03-randomized-complete-block-designs-and-vegan.Rmd b/_posts/2014-11-03-randomized-complete-block-designs-and-vegan.Rmd
@@ -0,0 +1,284 @@
+--- 
+title: "Analysing a randomised complete block design with vegan"
+status: publish
+layout: post
+published: true
+type: post
+tags:
+- vegan
+- permute
+- permutations
+active: blog
+category: R
+---
+
+```{r, echo = FALSE, include = FALSE, cache = FALSE}
+render_jekyll()
+opts_knit$set(progress = TRUE, base.url = "{{ site.url }}/assets/img/posts/", base.dir = "/home/gavin/work/web/jekyll/blog/assets/img/posts/")
+opts_chunk$set(results = "hold",
+               fig.path = "randomized-complete-block-design-and-vegan-", 
+               comment = NA,
+               prompt = FALSE,
+               cache = TRUE,
+               cache.path = "../_knitr-cache/randomized-complete-block-design-and-vegan-", 
+               fig.show = "hold",
+               dev = "png",
+               dpi = 85,
+               tidy = FALSE,
+               fig.height = 7,
+               fig.width = 7,
+               dev.args = list(pointsize = 10))
+opts_chunk$set(fig.lp = "fig:")
+```
+
+It has been a long time coming. 
+[**Vegan**](http://cran.r-project.org/package=vegan) now has in-built, 
+native ability to use restricted permutation designs when testing 
+effects in constrained ordinations and in range of other methods. This 
+new-found functionality comes courtesy of Jari (mainly) and my efforts 
+to have vegan permutation routines use the 
+[**permute**](http://cran.r-project.org/package=permute) package. Jari 
+also cooked up a standard interface that we can use to drop this and 
+some extra features neatly into any function we want; this allows us to 
+have permutation tests run on many CPU cores in parallel, splitting the 
+computational burden and reducing the run time of tests, and also a 
+mechanism that allows users to pass a matrix of user-defined 
+permutations to be used in tests. These new features are now fully 
+working in the development version of **vegan**, which you can find on 
+[github](https://github.com/vegandevs/vegan), and which should be 
+released to CRAN shortly. Ahead of the release, I'm preparing some 
+examples to show off the new capabilities; first off I look at data 
+from a randomized, complete block design experiment analysed using RDA 
+& restricted permutations.
+
+To follow this example locally you'll need to have version 2.1-43 or 
+later of **vegan** installed. You can grab the [sources from 
+github](https://github.com/vegandevs/vegan) and build it yourself, or 
+grab a Windows binary from the [Appveyor Continuous integration 
+service](https://ci.appveyor.com/project/gavinsimpson/vegan/branch/master/artifacts) 
+that we're using to test on that platform --- you want the `.zip` file 
+from the Artefacts. Once you've sorted out the installation, we can 
+begin.
+
+```{r, load-library}
+library("vegan")
+library("gdata")
+```
+
+We'll need **gdata**, and its `read.xls()` function, to read from the 
+XLS format files that the data for the example come as.
+
+The data set itself is quite simple and small, consisting of counts on 23 species from 16 plots, and arise from a randomised complete block designed experiment described by &Scaron;pa&#269;kov&aacute; and colleagues [-@Spackova1998-ad] and analysed by [@Smilauer2014-ac] in their recent book using Canoco v5.
+
+The experiment tested the effects on seedling recruitment to a range of treatments
+
+ * control
+ * removal of litter
+ * removal of the dominant species *Nardus stricta*
+ * removal of litter and moss (mos couldn't be removed without also removing litter)
+
+The treatments were replicated replicated in four, randomised complete blocks.
+
+The data are available from the accompanying website to the book *Multivariate Analysis of Ecological Data using CANOCO 5* [@Smilauer2014-ac]. They are supplied as XLS format files in a ZIP archive. We can read these into R directly from the website with a little bit of effort
+
+```{r, load-data}
+## Download the data zip
+furl <- "http://regent.prf.jcu.cz/maed2/chap15.zip"
+td <- tempdir()
+tf <- tempfile(tmpdir = td, fileext = ".zip")
+download.file(furl, tf)
+
+## list the files in the zip, we want the xls version (file 3)
+fname <- unzip(tf, list = TRUE)$Name[3]
+unzip(tf, files = fname, exdir = td, overwrite = TRUE) # unzip
+datpath <- file.path(td, fname)                        # path to xls
+
+## read the xls file, sheet 2 contains species data, sheet 3 the env
+spp <- read.xls(datpath, sheet = 2, skip = 1, row.names = 1)
+env <- read.xls(datpath, sheet = 3, row.names = 1)
+```
+
+The `block` variable is currently coded as an integer and needs 
+converting to a factor if we are to use it correctly in the analysis
+
+```{r, transform-block}
+env <- transform(env, block = factor(block))
+```
+
+The gradient lengths are short,
+
+```{r, decorana}
+decorana(spp)
+```
+
+motivating the use of redundancy analysis (RDA). Additionally, we may 
+be interested in how the raw abundance of seedlings change following 
+experimental manipulation, o we may wish to focus on the proportional 
+differences between treatments. The first case is handled naturaly by 
+RDA. The second case will require some form of standardisation by 
+samples, say by sample totals.
+
+First, let's test the first null hypothesis; that there is no effect of 
+the treatment on seedling recruitment. This is a simple RDA. We should 
+take into account the `block` factor when we assess this model for 
+significance. How we do this illustrates two potential approaches to 
+performing permutation tests
+
+ 1. **design**-based permutations, where how the samples are permuted 
+ follows the experimental design, or
+
+ 2. **model**-based permutations, where the experimental design is 
+ included in the analysis directly and residuals are permuted by simple 
+ randomisation.
+
+There is an important difference between the two approach, one which 
+I'll touch on shortly.
+
+We'll proceed by fitting the model, conditioning on `block` to remove 
+between block differences
+
+```{r, fit-conditioned-rda}
+mod1 <- rda(spp ~ treatment + Condition(block), data = env)
+mod1
+```
+
+There is a strong single, linear gradient in the data as evidenced by 
+the relative magnitudes of the eigenvalues (here expressed as 
+proportions of the total variance)
+
+```{r, eigenvals}
+eigenvals(mod1) / mod1$tot.chi
+```
+
+## Design-based permutations
+
+A *design*-based permutation test of these data would be on conditioned on the `block` variable, by restricting permutation of sample only *within* the levels of `block`. In this situation, samples are never permuted between blocks, only within. We can set up this type of permutation design as follows
+
+```{r, how1}
+h <- how(blocks = env$block, nperm = 999)
+```
+
+Note that we could use the `plots` argument instead of `blocks` to 
+restrict the permutations in the same way, but using `blocks` is 
+simpler. I also set the required number of permutations for the test 
+here.
+
+Constrained ordinations in **vegan** are tested using the `anova()` 
+function. New in the development version of the package is the 
+`permutations` argument, which is the key to supplying instructions on 
+how you want to permute to `anova()`. `permutations` can take a number of different types of instruction
+
+ 1. an object of class `"how"`, whch contains details of a restricted 
+ permutation design that `shuffleSet()` from the **permute** package will use to generate permutations from, or
+
+ 2. a number indicating the number of permutations required, in which 
+ case these are simple randomisations with no restriction, unless the 
+ `strata` argument is used, or
+
+ 3. a matrix of user-specified permutations, 1 row per permutation.
+
+To perform the design-based permutation we'll pass `h`, created 
+earlier, to `anova()`
+
+```{r, anova1}
+set.seed(42)
+p1 <- anova(mod1, permutations = h, parallel = 3)
+p1
+```
+
+Note that I've run this on three cores in parallel; this is another new 
+feature of the development version of **vegan** and can considerably 
+reduce the time needed to run permutation tests. I have four cores on 
+my laptop but left one free for the other software I have running.
+
+The overall permutation test indicates no significant effect of 
+treatment on the abundance of seedlings. We can test individual axes by 
+adding `by = "axis"` to the `anova()` call
+
+```{r, anova1-by-axis}
+set.seed(24)
+p1axis <- anova(mod1, permutations = h, parallel = 3, by = "axis")
+p1axis
+```
+
+This confirms the earlier impression that there is a single, linear 
+gradient in the data set. A biplot shows that this axis of variation is 
+associated with the Moss (& Litter) removal treatment. The variation 
+between the other treatments lies primarily along axis two and is 
+substantially less than that associated with the Moss & Litter removal.
+
+```{r, biplot, fig = TRUE, fig.cap = "Figure 1: RDA biplot showing species scores and treatment centroids."}
+plot(mod1, display = c("species", "cn"), scaling = 1, type = "n",
+     xlim = c(-10.5, 1.5))
+text(mod1, display = "species", scaling = 1, cex = 0.8)
+text(mod1, display = "cn", scaling = 1, col = "blue", cex = 1.2,
+     labels = c("Control", "Litter+Moss", "Litter", "Removal"))
+```
+
+In the above figure, I used `scaling = 1`, so-called *inter-sample 
+distance scaling*, as this best represents the centroid scores, which 
+are computed as the treatment-wise average of the sample scores.
+
+## Model-based permutation
+
+The alternative permutation approach, known as model-based 
+permutations, and would employ free permutation of residuals after the 
+effects of the covariables have been accounted for. This is justified 
+because under the null hypothesis, the residuals are freely 
+exchangeable once the effects of the covariables are removed. There is 
+a clear advantage of model-based permutations over design-based 
+permutations; where the sample size is small, as it is here, there 
+tends to be few blocks and the resulting design-based permutation test 
+relatively weak compared to the model-based version.
+
+It is simple to switch to model-based permutations, be setting the 
+blocks indicator in the permutation design to `NULL`, removing the 
+blocking structure from the design
+
+```{r, remove-blocks}
+setBlocks(h) <- NULL                    # remove blocking
+getBlocks(h)                            # confirm
+```
+
+Next we repeat the permutation test using the modified `h`
+
+```{r, model-anova1}
+set.seed(51)
+p2 <- anova(mod1, permutations = h, parallel = 3)
+p2
+```
+
+The estimated *p* value is slightly smaller now. The difference between 
+treatments is predominantly in the Moss & Litter removal with 
+differences between the control and the other treatments lying along 
+the insignificant axes
+
+```{r, model-anova-by-axis}
+set.seed(83)
+p2axis <- anova(mod1, permutations = h, parallel = 3, by = "axis")
+p2axis
+```
+
+## Chages in relative seedling composition
+
+As mentioned earlier, interest is also, perhaps predominantly, in whether any of the treatments have different species composition. To test this hypothesis we standardise by the sample (row) norm using `decostand()`. Alternatively we could have used `method = "total"` to work with proportional abundances. We then repeat the earlier steps, this time using only model-based permutations owing to their greater power.
+
+```{r, standardised-fits}
+spp.norm <- decostand(spp, method = "normalize", MARGIN = 1)
+
+mod2 <- rda(spp.norm ~ treatment + Condition(block), data = env)
+mod2
+eigenvals(mod2) / mod2$tot.chi
+
+set.seed(76)
+anova(mod2, permutations = h, parallel = 3)
+```
+
+The results suggest no difference in species composition under the 
+experimental manipulation.
+
+That's it for this post. In the next example I'll take a look at a more 
+complex example, one where model-based permutations can't be used to 
+test all the hypotheses we might want to in an experimental design.
+
+## References