/
2015-12-08-binomen-taxonomy-tools.Rmd
186 lines (135 loc) · 4.02 KB
/
2015-12-08-binomen-taxonomy-tools.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
name: binomen-taxonomy-tools
layout: post
title: binomen - Tools for slicing and dicing taxonomic names
date: 2015-12-08
author: Scott Chamberlain
sourceslug: _drafts/2015-12-08-binomen-taxonomy-tools.Rmd
tags:
- R
- taxonomy
- split-apply-combine
---
```{r echo=FALSE}
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
warning = FALSE,
message = FALSE
)
```
The first version of `binomen` is now up on [CRAN][binomencran]. It provides various taxonomic classes for defining a single taxon, multiple taxa, and a taxonomic data.frame. It is designed as a companion to [taxize](https://github.com/ropensci/taxize), where you can get taxonomic data on taxonomic names from the web.
The classes (S3):
* `taxon`
* `taxonref`
* `taxonrefs`
* `binomial`
* `grouping` (i.e., classification - used different term to avoid conflict with classification in `taxize`)
For example, the `binomial` class is defined by a genus, epithet, authority, and optional full species name and canonical version.
```r
binomial("Poa", "annua", authority="L.")
```
```r
<binomial>
genus: Poa
epithet: annua
canonical:
species:
authority: L.
```
The package has a suite of functions to work on these taxonomic classes:
* `gethier()` - get hierarchy from a `taxon` class
* `scatter()` - make each row in taxonomic data.frame (`taxondf`) a separate `taxon` object within a single `taxa` object
* `assemble()` - make a `taxa` object into a `taxondf` data.frame
* `pick()` - pick out one or more taxonomic groups
* `pop()` - pop out (drop) one or more taxonomic groups
* `span()` - pick a range between two taxonomic groups (inclusive)
* `strain()` - filter by taxonomic groups, like dplyr's filter
* `name()` - get the taxon name for each `taxonref` object
* `uri()` - get the reference uri for each `taxonref` object
* `rank()` - get the taxonomic rank for each `taxonref` object
* `id()` - get the reference uri for each `taxonref` object
The approach in this package I suppose is sort of like `split-apply-combine` from `plyr`/`dplyr`, whereas this is aims to make it easy to do with taxonomic names.
## Install
For examples below, you'll need the development version:
```{r eval=FALSE}
install.packages("binomen")
```
```{r}
library("binomen")
```
## Make a taxon
Make a taxon object
```{r}
(obj <- make_taxon(genus="Poa", epithet="annua", authority="L.",
family='Poaceae', clazz='Poales', kingdom='Plantae', variety='annua'))
```
Index to various parts of the object
The binomial
```{r}
obj$binomial
```
The authority
```{r}
obj$binomial$authority
```
The classification
```{r}
obj$grouping
```
The family
```{r}
obj$grouping$family
```
## Subset taxon objects
Get one or more ranks via `pick()`
```{r}
obj %>% pick(family)
obj %>% pick(family, genus)
```
Drop one or more ranks via `pop()`
```{r}
obj %>% pop(family)
obj %>% pop(family, genus)
```
Get a range of ranks via `span()`
```{r}
obj %>% span(kingdom, family)
```
Extract classification as a `data.frame`
```{r}
gethier(obj)
```
## Taxonomic data.frame's
Make one
```{r}
df <- data.frame(order = c('Asterales','Asterales','Fagales','Poales','Poales','Poales'),
family = c('Asteraceae','Asteraceae','Fagaceae','Poaceae','Poaceae','Poaceae'),
genus = c('Helianthus','Helianthus','Quercus','Poa','Festuca','Holodiscus'),
stringsAsFactors = FALSE)
(df2 <- taxon_df(df))
```
Parse - get rank order via `pick()`
```{r}
df2 %>% pick(order)
```
get ranks order, family, and genus via `pick()`
```{r}
df2 %>% pick(order, family, genus)
```
get range of names via `span()`, from rank `X` to rank `Y`
```{r}
df2 %>% span(family, genus)
```
Separate each row into a `taxon` class (many `taxon` objects are a `taxa` class)
```{r output.lines=1:20}
scatter(df2)
```
And you can re-assemble a data.frame from the output of `scatter()` with `assemble()`
```{r}
out <- scatter(df2)
assemble(out)
```
## Thoughts?
I'm really curious what people think of `binomen`. I'm not sure how useful this will be in the wild. Try it. Let me know. Thanks much :)
[binomencran]: https://cran.rstudio.com/web/packages/binomen