_drafts/2015-12-08-binomen-taxonomy-tools.Rmd

---
name: binomen-taxonomy-tools
layout: post
title: binomen - Tools for slicing and dicing taxonomic names
date: 2015-12-08
author: Scott Chamberlain
sourceslug: _drafts/2015-12-08-binomen-taxonomy-tools.Rmd
tags:
- R
- taxonomy
- split-apply-combine
---

```{r echo=FALSE}
knitr::opts_chunk$set(
  comment = "#>",
  collapse = TRUE,
  warning = FALSE,
  message = FALSE
)
```

The first version of `binomen` is now up on [CRAN][binomencran]. It provides various taxonomic classes for defining a single taxon, multiple taxa, and a taxonomic data.frame. It is designed as a companion to [taxize](https://github.com/ropensci/taxize), where you can get taxonomic data on taxonomic names from the web.

The classes (S3):

* `taxon`
* `taxonref`
* `taxonrefs`
* `binomial`
* `grouping` (i.e., classification - used different term to avoid conflict with classification in `taxize`)

For example, the `binomial` class is defined by a genus, epithet, authority, and optional full species name and canonical version.

```r
binomial("Poa", "annua", authority="L.")
```

```r
<binomial>
  genus: Poa
  epithet: annua
  canonical:
  species:
  authority: L.
```

The package has a suite of functions to work on these taxonomic classes:

* `gethier()` - get hierarchy from a `taxon` class
* `scatter()` - make each row in taxonomic data.frame (`taxondf`) a separate `taxon` object within a single `taxa` object
* `assemble()` - make a `taxa` object into a `taxondf` data.frame
* `pick()` - pick out one or more taxonomic groups
* `pop()` - pop out (drop) one or more taxonomic groups
* `span()` - pick a range between two taxonomic groups (inclusive)
* `strain()` - filter by taxonomic groups, like dplyr's filter
* `name()` - get the taxon name for each `taxonref` object
* `uri()` - get the reference uri for each `taxonref` object
* `rank()` - get the taxonomic rank for each `taxonref` object
* `id()` - get the reference uri for each `taxonref` object

The approach in this package I suppose is sort of like `split-apply-combine` from `plyr`/`dplyr`, whereas this is aims to make it easy to do with taxonomic names.

## Install

For examples below, you'll need the development version:

```{r eval=FALSE}
install.packages("binomen")
```

```{r}
library("binomen")
```

## Make a taxon

Make a taxon object

```{r}
(obj <- make_taxon(genus="Poa", epithet="annua", authority="L.",
  family='Poaceae', clazz='Poales', kingdom='Plantae', variety='annua'))
```

Index to various parts of the object

The binomial

```{r}
obj$binomial
```

The authority

```{r}
obj$binomial$authority
```

The classification

```{r}
obj$grouping
```

The family

```{r}
obj$grouping$family
```

## Subset taxon objects

Get one or more ranks via `pick()`

```{r}
obj %>% pick(family)
obj %>% pick(family, genus)
```

Drop one or more ranks via `pop()`

```{r}
obj %>% pop(family)
obj %>% pop(family, genus)
```

Get a range of ranks via `span()`

```{r}
obj %>% span(kingdom, family)
```

Extract classification as a `data.frame`

```{r}
gethier(obj)
```

## Taxonomic data.frame's

Make one

```{r}
df <- data.frame(order = c('Asterales','Asterales','Fagales','Poales','Poales','Poales'),
  family = c('Asteraceae','Asteraceae','Fagaceae','Poaceae','Poaceae','Poaceae'),
  genus = c('Helianthus','Helianthus','Quercus','Poa','Festuca','Holodiscus'),
  stringsAsFactors = FALSE)
(df2 <- taxon_df(df))
```

Parse - get rank order via `pick()`

```{r}
df2 %>% pick(order)
```

get ranks order, family, and genus via `pick()`

```{r}
df2 %>% pick(order, family, genus)
```

get range of names via `span()`, from rank `X` to rank `Y`

```{r}
df2 %>% span(family, genus)
```

Separate each row into a `taxon` class (many `taxon` objects are a `taxa` class)

```{r output.lines=1:20}
scatter(df2)
```

And you can re-assemble a data.frame from the output of `scatter()` with `assemble()`

```{r}
out <- scatter(df2)
assemble(out)
```

## Thoughts?

I'm really curious what people think of `binomen`. I'm not sure how useful this will be in the wild. Try it. Let me know. Thanks much :)

[binomencran]: https://cran.rstudio.com/web/packages/binomen