Skip to content

Commit

Permalink
binomen post
Browse files Browse the repository at this point in the history
  • Loading branch information
sckott committed Dec 8, 2015
1 parent 7ba2a9b commit f465910
Show file tree
Hide file tree
Showing 410 changed files with 24,650 additions and 14,629 deletions.
186 changes: 186 additions & 0 deletions _drafts/2015-12-08-binomen-taxonomy-tools.Rmd
@@ -0,0 +1,186 @@
---
name: binomen-taxonomy-tools
layout: post
title: binomen - Tools for slicing and dicing taxonomic names
date: 2015-12-08
author: Scott Chamberlain
sourceslug: _drafts/2015-12-08-binomen-taxonomy-tools.Rmd
tags:
- R
- taxonomy
- split-apply-combine
---

```{r echo=FALSE}
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
warning = FALSE,
message = FALSE
)
```

The first version of `binomen` is now up on [CRAN][binomencran]. It provides various taxonomic classes for defining a single taxon, multiple taxa, and a taxonomic data.frame. It is designed as a companion to [taxize](https://github.com/ropensci/taxize), where you can get taxonomic data on taxonomic names from the web.

The classes (S3):

* `taxon`
* `taxonref`
* `taxonrefs`
* `binomial`
* `grouping` (i.e., classification - used different term to avoid conflict with classification in `taxize`)

For example, the `binomial` class is defined by a genus, epithet, authority, and optional full species name and canonical version.

```r
binomial("Poa", "annua", authority="L.")
```

```r
<binomial>
genus: Poa
epithet: annua
canonical:
species:
authority: L.
```

The package has a suite of functions to work on these taxonomic classes:

* `gethier()` - get hierarchy from a `taxon` class
* `scatter()` - make each row in taxonomic data.frame (`taxondf`) a separate `taxon` object within a single `taxa` object
* `assemble()` - make a `taxa` object into a `taxondf` data.frame
* `pick()` - pick out one or more taxonomic groups
* `pop()` - pop out (drop) one or more taxonomic groups
* `span()` - pick a range between two taxonomic groups (inclusive)
* `strain()` - filter by taxonomic groups, like dplyr's filter
* `name()` - get the taxon name for each `taxonref` object
* `uri()` - get the reference uri for each `taxonref` object
* `rank()` - get the taxonomic rank for each `taxonref` object
* `id()` - get the reference uri for each `taxonref` object

The approach in this package I suppose is sort of like `split-apply-combine` from `plyr`/`dplyr`, whereas this is aims to make it easy to do with taxonomic names.

## Install

For examples below, you'll need the development version:

```{r eval=FALSE}
install.packages("binomen")
```

```{r}
library("binomen")
```

## Make a taxon

Make a taxon object

```{r}
(obj <- make_taxon(genus="Poa", epithet="annua", authority="L.",
family='Poaceae', clazz='Poales', kingdom='Plantae', variety='annua'))
```

Index to various parts of the object

The binomial

```{r}
obj$binomial
```

The authority

```{r}
obj$binomial$authority
```

The classification

```{r}
obj$grouping
```

The family

```{r}
obj$grouping$family
```

## Subset taxon objects

Get one or more ranks via `pick()`

```{r}
obj %>% pick(family)
obj %>% pick(family, genus)
```

Drop one or more ranks via `pop()`

```{r}
obj %>% pop(family)
obj %>% pop(family, genus)
```

Get a range of ranks via `span()`

```{r}
obj %>% span(kingdom, family)
```

Extract classification as a `data.frame`

```{r}
gethier(obj)
```

## Taxonomic data.frame's

Make one

```{r}
df <- data.frame(order = c('Asterales','Asterales','Fagales','Poales','Poales','Poales'),
family = c('Asteraceae','Asteraceae','Fagaceae','Poaceae','Poaceae','Poaceae'),
genus = c('Helianthus','Helianthus','Quercus','Poa','Festuca','Holodiscus'),
stringsAsFactors = FALSE)
(df2 <- taxon_df(df))
```

Parse - get rank order via `pick()`

```{r}
df2 %>% pick(order)
```

get ranks order, family, and genus via `pick()`

```{r}
df2 %>% pick(order, family, genus)
```

get range of names via `span()`, from rank `X` to rank `Y`

```{r}
df2 %>% span(family, genus)
```

Separate each row into a `taxon` class (many `taxon` objects are a `taxa` class)

```{r output.lines=1:20}
scatter(df2)
```

And you can re-assemble a data.frame from the output of `scatter()` with `assemble()`

```{r}
out <- scatter(df2)
assemble(out)
```

## Thoughts?

I'm really curious what people think of `binomen`. I'm not sure how useful this will be in the wild. Try it. Let me know. Thanks much :)

[binomencran]: https://cran.rstudio.com/web/packages/binomen

0 comments on commit f465910

Please sign in to comment.