Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
410 changed files
with
24,650 additions
and
14,629 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,186 @@ | ||
--- | ||
name: binomen-taxonomy-tools | ||
layout: post | ||
title: binomen - Tools for slicing and dicing taxonomic names | ||
date: 2015-12-08 | ||
author: Scott Chamberlain | ||
sourceslug: _drafts/2015-12-08-binomen-taxonomy-tools.Rmd | ||
tags: | ||
- R | ||
- taxonomy | ||
- split-apply-combine | ||
--- | ||
|
||
```{r echo=FALSE} | ||
knitr::opts_chunk$set( | ||
comment = "#>", | ||
collapse = TRUE, | ||
warning = FALSE, | ||
message = FALSE | ||
) | ||
``` | ||
|
||
The first version of `binomen` is now up on [CRAN][binomencran]. It provides various taxonomic classes for defining a single taxon, multiple taxa, and a taxonomic data.frame. It is designed as a companion to [taxize](https://github.com/ropensci/taxize), where you can get taxonomic data on taxonomic names from the web. | ||
|
||
The classes (S3): | ||
|
||
* `taxon` | ||
* `taxonref` | ||
* `taxonrefs` | ||
* `binomial` | ||
* `grouping` (i.e., classification - used different term to avoid conflict with classification in `taxize`) | ||
|
||
For example, the `binomial` class is defined by a genus, epithet, authority, and optional full species name and canonical version. | ||
|
||
```r | ||
binomial("Poa", "annua", authority="L.") | ||
``` | ||
|
||
```r | ||
<binomial> | ||
genus: Poa | ||
epithet: annua | ||
canonical: | ||
species: | ||
authority: L. | ||
``` | ||
|
||
The package has a suite of functions to work on these taxonomic classes: | ||
|
||
* `gethier()` - get hierarchy from a `taxon` class | ||
* `scatter()` - make each row in taxonomic data.frame (`taxondf`) a separate `taxon` object within a single `taxa` object | ||
* `assemble()` - make a `taxa` object into a `taxondf` data.frame | ||
* `pick()` - pick out one or more taxonomic groups | ||
* `pop()` - pop out (drop) one or more taxonomic groups | ||
* `span()` - pick a range between two taxonomic groups (inclusive) | ||
* `strain()` - filter by taxonomic groups, like dplyr's filter | ||
* `name()` - get the taxon name for each `taxonref` object | ||
* `uri()` - get the reference uri for each `taxonref` object | ||
* `rank()` - get the taxonomic rank for each `taxonref` object | ||
* `id()` - get the reference uri for each `taxonref` object | ||
|
||
The approach in this package I suppose is sort of like `split-apply-combine` from `plyr`/`dplyr`, whereas this is aims to make it easy to do with taxonomic names. | ||
|
||
## Install | ||
|
||
For examples below, you'll need the development version: | ||
|
||
```{r eval=FALSE} | ||
install.packages("binomen") | ||
``` | ||
|
||
```{r} | ||
library("binomen") | ||
``` | ||
|
||
## Make a taxon | ||
|
||
Make a taxon object | ||
|
||
```{r} | ||
(obj <- make_taxon(genus="Poa", epithet="annua", authority="L.", | ||
family='Poaceae', clazz='Poales', kingdom='Plantae', variety='annua')) | ||
``` | ||
|
||
Index to various parts of the object | ||
|
||
The binomial | ||
|
||
```{r} | ||
obj$binomial | ||
``` | ||
|
||
The authority | ||
|
||
```{r} | ||
obj$binomial$authority | ||
``` | ||
|
||
The classification | ||
|
||
```{r} | ||
obj$grouping | ||
``` | ||
|
||
The family | ||
|
||
```{r} | ||
obj$grouping$family | ||
``` | ||
|
||
## Subset taxon objects | ||
|
||
Get one or more ranks via `pick()` | ||
|
||
```{r} | ||
obj %>% pick(family) | ||
obj %>% pick(family, genus) | ||
``` | ||
|
||
Drop one or more ranks via `pop()` | ||
|
||
```{r} | ||
obj %>% pop(family) | ||
obj %>% pop(family, genus) | ||
``` | ||
|
||
Get a range of ranks via `span()` | ||
|
||
```{r} | ||
obj %>% span(kingdom, family) | ||
``` | ||
|
||
Extract classification as a `data.frame` | ||
|
||
```{r} | ||
gethier(obj) | ||
``` | ||
|
||
## Taxonomic data.frame's | ||
|
||
Make one | ||
|
||
```{r} | ||
df <- data.frame(order = c('Asterales','Asterales','Fagales','Poales','Poales','Poales'), | ||
family = c('Asteraceae','Asteraceae','Fagaceae','Poaceae','Poaceae','Poaceae'), | ||
genus = c('Helianthus','Helianthus','Quercus','Poa','Festuca','Holodiscus'), | ||
stringsAsFactors = FALSE) | ||
(df2 <- taxon_df(df)) | ||
``` | ||
|
||
Parse - get rank order via `pick()` | ||
|
||
```{r} | ||
df2 %>% pick(order) | ||
``` | ||
|
||
get ranks order, family, and genus via `pick()` | ||
|
||
```{r} | ||
df2 %>% pick(order, family, genus) | ||
``` | ||
|
||
get range of names via `span()`, from rank `X` to rank `Y` | ||
|
||
```{r} | ||
df2 %>% span(family, genus) | ||
``` | ||
|
||
Separate each row into a `taxon` class (many `taxon` objects are a `taxa` class) | ||
|
||
```{r output.lines=1:20} | ||
scatter(df2) | ||
``` | ||
|
||
And you can re-assemble a data.frame from the output of `scatter()` with `assemble()` | ||
|
||
```{r} | ||
out <- scatter(df2) | ||
assemble(out) | ||
``` | ||
|
||
## Thoughts? | ||
|
||
I'm really curious what people think of `binomen`. I'm not sure how useful this will be in the wild. Try it. Let me know. Thanks much :) | ||
|
||
[binomencran]: https://cran.rstudio.com/web/packages/binomen |
Oops, something went wrong.