Skip to content

Commit

Permalink
added vignette, fix #19
Browse files Browse the repository at this point in the history
  • Loading branch information
sckott committed May 7, 2015
1 parent 391e572 commit d547517
Show file tree
Hide file tree
Showing 6 changed files with 751 additions and 0 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Expand Up @@ -3,3 +3,4 @@
.travis.yml
README.Rmd
appveyor.yml
Makefile
1 change: 1 addition & 0 deletions DESCRIPTION
Expand Up @@ -12,6 +12,7 @@ URL: https://github.com/ropensci/rerddap
BugReports: http://www.github.com/ropensci/rerddap/issues
LazyData: true
Roxygen: list(wrap = FALSE)
VignetteBuilder: knitr
Imports:
httr,
dplyr,
Expand Down
8 changes: 8 additions & 0 deletions Makefile
@@ -0,0 +1,8 @@
all: move rmd2md

move:
cp inst/vign/rerddap_vignette.md vignettes/

rmd2md:
cd vignettes;\
mv rerddap_vignette.md rerddap_vignette.Rmd
153 changes: 153 additions & 0 deletions inst/vign/rerddap_vignette.Rmd
@@ -0,0 +1,153 @@
<!--
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{rerddap introduction}
-->

```{r echo=FALSE}
library("knitr")
hook_output <- knitr::knit_hooks$get("output")
knitr::knit_hooks$set(output = function(x, options) {
lines <- options$output.lines
if (is.null(lines)) {
return(hook_output(x, options)) # pass to default hook
}
x <- unlist(strsplit(x, "\n"))
more <- "..."
if (length(lines) == 1) { # first n lines
if (length(x) > lines) {
# truncate the output, but add ....
x <- c(head(x, lines), more)
}
} else {
x <- c(if (abs(lines[1]) > 1) more else NULL,
x[lines],
if (length(x) > lines[abs(length(lines))]) more else NULL
)
}
# paste these lines together
x <- paste(c(x, ""), collapse = "\n")
hook_output(x, options)
})
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
warning = FALSE,
message = FALSE
)
```

rerddap introduction
====================

`rerddap` is a general purpose R client for working with ERDDAP servers. ERDDAP is a server built on top of OPenDAP, which serves some NOAA data. You can get gridded data ([griddap](http://upwell.pfeg.noaa.gov/erddap/griddap/documentation.html)), which lets you query from gridded datasets, or table data ([tabledap](http://upwell.pfeg.noaa.gov/erddap/tabledap/documentation.html)) which lets you query from tabular datasets. In terms of how we interface with them, there are similarties, but some differences too. We try to make a similar interface to both data types in `rerddap`.

## netCDF

`rerddap` supports netCDF format, and is the default when using the `griddap()` function. netCDF is a binary file format, and will have a much smaller footprint on your disk than csv. The binary file format means it's harder to inspect, but the `ncdf` and `ncdf4` packages make it easy to pull data out and write data back into a netCDF file. Note the the file extension for netCDF files is `.nc`. Whether you choose netCDF or csv for small files won't make much of a difference, but will with large files.

## Caching

Data files downloaded are cached in a single hidden directory `~/.rerddap` on your machine. It's hidden so that you don't accidentally delete the data, but you can still easily delete the data if you like.

When you use `griddap()` or `tabledap()` functions, we construct a MD5 hash from the base URL, and any query parameters - this way each query is separately cached. Once we have the hash, we look in `~/.rerddap` for a matching hash. If there's a match we use that file on disk - if no match, we make a http request for the data to the ERDDAP server you specify.

## ERDDAP servers

You can get a data.frame of ERDDAP servers using the function `servers()`. Most I think serve some kind of NOAA data, but there are a few that aren't NOAA data. If you know of more ERDDAP servers, send a pull request, or let us know.

## Install

Stable version from CRAN

```{r eval=FALSE}
install.packages("rerddap")
```

Or, the development version from GitHub

```{r eval=FALSE}
devtools::install_github("ropensci/rerddap")
```

```{r}
library("rerddap")
```

## Search

First, you likely want to search for data, specify either `griddadp` or `tabledap`

```{r}
ed_search(query = 'size', which = "table")
```

```{r}
ed_search(query = 'size', which = "grid")
```

## Information

Then you can get information on a single dataset

```{r output.lines=1:10}
info('whoi_62d0_9d64_c8ff')
```

## griddap (gridded) data

First, get information on a dataset to see time range, lat/long range, and variables.

```{r}
(out <- info('noaa_esrl_027d_0fb5_5d38'))
```

Then query for gridded data using the `griddap()` function

```{r}
(res <- griddap(out,
time = c('2012-01-01', '2012-01-30'),
latitude = c(21, 10),
longitude = c(-80, -70)
))
```

The output of `griddap()` is a list that you can explore further. Get the summary

```{r}
res$summary
```

Get the dimension variables

```{r}
names(res$summary$dim)
```

Get the data.frame (beware: you may want to just look at the `head` of the data.frame if large)

```{r}
res$data
```

## tabledap (tabular) data

```{r output.lines=1:10}
(out <- info('erdCalCOFIfshsiz'))
```

```{r}
(dat <- tabledap(out, 'time>=2001-07-07', 'time<=2001-07-10', fields = c('longitude', 'latitude', 'fish_size', 'itis_tsn', 'scientific_name')))
```

Since both `griddap()` and `tabledap()` give back data.frame's, it's easy to do downstream manipulation. For example, we can use `dplyr` to filter, summarize, group, and sort:

```{r}
library("dplyr")
dat$fish_size <- as.numeric(dat$fish_size)
tbl_df(dat) %>%
filter(fish_size > 30) %>%
group_by(scientific_name) %>%
summarise(mean_size = mean(fish_size)) %>%
arrange(desc(mean_size))
```

0 comments on commit d547517

Please sign in to comment.