Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
6 changed files
with
751 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,4 @@ | |
.travis.yml | ||
README.Rmd | ||
appveyor.yml | ||
Makefile |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
all: move rmd2md | ||
|
||
move: | ||
cp inst/vign/rerddap_vignette.md vignettes/ | ||
|
||
rmd2md: | ||
cd vignettes;\ | ||
mv rerddap_vignette.md rerddap_vignette.Rmd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
<!-- | ||
%\VignetteEngine{knitr::knitr} | ||
%\VignetteIndexEntry{rerddap introduction} | ||
--> | ||
|
||
```{r echo=FALSE} | ||
library("knitr") | ||
hook_output <- knitr::knit_hooks$get("output") | ||
knitr::knit_hooks$set(output = function(x, options) { | ||
lines <- options$output.lines | ||
if (is.null(lines)) { | ||
return(hook_output(x, options)) # pass to default hook | ||
} | ||
x <- unlist(strsplit(x, "\n")) | ||
more <- "..." | ||
if (length(lines) == 1) { # first n lines | ||
if (length(x) > lines) { | ||
# truncate the output, but add .... | ||
x <- c(head(x, lines), more) | ||
} | ||
} else { | ||
x <- c(if (abs(lines[1]) > 1) more else NULL, | ||
x[lines], | ||
if (length(x) > lines[abs(length(lines))]) more else NULL | ||
) | ||
} | ||
# paste these lines together | ||
x <- paste(c(x, ""), collapse = "\n") | ||
hook_output(x, options) | ||
}) | ||
knitr::opts_chunk$set( | ||
comment = "#>", | ||
collapse = TRUE, | ||
warning = FALSE, | ||
message = FALSE | ||
) | ||
``` | ||
|
||
rerddap introduction | ||
==================== | ||
|
||
`rerddap` is a general purpose R client for working with ERDDAP servers. ERDDAP is a server built on top of OPenDAP, which serves some NOAA data. You can get gridded data ([griddap](http://upwell.pfeg.noaa.gov/erddap/griddap/documentation.html)), which lets you query from gridded datasets, or table data ([tabledap](http://upwell.pfeg.noaa.gov/erddap/tabledap/documentation.html)) which lets you query from tabular datasets. In terms of how we interface with them, there are similarties, but some differences too. We try to make a similar interface to both data types in `rerddap`. | ||
|
||
## netCDF | ||
|
||
`rerddap` supports netCDF format, and is the default when using the `griddap()` function. netCDF is a binary file format, and will have a much smaller footprint on your disk than csv. The binary file format means it's harder to inspect, but the `ncdf` and `ncdf4` packages make it easy to pull data out and write data back into a netCDF file. Note the the file extension for netCDF files is `.nc`. Whether you choose netCDF or csv for small files won't make much of a difference, but will with large files. | ||
|
||
## Caching | ||
|
||
Data files downloaded are cached in a single hidden directory `~/.rerddap` on your machine. It's hidden so that you don't accidentally delete the data, but you can still easily delete the data if you like. | ||
|
||
When you use `griddap()` or `tabledap()` functions, we construct a MD5 hash from the base URL, and any query parameters - this way each query is separately cached. Once we have the hash, we look in `~/.rerddap` for a matching hash. If there's a match we use that file on disk - if no match, we make a http request for the data to the ERDDAP server you specify. | ||
|
||
## ERDDAP servers | ||
|
||
You can get a data.frame of ERDDAP servers using the function `servers()`. Most I think serve some kind of NOAA data, but there are a few that aren't NOAA data. If you know of more ERDDAP servers, send a pull request, or let us know. | ||
|
||
## Install | ||
|
||
Stable version from CRAN | ||
|
||
```{r eval=FALSE} | ||
install.packages("rerddap") | ||
``` | ||
|
||
Or, the development version from GitHub | ||
|
||
```{r eval=FALSE} | ||
devtools::install_github("ropensci/rerddap") | ||
``` | ||
|
||
```{r} | ||
library("rerddap") | ||
``` | ||
|
||
## Search | ||
|
||
First, you likely want to search for data, specify either `griddadp` or `tabledap` | ||
|
||
```{r} | ||
ed_search(query = 'size', which = "table") | ||
``` | ||
|
||
```{r} | ||
ed_search(query = 'size', which = "grid") | ||
``` | ||
|
||
## Information | ||
|
||
Then you can get information on a single dataset | ||
|
||
```{r output.lines=1:10} | ||
info('whoi_62d0_9d64_c8ff') | ||
``` | ||
|
||
## griddap (gridded) data | ||
|
||
First, get information on a dataset to see time range, lat/long range, and variables. | ||
|
||
```{r} | ||
(out <- info('noaa_esrl_027d_0fb5_5d38')) | ||
``` | ||
|
||
Then query for gridded data using the `griddap()` function | ||
|
||
```{r} | ||
(res <- griddap(out, | ||
time = c('2012-01-01', '2012-01-30'), | ||
latitude = c(21, 10), | ||
longitude = c(-80, -70) | ||
)) | ||
``` | ||
|
||
The output of `griddap()` is a list that you can explore further. Get the summary | ||
|
||
```{r} | ||
res$summary | ||
``` | ||
|
||
Get the dimension variables | ||
|
||
```{r} | ||
names(res$summary$dim) | ||
``` | ||
|
||
Get the data.frame (beware: you may want to just look at the `head` of the data.frame if large) | ||
|
||
```{r} | ||
res$data | ||
``` | ||
|
||
## tabledap (tabular) data | ||
|
||
```{r output.lines=1:10} | ||
(out <- info('erdCalCOFIfshsiz')) | ||
``` | ||
|
||
```{r} | ||
(dat <- tabledap(out, 'time>=2001-07-07', 'time<=2001-07-10', fields = c('longitude', 'latitude', 'fish_size', 'itis_tsn', 'scientific_name'))) | ||
``` | ||
|
||
Since both `griddap()` and `tabledap()` give back data.frame's, it's easy to do downstream manipulation. For example, we can use `dplyr` to filter, summarize, group, and sort: | ||
|
||
```{r} | ||
library("dplyr") | ||
dat$fish_size <- as.numeric(dat$fish_size) | ||
tbl_df(dat) %>% | ||
filter(fish_size > 30) %>% | ||
group_by(scientific_name) %>% | ||
summarise(mean_size = mean(fish_size)) %>% | ||
arrange(desc(mean_size)) | ||
``` |
Oops, something went wrong.