added vignette, fix #19

ropensci · May 7, 2015 · d547517 · d547517
1 parent 391e572
commit d547517
Show file tree

Hide file tree

Showing 6 changed files with 751 additions and 0 deletions.
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -3,3 +3,4 @@
 .travis.yml
 README.Rmd
 appveyor.yml
+Makefile
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -12,6 +12,7 @@ URL: https://github.com/ropensci/rerddap
 BugReports: http://www.github.com/ropensci/rerddap/issues
 LazyData: true
 Roxygen: list(wrap = FALSE)
+VignetteBuilder: knitr
 Imports:
     httr,
     dplyr,

diff --git a/Makefile b/Makefile
@@ -0,0 +1,8 @@
+all: move rmd2md
+
+move:
+		cp inst/vign/rerddap_vignette.md vignettes/
+
+rmd2md:
+		cd vignettes;\
+		mv rerddap_vignette.md rerddap_vignette.Rmd
diff --git a/inst/vign/rerddap_vignette.Rmd b/inst/vign/rerddap_vignette.Rmd
@@ -0,0 +1,153 @@
+<!--
+%\VignetteEngine{knitr::knitr}
+%\VignetteIndexEntry{rerddap introduction}
+-->
+
+```{r echo=FALSE}
+library("knitr")
+hook_output <- knitr::knit_hooks$get("output")
+knitr::knit_hooks$set(output = function(x, options) {
+   lines <- options$output.lines
+   if (is.null(lines)) {
+     return(hook_output(x, options))  # pass to default hook
+   }
+   x <- unlist(strsplit(x, "\n"))
+   more <- "..."
+   if (length(lines) == 1) {        # first n lines
+     if (length(x) > lines) {
+       # truncate the output, but add ....
+       x <- c(head(x, lines), more)
+     }
+   } else {
+     x <- c(if (abs(lines[1]) > 1) more else NULL,
+            x[lines],
+            if (length(x) > lines[abs(length(lines))]) more else NULL
+           )
+   }
+   # paste these lines together
+   x <- paste(c(x, ""), collapse = "\n")
+   hook_output(x, options)
+ })
+
+knitr::opts_chunk$set(
+  comment = "#>",
+  collapse = TRUE,
+  warning = FALSE,
+  message = FALSE
+)
+```
+
+rerddap introduction
+====================
+
+`rerddap` is a general purpose R client for working with ERDDAP servers. ERDDAP is a server built on top of OPenDAP, which serves some NOAA data. You can get gridded data ([griddap](http://upwell.pfeg.noaa.gov/erddap/griddap/documentation.html)), which lets you query from gridded datasets, or table data ([tabledap](http://upwell.pfeg.noaa.gov/erddap/tabledap/documentation.html)) which lets you query from tabular datasets. In terms of how we interface with them, there are similarties, but some differences too. We try to make a similar interface to both data types in `rerddap`.
+
+## netCDF
+
+`rerddap` supports netCDF format, and is the default when using the `griddap()` function. netCDF is a binary file format, and will have a much smaller footprint on your disk than csv. The binary file format means it's harder to inspect, but the `ncdf` and `ncdf4` packages make it easy to pull data out and write data back into a netCDF file. Note the the file extension for netCDF files is `.nc`. Whether you choose netCDF or csv for small files won't make much of a difference, but will with large files.
+
+## Caching
+
+Data files downloaded are cached in a single hidden directory `~/.rerddap` on your machine. It's hidden so that you don't accidentally delete the data, but you can still easily delete the data if you like. 
+
+When you use `griddap()` or `tabledap()` functions, we construct a MD5 hash from the base URL, and any query parameters - this way each query is separately cached. Once we have the hash, we look in `~/.rerddap` for a matching hash. If there's a match we use that file on disk - if no match, we make a http request for the data to the ERDDAP server you specify. 
+
+## ERDDAP servers
+
+You can get a data.frame of ERDDAP servers using the function `servers()`. Most I think serve some kind of NOAA data, but there are a few that aren't NOAA data.  If you know of more ERDDAP servers, send a pull request, or let us know. 
+
+## Install
+
+Stable version from CRAN
+
+```{r eval=FALSE}
+install.packages("rerddap")
+```
+
+Or, the development version from GitHub
+
+```{r eval=FALSE}
+devtools::install_github("ropensci/rerddap")
+```
+
+```{r}
+library("rerddap")
+```
+
+## Search
+
+First, you likely want to search for data, specify either `griddadp` or `tabledap`
+
+```{r}
+ed_search(query = 'size', which = "table")
+```
+
+```{r}
+ed_search(query = 'size', which = "grid")
+```
+
+## Information
+
+Then you can get information on a single dataset
+
+```{r output.lines=1:10}
+info('whoi_62d0_9d64_c8ff')
+```
+
+## griddap (gridded) data
+
+First, get information on a dataset to see time range, lat/long range, and variables.
+
+```{r}
+(out <- info('noaa_esrl_027d_0fb5_5d38'))
+```
+
+Then query for gridded data using the `griddap()` function
+
+```{r}
+(res <- griddap(out,
+  time = c('2012-01-01', '2012-01-30'),
+  latitude = c(21, 10),
+  longitude = c(-80, -70)
+))
+```
+
+The output of `griddap()` is a list that you can explore further. Get the summary
+
+```{r}
+res$summary
+```
+
+Get the dimension variables
+
+```{r}
+names(res$summary$dim)
+```
+
+Get the data.frame (beware: you may want to just look at the `head` of the data.frame if large)
+
+```{r}
+res$data
+```
+
+## tabledap (tabular) data
+
+```{r output.lines=1:10}
+(out <- info('erdCalCOFIfshsiz'))
+```
+
+```{r}
+(dat <- tabledap(out, 'time>=2001-07-07', 'time<=2001-07-10', fields = c('longitude', 'latitude', 'fish_size', 'itis_tsn', 'scientific_name')))
+```
+
+Since both `griddap()` and `tabledap()` give back data.frame's, it's easy to do downstream manipulation. For example, we can use `dplyr` to filter, summarize, group, and sort:
+
+```{r}
+library("dplyr")
+dat$fish_size <- as.numeric(dat$fish_size)
+tbl_df(dat) %>% 
+  filter(fish_size > 30) %>% 
+  group_by(scientific_name) %>% 
+  summarise(mean_size = mean(fish_size)) %>% 
+  arrange(desc(mean_size))
+```