This package ease the processing of NCAR Catchement Attributes and Meteorlogy for Large-sample Studies (CAMELS) data using R using a set of USGS stream gages (here for more). This package offers a function for downloading the data automatically, are it can be downloaded at the following site:
CAMELS data: https://ral.ucar.edu/solutions/products/camels
671 small - medium size catchments over the contiguous US (CONUS) minimally impacted by human activities.
2 main type of daily time-series:
- Daily atmospheric forcing (source: Daymet, Maurer and NLDAS)
- Hydrologic reponse (source: USGS daily streamflow)
Attributes data (climatologies):
- topography
- climate
- streamflow
- land cover
- soil
- geology
code creating the data: https://github.com/naddor/camels
install.packages("devtools")
devtools::install_github("kylemonper/CamelsQuery")
- download CAMELS data remotely
- run and use the
extract_huc_data
function to query and visualize data from the CAMELS dataset - use the
get_sample_data
function to get usgs streamgauge data
library(CamelsQuery)
Data can be download manually from https://ral.ucar.edu/solutions/products/camels. This package requires that specifically the following be downloaded from there:
- 1.2 CAMELS time series meteorology, observed flow, meta data (.zip)
- 2.0 CAMELS Attributes (.zip)
alternatively the download_camels()
function can be used to automatically download and unzip the data into a folder of the user's choice
### specify new folder to be created
#~ note: this must be a nonexistant new folder within an already existing folder:
#~ here this is simply creating a camels_data folder within the users home directory
data_dir <- "~/CAMELS_data"
download_camels(data_dir)
- basin directory (basin_dir)
- This is the location of the
basin_data_public_v1p2
folder. From this directory you should be able to further navigate to desired daymet mean forcing data folders (labeled 01, 02, 03, etc) via : "~/home/basin_dataset_public_v1p2/basin_mean_forcing/daymet" , and the streamflow folders should be in: "~/home/basin_dataset_public_v1p2/usgs_streamflow" . This exact folder structure is required for the function to work properly (if you used the download function or downloaded from the correct locations mentioned above, this shouldn't be an issue)
- This is the location of the
- attribute directory (attr_dir)
- location of .txt files for data attributes (camels_clim.txt, camels_geol.txt, etc)
- huc ids (huc8_names)
- a vector of 8 digit huc 8 ids to be queried
##~ directories
basin_dir <- "~/CAMELS_data/basin_dataset_public_v1p2"
attr_dir <- "~/CAMELS_data/camels_attributes_v2.0"
##~ list of hucs to query (provided as a vector)
huc8_names <- c("01013500", "08269000", "10259200")
### run function
##~ this returns a named list object with 9 items
data <- extract_huc_data(basin_dir = basin_dir,
attr_dir = attr_dir,
huc8_names = huc8_names)
names(data)
## [1] "mean_forcing_daymet" "usgs_streamflow" "camels_clim"
## [4] "camels_geol" "camels_hydro" "camels_name"
## [7] "camels_soil" "camels_topo" "camels_vege"
mean_forcing <- data$mean_forcing_daymet
### an alternative using [[]] syntax:
##~ mean_forcing <- data[["mean_forcing_daymet"]]
## OR, because this is the first item in the list:
##~ mean_forcing <- data[[1]]
##this returns a tibble/data frame containing the mean forcing data
str(mean_forcing)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 38352 obs. of 12 variables:
## $ ID : chr "01013500" "01013500" "01013500" "01013500" ...
## $ Year : num 1980 1980 1980 1980 1980 1980 1980 1980 1980 1980 ...
## $ Mnth : chr "01" "01" "01" "01" ...
## $ Day : chr "01" "02" "03" "04" ...
## $ Hr : num 12 12 12 12 12 12 12 12 12 12 ...
## $ dayl(s) : num 30173 30253 30344 30408 30413 ...
## $ prcp(mm/day): num 0 0 0 0 0 0 6.69 3.64 0 0 ...
## $ srad(W/m2) : num 153 145 147 146 170 ...
## $ swe(mm) : num 0 0 0 0 0 0 0 0 0 0 ...
## $ tmax(C) : num -6.54 -6.18 -9.89 -10.98 -11.29 ...
## $ tmin(C) : num -16.3 -15.2 -18.9 -19.8 -22.2 ...
## $ vp(Pa) : num 172 186 138 120 118 ...
## furthermore, we can see that each of the hucs we entered are present
unique(mean_forcing$ID)
## [1] "01013500" "08269000" "10259200"
library(ggplot2)
library(lubridate) # for dates
library(janitor) # clean column names
## first, rename huc ID's with location from camels_name, this isn't necessary, but makes for more informative labels
locs <- data[["camels_name"]]
names(locs)
## [1] "gauge_id" "huc_02" "gauge_name"
# clean column names
cleaned_forcing <- clean_names(mean_forcing)
## join data (by ID) to bring in gauge names
cleaned_names <- left_join(cleaned_forcing, locs, by = c("id" = "gauge_id"))
### turn year, month, day columns into single "date" column
mean_forcing_date <- cleaned_names %>%
## join columns, forcing into year, month, day format
mutate(date = ymd(paste(year, mnth, day, sep = "-")))
ggplot(mean_forcing_date, aes(date, prcp_mm_day)) +
geom_line() +
facet_wrap(~gauge_name)
The get_sample_data
is a user-friendly wrapper for a function from the USGS's dataRetrieval function to pull water quality data from selected stream gauges
## read in gauges of interest
gauges <- readr::read_csv("~/CAMELS_data/USGS_trial_sites.csv")
### dataRetrieval functions require sites to be named using an "Agency-Site#" format, this code reformats the trial sites csv into this format
#~ eg: "USGS-01073319""
gauges_new <- gauges %>%
mutate(renamed_site = paste(SiteAgency, "-", SiteNumber, sep = ""))
site_names <- gauges_new$renamed_site
sample_data <- get_sample_data(site_names)
## no sample data found for sites:
## USGS-01100500
## USGS-01100693
## USGS-04182950
## USGS-04182900
## USGS-04182830
## USGS-04181120
## USGS-04180988
## Warning in readNWISpCode(pcodes): The following
## pCodes seem mistyped, and no information was returned:
## 92687,93144,92266,92207,92472,92847,92793,92451
## look at nitrogen species for all sites over time
N_spp <- sample_data %>%
filter(CharacteristicName == "Nitrogen, mixed forms (NH3), (NH4), organic, (NO2) and (NO3)")
ggplot(N_spp, aes(x = ActivityStartDate, y = ResultMeasureValue, color = MonitoringLocationIdentifier)) +
geom_point()