# Background 

Improving data sharing will enable timely access to data, enhance the quality of data, and create clear channels for better management decisions.  Healthy aquatic habitat is critical to fishes, aquatic species, and water quality. Across the country long term, large scale stream habitat monitoring programs collect data for their specific objectives and within their jurisdictional boundaries.  Streams cross jurisdictional boundaries and face unprecedented pressure from changing climate, multi-use public lands, and development. To meet these stressors, we need a way to ingratiate data from multiple sources to create indicators of stream conditions across jurisdictional boundaries. As a pilot, we will focus on integrating data from the EPA National Aquatic Resources Surveys (NARS); BLM Aquatic Assessment, Inventory, and Monitoring; and USFS Aquatic and Riparian Effective Monitoring Program (AREMP) and Pacfish/Infish Biological Opinion Effectiveness Monitoring Program (PIBO). We will build infrastructure to integrate a subset of metrics collected on public lands in the Western United States and document metadata in MonitoringResources.org.

This Aquatic Habitat Analysis package will integrate aquatic habitat data from multiple projects and provide access and analysis of aquatic habitat data status and trends across jurisdictional boundaries. 



In [1]:
install.packages("xlsx", dependency=TRUE)
library(xlsx)

install.packages("openxlsx")
library(openxlsx)

install.packages("tidyverse")
library(tidyverse)

#Packages to create well formated tables in RStuido 
install.packages("knitr")
library(knitr)
install.packages("kableExtra")
library(kableExtra)

# packages for downloading and formaing shapefiles 
install.packages("downloader")
library(downloader)

#, repos="http://cran.cnr.berkeley.edu"

install.packages("rgdal")
library(rgdal)

install.packages("RCurl")
library(RCurl)



package 'xlsx' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\Rtmpgzd44S\downloaded_packages


"package 'xlsx' was built under R version 3.6.3"


  There is a binary version available but the source version is later:
         binary source needs_compilation
openxlsx  4.1.4  4.1.5              TRUE

  Binaries will be installed
package 'openxlsx' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\Rtmpgzd44S\downloaded_packages


"package 'openxlsx' was built under R version 3.6.3"
Attaching package: 'openxlsx'

The following objects are masked from 'package:xlsx':

    createWorkbook, loadWorkbook, read.xlsx, saveWorkbook, write.xlsx



package 'tidyverse' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\Rtmpgzd44S\downloaded_packages


"package 'tidyverse' was built under R version 3.6.3"-- Attaching packages --------------------------------------- tidyverse 1.3.0 --
v ggplot2 3.3.0     v purrr   0.3.4
v tibble  3.0.1     v dplyr   0.8.5
v tidyr   1.0.2     v stringr 1.4.0
v readr   1.3.1     v forcats 0.4.0
"package 'dplyr' was built under R version 3.6.3"-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()


package 'knitr' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\Rtmpgzd44S\downloaded_packages


"package 'knitr' was built under R version 3.6.3"

package 'kableExtra' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\Rtmpgzd44S\downloaded_packages


"package 'kableExtra' was built under R version 3.6.3"
Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows



package 'downloader' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\Rtmpgzd44S\downloaded_packages


"package 'downloader' was built under R version 3.6.3"

package 'rgdal' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\Rtmpgzd44S\downloaded_packages


"package 'rgdal' was built under R version 3.6.3"Loading required package: sp
"package 'sp' was built under R version 3.6.3"rgdal: version: 1.4-8, (SVN revision 845)
 Geospatial Data Abstraction Library extensions to R successfully loaded
 Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
 Path to GDAL shared files: C:/Users/rscully/Anaconda3/envs/StreamHabitat/Lib/R/library/rgdal/gdal
 GDAL binary built with GEOS: TRUE 
 Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
 Path to PROJ.4 shared files: C:/Users/rscully/Anaconda3/envs/StreamHabitat/Lib/R/library/rgdal/proj
 Linking to sp version: 1.4-1 


package 'RCurl' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\Rtmpgzd44S\downloaded_packages


"package 'RCurl' was built under R version 3.6.3"
Attaching package: 'RCurl'

The following object is masked from 'package:tidyr':

    complete



# Program Information 
In this Jupyter Notebook, we integrate data from three different aquatic monitoring programs; EPA National Aquatic Resources Surveys (NARS); BLM Aquatic Assessment, Inventory, and Monitoring; and USFS Aquatic and Riparian Effective Monitoring Program (AREMP). We use the data exchange schema shared on ScienceBase: https://www.sciencebase.gov/catalog/item/5e7cf61be4b01d5092751d0f. We included the Pacfish/Infish Biological Opinion Effectiveness Monitoring Program (PIBO) metadata information. Still, we do not include their data in the integrated dataset, because their data is not accessible outside the USFS firewall. 

See the table below for information about each of the four programs. 
 

In [2]:
#Get and display the program data
wd <- getwd()
metadata_name <- paste0(wd, "/Data/Metadata.xlsx")
program_info <- as_tibble(read.xlsx(metadata_name, 1))
program_info 


Entity,Bureau.of.Land.Management,US.Forest.Service.and.Bureau.of.Land.Management,Environmental.Protection.Agency,US.Forest.Service
Program,"Assessment, Inventory, and Monitoring",Aquatic and Riparian Effective Monitoring Program,National Rivers and Streams Assessment,PACFISH/INFISH Biological Opinion Effectiveness Monitoring
Abbreviation,AIM Aquatic,AREMP,NRSA,PIBO EM
Primary Objective,"A consistent, quantitative approach for determining the attainment of BLM land health standards for perennial wadeable streams and rivers, among other applications. Monitoring objectives are established by project managers and will determine the number of reaches to be sampled and whether a randomized, targeted, or mixed site selection approach is appropriate.",Status and Trend,"A collaborative survey that provides information on the ecological condition of the nation’s rivers and streams and the key stressors that affect them, both on a national and an ecoregional scale. The goals of the NRSA are to determine the extent to which rivers and streams support a healthy biological condition and the extent of major stressors that affect them. The survey supports a longer-term goal: to determine whether our rivers and streams are getting cleaner and how we might best invest in protecting and restoring them.","The primary objective is to determine whether priority biological and physical attributes, processes, and functions of riparian and aquatic systems are being degraded, maintained, or restored in the PIBO monitoring area."
Year Started,2011,2002,2008,1998
Site Length,"20 x bankfull width, a minimum of 150m","20x bankfull width categories, range 150m-500m","20x channel-widths, a minimum of 150m","20x bankfull channel widths, range from 160m-480m"
Spatial Design,"Stratified GRTS, Targeted sites",Non-stratified GRTS,GRTS,"GRTS, Targeted sites"
Target Population,Surveys conducted in wadeable streams within BLM Bruneau Field Offices,Surveys conducted in wadeable stream within the Northwest Forest Plan Area in Watersheds with at least 25% of the 1:100K streams layers within federal ownership.,"The target population consists of all streams and rivers within the 48 contiguous states that have flowing water during the study index period, including major rivers and small streams. Sites must have > 50% of the reach length with standing water, and sites with water in less than 50% of the reach length must be dropped. All sites must be sampled during base flow conditions. The target population excludes: • Tidal rivers and streams up to head of salt (defined as < 0.5 ppt for this study). • Run-of-the-river ponds and reservoirs with greater than seven day residence time.",Target popultion cosists of 6th fieldhydrologic unit code (HUC) watersheds within perennial streams and greater than 50 percent Federal ownership above the sample reach.
Master Sample Details,Resolution ?,,,
Additional Spatial Design Details,,,"The survey design consists of two separate designs to address the dual objectives of: (1)estimating current status and (2) estimating change in status for all flowing waters: • Resample design applied to NRSA 2008/09 and NRSA 2013/14 sites. • New site design for NRSA 2018/19. The survey design is explicitly stratified by state for both designs. The unequal probability categories are specific to the survey design used for the NRSA 2008/09, NRSA 2013/14, and NRSA 2018/19. In all cases the categories are specific combinations of Strahler order categories and nine National Aquatic Resource Survey (NARS) aggregated ecoregions. In addition, a minimum of 20 sites (Resample and New) was guaranteed in each state and a maximum of 75 sites was the limit for an individual state. There are 983 unique sites in the Resample Design and 825 unique sites in the New Site Design. Approximately 10% of the total NRSA sites are scheduled for repeated sampling (revisit sites) in the same year of the two year NRSA field cycle. The sample frame was derived from the medium resolution National Hydrography Dataset (NHD), in particular NHDPlus V2. Additional details on the NRSA survey design are found in the National Rivers and Streams Assessment Survey Design: 2018/19 documents.",
Base Temporal Scale,Annual,Annual,"2008-2009, 2013-2014, 2018-2019",Annual (5-year rotating panel)


<h3> Program Metrics </h3> 
We compiled metadata from the each of the four programs descibing the metrics calculated by each program. 


In [3]:
#Load the metric list and cross walk 
metadata <-read.xlsx(metadata_name, 3)
metadata_summary<- metadata %>%
  group_by(Category)%>%
  count(Category, sort=TRUE)

In [4]:
s = sum(metadata_summary[,2])
paste("Across the 4 programs a total of," ,s, "metrics are calculated")

A varity of metrics are calculated within standard cataogries. 

In [5]:
metadata_summary

Category,n
Macroinvertebrates,72
Substrate,31
Channel Characteristics,28
Channel dimensions,25
Wood,24
Location,20
Identification,16
Human disturbance,14
Bed stability,13
Streambanks,13


Only a small subset of metrics are calcuated by 3 or more of the programs.  

In [6]:
file <- paste(wd,"/Code/Data_Organize/create_list_of_metrics.R",sep='')
source(file)
number_of_programs <- 3 #user can variabe to understand the metrics calculated across a number of programs 
metrics_list <- metrics(number_of_programs) 
count <-dim (metrics_list)
paste("There are", count[1], "metrics calculate across", number_of_programs, "programs. The table below shows the subset of metrics calculated at", number_of_programs, "or more")


"funs() is soft deprecated as of dplyr 0.8.0
Please use a list of either functions or lambdas: 

  # Simple named list: 
  list(mean = mean, median = median)

  # Auto named with `tibble::lst()`: 
  tibble::lst(mean, median)

  # Using lambdas
  list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))

In [7]:
metrics_list

Category,LongName,Field,AREMPColumn,BLMColumn,EPA2008Column,PIBOColumn,Count.of.Program
Location,From the Dataset Location Identification,verbatimLocation,site_survey_id,UID,UID,RchID,4
Location,From the Data Set Latitude,verbatimLatitude,lattitude,BTMLAT,LAT_DD83,Lat,4
Location,From the dataset Longitude,verbatimLongitude,longitude,BTMLONG,LON_DD83,Long,4
Location,State From the Dataset,State,STATES,ADMIN_ST,STATE,State,4
Location,FS Region or BLM State Office,Region,STATES,State,,Region,3
Darwin Core - Event,Site Identification,locationID,GLOBALID,MS_CD,SITE_ID,SiteID,4
Darwin Core - Event,Sample Year,Year,survey_year,,YEAR,Yr,3
Channel dimensions,Average bankfull width from transects,BFWidth,average_bfwidth,BNKFLL_WT,XBKF_W,Bf,4
Channel dimensions,Gradient of stream reach,Grad,gradient,SLPE,XSLOPE,Grad,4
Channel dimensions,Length of sampling reach,RchLen,REACH_LENGTH,TOT_RCH_LEN,REACHLEN,RchLen,4


Just because the metrics are calculated across, it does not mean the data is compadiliy. Difference is field data collection methods can impact the method results. Using worok done by the BLM and other programs we outline compabiltiy hereL <a href="url">Review fild method compadbility</a> I HAVE NOT COMPLTED THAT DOCUMENT YET> 

In [8]:
install.packages("sf")
install.packages("httr")
install.packages("data.table")
install.packages("tmap")


package 'sf' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\Rtmpgzd44S\downloaded_packages
package 'httr' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\Rtmpgzd44S\downloaded_packages
package 'data.table' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\Rtmpgzd44S\downloaded_packages
package 'tmap' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\Rtmpgzd44S\downloaded_packages


In [None]:
library(sf)
 

In [None]:
install.packages("tmap")    
library(tmap)


In [None]:
 library(httr)

In [None]:
 library(data.table)

# Data Sources 
## Metric Data 

Three of the four habitat programs store metric level data online, BLM AIM, EPA NARS and USFS AREMP. 

EPA NARS and USFS AREMP post data at an give interval, BLM had an ArcGIS server they update yealy. 


<h3>EPA Rivers and Streams Data </h3>
EPA collects data across the Unites States, they shre their metric level data here https://www.epa.gov/national-aquatic-resource-surveys/data-national-aquatic-resource-surveys. There are data sets for each two year collection period, 2004-2005 and 2008-2009. The 2018-2019 data set is not yet posted we will update this text when it is releasd. 

The EPA data provided some challenges. Each year of the data differs; the dataset has different titles, different field names for the same metrics, different formats.  For each set of data, 2004-2005 and 2008-2009 we build one containing the information we want to integrate into this analysis and the fields outlined in the data exchange schema. 

You can find the metadata and the data used to create one dataset for eacy year range on ScienceBase: 
<ul> 
<li><a href=" https://www.sciencebase.gov/catalog/item/5ea9d6a082cefae35a21ba5a"> 2004-2005 ScienceBase Item</a></li>
<li><a href=" https://www.sciencebase.gov/catalog/item/5e3db6a4e4b0edb47be3d602"> 2008-2009 ScienceBase Item</a></li>
</ul>


<h3>USFS AREMP Data </h3>
AREMP published thier data as a geodatabase after they release their five year report here: <a href="https://www.fs.fed.us/r6/reo/monitoring/watershed/"> https://www.fs.fed.us/r6/reo/monitoring/watershed/</a>. The data was last updated in 2015. 

You can find the metadata and the data used to create one dataset for on ScienceBase: 
<ul> 
<li><a href="https://www.sciencebase.gov/catalog/item/5e3dbb2ee4b0edb47be3d646"> AREMP ScienceBase Item</a></li>
</ul>

<h3>BLM AIM Data </h3>
BLM published thier data yearly on an ArcGIS server here: <a href="https://landscape.blm.gov/geoportal/rest/find/document;jsessionid=F4427530B569FCCAEC19400F8C19E345?searchText=isPartOf%3AAIM&contentType=liveData&start=1&max=10&f=searchpage"> https://landscape.blm.gov/geoportal/rest/find/document;jsessionid=F4427530B569FCCAEC19400F8C19E345?searchText=isPartOf%3AAIM&contentType=liveData&start=1&max=10&f=searchpage</a>. 

You can find the metadata and the data used to create one dataset for on ScienceBase: 
<ul> 
<li><a href="https://www.sciencebase.gov/catalog/item/5e3c61ffe4b0edb47be0ef27"> BLM ScienceBase Item</a></li>
</ul>

In [None]:
 url <- list(hostname = "gis.blm.gov/arcgis/rest/services",
                                scheme = "https",
                                path = "hydrography/BLM_Natl_AIM_AquADat/MapServer/0/query",
                                query = list(
                                  where = "1=1",
                                  outFields = "*",
                                  returnGeometry = "true",
                                  f = "geojson")) %>% 
                      setattr("class", "url")
                    request <- build_url(url)
                    BLM <- st_read(request, stringsAsFactors = TRUE) #Load the file from the Data file 
                    data <- as_tibble(BLM)

# Create one Data File 
To complete anaysis we create one data frame from the three  program.

In [None]:
file <- paste0(wd,"/Code/data_organize/creating one dataframe_for loop.R")
#file <- path.expand("Code/data_organize/creating one dataframe_for loop.R")
#source("Code/data_organize/creating one dataframe_for loop.R")
source(file)
data<-one_data_frame()


package 'dplyr' successfully unpacked and MD5 sums checked


"restored 'dplyr'"


The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\RtmpIT2llB\downloaded_packages

  There is a binary version available but the source version is later:
         binary source needs_compilation
openxlsx  4.1.4  4.1.5              TRUE

  Binaries will be installed
package 'openxlsx' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\RtmpIT2llB\downloaded_packages
package 'tidyverse' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\RtmpIT2llB\downloaded_packages
package 'readxl' successfully unpacked and MD5 sums checked


"restored 'readxl'"


The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\RtmpIT2llB\downloaded_packages
package 'sbtools' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\rscully\AppData\Local\Temp\1\RtmpIT2llB\downloaded_packages


"package 'dplyr' was built under R version 3.6.3"
Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

"package 'tidyverse' was built under R version 3.6.3"-- Attaching packages --------------------------------------- tidyverse 1.3.0 --
v ggplot2 3.3.0     v purrr   0.3.4
v tibble  3.0.1     v stringr 1.4.0
v tidyr   1.0.2     v forcats 0.4.0
v readr   1.3.1     
"package 'purrr' was built under R version 3.6.3"-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
"package 'sf' was built under R version 3.6.3"Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
"package 'tmap' was built under R version 3.6.3"

# Explore In Stream Tributary Habitat Data
As a proof of concept for this analysis package, we picked 2-3 metrics (category of metrics?) as a test case to build a data dictionary, intergrade data from multiple sources and design infrastructure to serve data to interested parties.  We selected metrics that are responsive to management decisions, have low measure error as defined in the lititure (Kershner and Roper 2010, ##add other references), the method comparison (<font color="red">insert link </font>) work completed by this group. Special consideration was given to metrics identified in the <a href="https://docs.google.com/spreadsheets/d/1zeLDBvNtEaw21LR6vcnCEsOZeW5C6cNAzgTpnz8D4XI/edit#gid=596617989">speed dating exercise</a> speed dating exercise as of high importance by the three programs. 

I CAN'T Get this to run? I think it has something do do with loading packages. 



In [None]:
install.packages('leaflet')
library(leaflet)
install.packages('dplyr')
library(dplyr)
install.packages('leaflet.extras')
library(leaflet.extras)
install.packages('DT')
library(DT)
install.packages('ggplot2')
library(ggplot2)

In [None]:
getwd()

In [None]:
#install.packages('shiny', dependencies = TRUE)
library(shiny)
#install.packages('tidyverse')
library(tidyverse)
#install.packages('leaflet')
library(leaflet)
#install.packages('dplyr')
library(dplyr)
#install.packages('leaflet.extras')
library(leaflet.extras)
#install.packages('DT')
library(DT)
#install.packages('ggplot2')
library(ggplot2)

In [1]:
library(shiny)
app <- paste0(wd,"code/Visualization/app.R")
runApp(appDir=app)


"package 'tidyverse' was built under R version 3.6.3"-- Attaching packages --------------------------------------- tidyverse 1.3.0 --
v ggplot2 3.3.0     v purrr   0.3.4
v tibble  3.0.1     v dplyr   0.8.5
v tidyr   1.0.2     v stringr 1.4.0
v readr   1.3.1     v forcats 0.4.0
"package 'dplyr' was built under R version 3.6.3"-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
"package 'DT' was built under R version 3.6.3"
Attaching package: 'DT'

The following objects are masked from 'package:shiny':

    dataTableOutput, renderDataTable



ERROR: Error in tag("div", list(...)): argument is missing, with no default


In [None]:
#Load the pallet for the map 
pal <- colorFactor(rainbow(3), data$Program)

m <- data%>% 
  leaflet() %>%
  addTiles() %>%
  addCircles(lng=~verbatimLongitude, lat= ~verbatimLatitude, color=~pal(Program), 
             popup= ~paste0("<b>",  Program, "</b>", 
                            "<br>", "<b>", "EventID ", "</b>",  eventID, "</br>",
                            "<br>", "<b>", "LocationID ", "</b>",locationID,  "</br>",
                            "<br>", "<b>", "Year ", "</b>", Year,    "</br>",
                            "<br>", "<b>", "Date ", "</b>", verbatimEventDate,    "</br>", 
                            "<br>"))%>%
                              addLegend("topright", pal=pal, values= ~Program, opacity =1)
m

In [None]:
state_input = "OR"  
metric = "D50" # BFWDRatio' 'BFHeight' 'WetWidth' 'WetWidthToDepth' 'RPD' 'PctPool' 'Sin' 'PctDry' 'Beaver' 'StreamOrder' 'D50' 'PctFines2' 'PctFines6' 'PoolTailFines2' 'PoolTailFines6' 'LWDFreq' 'LWDVol' 'OEratio' 'MMI' 'BeaverPresent'
stream_power  = "BFWidth" #'BFWidth' 'Grad' 

data_by_state = data %>% 
      filter(State==state_input)%>%
      select(matches(metric, stream_power, "Program")
          

In [None]:
data_by_state <-  
ggplot(data_by_state, aes(x=data_by_state[,1], y=data_by_state[,2], color=Program))+geom_point()
    plot(data_by_state)
