<a href="https://colab.research.google.com/github/samsoe/mpg_notebooks/blob/master/yvp_species_richness_WRANGLE_2020.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Documentation

[Readme fixed plot vegetation data](https://docs.google.com/document/d/16-Aq8u9Rudd78fSzfjvpCXyQgE-BstC-d2PjYfmLtcw/edit?usp=sharing)

# Security

* The user must load a `json` file containing the BigQuery API key into the local directory `/content/...`
* The user must have a Google Maps API key to enable mapping. 
   * CAUTION make sure the key is deleted from the current instance of the notebook before sharing

# Tools

In [1]:
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.4     [32m✔[39m [34mdplyr  [39m 1.0.2
[32m✔[39m [34mtidyr  [39m 1.1.2     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.4.0     [32m✔[39m [34mforcats[39m 0.5.0

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



* Remember that the file containing authorization keys for Big Query must be loaded into the virutual envrionment manually.

In [2]:
install.packages("bigrquery")
library(bigrquery)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependencies ‘bit’, ‘bit64’, ‘gargle’, ‘rapidjsonr’




# Source

In this view of the yvp data, species from the cover-based and additional species summaries will be vertically combined for each grid point. Since the additional species summary records species presence only, this view will be limited entirely to species presence. The result will be the plant species richness for each grid point, and this is useful for comparing plant communities, finding the locations of rarer species, or identifying grid points where non-native species are just getting established. After these data are processed, we will want to retain knowledge of whether a species was detected during point-intercept of additional species surveys so that we can evaluate the potential rarity of a given species. The new variable detection_type will allow us to do this.

## Database Connection

In [3]:
# BigQuery API Key
bq_auth(path = "/content/mpg-data-warehouse-api_key-master.json")

In [4]:
Sys.setenv(BIGQUERY_TEST_PROJECT = "mpg-data-warehouse")

In [5]:
billing <- bq_test_project()

### yvp_vegetation_cover

In [28]:
sql_vegetation_cover <- 
"
SELECT
  CONCAT(plot_code, \" \", date) AS survey_code,
  plot_code,
  SUBSTR(SAFE_CAST(date AS STRING), 0, 4) AS year,
  plot_loc,
  plot_rep,
  grid_point,
  (\"cover_est\") AS detection_type,
  key_plant_species
FROM
  `mpg-data-warehouse.vegetation_fixed_plot_yvp.yvp_vegetation_cover`
WHERE
  cover_pct != 0
"

In [29]:
bq_vegetation_cover <- bq_project_query(billing, sql_vegetation_cover)

In [30]:
tb_vegetation_cover <- bq_table_download(bq_vegetation_cover)

In [31]:
df_vegetation_cover <- as.data.frame(tb_vegetation_cover)

In [32]:
df_vegetation_cover %>% glimpse() 

Rows: 27,281
Columns: 8
$ survey_code       [3m[90m<chr>[39m[23m "YVP N348 2017-07-18", "YVP 10 2017-06-09", "YVP 10…
$ plot_code         [3m[90m<chr>[39m[23m "YVP N348", "YVP 10", "YVP 10", "YVP 10", "YVP 10",…
$ year              [3m[90m<chr>[39m[23m "2017", "2017", "2019", "2018", "2018", "2018", "20…
$ plot_loc          [3m[90m<chr>[39m[23m "N", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ plot_rep          [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ grid_point        [3m[90m<int>[39m[23m 348, 10, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12…
$ detection_type    [3m[90m<chr>[39m[23m "cover_est", "cover_est", "cover_est", "cover_est",…
$ key_plant_species [3m[90m<int>[39m[23m 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …


### yvp_additional_species

In [33]:
sql_additional_species <- "
SELECT 
  CONCAT(plot_code, \" \", date) AS survey_code,
  plot_code,
  SUBSTR(SAFE_CAST(date AS STRING), 0, 4) AS year,
  plot_loc,
  plot_rep,
  grid_point,
  (\"supplemental_obs\") AS detection_type,
  key_plant_species
FROM
  `mpg-data-warehouse.vegetation_fixed_plot_yvp.yvp_additional_species`
"

In [34]:
bq_additional_species <- bq_project_query(billing, sql_additional_species)

In [35]:
tb_additional_species <- bq_table_download(bq_additional_species)

In [36]:
df_additional_species <- as.data.frame(tb_additional_species)

In [37]:
df_additional_species %>% glimpse()

Rows: 1,990
Columns: 8
$ survey_code       [3m[90m<chr>[39m[23m "YVP 10 2020-06-27", "YVP 10 2020-06-27", "YVP 10 2…
$ plot_code         [3m[90m<chr>[39m[23m "YVP 10", "YVP 10", "YVP 10", "YVP 10", "YVP 12", "…
$ year              [3m[90m<chr>[39m[23m "2020", "2020", "2020", "2020", "2020", "2020", "20…
$ plot_loc          [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ plot_rep          [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ grid_point        [3m[90m<int>[39m[23m 10, 10, 10, 10, 12, 12, 12, 12, 12, 20, 20, 20, 20,…
$ detection_type    [3m[90m<chr>[39m[23m "supplemental_obs", "supplemental_obs", "supplement…
$ key_plant_species [3m[90m<int>[39m[23m 169, 163, 72, 16, 365, 63, 212, 250, 220, 315, 163,…


### location_position_classification

In [38]:
sql_position_classification <- "
SELECT
  grid_point,
  aspect_mean_deg,
  elevation_mean_m,
  slope_mean_deg,
  cover_type_2016_gridVeg,
  type3_vegetation_indicators,
  type4_indicators_history
FROM
  `mpg-data-warehouse.grid_point_summaries.location_position_classification`
"

In [39]:
bq_position_classification <- bq_project_query(billing, sql_position_classification)

In [40]:
tb_position_classification <- bq_table_download(bq_position_classification)

In [41]:
df_position_classification <- as.data.frame(tb_position_classification)

In [42]:
df_position_classification %>% glimpse()

Rows: 582
Columns: 7
$ grid_point                  [3m[90m<int>[39m[23m 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13…
$ aspect_mean_deg             [3m[90m<dbl>[39m[23m 334.7050, 45.3030, 221.3340, 290.4890, 28…
$ elevation_mean_m            [3m[90m<dbl>[39m[23m 1395.64, 1456.09, 1126.90, 1166.33, 1179.…
$ slope_mean_deg              [3m[90m<dbl>[39m[23m 28.44230, 12.22630, 4.25130, 2.68361, 4.2…
$ cover_type_2016_gridVeg     [3m[90m<chr>[39m[23m "woodland/forest", "non-irrigated grassla…
$ type3_vegetation_indicators [3m[90m<chr>[39m[23m "mixed canopy conifer", "uncultivated gra…
$ type4_indicators_history    [3m[90m<chr>[39m[23m "mixed canopy conifer", "uncultivated gra…


### vegetation_species_metadata

In [43]:
sql_species_metadata <- "
SELECT
  key_plant_species,
  key_plant_code,
  plant_name_sci,
  plant_name_common,
  plant_native_status,
  plant_life_cycle,
  plant_life_form
FROM
  `mpg-data-warehouse.vegetation_species_metadata.vegetation_species_metadata`"

In [44]:
bq_species_metadata <- bq_project_query(billing, sql_species_metadata)

In [45]:
tb_species_metadata <- bq_table_download(bq_species_metadata)

In [46]:
df_species_metadata <- as.data.frame(tb_species_metadata)

In [47]:
df_species_metadata %>% glimpse()

Rows: 754
Columns: 7
$ key_plant_species   [3m[90m<int>[39m[23m 360, 13, 26, 53, 738, 75, 76, 746, 83, 88, 86, 87…
$ key_plant_code      [3m[90m<chr>[39m[23m "NV", "AGRSCA", "ANDGER", "ARIPUR", "BOUCUR", "BO…
$ plant_name_sci      [3m[90m<chr>[39m[23m "no vegetation", "Agrostis scabra", "Andropogon g…
$ plant_name_common   [3m[90m<chr>[39m[23m "no vegetation", "rough bentgrass", "big bluestem…
$ plant_native_status [3m[90m<chr>[39m[23m "none", "native", "native", "native", "native", "…
$ plant_life_cycle    [3m[90m<chr>[39m[23m "unknown", "perennial", "perennial", "perennial",…
$ plant_life_form     [3m[90m<chr>[39m[23m "none", "graminoid", "graminoid", "graminoid", "g…


# Wrangle

With the data from yvp_vegetation_cover, species lists must first be summarized as distinct key_plant_species values within survey_code values. This is because the raw data are estimated in 10 subplots per transect, and species names will often be redundant among subplots. Then the yvp_vegetation_cover data can be vertically bound to the yvp_additional_species data after some light coercion of field names.

One caution with these data. According to protocol, a plant species is only included in the additional species table if it was not found during cover-based surveys. In practice, I assume that this is routinely violated because it isn’t easy to remember all the species surveyed, nor is it efficient to check time after time. It’s important that we eliminate duplicate species for a given grid point. When duplication exists, default to detection_type = “cover_est”. This will prevent upward bias of richness estimates and will make downstream analyses less complicated. Some operation that again summarizes distinct key_plant_species values within survey_code values will be necessary. For additional information on this point, please see instructions for a similar operation with the point-intercept data in the gridVeg [Readme](https://docs.google.com/document/d/1JWnhxNjeSQZkSnGhtHP68i_l1mDj4vPFMBdUvGqN0TA/edit#heading=h.hnb7ex8jlp42).


## Remove duplicates

### yvp_vegetation_cover

In [48]:
df_vegetation_cover %>% glimpse()

Rows: 27,281
Columns: 8
$ survey_code       [3m[90m<chr>[39m[23m "YVP N348 2017-07-18", "YVP 10 2017-06-09", "YVP 10…
$ plot_code         [3m[90m<chr>[39m[23m "YVP N348", "YVP 10", "YVP 10", "YVP 10", "YVP 10",…
$ year              [3m[90m<chr>[39m[23m "2017", "2017", "2019", "2018", "2018", "2018", "20…
$ plot_loc          [3m[90m<chr>[39m[23m "N", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ plot_rep          [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ grid_point        [3m[90m<int>[39m[23m 348, 10, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12…
$ detection_type    [3m[90m<chr>[39m[23m "cover_est", "cover_est", "cover_est", "cover_est",…
$ key_plant_species [3m[90m<int>[39m[23m 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …


In [51]:
df_vegetation_cover %>%
  group_by(survey_code, key_plant_species) %>%
  arrange(survey_code, key_plant_species)

survey_code,plot_code,year,plot_loc,plot_rep,grid_point,detection_type,key_plant_species
<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<int>
YVP 10 2017-06-09,YVP 10,2017,,,10,cover_est,5
YVP 10 2017-06-09,YVP 10,2017,,,10,cover_est,37
YVP 10 2017-06-09,YVP 10,2017,,,10,cover_est,37
YVP 10 2017-06-09,YVP 10,2017,,,10,cover_est,39
YVP 10 2017-06-09,YVP 10,2017,,,10,cover_est,39
YVP 10 2017-06-09,YVP 10,2017,,,10,cover_est,51
YVP 10 2017-06-09,YVP 10,2017,,,10,cover_est,82
YVP 10 2017-06-09,YVP 10,2017,,,10,cover_est,84
YVP 10 2017-06-09,YVP 10,2017,,,10,cover_est,90
YVP 10 2017-06-09,YVP 10,2017,,,10,cover_est,90


In [53]:
# remove duplicate species records per survey
df_vegetation_cover <- df_vegetation_cover %>%
  group_by(survey_code) %>%
  distinct() %>%
  arrange(desc(survey_code), key_plant_species) %>% glimpse()

Rows: 6,622
Columns: 8
Groups: survey_code [233]
$ survey_code       [3m[90m<chr>[39m[23m "YVP NC294 2020-05-09", "YVP NC294 2020-05-09", "YV…
$ plot_code         [3m[90m<chr>[39m[23m "YVP NC294", "YVP NC294", "YVP NC294", "YVP NC294",…
$ year              [3m[90m<chr>[39m[23m "2020", "2020", "2020", "2020", "2020", "2020", "20…
$ plot_loc          [3m[90m<chr>[39m[23m "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "…
$ plot_rep          [3m[90m<chr>[39m[23m "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "…
$ grid_point        [3m[90m<int>[39m[23m 294, 294, 294, 294, 294, 294, 294, 294, 294, 294, 2…
$ detection_type    [3m[90m<chr>[39m[23m "cover_est", "cover_est", "cover_est", "cover_est",…
$ key_plant_species [3m[90m<int>[39m[23m 5, 20, 52, 57, 72, 74, 82, 174, 183, 187, 220, 230,…


### yvp_additional_species

In [54]:
df_additional_species %>% glimpse()

Rows: 1,990
Columns: 8
$ survey_code       [3m[90m<chr>[39m[23m "YVP 10 2020-06-27", "YVP 10 2020-06-27", "YVP 10 2…
$ plot_code         [3m[90m<chr>[39m[23m "YVP 10", "YVP 10", "YVP 10", "YVP 10", "YVP 12", "…
$ year              [3m[90m<chr>[39m[23m "2020", "2020", "2020", "2020", "2020", "2020", "20…
$ plot_loc          [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ plot_rep          [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ grid_point        [3m[90m<int>[39m[23m 10, 10, 10, 10, 12, 12, 12, 12, 12, 20, 20, 20, 20,…
$ detection_type    [3m[90m<chr>[39m[23m "supplemental_obs", "supplemental_obs", "supplement…
$ key_plant_species [3m[90m<int>[39m[23m 169, 163, 72, 16, 365, 63, 212, 250, 220, 315, 163,…


In [56]:
# remove duplicates per survey
df_additional_species <- df_additional_species %>%
  group_by(survey_code) %>%
  distinct() %>%
  arrange(desc(survey_code), key_plant_species) %>% glimpse()

Rows: 1,977
Columns: 8
Groups: survey_code [236]
$ survey_code       [3m[90m<chr>[39m[23m "YVP NC294 2020-05-09", "YVP NC294 2020-05-09", "YV…
$ plot_code         [3m[90m<chr>[39m[23m "YVP NC294", "YVP NC294", "YVP NC294", "YVP NC294",…
$ year              [3m[90m<chr>[39m[23m "2020", "2020", "2020", "2020", "2020", "2020", "20…
$ plot_loc          [3m[90m<chr>[39m[23m "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "…
$ plot_rep          [3m[90m<chr>[39m[23m "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "…
$ grid_point        [3m[90m<int>[39m[23m 294, 294, 294, 294, 294, 294, 294, 294, 294, 294, 2…
$ detection_type    [3m[90m<chr>[39m[23m "supplemental_obs", "supplemental_obs", "supplement…
$ key_plant_species [3m[90m<int>[39m[23m 31, 36, 84, 178, 183, 216, 316, 342, 362, 484, 31, …


## Combine dataframes

In [57]:
species_richness <- union(df_vegetation_cover, df_additional_species) %>% glimpse()

Rows: 8,599
Columns: 8
Groups: survey_code [245]
$ survey_code       [3m[90m<chr>[39m[23m "YVP NC294 2020-05-09", "YVP NC294 2020-05-09", "YV…
$ plot_code         [3m[90m<chr>[39m[23m "YVP NC294", "YVP NC294", "YVP NC294", "YVP NC294",…
$ year              [3m[90m<chr>[39m[23m "2020", "2020", "2020", "2020", "2020", "2020", "20…
$ plot_loc          [3m[90m<chr>[39m[23m "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "…
$ plot_rep          [3m[90m<chr>[39m[23m "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "…
$ grid_point        [3m[90m<int>[39m[23m 294, 294, 294, 294, 294, 294, 294, 294, 294, 294, 2…
$ detection_type    [3m[90m<chr>[39m[23m "cover_est", "cover_est", "cover_est", "cover_est",…
$ key_plant_species [3m[90m<int>[39m[23m 5, 20, 52, 57, 72, 74, 82, 174, 183, 187, 220, 230,…


In [58]:
# look for duplicates
duplicate_detections <- species_richness %>%
  group_by(survey_code) %>%
  count(key_plant_species) %>%
  filter(n > 1) %>%
  arrange(survey_code, key_plant_species) %>% glimpse()

Rows: 292
Columns: 3
Groups: survey_code [115]
$ survey_code       [3m[90m<chr>[39m[23m "YVP 10 2018-07-12", "YVP 10 2018-07-12", "YVP 10 2…
$ key_plant_species [3m[90m<int>[39m[23m 16, 433, 16, 220, 433, 16, 163, 169, 212, 433, 39, …
$ n                 [3m[90m<int>[39m[23m 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …


## Remove redundant detections

In [59]:
duplicate_detections <- duplicate_detections %>%
  mutate(mask = paste(survey_code, key_plant_species)) %>%
  select(!n) %>%
  ungroup() %>% glimpse()

Rows: 292
Columns: 3
$ survey_code       [3m[90m<chr>[39m[23m "YVP 10 2018-07-12", "YVP 10 2018-07-12", "YVP 10 2…
$ key_plant_species [3m[90m<int>[39m[23m 16, 433, 16, 220, 433, 16, 163, 169, 212, 433, 39, …
$ mask              [3m[90m<chr>[39m[23m "YVP 10 2018-07-12 16", "YVP 10 2018-07-12 433", "Y…


In [60]:
# subset surveys with no redundancy
species_richness_x <- species_richness %>%
  ungroup() %>%
  mutate(mask = paste(survey_code, key_plant_species)) %>%
  filter(!mask %in% duplicate_detections$mask) %>%
  arrange(survey_code, key_plant_species) %>% glimpse()

Rows: 8,015
Columns: 9
$ survey_code       [3m[90m<chr>[39m[23m "YVP 10 2017-06-09", "YVP 10 2017-06-09", "YVP 10 2…
$ plot_code         [3m[90m<chr>[39m[23m "YVP 10", "YVP 10", "YVP 10", "YVP 10", "YVP 10", "…
$ year              [3m[90m<chr>[39m[23m "2017", "2017", "2017", "2017", "2017", "2017", "20…
$ plot_loc          [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ plot_rep          [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ grid_point        [3m[90m<int>[39m[23m 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
$ detection_type    [3m[90m<chr>[39m[23m "cover_est", "cover_est", "cover_est", "cover_est",…
$ key_plant_species [3m[90m<int>[39m[23m 5, 37, 39, 51, 72, 82, 84, 90, 153, 163, 169, 187, …
$ mask              [3m[90m<chr>[39m[23m "YVP 10 2017-06-09 5", "YVP 10 2017-06-09 37", "YVP…


In [61]:
# subset surveys with redundancy
species_richness_y <- species_richness %>%
  ungroup() %>%
  mutate(mask = paste(survey_code, key_plant_species)) %>%
  filter(mask %in% duplicate_detections$mask, detection_type != "supplemental_obs") %>%
  arrange(survey_code, key_plant_species) %>% glimpse()

Rows: 289
Columns: 9
$ survey_code       [3m[90m<chr>[39m[23m "YVP 10 2018-07-12", "YVP 10 2018-07-12", "YVP 10 2…
$ plot_code         [3m[90m<chr>[39m[23m "YVP 10", "YVP 10", "YVP 10", "YVP 10", "YVP 10", "…
$ year              [3m[90m<chr>[39m[23m "2018", "2018", "2019", "2019", "2019", "2020", "20…
$ plot_loc          [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ plot_rep          [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ grid_point        [3m[90m<int>[39m[23m 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 112, 112, 1…
$ detection_type    [3m[90m<chr>[39m[23m "cover_est", "cover_est", "cover_est", "cover_est",…
$ key_plant_species [3m[90m<int>[39m[23m 16, 433, 16, 220, 433, 16, 163, 169, 212, 433, 39, …
$ mask              [3m[90m<chr>[39m[23m "YVP 10 2018-07-12 16", "YVP 10 2018-07-12 433", "Y…


In [62]:
# union x and y
species_richness <- union(species_richness_x, species_richness_y)

In [63]:
species_richness <- species_richness %>%
  select(!mask) %>%
  group_by(survey_code) %>%
  arrange(survey_code, key_plant_species) %>% glimpse()

Rows: 8,304
Columns: 8
Groups: survey_code [245]
$ survey_code       [3m[90m<chr>[39m[23m "YVP 10 2017-06-09", "YVP 10 2017-06-09", "YVP 10 2…
$ plot_code         [3m[90m<chr>[39m[23m "YVP 10", "YVP 10", "YVP 10", "YVP 10", "YVP 10", "…
$ year              [3m[90m<chr>[39m[23m "2017", "2017", "2017", "2017", "2017", "2017", "20…
$ plot_loc          [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ plot_rep          [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ grid_point        [3m[90m<int>[39m[23m 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
$ detection_type    [3m[90m<chr>[39m[23m "cover_est", "cover_est", "cover_est", "cover_est",…
$ key_plant_species [3m[90m<int>[39m[23m 5, 37, 39, 51, 72, 82, 84, 90, 153, 163, 169, 187, …


# Join 

## location_position_classification

In [65]:
species_richness <- species_richness %>%
  left_join(df_position_classification, by = c("grid_point" = "grid_point"))

## vegetation_species_metadata

In [67]:
species_richness <- species_richness %>%
  left_join(df_species_metadata, by = c("key_plant_species" = "key_plant_species"))

In [68]:
names(species_richness)

In [71]:
# reorder columns
species_richness <- species_richness %>%
  select(survey_code, plot_code, year, plot_loc, plot_rep, grid_point, detection_type, key_plant_species, key_plant_code, plant_name_sci,
         plant_name_common, plant_native_status, plant_life_cycle, plant_life_form)

In [72]:
species_richness %>% glimpse()

Rows: 8,304
Columns: 14
Groups: survey_code [245]
$ survey_code         [3m[90m<chr>[39m[23m "YVP 10 2017-06-09", "YVP 10 2017-06-09", "YVP 10…
$ plot_code           [3m[90m<chr>[39m[23m "YVP 10", "YVP 10", "YVP 10", "YVP 10", "YVP 10",…
$ year                [3m[90m<chr>[39m[23m "2017", "2017", "2017", "2017", "2017", "2017", "…
$ plot_loc            [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ plot_rep            [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ grid_point          [3m[90m<int>[39m[23m 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1…
$ detection_type      [3m[90m<chr>[39m[23m "cover_est", "cover_est", "cover_est", "cover_est…
$ key_plant_species   [3m[90m<int>[39m[23m 5, 37, 39, 51, 72, 82, 84, 90, 153, 163, 169, 187…
$ key_plant_code      [3m[90m<chr>[39m[23m "ACHMIL", "ANTE_SP", "APEINT", "ARECON", "BALSAG"…
$ plant_name_sci      [3m[90m<chr>[39m[23m "Achillea millefolium

In [74]:
# update year datatype to integer
species_richness$year <- as.integer(species_richness$year)

In [76]:
# filter for 2020 only
species_richness %>%
  filter(year == 2020) %>%
  glimpse()

Rows: 2,200
Columns: 14
Groups: survey_code [58]
$ survey_code         [3m[90m<chr>[39m[23m "YVP 10 2020-06-27", "YVP 10 2020-06-27", "YVP 10…
$ plot_code           [3m[90m<chr>[39m[23m "YVP 10", "YVP 10", "YVP 10", "YVP 10", "YVP 10",…
$ year                [3m[90m<int>[39m[23m 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2…
$ plot_loc            [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ plot_rep            [3m[90m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ grid_point          [3m[90m<int>[39m[23m 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1…
$ detection_type      [3m[90m<chr>[39m[23m "cover_est", "cover_est", "cover_est", "cover_est…
$ key_plant_species   [3m[90m<int>[39m[23m 5, 16, 37, 39, 51, 52, 72, 82, 90, 153, 163, 169,…
$ key_plant_code      [3m[90m<chr>[39m[23m "ACHMIL", "ALLCER", "ANTE_SP", "APEINT", "ARECON"…
$ plant_name_sci      [3m[90m<chr>[39m[23m "Achillea millefolium"

In [79]:
summary(species_richness)

 survey_code         plot_code              year        plot_loc        
 Length:8304        Length:8304        Min.   :2017   Length:8304       
 Class :character   Class :character   1st Qu.:2018   Class :character  
 Mode  :character   Mode  :character   Median :2019   Mode  :character  
                                       Mean   :2019                     
                                       3rd Qu.:2020                     
                                       Max.   :2020                     
   plot_rep           grid_point    detection_type     key_plant_species
 Length:8304        Min.   :  7.0   Length:8304        Min.   :  3.0    
 Class :character   1st Qu.:110.0   Class :character   1st Qu.:125.0    
 Mode  :character   Median :209.0   Mode  :character   Median :250.0    
                    Mean   :249.6                      Mean   :266.4    
                    3rd Qu.:395.0                      3rd Qu.:405.0    
                    Max.   :571.0                  

# Output

In [78]:
# Output 2020-11-03 ES
write_csv(species_richness, file = "yvp_species_richness-WRANGLE-2020.csv")