<a href="https://colab.research.google.com/github/samsoe/mpg_notebooks/blob/master/gridVeg_plant_functional_groups.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Security

* The user must load a `json` file containing the BigQuery API key into the local directory `/content/...`
* The user must have a Google Maps API key to enable mapping. 
   * CAUTION make sure the key is deleted from the current instance of the notebook before sharing

# Tools

In [None]:
library(tidyverse)

* Remember that the file containing authorization keys for Big Query must be loaded into the virutual envrionment manually.

In [None]:
install.packages("bigrquery")
library(bigrquery)

# Source

## Database Connection

In [None]:
# BigQuery API Key
bq_auth(path = "/content/mpg-data-warehouse-api_key-master.json")

In [None]:
Sys.setenv(BIGQUERY_TEST_PROJECT = "mpg-data-warehouse")

In [None]:
billing <- bq_test_project()

### gridVeg_plant_intercepts

In [None]:
sql_plant_intercepts <- 
"
  SELECT 
    survey_ID,
    grid_point,
    key_plant_code,
    plant_native_status,
    plant_life_cycle,
    plant_life_form,
    intercepts_pct
  FROM 
    `mpg-data-warehouse.vegetation_gridVeg_summaries.gridVeg_plant_intercepts`
"

In [None]:
bq_plant_intercepts <- bq_project_query(billing, sql_plant_intercepts)

In [None]:
tb_plant_intercepts <- bq_table_download(bq_plant_intercepts)

In [None]:
df_plant_intercepts <- as.data.frame(tb_plant_intercepts)

In [None]:
head(df_plant_intercepts, n=4)

Unnamed: 0_level_0,survey_ID,grid_point,key_plant_code,plant_native_status,plant_life_cycle,plant_life_form,intercepts_pct
Unnamed: 0_level_1,<chr>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>
1,436,1,HEUCYL,native,perennial,forb,2.5
2,436,1,ALLCER,native,perennial,forb,0.5
3,436,1,GEUTRI,native,perennial,forb,1.0
4,436,1,ERIG_SP,native,unknown,forb,0.5


In [None]:
df_plant_intercepts %>%
  distinct(survey_ID) %>%
  count()

n
<int>
1244


# Wrangle

Start with the view vegetation_gridVeg_summaries:gridVeg_plant_intercepts. Remove records where key_plant_code = “NV” (corresponds with key_plant_species = 360).

In [None]:
df_plant_functional_groups <- df_plant_intercepts %>%
  filter(key_plant_code != "NV")

In [None]:
df_plant_intercepts %>%
  filter(key_plant_code == "NV")

In [None]:
# check unique survey IDs
df_plant_functional_groups %>%
  distinct(survey_ID) %>%
  count()

n
<int>
1242


Recode the levels of plant_life_cycle to simplify them (re-coded values are supplied in the Readme).


In [None]:
df_plant_functional_groups <- df_plant_functional_groups %>%
  mutate(plant_life_cycle = ifelse(plant_life_cycle == "biennial perennial" |
                                   plant_life_cycle == "annual perennial" |
                                   plant_life_cycle == "annual biennial perennial" |
                                   plant_life_cycle == "annual biennial"
                                   , "multiple", plant_life_cycle))

Group the data on {survey_ID, plant_native_status, plant_life_cycle, plant_life_form}. Sum the intercepts_pct within each group, and divide the sums by 4 to obtain the detection_rate per 100 intercepts possible. 

In [None]:
df_plant_functional_groups %>%
  group_by(survey_ID, plant_native_status, plant_life_cycle, plant_life_form) %>%
  summarise(intercepts_pct_sum = sum(intercepts_pct), detection_rate = sum(intercepts_pct)/4) %>%
  filter(survey_ID == "436")

`summarise()` regrouping output by 'survey_ID', 'plant_native_status', 'plant_life_cycle' (override with `.groups` argument)



survey_ID,plant_native_status,plant_life_cycle,plant_life_form,intercepts_pct_sum,detection_rate
<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>
436,native,annual,forb,3.5,0.875
436,native,multiple,forb,1.0,0.25
436,native,perennial,forb,8.5,2.125
436,native,perennial,graminoid,44.5,11.125
436,native,perennial,shrub,10.5,2.625
436,native,perennial,tree,2.0,0.5
436,native,unknown,forb,0.5,0.125
436,nonnative,annual,forb,1.5,0.375
436,nonnative,annual,graminoid,5.5,1.375
436,nonnative,multiple,forb,1.0,0.25


In [None]:
df_plant_functional_groups <- df_plant_functional_groups %>%
  group_by(survey_ID, plant_native_status, plant_life_cycle, plant_life_form) %>%
  summarise(intercepts_pct_sum = sum(intercepts_pct), detection_rate = sum(intercepts_pct)/4)

In [None]:
df_plant_functional_groups %>% glimpse()

Rows: 9,015
Columns: 6
Groups: survey_ID, plant_native_status, plant_life_cycle [5,560]
$ survey_ID           [3m[90m<chr>[39m[23m "012C5FAD-2451-41B0-9E2F-432D1ECEB55C", "012C5FAD…
$ plant_native_status [3m[90m<chr>[39m[23m "native", "native", "native", "native", "native",…
$ plant_life_cycle    [3m[90m<chr>[39m[23m "annual", "multiple", "perennial", "perennial", "…
$ plant_life_form     [3m[90m<chr>[39m[23m "forb", "forb", "forb", "graminoid", "shrub", "fo…
$ intercepts_pct_sum  [3m[90m<dbl>[39m[23m 6.5, 1.0, 14.5, 5.0, 8.5, 5.0, 46.0, 15.0, 25.0, …
$ detection_rate      [3m[90m<dbl>[39m[23m 1.625, 0.250, 3.625, 1.250, 2.125, 1.250, 11.500,…


Then, make sure all combinations of functional groups that are found in the data are represented in each survey_ID. For those groups which were not detected at a survey_ID, fill the detection_rate with 0. The following tables show the categories of functional groups and the full list of all possible combinations.

### Template Survey

In [None]:
survey_ID <- c('xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx', 'xxx-xxx')
grid_point <- c(01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234, 01234)
key_plant_code <- c('NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA')
plant_native_status <- c('native', 'native', 'native', 'native', 'native', 'native', 'native', 'native', 'native', 'native', 'native', 
                         'nonnative', 'nonnative', 'nonnative', 'nonnative', 'nonnative', 'nonnative', 'nonnative', 'nonnative', 'nonnative', 'nonnative', 'nonnative', 
                         'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown')
plant_life_cycle <- c('annual', 'annual', 'biennial', 'multiple', 'multiple', 'perennial', 'perennial', 'perennial', 'perennial', 'perennial', 'unknown', 'annual', 'annual', 'biennial', 'multiple', 'perennial', 'perennial', 'perennial', 'perennial', 'perennial', 'unknown', 'unknown', 'annual', 'perennial', 'perennial', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown')
plant_life_form <- c('forb', 
                     'graminoid', 
                     'forb', 
                     'forb', 
                     'graminoid', 
                     'forb', 
                     'graminoid', 
                     'shrub',
                     'tree',
                     'vine',
                     'forb',
                     'forb',
                     'graminoid',
                     'forb',
                     'forb',
                     'forb',
                     'graminoid',
                     'shrub',
                     'tree',
                     'vine',
                     'forb',
                     'graminoid',
                     'forb',
                     'forb',
                     'graminoid',
                     'forb',
                     'graminoid',
                     'shrub',
                     'tree',
                     'unknown')

template_survey <- data.frame(survey_ID, grid_point, plant_native_status, plant_life_cycle, plant_life_form)
template_survey$intercepts_pct <- 0
template_survey$detection_rate <- 0

In [None]:
template_survey

survey_ID,grid_point,plant_native_status,plant_life_cycle,plant_life_form,intercepts_pct,detection_rate
<fct>,<dbl>,<fct>,<fct>,<fct>,<dbl>,<dbl>
xxx-xxx,1234,native,annual,forb,0,0
xxx-xxx,1234,native,annual,graminoid,0,0
xxx-xxx,1234,native,biennial,forb,0,0
xxx-xxx,1234,native,multiple,forb,0,0
xxx-xxx,1234,native,multiple,graminoid,0,0
xxx-xxx,1234,native,perennial,forb,0,0
xxx-xxx,1234,native,perennial,graminoid,0,0
xxx-xxx,1234,native,perennial,shrub,0,0
xxx-xxx,1234,native,perennial,tree,0,0
xxx-xxx,1234,native,perennial,vine,0,0


## Group Fill

In [None]:
df_plant_functional_groups <- df_plant_functional_groups %>%
  ungroup() %>%
  complete(survey_ID, 
           nesting(plant_native_status, plant_life_cycle, plant_life_form), 
           fill = list(intercepts_pct = 0, detection_rate = 0))

In [None]:
# create reference matrix to refresh grid_points
df_grid_point_ref <- df_plant_intercepts %>%
  select(survey_ID, grid_point) %>%
  distinct(survey_ID, grid_point)

In [None]:
# df_plant_functional_groups %>%
#   select(plant_native_status, plant_life_cycle, plant_life_form) %>%
#   distinct(plant_native_status, plant_life_cycle, plant_life_form)

In [None]:
df_plant_functional_groups <- df_plant_functional_groups %>%
  select(!grid_point) %>%
  left_join(df_grid_point_ref)

Note: Using an external vector in selections is ambiguous.
[34mℹ[39m Use `all_of(grid_point)` instead of `grid_point` to silence this message.
[34mℹ[39m See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
[90mThis message is displayed once per session.[39m



ERROR: ignored

In [None]:
df_plant_functional_groups %>%
  select(plant_native_status, plant_life_cycle, plant_life_form) %>%
  distinct(plant_native_status, plant_life_cycle, plant_life_form) %>%
  count()

n
<int>
25


In [None]:
df_plant_functional_groups %>%
  filter(survey_ID == "833") %>%
  select(survey_ID, plant_native_status, plant_life_cycle, plant_life_form) %>%
  distinct(survey_ID, plant_native_status, plant_life_cycle, plant_life_form) %>%
  count()

n
<int>
25


In [None]:
df_plant_functional_groups %>%
  distinct(survey_ID) %>%
  count()

n
<int>
1242


ultimately multiply number of functional groups by survey should equal rows