<a href="https://colab.research.google.com/github/samsoe/mpg_notebooks/blob/master/gridVeg_plant_functional_groups.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Security

* The user must load a `json` file containing the BigQuery API key into the local directory `/content/...`
* The user must have a Google Maps API key to enable mapping. 
   * CAUTION make sure the key is deleted from the current instance of the notebook before sharing

# Tools

In [79]:
library(tidyverse)

* Remember that the file containing authorization keys for Big Query must be loaded into the virutual envrionment manually.

In [80]:
install.packages("bigrquery")
library(bigrquery)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



# Source

## Database Connection

In [83]:
# BigQuery API Key
bq_auth(path = "/content/mpg-data-warehouse-api_key-master.json")

In [84]:
Sys.setenv(BIGQUERY_TEST_PROJECT = "mpg-data-warehouse")

In [85]:
billing <- bq_test_project()

### gridVeg_plant_intercepts

In [86]:
sql_plant_intercepts <- 
"
  SELECT 
    survey_ID,
    grid_point,
    key_plant_code,
    plant_native_status,
    plant_life_cycle,
    plant_life_form,
    intercepts_pct
  FROM 
    `mpg-data-warehouse.vegetation_gridVeg_summaries.gridVeg_plant_intercepts`
"

In [90]:
bq_plant_intercepts <- bq_project_query(billing, sql_plant_intercepts)

In [91]:
tb_plant_intercepts <- bq_table_download(bq_plant_intercepts)

In [92]:
df_plant_intercepts <- as.data.frame(tb_plant_intercepts)

In [93]:
head(df_plant_intercepts, n=4)

Unnamed: 0_level_0,survey_ID,grid_point,key_plant_code,plant_native_status,plant_life_cycle,plant_life_form,intercepts_pct
Unnamed: 0_level_1,<chr>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>
1,436,1,HEUCYL,native,perennial,forb,2.5
2,436,1,ALLCER,native,perennial,forb,0.5
3,436,1,GEUTRI,native,perennial,forb,1.0
4,436,1,ERIG_SP,native,unknown,forb,0.5


# Wrangle

Start with the view vegetation_gridVeg_summaries:gridVeg_plant_intercepts. Remove records where key_plant_code = “NV” (corresponds with key_plant_species = 360).

In [143]:
df_plant_functional_groups <- df_plant_intercepts %>%
  filter(key_plant_code != "NV")

Recode the levels of plant_life_cycle to simplify them (re-coded values are supplied in the Readme).


In [144]:
df_plant_functional_groups <- df_plant_functional_groups %>%
  mutate(plant_life_cycle = ifelse(plant_life_cycle == "biennial perennial" |
                                   plant_life_cycle == "annual perennial" |
                                   plant_life_cycle == "annual biennial perennial" |
                                   plant_life_cycle == "annual biennial"
                                   , "multiple", plant_life_cycle))

Group the data on {survey_ID, plant_native_status, plant_life_cycle, plant_life_form}. Sum the intercepts_pct within each group, and divide the sums by 4 to obtain the detection_rate per 100 intercepts possible. 

In [145]:
df_plant_functional_groups %>%
  group_by(survey_ID, plant_native_status, plant_life_cycle, plant_life_form) %>%
  mutate(detection_rate = sum(intercepts_pct)/4) %>%
  head()

survey_ID,grid_point,key_plant_code,plant_native_status,plant_life_cycle,plant_life_form,intercepts_pct,detection_rate
<chr>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>
436,1,HEUCYL,native,perennial,forb,2.5,2.125
436,1,ALLCER,native,perennial,forb,0.5,2.125
436,1,GEUTRI,native,perennial,forb,1.0,2.125
436,1,ERIG_SP,native,unknown,forb,0.5,0.125
436,1,ARESER,nonnative,annual,forb,1.0,0.375
436,1,GALAPA,native,annual,forb,1.5,0.875


In [146]:
df_plant_functional_groups <- df_plant_functional_groups %>%
  group_by(survey_ID, plant_native_status, plant_life_cycle, plant_life_form) %>%
  mutate(detection_rate = sum(intercepts_pct)/4)

Then, make sure all combinations of functional groups that are found in the data are represented in each survey_ID. For those groups which were not detected at a survey_ID, fill the detection_rate with 0. The following tables show the categories of functional groups and the full list of all possible combinations.

In [147]:
df_plant_functional_groups <- df_plant_functional_groups %>%
  ungroup() %>%
  complete(survey_ID, 
           nesting(plant_native_status, plant_life_cycle, plant_life_form), 
           fill = list(intercepts_pct = 0, detection_rate = 0))

In [158]:
# create reference matrix to refresh grid_points
df_grid_point_ref <- df_plant_intercepts %>%
  select(survey_ID, grid_point) %>%
  distinct(survey_ID, grid_point)

In [142]:
# remove incomplete grid_point and rejoin
df_plant_functional_groups <- df_plant_functional_groups %>%
  select(!grid_point) %>%
  inner_join(df_grid_point_ref)

Joining, by = "survey_ID"



In [161]:
df_plant_functional_groups <- df_plant_functional_groups %>%
  select(!grid_point) %>%
  left_join(df_grid_point_ref)

Joining, by = "survey_ID"



In [197]:
df_plant_functional_groups %>%
  select(plant_native_status, plant_life_cycle, plant_life_form) %>%
  distinct(plant_native_status, plant_life_cycle, plant_life_form) %>%
  count()

n
<int>
25


In [196]:
df_plant_functional_groups %>%
  filter(survey_ID == "833") %>%
  select(survey_ID, plant_native_status, plant_life_cycle, plant_life_form) %>%
  distinct(survey_ID, plant_native_status, plant_life_cycle, plant_life_form) %>%
  count()

n
<int>
25


In [192]:
df_plant_functional_groups %>%
  distinct(survey_ID) %>%
  count()

n
<int>
1242
