<a href="https://colab.research.google.com/github/samsoe/mpg_notebooks/blob/master/gridVeg_plant_binary_matrix.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Documentation
* [Readme - vegetation point transect survey](https://docs.google.com/document/d/1JWnhxNjeSQZkSnGhtHP68i_l1mDj4vPFMBdUvGqN0TA/edit?usp=sharing): View: gridVeg_plant_binary_matrix

# Security

* The user must load a `json` file containing the BigQuery API key into the local directory `/content/...`
* The user must have a Google Maps API key to enable mapping. 
   * CAUTION make sure the key is deleted from the current instance of the notebook before sharing

# Tools

In [58]:
library(tidyverse)

* Remember that the file containing authorization keys for Big Query must be loaded into the virutual envrionment manually.

In [59]:
install.packages("bigrquery")
library(bigrquery)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



# Source

## Database Connection

In [60]:
# BigQuery API Key
bq_auth(path = "/content/mpg-data-warehouse-api_key-master.json")

In [61]:
Sys.setenv(BIGQUERY_TEST_PROJECT = "mpg-data-warehouse")

In [62]:
billing <- bq_test_project()

### vegetation_point_intercept_gridVeg

In [None]:
# con_point_intercept <- dbConnect(
#   bigrquery::bigquery(),
#   project = "mpg-data-warehouse",
#   dataset = "vegetation_point_intercept_gridVeg",
#   billing = billing
# )

In [None]:
# dbListTables(con_survey_effort)

In [63]:
sql_species_richness <- 
"
  SELECT
    survey_ID,
    grid_point,
    key_plant_code
  FROM
    `mpg-data-warehouse.vegetation_gridVeg_summaries.gridVeg_species_richness`
  GROUP BY
    survey_ID,
    grid_point,
    key_plant_code
  ORDER BY
    grid_point
"

In [64]:
bq_species_richness <- bq_project_query(billing, sql_species_richness)

In [65]:
tb_species_richness <- bq_table_download(bq_species_richness)

In [66]:
df_species_richness <- as.data.frame(tb_species_richness)

In [75]:
head(df_species_richness, n=4)

Unnamed: 0_level_0,survey_ID,grid_point,key_plant_code
Unnamed: 0_level_1,<chr>,<int>,<chr>
1,436,1,ANTE_SP
2,436,1,ARECON
3,436,1,CAMROT
4,436,1,COLLIN


In [68]:
dim(df_species_richness)

In [69]:
df_species_richness %>%
  distinct(key_plant_code) %>%
  count()

n
<int>
553


### gridVeg_survey_metadata

In [70]:
sql_survey_metadata <- 
"
  SELECT
    survey_ID,
    year,
    survey_sequence
  FROM
    `mpg-data-warehouse.vegetation_point_intercept_gridVeg.gridVeg_survey_metadata`
"

In [71]:
bq_survey_metadata <- bq_project_query(billing, sql_survey_metadata)

In [72]:
tb_survey_metadata <- bq_table_download(bq_survey_metadata)

In [73]:
df_survey_metadata <- as.data.frame(tb_survey_metadata)

In [74]:
head(df_survey_metadata, n=4)

Unnamed: 0_level_0,survey_ID,year,survey_sequence
Unnamed: 0_level_1,<chr>,<int>,<chr>
1,F31C56A8-912D-410C-A17D-4C2DD75F71A4,2016,2016
2,A19E87E6-A89C-4993-B550-802226730D54,2016,2016
3,6F1D71D3-9F87-4C93-B179-A12C8938D18D,2016,2016
4,9C67C9F1-1E89-4FD2-ADC0-0390E0022D62,2016,2016


### location_position_classification

In [76]:
sql_position_class <- 
"
  SELECT
    grid_point,
    aspect_mean_deg,
    elevation_mean_m,
    slope_mean_deg,
    cover_type_2016_gridVeg,
    type3_vegetation_indicators,
    type4_indicators_history
  FROM
    `mpg-data-warehouse.grid_point_summaries.location_position_classification`
"

In [77]:
bq_position_class <- bq_project_query(billing, sql_position_class)

In [78]:
tb_position_class <- bq_table_download(bq_position_class)

In [79]:
df_position_class <- as.data.frame(tb_position_class)

In [81]:
head(df_position_class, n=4)

Unnamed: 0_level_0,grid_point,aspect_mean_deg,elevation_mean_m,slope_mean_deg,cover_type_2016_gridVeg,type3_vegetation_indicators,type4_indicators_history
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>
1,1,334.705,1395.64,28.4423,woodland/forest,mixed canopy conifer,mixed canopy conifer
2,2,45.303,1456.09,12.2263,non-irrigated grasslands,uncultivated grassland native or degraded,uncultivated grassland native or degraded
3,3,221.334,1126.9,4.2513,shrubland,uncultivated grassland native or degraded,uncultivated grassland native or degraded
4,4,290.489,1166.33,2.68361,shrubland,uncultivated grassland native or degraded,uncultivated grassland native or degraded


# Wrangle

## species_richness

Producing the view is generally similar to the abundance matrix. Begin with two fields from the gridVeg_species_richness view: survey_ID and key_plant_code. These two form the key-value pair for pivoting the table to a wide, matrix format. The result of this step will be a table with one row for each survey_ID (1244 rows as of 2020) and one column for each plant species (764 are possible but not all are included in the point-intercept data; there were 489 in the abundance matrix and could be more here). Fill cells where species were detected during a particular survey with 1. Because most species are not found at any grid point, most of the cells in this table will contain no data. By convention, these cells are filled with zeroes. 

We use key_plant_code instead of key_plant_species here because now the species names are column headers, and in this form are not relatable to plant_species_metadata. So we use the taxonomic code to make the data more interpretable. A consequence of this is that we cannot update species codes via relationship with plant_species_metadata, so this view must be manually refreshed if any of the species codes are changed.


In [50]:
df_plant_binary_matrix <- df_species_richness %>%
  mutate(selected = 1) %>%
  pivot_wider(names_from = key_plant_code, values_from = selected, values_fill = 0)

## Join 

### gridVeg_survey_metadata

In [52]:
df_plant_binary_matrix <- df_plant_binary_matrix %>%
  left_join(df_survey_metadata)

Joining, by = "survey_ID"



### location_position_classification

In [83]:
df_plant_binary_matrix <- df_plant_binary_matrix %>%
  left_join(df_position_class)

Joining, by = "grid_point"



## Order Columns

In [104]:
df_plant_binary_matrix <- df_plant_binary_matrix[,c(1, 556:557, 2, 558:ncol(df_plant_binary_matrix), 3:555)]

In [105]:
dim(df_plant_binary_matrix)

In [106]:
head(df_plant_binary_matrix, n=4)

survey_ID,year,survey_sequence,grid_point,aspect_mean_deg,elevation_mean_m,slope_mean_deg,cover_type_2016_gridVeg,type3_vegetation_indicators,type4_indicators_history,⋯,GLYGRA,MONFIS,AGOS_SP,MERCIL,ACOCOL,TRICAN,SENSER,VERBA_SP,GAUCOC,LAPOCC
<chr>,<int>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
436,2011,2011-12,1,334.705,1395.64,28.4423,woodland/forest,mixed canopy conifer,mixed canopy conifer,⋯,0,0,0,0,0,0,0,0,0,0
E7CBF688-32B4-4AC1-A888-DE83B00B2302,2016,2016,1,334.705,1395.64,28.4423,woodland/forest,mixed canopy conifer,mixed canopy conifer,⋯,0,0,0,0,0,0,0,0,0,0
437,2011,2011-12,2,45.303,1456.09,12.2263,non-irrigated grasslands,uncultivated grassland native or degraded,uncultivated grassland native or degraded,⋯,0,0,0,0,0,0,0,0,0,0
83C90DAA-88FF-4256-9ACC-4BAC0F9EF456,2016,2016,2,45.303,1456.09,12.2263,non-irrigated grasslands,uncultivated grassland native or degraded,uncultivated grassland native or degraded,⋯,0,0,0,0,0,0,0,0,0,0


# Output

In [107]:
write_csv(df_plant_binary_matrix, path = "gridVeg_plant_binary_matrix-WRANGLE.csv")