<a href="https://colab.research.google.com/github/samsoe/mpg_notebooks/blob/master/gridVeg_groundCover_abundance_matrix_WRANGLE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Documentation

This table will structure the point-intercept data as a species by samples matrix that is useful for multivariate analysis and visualization. In this case, the species are ground cover types, one per column, and the samples are survey IDs. The survey IDs join to various survey metadata and allow the analyst to subset or cluster the response data to facilitate various analyses.

See the appropriate section in the [Readme](https://docs.google.com/document/d/1JWnhxNjeSQZkSnGhtHP68i_l1mDj4vPFMBdUvGqN0TA/edit#heading=h.b1khpgg2so3y) for more information.

# Tools

In [None]:
# Package and library installation
packages_needed = c("tidyverse", "rjson", "knitr") # comma delimited vector of package names
packages_installed = packages_needed %in% rownames(installed.packages())

if (any(! packages_installed))
  install.packages(packages_needed[! packages_installed])
for (i in 1:length(packages_needed)) {
  library(packages_needed[i], character.only = T)
}

In [2]:
# Package and library installation
packages_needed = c("bigrquery") # comma delimited vector of package names
packages_installed = packages_needed %in% rownames(installed.packages())

if (any(! packages_installed))
  install.packages(packages_needed[! packages_installed])
for (i in 1:length(packages_needed)) {
  library(packages_needed[i], character.only = T)
}

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependencies ‘bit’, ‘bit64’, ‘gargle’, ‘rapidjsonr’




# Source

## Database Connection

In [3]:
# BigQuery API Key
bq_auth(path = "/content/mpg-data-warehouse-api_key-master.json")

In [4]:
Sys.setenv(BIGQUERY_TEST_PROJECT = "mpg-data-warehouse")

In [5]:
billing <- bq_test_project()

## Survey Data: Ground Cover

In [6]:
con_ground <- dbConnect(
  bigrquery::bigquery(),
  project = "mpg-data-warehouse",
  dataset = "vegetation_gridVeg_summaries",
  billing = billing
)

In [7]:
dbListTables(con_ground)

In [8]:
sql_ground <- 
  "
  SELECT * 
  FROM `mpg-data-warehouse.vegetation_gridVeg_summaries.gridVeg_groundCover_intercepts`
  "
bq_ground <- bq_project_query(billing, sql_ground)
tb_ground <- bq_table_download(bq_ground)
df_ground <- as.data.frame(tb_ground) %>% arrange(year, grid_point, intercept_ground_code) %>% glimpse()

Rows: 23,636
Columns: 7
$ survey_ID             [3m[90m<chr>[39m[23m "436", "436", "436", "436", "436", "436", "436"…
$ year                  [3m[90m<int>[39m[23m 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…
$ survey_sequence       [3m[90m<chr>[39m[23m "2011-12", "2011-12", "2011-12", "2011-12", "20…
$ grid_point            [3m[90m<int>[39m[23m 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ intercept_ground_code [3m[90m<chr>[39m[23m "BG", "BV", "G", "L", "LIC", "M", "M/L", "OTHER…
$ ground_group          [3m[90m<chr>[39m[23m "inorganic", "vas_plant", "inorganic", "litter"…
$ intercepts_pct        [3m[90m<dbl>[39m[23m 2.0, 3.0, 3.5, 32.0, 0.0, 0.0, 54.0, 0.0, 2.5, …


# Wrangle
- Pivot the data frame into a species-samples matrix
- Fix inconsistent codes for bryophytes
  - Before 2015, intercepts of moss or lichen were coded as “M/L” for “moss/lichen”. After 2015, mosses, liverworts, and hornworts were coded as "M", and lichens were coded as "LIC". This will make comparison across years impossible. Because the old code “M/L” is inseparable, entries of “M” or “LIC” must be re-coded to match the old code. The "M/L" code will not be read properly because of the slash, so a new one "M_L" will be created. Here, this is accomplished by using `mutate()` and summing all the bryophyte columns into a new one with the new code.


In [9]:
df_ground_wide <-
  df_ground %>% 
  select(-ground_group) %>% 
  pivot_wider(id_cols = c(survey_ID, year, survey_sequence, grid_point), names_from = intercept_ground_code, values_from = intercepts_pct, values_fill = 0) %>% 
  mutate(M_L = LIC + M + `M/L`) %>% 
  select(survey_ID, year, survey_sequence, grid_point, BG, BV, G, L, M_L, everything(), -LIC, -M, -`M/L`) %>% 
  glimpse()

Rows: 1,244
Columns: 21
$ survey_ID       [3m[90m<chr>[39m[23m "436", "437", "561", "560", "559", "558", "695", "438…
$ year            [3m[90m<int>[39m[23m 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…
$ survey_sequence [3m[90m<chr>[39m[23m "2011-12", "2011-12", "2011-12", "2011-12", "2011-12"…
$ grid_point      [3m[90m<int>[39m[23m 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17…
$ BG              [3m[90m<dbl>[39m[23m 2.0, 2.5, 0.0, 1.0, 0.0, 0.0, 1.0, 11.0, 0.0, 1.5, 0.…
$ BV              [3m[90m<dbl>[39m[23m 3.0, 18.5, 5.5, 12.0, 7.5, 2.0, 12.0, 12.5, 10.0, 8.5…
$ G               [3m[90m<dbl>[39m[23m 3.5, 5.0, 2.5, 10.5, 0.0, 0.0, 0.5, 7.5, 1.0, 2.5, 0.…
$ L               [3m[90m<dbl>[39m[23m 32.0, 43.0, 84.0, 63.0, 82.5, 87.0, 76.0, 47.5, 81.5,…
$ M_L             [3m[90m<dbl>[39m[23m 54.0, 23.5, 3.5, 9.5, 0.5, 7.5, 3.0, 5.0, 4.5, 1.0, 1…
$ OTHER           [3m[90m<dbl>[39m[23m 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0

In [10]:
df_ground_wide[which(!complete.cases(df_ground_wide)), ]

“number of rows of result is not a multiple of vector length (arg 2)”
“number of rows of result is not a multiple of vector length (arg 2)”
“number of rows of result is not a multiple of vector length (arg 2)”
“number of rows of result is not a multiple of vector length (arg 2)”


survey_ID,year,survey_sequence,grid_point,BG,BV,G,L,M_L,OTHER,⋯,S,SC,SD,SE,SH,SU,WDL,WDS,WDSTUMP,WDT
<chr>,<int>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>


# Output

In [11]:
# updated output 2021-01-26
write_csv(df_ground_wide, file = "gridVeg_groundCover_abundance_matrix_WRANGLE.csv")

“The `path` argument of `write_csv()` is deprecated as of readr 1.4.0.
Please use the `file` argument instead.
