-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
#29: Centralize gathering, cleaning and exporting of raw data in data…
…-raw/data.Rmd.
- Loading branch information
1 parent
493b6fe
commit ec61533
Showing
9 changed files
with
412 additions
and
108 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,148 @@ | ||
--- | ||
title: "Gather raw data,clean it and export it to data/" | ||
output: github_document | ||
--- | ||
|
||
```{r setup, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE) | ||
``` | ||
|
||
# Overview | ||
|
||
This document exports data. It takes data from data-raw/, cleans it, and places a .rda version of it in data/. To update all exported data you can do either of two things: (1) Click Knit (Ctrl+Shift+K), which will also update data.md, or (2) Run > Run All (Ctrl+Alt+R). You can also update the data in specific code chunks. (Learn more about RMarkdown documents [here](https://rmarkdown.rstudio.com/lesson-1.html).) | ||
|
||
# Gather raw master data and clean it | ||
|
||
Clean master data | ||
|
||
```{r master} | ||
master <- readr::read_csv(here::here("data-raw/allotemp_main.csv")) | ||
``` | ||
|
||
```{r master-clean} | ||
# FIXME: Remove ambiguity of this code chunk (#29; https://goo.gl/rmuzmH) | ||
# eliminate rows where fam or sp is unknown #use unique(allo_main$species) | ||
master <- subset(master, family != "Unkown") | ||
# chnage name of "equation" column to "equation_form" | ||
``` | ||
|
||
# Export subsets of the clean master data | ||
|
||
This section exports subsets of the clean master data to data/. Each dataset documented in R/data.R to produce a help file that can be accessed from the R console with `?name-of-the-dataset` and also from the Functions Index tab of the website of __allodb__. | ||
|
||
## Dataset `equations` | ||
|
||
Allometric equations (doesn't include sites, but sp? not sure) | ||
|
||
```{r equations} | ||
# Needs to include a column after'equation_form' to combine | ||
# coeficienss+formula so we get "unique" equations, then give unique id | ||
equations_cols <- c( | ||
"equation_id", | ||
"model_parameters", | ||
"biomass_units_original", | ||
"regression_model", | ||
"other_equations_tested", | ||
"log (biomass)", | ||
"d", | ||
"dbh_min_cm", | ||
"dbh_max_cm", | ||
"n_trees", | ||
"dbh_units_original", | ||
"equation", | ||
"equation_grouping", | ||
"bias correction _CF" | ||
) | ||
equations <- master[equations_cols] | ||
usethis::use_data(equations, overwrite = TRUE) | ||
equations_metadata <- readr::read_csv( | ||
here::here("data-raw/data_equations_metadata.csv") | ||
) | ||
usethis::use_data(equations_metadata, overwrite = TRUE) | ||
``` | ||
|
||
## Dataset `missing_values` | ||
|
||
```{r missing-values} | ||
missing_values_metadata <- readr::read_csv( | ||
here::here("data-raw/data_missing_values_metadata.csv") | ||
) | ||
usethis::use_data(missing_values_metadata, overwrite = TRUE) | ||
``` | ||
|
||
## Dataset `references` | ||
|
||
References (links to wood density table with an id, my raw reference table includes sites for my own sanity!). | ||
|
||
```{r} | ||
# TODO: Add table. | ||
# usethis::use_data(references, overwrite = TRUE) | ||
references_metadata <- readr::read_csv( | ||
here::here("data-raw/data_references_metadata.csv") | ||
) | ||
usethis::use_data(references_metadata, overwrite = TRUE) | ||
``` | ||
|
||
## Dataset `sites_info` | ||
|
||
```{r} | ||
# Basic info ForestGEO sites | ||
# TODO: Add table. See https://goo.gl/ic7uya. | ||
# usethis::use_data(references, overwrite = TRUE) | ||
``` | ||
|
||
## Dataset `sitespecies` | ||
|
||
Site-species (includes non-tropical sites, links to equation table with eq Id). | ||
|
||
```{r} | ||
sitespecies_cols <- c( | ||
"site", | ||
"family", | ||
"species", | ||
"species_code", | ||
"life_form", | ||
"model_parameters", | ||
"allometry_development_method", | ||
"equation_id", | ||
"regression_model", | ||
"wsg", | ||
"wsg_id" | ||
) | ||
sitespecies <- master[sitespecies_cols] | ||
usethis::use_data(sitespecies, overwrite = TRUE) | ||
sitespecies_metadata <- readr::read_csv( | ||
here::here("data-raw/data_sitespecies_metadata.csv") | ||
) | ||
usethis::use_data(sitespecies_metadata, overwrite = TRUE) | ||
``` | ||
|
||
## Dataset `wsg` | ||
|
||
Wood density (with this scrip and master table we only take wsg for temperate sites, later to be merge with trop). | ||
|
||
```{r} | ||
wsg_cols <- c( | ||
"wsg_id", | ||
"family", | ||
"species", | ||
"wsg", | ||
"wsg_specificity", | ||
"variable", | ||
"site" | ||
) | ||
wsg <- master[wsg_cols] | ||
usethis::use_data(wsg, overwrite = TRUE) | ||
wsg_metadata <- readr::read_csv( | ||
here::here("data-raw/data_wsg_metadata.csv") | ||
) | ||
usethis::use_data(wsg_metadata, overwrite = TRUE) | ||
``` | ||
|
Oops, something went wrong.