Skip to content

Commit

Permalink
#29: Centralize gathering, cleaning and exporting of raw data in data…
Browse files Browse the repository at this point in the history
…-raw/data.Rmd.
  • Loading branch information
maurolepore committed Mar 1, 2018
1 parent 493b6fe commit ec61533
Show file tree
Hide file tree
Showing 9 changed files with 412 additions and 108 deletions.
8 changes: 0 additions & 8 deletions data-raw/allotemp_main.R

This file was deleted.

148 changes: 148 additions & 0 deletions data-raw/data.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
---
title: "Gather raw data,clean it and export it to data/"
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Overview

This document exports data. It takes data from data-raw/, cleans it, and places a .rda version of it in data/. To update all exported data you can do either of two things: (1) Click Knit (Ctrl+Shift+K), which will also update data.md, or (2) Run > Run All (Ctrl+Alt+R). You can also update the data in specific code chunks. (Learn more about RMarkdown documents [here](https://rmarkdown.rstudio.com/lesson-1.html).)

# Gather raw master data and clean it

Clean master data

```{r master}
master <- readr::read_csv(here::here("data-raw/allotemp_main.csv"))
```

```{r master-clean}
# FIXME: Remove ambiguity of this code chunk (#29; https://goo.gl/rmuzmH)
# eliminate rows where fam or sp is unknown #use unique(allo_main$species)
master <- subset(master, family != "Unkown")
# chnage name of "equation" column to "equation_form"
```

# Export subsets of the clean master data

This section exports subsets of the clean master data to data/. Each dataset documented in R/data.R to produce a help file that can be accessed from the R console with `?name-of-the-dataset` and also from the Functions Index tab of the website of __allodb__.

## Dataset `equations`

Allometric equations (doesn't include sites, but sp? not sure)

```{r equations}
# Needs to include a column after'equation_form' to combine
# coeficienss+formula so we get "unique" equations, then give unique id
equations_cols <- c(
"equation_id",
"model_parameters",
"biomass_units_original",
"regression_model",
"other_equations_tested",
"log (biomass)",
"d",
"dbh_min_cm",
"dbh_max_cm",
"n_trees",
"dbh_units_original",
"equation",
"equation_grouping",
"bias correction _CF"
)
equations <- master[equations_cols]
usethis::use_data(equations, overwrite = TRUE)
equations_metadata <- readr::read_csv(
here::here("data-raw/data_equations_metadata.csv")
)
usethis::use_data(equations_metadata, overwrite = TRUE)
```

## Dataset `missing_values`

```{r missing-values}
missing_values_metadata <- readr::read_csv(
here::here("data-raw/data_missing_values_metadata.csv")
)
usethis::use_data(missing_values_metadata, overwrite = TRUE)
```

## Dataset `references`

References (links to wood density table with an id, my raw reference table includes sites for my own sanity!).

```{r}
# TODO: Add table.
# usethis::use_data(references, overwrite = TRUE)
references_metadata <- readr::read_csv(
here::here("data-raw/data_references_metadata.csv")
)
usethis::use_data(references_metadata, overwrite = TRUE)
```

## Dataset `sites_info`

```{r}
# Basic info ForestGEO sites
# TODO: Add table. See https://goo.gl/ic7uya.
# usethis::use_data(references, overwrite = TRUE)
```

## Dataset `sitespecies`

Site-species (includes non-tropical sites, links to equation table with eq Id).

```{r}
sitespecies_cols <- c(
"site",
"family",
"species",
"species_code",
"life_form",
"model_parameters",
"allometry_development_method",
"equation_id",
"regression_model",
"wsg",
"wsg_id"
)
sitespecies <- master[sitespecies_cols]
usethis::use_data(sitespecies, overwrite = TRUE)
sitespecies_metadata <- readr::read_csv(
here::here("data-raw/data_sitespecies_metadata.csv")
)
usethis::use_data(sitespecies_metadata, overwrite = TRUE)
```

## Dataset `wsg`

Wood density (with this scrip and master table we only take wsg for temperate sites, later to be merge with trop).

```{r}
wsg_cols <- c(
"wsg_id",
"family",
"species",
"wsg",
"wsg_specificity",
"variable",
"site"
)
wsg <- master[wsg_cols]
usethis::use_data(wsg, overwrite = TRUE)
wsg_metadata <- readr::read_csv(
here::here("data-raw/data_wsg_metadata.csv")
)
usethis::use_data(wsg_metadata, overwrite = TRUE)
```

Loading

0 comments on commit ec61533

Please sign in to comment.