Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table arrangement and conventions #85

Closed
teixeirak opened this issue Mar 25, 2019 · 2 comments
Closed

Table arrangement and conventions #85

teixeirak opened this issue Mar 25, 2019 · 2 comments

Comments

@teixeirak
Copy link
Member

Non-urgent question for @gonzalezeb - Can we move equation_taxa be in the equations table? I recognize that its handy to have in sitespecies, but its also necessary to interpret the equations table, and its a property of the equation.

@teixeirak teixeirak changed the title move equation_taxa to equations table Arrangement of tables Mar 26, 2019
@teixeirak teixeirak changed the title Arrangement of tables Table arrangement and conventions Mar 26, 2019
@teixeirak
Copy link
Member Author

teixeirak commented Mar 26, 2019

More broadly than the comment above, I'm finding the tables difficult to work with from a "user" standpoint, and in some cases I think they may be missing important information. Here's my running list (including modification of the one above):

  • The equations table needs a field to indicate the taxa on which it was developed. This is (partially) given in the sitespecies table (equation_taxa), but upon looking at the chojnacky_2014_ugbe equations, I noted that it includes only the taxa to which the species in question belongs and not full information about taxa used in the equation. For example, it (1) doesn't differentiate between "Abies < 0.35 spg" (eq. 4872ed) and Abies≥ 0.35 spg (eq. 74dd65) or (2) give the information that taxa other than the target taxa were included in the allometry ("Fabaceae/ Juglandaceae, other"; eq. 7f7777). This will be something for Erika to decide on, but here's my recommendation: (1) move equation_taxa to the equations table, and modify to indicate when multiple taxa were included (e.g., "Fabaceae/ Juglandaceae"); (2) add a field to equations table original_equation_id or such so that equations can be easily matched to those in the original publication (for chojnacky_2014_ugbe, this would be, e.g., "Abies≥ 0.35 spg"); (3) add a field in sitepecies briefly describing the type of trees for which the equation was developed (e.g., "Abies≥ 0.35 spg", "Fabaceae/ Juglandaceae, other", "Pinus saplings"). The purpose of this field will be for users to be able to evaluate how closely a certain species is matched to an equation without having to merge tables and examine several fields.

  • Related to the above -- and (partly) solved if we follow my recommendations)-- is the issue of the sitespecies table indicating the appropriate range of dbh. We'd originally planned that this table would have its own minDBH and maxDBH fields, indicating the DBH range over which the equation would be applied. This may be different from the min and max in the equations table when (1) there's >1 option for a given DBH or (2) we decide that the best option is to extrapolate beyond the min/max of the equation that we have. We've now been writing/ planning the code to automate this (e.g., issue Deal with disconnects between equations that switch when trees cross a size threshhold  #17), so maybe that's not needed anymore (although potentially useful to communicate to the user what's being applied). If we implement my recommendation 3 in the item above, that could at least give a sense of which equations are being applied (e.g., "Pinus saplings" and "Pinus >5cm").

  • The current warning field (equations table) is currently notes/ warnings to ourselves. I thought the purpose of that field would be to generate warnings to the user when potentially untrustworthy equations are used. Rename notes? Or do we need a field warning about the equations themselves?

  • Related to the above, it seems that the sitespecies table would be an appropriate place for a warning field. Warnings would be generated when there's no truly appropriate equation for a certain species. An example warning (currently in notes) might be "using allometries for Pseudotsuga menziesii as generic small conifer proxy for Pinus longaeva at utah". One would not generate such a warning when using the same equation for small Pseudotsuga menziesii.

  • I noted that some ref_id differ from the convention that we we were planning to follow (e.g., means_22_NA, whereas convention is [last name of first author_publication year_first letter of first four words in title]). What are these intended to capture? My guess may be a particular equation within a publication, in which case implementing my suggestion 2 in the top comment (add a field to equations table original_equation_id) would allow you to specify the equation number within a publication while describing the source according to the convention we had planned.

@maurolepore
Copy link
Member

maurolepore commented Mar 26, 2019

Sorry for jumping in your conversation. Just thought it'd be useful to clarify that the individual tables are a low-level detail that users shouldn't need to know about. The individual tables are developer-oriented (i.e. to make our life easier by maintaining normalized data). Users can access all the information with a higher level interface such as the master() or whatever wrapper you think is useful. Just name the table you want users to have and we can create it on the fly from the low-level tables we maintain and provide an evocative wrapper. This approach follows how databases maintain tables and Views (more info at #78 (comment)).

library(allodb)
library(tidyverse)

glimpse(
  allodb::master()
)
#> Joining `equations` and `sitespecies` by 'equation_id'; then `sites_info` by 'site'.
#> Observations: 769
#> Variables: 43
#> $ ref_id                               <chr> "jenkins_2004_cdod", "jen...
#> $ equation_id                          <chr> "2060ea", "2060ea", "a4d8...
#> $ equation_allometry                   <chr> "10^(1.1891+1.419*(log10(...
#> $ equation_form                        <chr> "10^(a+b*(log10(dbh^c)))"...
#> $ dependent_variable_biomass_component <chr> "Total aboveground biomas...
#> $ independent_variable                 <chr> "DBH", "DBH", "DBH", "DBH...
#> $ allometry_specificity                <chr> "Species", "Species", "Sp...
#> $ geographic_area                      <chr> "Ohio, USA", "Ohio, USA",...
#> $ dbh_min_cm                           <chr> "0.21", "0.21", "0.19", "...
#> $ dbh_max_cm                           <chr> "5.73", "5.73", "3.86", "...
#> $ sample_size                          <chr> NA, NA, NA, NA, NA, NA, N...
#> $ dbh_units_original                   <chr> "cm", "cm", "cm", "cm", "...
#> $ biomass_units_original               <chr> "g", "g", "g", "g", "g", ...
#> $ allometry_development_method         <chr> "harvest", "harvest", "ha...
#> $ regression_model                     <chr> NA, NA, NA, NA, NA, NA, N...
#> $ other_equations_tested               <chr> NA, NA, NA, NA, NA, NA, N...
#> $ log_biomass                          <chr> NA, NA, NA, NA, NA, NA, N...
#> $ bias_corrected                       <chr> "1", "1", "1", "1", "1", ...
#> $ bias_correction_factor               <chr> "1.056", "1.056", "1.016"...
#> $ notes_fitting_model                  <chr> NA, NA, NA, NA, NA, NA, N...
#> $ original_data_availability           <chr> NA, NA, NA, NA, NA, NA, N...
#> $ warning                              <chr> NA, NA, NA, NA, NA, NA, N...
#> $ site                                 <chr> "lilly dicky", "tyson", "...
#> $ family                               <chr> "Sapindaceae", "Sapindace...
#> $ species                              <chr> "Acer rubrum", "Acer rubr...
#> $ species_code                         <chr> "316", "acerub", "318", "...
#> $ life_form                            <chr> "Tree", "Tree", "Tree", "...
#> $ equation_group                       <chr> "Expert", "Expert", "Expe...
#> $ equation_taxa                        <chr> "Acer rubrum", "Acer rubr...
#> $ notes_on_species                     <chr> NA, NA, NA, NA, NA, NA, N...
#> $ wsg_id                               <chr> NA, NA, NA, NA, NA, NA, N...
#> $ wsg_specificity                      <chr> NA, NA, NA, NA, NA, NA, N...
#> $ id                                   <chr> NA, NA, NA, NA, NA, "34",...
#> $ Site                                 <chr> NA, NA, NA, NA, NA, "SCBI...
#> $ lat                                  <chr> NA, NA, NA, NA, NA, "38.8...
#> $ long                                 <chr> NA, NA, NA, NA, NA, "-78....
#> $ UTM_Zone                             <chr> NA, NA, NA, NA, NA, "17",...
#> $ UTM_X                                <chr> NA, NA, NA, NA, NA, "7475...
#> $ UTM_Y                                <chr> NA, NA, NA, NA, NA, "4308...
#> $ intertropical                        <chr> NA, NA, NA, NA, NA, "Othe...
#> $ size.ha                              <chr> NA, NA, NA, NA, NA, NA, N...
#> $ E                                    <chr> NA, NA, NA, NA, NA, "1.57...
#> $ wsg.site.name                        <chr> NA, NA, NA, NA, NA, NA, "...

As per how you can now quickly see the information you care about, use either the high level master() function or join whatever tables you are interested in:

library(allodb)
library(tidyverse)

equations %>% left_join(sitespecies)
#> Joining, by = "equation_id"
#> # A tibble: 769 x 32
#>    ref_id equation_id equation_allome~ equation_form dependent_varia~
#>    <chr>  <chr>       <chr>            <chr>         <chr>           
#>  1 jenki~ 2060ea      10^(1.1891+1.41~ 10^(a+b*(log~ Total abovegrou~
#>  2 jenki~ 2060ea      10^(1.1891+1.41~ 10^(a+b*(log~ Total abovegrou~
#>  3 jenki~ a4d879      10^(1.2315+1.63~ 10^(a+b*(log~ Total abovegrou~
#>  4 jenki~ a4d879      10^(1.2315+1.63~ 10^(a+b*(log~ Total abovegrou~
#>  5 jenki~ c59e03      exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#>  6 jenki~ c59e03      exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#>  7 jenki~ c59e03      exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#>  8 jenki~ c59e03      exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#>  9 jenki~ c59e03      exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#> 10 jenki~ c59e03      exp(7.217+1.514~ exp(a+b*log(~ Stem biomass (w~
#> # ... with 759 more rows, and 27 more variables:
#> #   independent_variable <chr>, allometry_specificity <chr>,
#> #   geographic_area <chr>, dbh_min_cm <chr>, dbh_max_cm <chr>,
#> #   sample_size <chr>, dbh_units_original <chr>,
#> #   biomass_units_original <chr>, allometry_development_method <chr>,
#> #   regression_model <chr>, other_equations_tested <chr>,
#> #   log_biomass <chr>, bias_corrected <chr>, bias_correction_factor <chr>,
#> #   notes_fitting_model <chr>, original_data_availability <chr>,
#> #   warning <chr>, site <chr>, family <chr>, species <chr>,
#> #   species_code <chr>, life_form <chr>, equation_group <chr>,
#> #   equation_taxa <chr>, notes_on_species <chr>, wsg_id <chr>,
#> #   wsg_specificity <chr>

If you don't want to use R, and instead you prefer to explore the database online, we can certainly build a simple shiny app that joins the tables in the background while the user simple points and clicks. Something along the lines of https://shiny.rstudio.com/gallery/datatables-demo.html but where the check boxes don't refer to column to show but to tables to join.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants