Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Units problem to tackle #42

Closed
gonzalezeb opened this issue Sep 19, 2018 · 47 comments
Closed

Units problem to tackle #42

gonzalezeb opened this issue Sep 19, 2018 · 47 comments

Comments

@gonzalezeb
Copy link
Contributor

An issue (more of a reminder to myself) to tackle is to convert biomass units from original equations to the final output unit we want allodb to give (kg, Mg). That conversion factor (convert from inches, mm to cm, etc) should be incorporated in the equation. Of special attention is the DBH used to built the original allometry. We have two options:

  1. leave it as it and the site will have to convert
  2. or better, alert the site that their input dbh should be in cm and we "rewrite" all equations to reflect that
@maurolepore
Copy link
Member

Great to know you are thinking about this issue. I note some things also as a reminder for further discussion.

I think it's crucial to ensure that the output of the equations is always in the same unit. That allows summarizing the resulting biomass across rows. For example, to sum the biomass of all species in a census the biomass of all species must be in the same unit.

Does any equation convert units? Assuming that no equation converts units, the output unit will only depend on the input unit. So the important question is this: Do different equations expect input in different units?

If all equations expect input in the same unit, then we can ask users to convert their data to whatever units we want. Users can convert their entire DBH column with, for example, with measurements::conv_unit().

The problem, I think, is if different equations expect different units. We can't ask users to convert different rows of the DBH column in different way. That's too hard to do and error prone. To deal with this problem I see two options:

  1. Use not the original equations but a version of them that expect input in the same unit.
  2. Use the original equations, then automatically convert the known temporary-output unit to a consistent final-output unit. What users get is only the final, consistent unit.

@gonzalezeb
Copy link
Contributor Author

I believe option one is the best solution: modify the original equation so the output is a standard metric (cm, kg). We already have dbh and biomass units from original pub listed in allodb_master (although not all of them need conversion) so it should be simple to do it programmatically (I will try first!).
We then can request users that their input DBH should be in cm.

Shortly I will add few more temperate sites to allodb_master so we can start working on that.

@teixeirak
Copy link
Member

Actually, I think its better tow go with option two, which I see as simpler and less error-prone. I'd like to see a workflow like this:

  1. user inputs diameter data in cm (or maybe mm, in which case it gets converted to cm).
  2. based on the input (i.e., diameter) units of the original equation, DBH is multiplied by the appropriate conversion factor (e.g., all equations using inches get DBH*0.39)
  3. biomass is calculated according to original equation
  4. based on output units of original equation, biomass is multiplied by the appropriate conversion factor

@teixeirak
Copy link
Member

The motivations for this option are:

  1. avoid potential errors in converting the equations
  2. avoid situations where original equation may have reduced accuracy under converted units
  3. make it easy to go back and compare equations against original publications

@maurolepore
Copy link
Member

maurolepore commented Sep 20, 2018

Thanks for the feedback. In the light of @teixeirak suggestion (use solution 2 of my previous comment) you may disregard this comment. But I write it anyway because I had already devloped the example and it illustrates a useful function: dplyr::case_when().

RE: @gonzalezeb's #42 (comment), the implementation would look something like this:

library(glue)
library(dplyr)

# Dummy database showing that different equations expect different units.
db <- tribble(
  ~equations, ~units,
   "2 * dbh",   "mm",
   "5 * dbh",    "m"
)

db
#> # A tibble: 2 x 2
#>   equations units
#>   <chr>     <chr>
#> 1 2 * dbh   mm   
#> 2 5 * dbh   m

# Implementing solution 1 of my previous comment: The original equations remain the same
# but we create a copy that converts the output of all equations to the same unit -- based
# on the expected input-unit.
db_edited <- db %>% 
  mutate(equations_edited_cm = case_when(
      units == "mm" ~ glue("{equations} / 10"),
      units == "m"  ~ glue("{equations} * 10")
    )
  )

db_edited
#> # A tibble: 2 x 3
#>   equations units equations_edited_cm
#>   <chr>     <chr> <S3: glue>         
#> 1 2 * dbh   mm    2 * dbh / 10       
#> 2 5 * dbh   m     5 * dbh * 10

@maurolepore
Copy link
Member

And this illustrates the implementation of solution 2 (my previous last comment above), by which we first calculate biomass with the original function and then convert the result to a consistent unit.

library(dplyr)

db <- tribble(
  ~equations, ~units,
   "2 * dbh",   "mm",
   "5 * dbh",    "m"
)
db
#> # A tibble: 2 x 2
#>   equations units
#>   <chr>     <chr>
#> 1 2 * dbh   mm   
#> 2 5 * dbh   m

# Little helper function to hide unimportant details
evaluate_column <- function(column, dbh = dbh) {
  dummy_dbh <- 10
  eval(parse(text = db[[column]]), envir = list(dbh = dummy_dbh))
}

# Implementation
mutate(db,
  biomass_asis = evaluate_column("equations"),
  biomass_cm = case_when(
    units == "mm" ~ biomass_asis / 10,
    units ==  "m" ~ biomass_asis * 10
  )
)
#> # A tibble: 2 x 4
#>   equations units biomass_asis biomass_cm
#>   <chr>     <chr>        <dbl>      <dbl>
#> 1 2 * dbh   mm              50          5
#> 2 5 * dbh   m               50        500

Created on 2018-09-20 by the reprex package (v0.2.1)

@maurolepore
Copy link
Member

Summary

This summarizes all comments above and a meeting with @gonzalezeb.

Issues:

  1. The units may be different of the dbh values from different users.

  2. Right now, different equations in allodb expect dbh values in different units.

  3. Output should be always in the same unit. This allows summarizing the resulting biomass across rows. For example, to sum the biomass of all species in a census the biomass of all species must be in the same unit.

Possible solution

  1. Ask users to provide dbh values in a specific a unit, say, in centimeters: dbh_cm. Users can convert units with e.g. measurements::conv_unit(). This would keep the code focused, simple, and less prone to bugs.

  2. For each equation, the code can use a conversion factor to convert the input (dbh_cm) from cm to the unit that is specific for that equation: dbh_eq . This conversion factor should be stored in the equations table in a column, say, cf_dbh.

  3. To output biomass in consistent units, the code must finally convert the biomass from the equation-specific unit to "cm" (for consistency with the input we expect). This conversion factor should also be stored in the equations table in the column, say, cf_bmss.

Example

(The actual numbers make no sense)

USER DATA      EQUATIONS TABLE
---------      -------------------------------------
 dbh_cm        cf_dbh          equation      cf_bmss
  <dbl>         <dbl>             <chr>        <dbl>
---------      -------------------------------------
    55           0.39   "888 + (dbh_eq)"        2.54
    66              1   "999 + (dbh_eq)"           1
...


# Each row:
# * INTERNAL COMPUTATION BEFORE APLYING THE EQUAITON
dbh_cm * cf_dbh = dbh_eq

# * INTERNAL COMPUTATION AFTER APLYING THE EQUAITON
biomass_eq * cf_bmss = biomass_cm

@teixeirak
Copy link
Member

This sounds good to me.

@maurolepore
Copy link
Member

I just came across the quantities package, which may be handy for dealing with units:
https://github.com/r-quantities/quantities/blob/master/README.md

@maurolepore
Copy link
Member

@gonzalezeb , please see if this is getting close to what you think the code should do.

# EXPLORATION
# Here I just expresses my thoughts in code

library(tidyverse)

# * Good: The columns giving units have no misisng values
# * Unexpected: Some units are not of distance (e.g. cm) but of area (e.g. cm^2)
allodb::equations %>%
  select(matches("unit")) %>%
  unique()
#> # A tibble: 11 x 2
#>    dbh_units_original biomass_units_original
#>    <chr>              <chr>                 
#>  1 cm                 g                     
#>  2 in                 g                     
#>  3 cm                 kg                    
#>  4 in                 lb                    
#>  5 mm                 kg                    
#>  6 in^2               lb                    
#>  7 in                 kg                    
#>  8 cm^2               g                     
#>  9 cm                 Mg                    
#> 10 cm                 gm                    
#> 11 m                  t

# * Conversion factors not in the table. I'll do the conversion myself
allodb::equations %>%
  select(matches("unit|convert|correct|factor")) %>%
  head(3)
#> # A tibble: 3 x 4
#>   dbh_units_original biomass_units_orig~ bias_corrected bias_correction_fa~
#>   <chr>              <chr>               <chr>          <chr>              
#> 1 cm                 g                   1              1.056              
#> 2 cm                 g                   1              1.016              
#> 3 cm                 g                   1              included in model

# QUESTIONS
# 1. What should we do when `dbh_original_units` are in units of area, e.g.
# [cm^2]? For example, if the user inputs `DBH` in [cm], and the value of
# `dbh_units_original` is `cm^2`, Should the code simply square the input?
# 
# 2. Wouldn't that square `DBH` twice if `equation_form` is `a * (DBH^2)^b`?
allodb::equations %>%
  dplyr::select(matches("dbh_unit"), matches("equation_form")) %>%
  dplyr::filter(grepl("2", dbh_units_original))
#> # A tibble: 4 x 2
#>   dbh_units_original equation_form
#>   <chr>              <chr>        
#> 1 in^2               a*(DBH^2)^b  
#> 2 in^2               a*(DBH^2)^b  
#> 3 in^2               a*(DBH^2)^b  
#> 4 cm^2               a+b*BA

# EXAMPLE
# 3. Is this what you would do?

# User inputs dbh in [cm] (i.e. we arbitrary request [cm])
(cns <- tibble(dbh_cm = c(100, 200, 300)))
#> # A tibble: 3 x 1
#>   dbh_cm
#>    <dbl>
#> 1    100
#> 2    200
#> 3    300

# Here `dbh_unit_original` indicates that the dbh input to the equation should
# be in [in^2] and the output will be in [lb] (same as [lbs], right?)
eqn <- allodb::equations %>%
  dplyr::select(matches("dbh_unit|biomass_unit|equation_allometry")) %>% 
  dplyr::filter(dbh_units_original == "in^2")
eqn
#> # A tibble: 3 x 3
#>   equation_allometry      dbh_units_original biomass_units_original
#>   <chr>                   <chr>              <chr>                 
#> 1 3.08355*(dbh^2)^1.14915 in^2               lb                    
#> 2 2.17565*(dbh^2)^1.2481  in^2               lb                    
#> 3 2.56795*(dbh^2)^1.18685 in^2               lb

# So we convert the input from [cm] to [in^2] and the output to from [lbs] to
# [g](for example)
cm_to_inch_squared <- function(x) {
  measurements::conv_unit(x, from = "cm", to = "inch")^2
}
lbs_to_g <- function(x) {
  measurements::conv_unit(x, from = "lbs", to = "g")
}

eqn %>% 
  mutate(
    dbh = cm_to_inch_squared(cns$dbh_cm),
    # We evaluate dbh in the original units
    biomass_original = eval(parse(text = equation_allometry), envir = eqn),
    # We convert the original output units to the units we choose (e.g. [g])
    biomass = lbs_to_g(biomass_original)
  )
#> # A tibble: 3 x 6
#>   equation_allome~ dbh_units_origi~ biomass_units_o~    dbh
#>   <chr>            <chr>            <chr>             <dbl>
#> 1 3.08355*(dbh^2)~ in^2             lb                1550.
#> 2 2.17565*(dbh^2)~ in^2             lb                6200.
#> 3 2.56795*(dbh^2)~ in^2             lb               13950.
#> # ... with 2 more variables: biomass_original <dbl>, biomass <dbl>

Created on 2019-02-25 by the reprex package (v0.2.1)

@maurolepore
Copy link
Member

allo_evaluate() now warns that we still don't handle units.

You can see this in action at https://forestgeo.github.io/fgeo.biomass/ (search for allo_evaluate()).

@teixeirak
Copy link
Member

The units issue is one that should be solvable in a a few minutes. Do you need any guidance on this?

@maurolepore
Copy link
Member

Do you need any guidance on this?

I can certainly move this forward but to be sure the result is correct I need @gonzalezeb or @teixeirak to answer these questions:

#42 (comment)

image

@teixeirak
Copy link
Member

If inputs are in units of area (cm2), we can either (1) calculate cm^2 before passing to the equation or (2) modify the equation so that the squaring takes place in the equation. The second option would be slightly tidier, but either works. I think the answer depends on how much work it would be to go back and change those equations. We (@gonzalezeb ) may need to check all of them anyway, as the example you gave where DHB is squared twice is suspicious.
Of course, when the input is basal area (BA), that's pi*(DBH/2)^2.

@maurolepore
Copy link
Member

where DHB is squared twice is suspicious

This is what I’m mainly asking. (I’m not concerned about the implementation. I’d go with the approach I showed in the example and we can refactor it anytime later.)

@maurolepore
Copy link
Member

when the input is basal area (BA), that's pi*(DBH/2)^2

Do you prefer me to replace this via code in fgeo.biomass, or for you to change it in allodb? That is, wherever it now says “BA” replace it with “pi*(DBH/2)^2”?

@teixeirak
Copy link
Member

I'd go ahead and do it in allodb (unless @gonzalezeb has any objection). Be sure to change the input units when you do that.

@maurolepore maurolepore moved this from To do to In progress in Minimum Viable Product ASAP Mar 6, 2019
@maurolepore
Copy link
Member

@gonzalezeb,

I assume that [t] means ton, right? Which of the following "_ton" alternatives do you think is the right one? I have no idea what "short_ton" or "long_ton" means and unfortunetaly
?measurements::conv_unit_options doesn't give details on this.

measurements::conv_unit_options$mass
#>  [1] "Da"         "fg"         "pg"         "ng"         "ug"        
#>  [6] "mg"         "g"          "kg"         "Mg"         "Gg"        
#> [11] "Tg"         "Pg"         "carat"      "metric_ton" "oz"        
#> [16] "lbs"        "short_ton"  "long_ton"   "stone"

Created on 2019-03-06 by the reprex package (v0.2.1)

@gonzalezeb
Copy link
Contributor Author

I though I changed all t to to metric _ton already.

@maurolepore
Copy link
Member

@gonzalezeb, what unit is [gm]?

sort(unique(allodb::equations$biomass_units_original))
#> [1] "g"  "gm" "kg" "lb" "Mg" "t"

Created on 2019-03-06 by the reprex package (v0.2.1)

And which of these do you think is the right replacement?

measurements::conv_unit_options$mass
#>  [1] "Da"         "fg"         "pg"         "ng"         "ug"        
#>  [6] "mg"         "g"          "kg"         "Mg"         "Gg"        
#> [11] "Tg"         "Pg"         "carat"      "metric_ton" "oz"        
#> [16] "lbs"        "short_ton"  "long_ton"   "stone"

Created on 2019-03-06 by the reprex package (v0.2.1)

@maurolepore
Copy link
Member

I though I changed all t to to metric _ton already

Likely I just need to update pull and reinstall allodb

@maurolepore
Copy link
Member

maurolepore commented Mar 6, 2019

FYI, I'm now working on replacing the units in allodb with the units that the measurements package knows of. This is to avoid manually inserting conversion factors and instead reuse available tools (i.e. measurements). I think this is the safest approach. You may or may not want to change that in allodb.

library(fgeo.biomass)

unique(
  allodb::equations$dbh_units_original
)
#> [1] "cm"   "in"   "mm"   "in^2" "cm^2" "m"

unique(
  fixme_units(allodb::equations$dbh_units_original)
)
#> [1] "cm"    "inch"  "mm"    "inch2" "cm2"   "m"

unique(
  fixme_units(allodb::equations$biomass_units_original)
)
#> [1] "g"                   "kg"                  "FIXME: Unknown unit"
#> [4] "Mg"

# All valid unites
valid_units()
#> $acceleration
#>  [1] "mm_per_sec2"   "cm_per_sec2"   "m_per_sec2"    "km_per_sec2"  
#>  [5] "grav"          "inch_per_sec2" "ft_per_sec2"   "mi_per_sec2"  
#>  [9] "kph_per_sec"   "mph_per_sec"  
#> 
#> $angle
#> [1] "degree" "radian" "grad"   "arcmin" "arcsec" "turn"  
#> 
#> $area
#>  [1] "nm2"      "um2"      "mm2"      "cm2"      "m2"       "hectare" 
#>  [7] "km2"      "inch2"    "ft2"      "yd2"      "acre"     "mi2"     
#> [13] "naut_mi2"
#> 
#> $coordinate
#> [1] "dec_deg"     "deg_dec_min" "deg_min_sec"
#> 
#> $count
#> [1] "fmol" "pmol" "nmol" "umol" "mmol" "mol" 
#> 
#> $duration
#>  [1] "nsec" "usec" "msec" "sec"  "min"  "hr"   "day"  "wk"   "mon"  "yr"  
#> [11] "dec"  "cen"  "mil"  "Ma"  
#> 
#> $energy
#> [1] "J"    "kJ"   "erg"  "cal"  "Cal"  "Wsec" "kWh"  "MWh"  "BTU" 
#> 
#> $flow
#>  [1] "ml_per_sec"  "ml_per_min"  "ml_per_hr"   "l_per_sec"   "l_per_min"  
#>  [6] "l_per_hr"    "m3_per_sec"  "m3_per_min"  "m3_per_hr"   "Sv"         
#> [11] "gal_per_sec" "gal_per_min" "gal_per_hr"  "ft3_per_sec" "ft3_per_min"
#> [16] "ft3_per_hr" 
#> 
#> $length
#>  [1] "angstrom" "nm"       "um"       "mm"       "cm"       "dm"      
#>  [7] "m"        "km"       "inch"     "ft"       "yd"       "fathom"  
#> [13] "mi"       "naut_mi"  "au"       "light_yr" "parsec"   "point"   
#> 
#> $mass
#>  [1] "Da"         "fg"         "pg"         "ng"         "ug"        
#>  [6] "mg"         "g"          "kg"         "Mg"         "Gg"        
#> [11] "Tg"         "Pg"         "carat"      "metric_ton" "oz"        
#> [16] "lbs"        "short_ton"  "long_ton"   "stone"     
#> 
#> $power
#>  [1] "uW"          "mW"          "W"           "kW"          "MW"         
#>  [6] "GW"          "erg_per_sec" "cal_per_sec" "cal_per_hr"  "Cal_per_sec"
#> [11] "Cal_per_hr"  "BTU_per_sec" "BTU_per_hr"  "hp"         
#> 
#> $pressure
#>  [1] "uatm"  "atm"   "Pa"    "hPa"   "kPa"   "torr"  "mmHg"  "inHg" 
#>  [9] "cmH2O" "inH2O" "mbar"  "bar"   "dbar"  "psi"  
#> 
#> $speed
#>  [1] "mm_per_sec"   "cm_per_sec"   "m_per_sec"    "km_per_sec"  
#>  [5] "inch_per_sec" "ft_per_sec"   "kph"          "km_per_hr"   
#>  [9] "mph"          "mi_per_hr"    "km_per_day"   "mi_per_day"  
#> [13] "knot"         "mach"         "light"       
#> 
#> $temperature
#> [1] "C" "F" "K" "R"
#> 
#> $volume
#>  [1] "ul"        "ml"        "dl"        "l"         "cm3"      
#>  [6] "dm3"       "m3"        "km3"       "us_tsp"    "us_tbsp"  
#> [11] "us_oz"     "us_cup"    "us_pint"   "us_quart"  "us_gal"   
#> [16] "inch3"     "ft3"       "mi3"       "imp_tsp"   "imp_tbsp" 
#> [21] "imp_oz"    "imp_cup"   "imp_pint"  "imp_quart" "imp_gal"

Created on 2019-03-06 by the reprex package (v0.2.1)

@gonzalezeb
Copy link
Contributor Author

Sure, I was going to do it manually in the csv file but you already got it.

@gonzalezeb
Copy link
Contributor Author

--We (@gonzalezeb ) may need to check all of them anyway, as the example you gave where DHB is squared twice is suspicious

Maybe I am misreading the papers where I am taking equations. For example, for Russell_2005 (equation Id=c94845) it defines:
The regression equation for Point Reyes National Seashore was biomass (grams) = 51.68 +0.02 basal area (centimeters2) with an adjusted R2 = 0.53 and P= 0.01

Similar case for the Jenkins (jenkins_2004_cdod) equations.

@teixeirak
Copy link
Member

Note: Mg = metric_ton. (I tend to use Mg because it avoids any potential confusion with U.S. tons.)

@maurolepore
Copy link
Member

@gonzalezeb ,

I'm reinstalling allodb to ensure I have the latest commits. I find 'gm' in equation_id == b30744. Do you see the same?

remotes::install_github("forestgeo/allodb")
#> Skipping install of 'allodb' from a github remote, the SHA1 (6929cc95) has not changed since last install.
#>   Use `force = TRUE` to force installation

library(tidyverse)

allodb::equations %>% 
  filter(biomass_units_original == "gm") %>% 
  select(equation_id, matches("unit"))
#> # A tibble: 1 x 3
#>   equation_id dbh_units_original biomass_units_original
#>   <chr>       <chr>              <chr>                 
#> 1 b30744      cm                 gm

Created on 2019-03-06 by the reprex package (v0.2.1)

@maurolepore
Copy link
Member

maurolepore commented Mar 6, 2019

@gonzalezeb,

We discussed that the code should expect users to provide dbh in [cm]. There are a few equations that seem to need squared inches. How should we handle this? That is, where should the code take the second dimension to convert from a unit of lengh to a unit of area?

library(tidyverse)

allodb::equations %>% 
  filter(dbh_units_original == "in^2") %>% 
  select(dbh_units_original, matches("equ"))
#> # A tibble: 3 x 5
#>   dbh_units_origi~ equation_allome~ equation_id equation_form
#>   <chr>            <chr>            <chr>       <chr>        
#> 1 in^2             3.08355*(dbh^2)~ f3e023      a*(DBH^2)^b  
#> 2 in^2             2.17565*(dbh^2)~ 8eca60      a*(DBH^2)^b  
#> 3 in^2             2.56795*(dbh^2)~ e8ebc2      a*(DBH^2)^b  
#> # ... with 1 more variable: other_equations_tested <chr>

For now I'm just dropping the rows where dbh can't be converted to the unit expected by the equation (notice the last warning "#> Warning: Dropping 828 rows where units can't be converted").

library(fgeo.biomass)

census <- fgeo.biomass::scbi_tree1
species <- fgeo.biomass::scbi_species
census_species <- add_species(census, species, site = "scbi")
#> `sp` now stores Latin species names

allo_find(census_species)
#> Warning: Dropping 58 equations that can't be evaluated.
#> Identify failing equations with `fixme_pull_failing_eqn(allodb::master())`
#> Joining, by = c("sp", "site")
#> Warning:   The input and output datasets have different number of rows:
#>   * Input: 40283.
#>   * Output: 30229.
#> Warning: Dropping 828 rows where units can't be converted
#> # A tibble: 29,401 x 11
#>    eqn_type rowid site  sp      dbh equation_id eqn   eqn_source
#>    <chr>    <int> <chr> <chr> <dbl> <chr>       <chr> <chr>     
#>  1 species      4 scbi  nyss~  53.1 8da09d      1.54~ default   
#>  2 species     21 scbi  liri~  91.4 34fe5a      1.02~ default   
#>  3 species     29 scbi  acer~ 326.  7c72ed      exp(~ default   
#>  4 species     38 scbi  frax~  42.8 0edaff      0.16~ default   
#>  5 species     72 scbi  acer~ 289.  7c72ed      exp(~ default   
#>  6 species     77 scbi  quer~ 251.  07dba7      1.56~ default   
#>  7 species     79 scbi  tili~ 187.  3f99ba      1.44~ default   
#>  8 species     79 scbi  tili~ 475   76d19b      0.00~ default   
#>  9 species     84 scbi  frax~ 170.  0edaff      0.16~ default   
#> 10 species     89 scbi  fagu~  10.7 74186d      2.03~ default   
#> # ... with 29,391 more rows, and 3 more variables:
#> #   anatomic_relevance <chr>, dbh_unit <chr>, bms_unit <chr>

Created on 2019-03-06 by the reprex package (v0.2.1)

@gonzalezeb
Copy link
Contributor Author

I'm reinstalling allodb to ensure I have the latest commits. I find 'gm' in equation_id == b30744. Do you see the same?

No, I see 'g'. Committed in be8160f, line 144 (jan 16, 2019)

@gonzalezeb
Copy link
Contributor Author

#42 (comment). This was my fault, I just checked the original reference and dbh unit should be inch.
Same with this #42 (comment). dbh unit shoudl be cm.
I fixed the equation table, committed here 33f8411

@maurolepore
Copy link
Member

RE: #42 (comment)

Great, thanks! No things are the way they ought to be, i.e. I no longer see 'gm' in equation_id == b30744.

I was out of sync because I hadn't updated data/. To update data/ I need to run data-raw/data.R after you update any .csv file. Also I need to update the strored references in tests/testthat/. This extra step is a little annoying but It's a good way to force myself to slow down a bit and see precisely what's changed. For now, I'll leave the workflow untouched. Maybe I'll rethink later.

remotes::install_github("forestgeo/allodb")
#> Skipping install of 'allodb' from a github remote, the SHA1 (3fe53456) has not changed since last install.
#>   Use `force = TRUE` to force installation

library(tidyverse)

allodb::equations %>% 
  filter(biomass_units_original == "gm") %>% 
  select(equation_id, matches("unit"))
#> # A tibble: 0 x 3
#> # ... with 3 variables: equation_id <chr>, dbh_units_original <chr>,
#> #   biomass_units_original <chr>

Created on 2019-03-07 by the reprex package (v0.2.1)

@maurolepore
Copy link
Member

maurolepore commented Mar 7, 2019

Awesome!

Now that I'm in sync with allodb I confirm that I no longer see [in^2] or [cm^].

library(tidyverse)

allodb::equations %>% 
  select(matches("unit")) %>% 
  unique()
#> # A tibble: 8 x 2
#>   dbh_units_original biomass_units_original
#>   <chr>              <chr>                 
#> 1 cm                 g                     
#> 2 inch               g                     
#> 3 cm                 kg                    
#> 4 inch               lbs                   
#> 5 mm                 kg                    
#> 6 inch               kg                    
#> 7 cm                 Mg                    
#> 8 cm                 metric_ton

Created on 2019-03-07 by the reprex package (v0.2.1)


Following Krista's #42 (comment), should [metric_ton] be replaced by [Mg]?

@maurolepore maurolepore moved this from In progress to Needs review in Minimum Viable Product ASAP Mar 7, 2019
@maurolepore maurolepore moved this from Needs review to In progress in Minimum Viable Product ASAP Mar 12, 2019
@maurolepore
Copy link
Member

This comment by Krista suggests that the output biomass should be in [kg]. @gonzalezeb and @teixeirak can you confirm?

Right now the output is in [g].

@gonzalezeb
Copy link
Contributor Author

yes, output should be in [kg]

@maurolepore
Copy link
Member

allo_evaluate() now output biomass in [kg].

library(tidyverse)
library(fgeo.biomass)

census <- fgeo.biomass::scbi_tree1 %>% dplyr::sample_n(30)
species <- fgeo.biomass::scbi_species
with_equations <- census %>% 
  add_species(species, "scbi") %>% 
  allo_find()
#> `sp` now stores Latin species names
#> Joining, by = c("sp", "site")
#> Warning: The input and output datasets have different number of rows:
#> * Input: 30.
#> * Output: 21.
#> Converting `dbh` based on `dbh_unit`.



# Notice this message
allo_evaluate(with_equations)
#> Assuming `dbh` units in [cm] (to convert units see `?measurements::conv_unit()`).
#> `biomass` values are given in [kg].
#> # A tibble: 21 x 12
#>    eqn_type rowid site  sp      dbh equation_id eqn   eqn_source
#>    <chr>    <int> <chr> <chr> <dbl> <chr>       <chr> <chr>     
#>  1 species      7 scbi  frax~  77.3 973c05      2.36~ default   
#>  2 species     26 scbi  liri~  60.6 34fe5a      1.02~ default   
#>  3 genus        9 scbi  cary~  50.1 9c4cc9      10^(~ default   
#>  4 genus       10 scbi  cary~  85.8 9c4cc9      10^(~ default   
#>  5 genus       27 scbi  ulmu~  38.0 455839      2.04~ default   
#>  6 mixed_h~     2 scbi  asim~  26.3 ae65ed      exp(~ default   
#>  7 mixed_h~     5 scbi  asim~  18.6 ae65ed      exp(~ default   
#>  8 mixed_h~    11 scbi  asim~  14.4 ae65ed      exp(~ default   
#>  9 mixed_h~    14 scbi  carp~  18.7 ae65ed      exp(~ default   
#> 10 mixed_h~    18 scbi  asim~  13.6 ae65ed      exp(~ default   
#> # ... with 11 more rows, and 4 more variables: anatomic_relevance <chr>,
#> #   dbh_unit <chr>, bms_unit <chr>, biomass <dbl>

Created on 2019-03-12 by the reprex package (v0.2.1)

@gonzalezeb
Copy link
Contributor Author

I think the units issue is solved now but I just noticed that in the example right above it says: Assuming dbh units in [cm] but scbi.tree1 dbh in mm.
Maybe we need to specify in fgeo.biomass right before Finding the best available allometric-equations that anyone using the package need to ensure that dbh units should be in cm (and give the function to do it)..
what do you think?

@maurolepore
Copy link
Member

maurolepore commented Mar 12, 2019

... Assuming dbh units in [cm] but scbi.tree1 dbh in mm.

Yeah, that's what I imagined as all other ForestGEO sites seem to also record dbh in [mm]. Considering that fgeo.biomass is focuses on ForestGEO sites, What is the reason you prefer to expect [cm]?

@teixeirak
Copy link
Member

I think that fgeo.biomass should ensure that all units are converted to cm before passing to allodb. That is, input to allodb function should be in cm. Allodb can then convert cm to whatever units are used in the equation.

@maurolepore
Copy link
Member

input ... should be in cm

Why expect input data in [cm] instead of [mm]?

I'm sure you have a good reason but I would love to know what it is. I ask because ForestGEO tables record dbh in [mm]. If we expect [mm] most users will be ready to go. If we expect [cm] most users will need to first convert from [cm] to [mm].

... can then convert cm to whatever units are used in the equation.

This is already implemented (although in fgeo.biomass -- not in allodb)

@maurolepore maurolepore moved this from In progress to Needs review in Minimum Viable Product ASAP Mar 13, 2019
@maurolepore maurolepore moved this from Needs review to Done in Minimum Viable Product ASAP Mar 13, 2019
@gonzalezeb
Copy link
Contributor Author

It is true, most forestgeo sites record their dbh in mm, but some in cm. I was only inclined to use cm because it is more universal.

@teixeirak
Copy link
Member

I agree. I suppose it doesn't matter. What's essential is that anyone using the package needs to input the units of their DBH measurements.

@maurolepore
Copy link
Member

Okay, allo_find() gains the argument dbh_unit. For now it defaults to [mm] to favor ForestGEO users. If you still prefer [cm] I can change the default.

We can also default to NULL and force all users to give a value. It's a bit annoying but certainly makes users think more carefully about what they are doing. Let me know if you prefer this option.

library(fgeo.biomass)

census <- dplyr::sample_n(fgeo.biomass::scbi_tree1, 30)
species <- fgeo.biomass::scbi_species

census_species <- census %>% 
  add_species(species, site = "scbi")
#> Adding `site`.
#> Overwriting `sp`; it now stores Latin species names.
#> Adding `rowid`.

with_equations_mm <- allo_find(census_species)
#> Assuming `dbh` data in [mm].
#> Joining, by = c("sp", "site")
#> Converting `dbh` based on `dbh_unit`.

with_equations_cm <- allo_find(census_species, dbh_unit = "cm")
#> Assuming `dbh` data in [cm].
#> Joining, by = c("sp", "site")
#> Converting `dbh` based on `dbh_unit`.

Created on 2019-03-13 by the reprex package (v0.2.1)

@teixeirak
Copy link
Member

I doubt this would be the best place in the code to do it, but it is essential that users enter dbh units at some point. Alternatively, it would be easy to detect automatically which is used (minimum DBH values = 10 (mm) or 1 (cm).

@maurolepore
Copy link
Member

I like the idea of guessing the units (with a message).

forestgeo/fgeo.biomass#20

@teixeirak
Copy link
Member

teixeirak commented Mar 14, 2019 via email

@maurolepore maurolepore moved this from Done to Needs review in Minimum Viable Product ASAP Mar 14, 2019
@maurolepore maurolepore moved this from Needs review to Done in Minimum Viable Product ASAP Mar 14, 2019
@maurolepore
Copy link
Member

See forestgeo/fgeo.biomass#20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

3 participants