-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generic equations and mechanism to select them #72
Comments
Good! Mostly as a note to myself, some useful code was in |
Agreed. I think we'll want that functionality, but user should be able to enter the command as to how to prioritize equations just once. |
It seems that generic equations are already associated to sites and species. I seems like a contradiciton (how is an equation generic and at the same time it is site and species specific) but it is what I need to match each species and site with an equation, so I don't complain -- just notice this as something that I need to understand better. library(tidyverse)
#> Warning: package 'purrr' was built under R version 3.5.3
allodb::master_tidy() %>%
select(
site,
equation_group,
species,
equation_id,
dependent_variable_biomass_component
) %>%
filter(equation_group == "Generic") %>%
unique()
#> Joining `equations` and `sitespecies` by 'equation_id'; then `sites_info` by 'site'.
#> # A tibble: 232 x 5
#> site equation_group species equation_id dependent_variable_bio~
#> <chr> <chr> <chr> <chr> <chr>
#> 1 lilly d~ Generic Asimina tri~ ae65ed Total aboveground biom~
#> 2 lilly d~ Generic Carpinus ca~ ae65ed Total aboveground biom~
#> 3 lilly d~ Generic Celtis occi~ ae65ed Total aboveground biom~
#> 4 lilly d~ Generic Paulownia t~ ae65ed Total aboveground biom~
#> 5 lilly d~ Generic Rhus typhina ae65ed Total aboveground biom~
#> 6 scbi Generic Ailanthus a~ ae65ed Total aboveground biom~
#> 7 scbi Generic Asimina tri~ ae65ed Total aboveground biom~
#> 8 scbi Generic Berberis th~ ae65ed Total aboveground biom~
#> 9 scbi Generic Carpinus ca~ ae65ed Total aboveground biom~
#> 10 scbi Generic Celtis occi~ ae65ed Total aboveground biom~
#> # ... with 222 more rows Created on 2019-03-21 by the reprex package (v0.2.1) |
Here's the plan that we've worked out: Erika has already reviewed the current species lists for temperate sites and assigned the best available equations, which in some cases are generic. There is value to keeping all of these links to generic equations in the sitespecies table because it indicates (1) that the species is present at the site and (2) the species has been reviewed, and it was determined that the generic equation was the best available option. Thus, what we currently have stays as is. Generic equations can also be applied to any site within a specified region (e.g., temperate North America), including for stems at ForestGEO sites where the DBH is greater than the upper DBH limit of the expert-selected equation. For these, the site species table will contain records with: Note that for species that are specifically assigned a generic equation, the record in the sitespecies table is superfluous from a coding perspective. However, its important data from the perspective of someone who may want to pull up the list of species and associated allometries for a given site, and to disambiguate between species that have never been reviewed (e.g., if a new species shows up at a site) and those that have been reviewed but found to have no specifically appropriate allometry. Does this make sense? If it would be useful, I could add an example to the sitespecies table. |
Just to make sure we are on the same page. I assume that the information that distinguishes "Expert" from "Generic" equations is already encoded in the column library(tidyverse)
#> Warning: package 'purrr' was built under R version 3.5.3
library(allodb)
sitespecies %>%
filter(equation_group == "Expert") %>%
select(equation_group, everything())
#> # A tibble: 540 x 11
#> equation_group site family species species_code life_form equation_id
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Expert Lill~ Sapin~ Acer r~ 316 Tree 7c72ed
#> 2 Expert Lill~ Sapin~ Acer r~ 316 Tree 2060ea
#> 3 Expert Lill~ Sapin~ Acer s~ 318 Tree a4d879
#> 4 Expert Lill~ Rosac~ Amelan~ 356 Tree c59e03
#> 5 Expert Lill~ Rosac~ Amelan~ 356 Tree 96c0af
#> 6 Expert Lill~ Rosac~ Amelan~ 356 Tree 529234
#> 7 Expert Lill~ Jugla~ Carya ~ 409 Tree 9c4cc9
#> 8 Expert Lill~ Jugla~ Carya ~ 402 Tree 9c4cc9
#> 9 Expert Lill~ Jugla~ Carya ~ 403 Tree 9c4cc9
#> 10 Expert Lill~ Jugla~ Carya ~ 407 Tree 9c4cc9
#> # ... with 530 more rows, and 4 more variables: equation_taxa <chr>,
#> # notes_on_species <chr>, wsg_id <chr>, wsg_specificity <chr>
sitespecies %>%
filter(equation_group == "Generic") %>%
select(equation_group, everything())
#> # A tibble: 232 x 11
#> equation_group site family species species_code life_form equation_id
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Generic Lill~ Annon~ Asimin~ 367 Tree ae65ed
#> 2 Generic Lill~ Betul~ Carpin~ 391 Tree ae65ed
#> 3 Generic Lill~ Jugla~ Carya ~ <NA> Tree 1c1ac8
#> 4 Generic Lill~ Canna~ Celtis~ 462 Tree ae65ed
#> 5 Generic Lill~ Fabac~ Cercis~ 471 Tree 7f7777
#> 6 Generic Lill~ Rosac~ Cratae~ 500 Shrub f08fff
#> 7 Generic Lill~ Laura~ Linder~ 609 Shrub f08fff
#> 8 Generic Lill~ Paulo~ Paulow~ 712 Tree ae65ed
#> 9 Generic Lill~ Rosac~ Prunus~ 762 Tree f08fff
#> 10 Generic Lill~ Anaca~ Rhus t~ 899 Shrub ae65ed
#> # ... with 222 more rows, and 4 more variables: equation_taxa <chr>,
#> # notes_on_species <chr>, wsg_id <chr>, wsg_specificity <chr> .
Just noticing that I don't yet see a column encoding region. library(tidyverse)
#> Warning: package 'purrr' was built under R version 3.5.3
library(allodb)
master() %>%
select(matches("region"))
#> Joining `equations` and `sitespecies` by 'equation_id'; then `sites_info` by 'site'.
#> # A tibble: 769 x 0 .
Okay, I'll have to work around missmatches. Remember that this is how the code allocates an equation from allodb to each
If on top of this you expect to use different kinds of generic equations, say "any-temperate", and "any-tropical", then the code needs to somehow know which species are temperate and which are tropical. That could take quite some time to do.
Can you say this again? I don't understand. There seems to be two kinds of generic equations. If this is the case, think if you can design the table in a way that is straight forward to code. The more ambiguity there is in a single variable (one column of a table in allodb encoding more than one thing) the harder it is to write readable and reliable code.
How about you encode whether an equation id should not be used by the code? All I need is a 1-column table of the equations to skip. equations_to_skip
"abc123"
"opq321"
... .
Thanks, an example will certainly help. Best is a minimal, fake example that captures the essense of what you want to convey. |
I'll let this conversation develop a bit to better understand the next actions. For now, I'll drop the generic equaitons that shoudn't get mixed with expert equations (see forestgeo/fgeo.biomass#28) which should immediately improve the accuracy of the |
Please don't drop anything we currently have; its all wanted (i.e., Erika has determined those to be the best options). I'll clarify more when I get a chance. |
By drop I mean exclude from the calculation. I won't touch anything in allodb. Right now, the # Now (incorrect)
rowid site species dbh equaiton equation_group biomass
1 "scbi" "Aaa aaa" 10 dbh * 10 "Generic" 100
1 "scbi" "Aaa aaa" 10 dbh * 10 "expert" 100
---
biomass result = 200
# Soon (correct)
rowid site species dbh equaiton equation_group biomass
1 "scbi" "Aaa aaa" 10 dbh * 10 "expert" 100
---
biomass result = 100
|
No, please don't exclude. The "Generic" equations are still expert-selected (i.e., identified by Erika as the best available). |
Here's an example of what we want (using a real example):
Here's how we want it to work in several cases: |
Here is one example: The row 236 of the user's data has a single tree of The temporary approach I suggest is to forget about generic equations until we can handle them correctly. Here, the code would on the fly drop the row where What do you think? Full reprex with more exampleslibrary(tidyverse)
#> Warning: package 'purrr' was built under R version 3.5.3
library(allodb)
library(fgeo.biomass)
set.seed(1)
census <- fgeo.biomass::scbi_tree1 %>% dplyr::sample_n(1000)
species <- fgeo.biomass::scbi_species
census_species <- census %>% add_species(species, site = "scbi")
#> Adding `site`.
#> Overwriting `sp`; it now stores Latin species names.
#> Adding `rowid`.
bad <- allo_find(census_species)
#> Assuming `dbh` in [mm] (required to find dbh-specific equations).
#> * Searching equations according to site and species.
#> Warning: Can't find equations matching these species:
#> carya sp, quercus prinus, ulmus sp, unidentified unk
#> * Refining equations according to dbh.
#> Warning: Can't find equations for 664 rows (inserting `NA`).
bad %>%
select(
rowid, equation_id,
site,
sp,
dbh,
matches("dbh.*mm$"),
is_generic,
anatomic_relevance
) %>%
add_count(rowid) %>%
filter(n > 1 & rowid %in% c("236", "811", "336")) %>%
select(-n)
#> # A tibble: 6 x 9
#> rowid equation_id site sp dbh dbh_min_mm dbh_max_mm is_generic
#> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl>
#> 1 236 7f7777 scbi robi~ 143. 40 420 TRUE
#> 2 236 333c34 scbi robi~ 143. 142. 259. FALSE
#> 3 336 f08fff scbi prun~ 37.6 30 640 TRUE
#> 4 336 8aecca scbi prun~ 37.6 3.7 68.3 FALSE
#> 5 811 f08fff scbi sass~ 37.3 30 640 TRUE
#> 6 811 2c092b scbi sass~ 37.3 4 84.9 FALSE
#> # ... with 1 more variable: anatomic_relevance <chr> Created on 2019-03-21 by the reprex package (v0.2.1) |
This is a case, and there will be others like it, where there are two equations for different size classes that overlap in dbh range. This relates to issue #17, and ultimately I'd prefer to use the approach described there (i.e., switch equations at the point where they cross). Until that is done, please give precedence to the expert-selected equation. |
Note that, in the long run, it won't be a stable solution to sum equations under the assumption that they describe different biomass components. Rather, we will need to create equations describing how the |
Sorry, I don't understand this, but you say it's now urgent so I'll let it sit for now. |
I've made an issue (#82) to remind us of this later. |
RE your #72 (comment) Awesome! Thanks for taking the time to develop an example. Its a great reminder of the basic logic you expect. It should be clear by now, but I highlight that your comment (#72 (comment)) describes decisions about different trees, whereas my example (#72 (comment)) describes decisions about a single tree. |
Following #73 (comment), here are the newly added generic-equations, and some comments and questions. a. (forestgeo/fgeo.biomass#31) b. Rows 3-10 require no action. They will be handled correctly once (forestgeo/fgeo.biomass#31) is implemented. That is, the code already knows how to find equaitons for each row in a census dataset by matching allodb tables by c. Row 11 is also no problem. That equation will be used for every row of the census data containing "Abies sp." in the census dataset (i.e. when the user's ForestGEO census table has a code in d. Rows 1-2 are problematic. How will the code decide which of the two to use? @teixeirak, is your idea to match by family? i.e. if a tree belongs to Fabaceae match 1, and if it belongs to Junglandaceae match 2? library(allodb)
library(tidyverse)
sitespecies %>%
filter(str_detect(site, "any")) %>%
select(site, equation_group, family, species, equation_id, equation_group)
#> # A tibble: 11 x 5
#> site equation_group family species equation_id
#> <chr> <chr> <chr> <chr> <chr>
#> 1 any temperate NA Generic Fabaceae <NA> 7f7777
#> 2 any temperate NA Generic Juglandaceae <NA> 1c1ac8
#> 3 any temperate NA Generic Pinaceae Abies balsamea 4872ed
#> 4 any temperate NA Generic Pinaceae Abies fraseri 4872ed
#> 5 any temperate NA Generic Pinaceae Abies lasiocar~ 4872ed
#> 6 any temperate NA Generic Pinaceae Abies amabilis 74dd65
#> 7 any temperate NA Generic Pinaceae Abies concolor 74dd65
#> 8 any temperate NA Generic Pinaceae Abies grandis 74dd65
#> 9 any temperate NA Generic Pinaceae Abies magnifica 74dd65
#> 10 any temperate NA Generic Pinaceae Abies procera 74dd65
#> 11 any temperate NA Generic Pinaceae Abies sp. 74dd65 Created on 2019-03-26 by the reprex package (v0.2.1) |
Sorry for the delayed response. Somehow this landed in my junk folder. Regarding (d), yes, the idea is to match by family. But I suppose you need a list of the genera in each family (filled in as in row 11)? |
Thanks! Yes, you are right. The code can't know what species belong to which Family. If this means too much duplicated data, you may pull library(tidyverse)
family <- tribble(
~family, ~species,
"Aea", "A a",
"Aea", "A b",
"Aea", "A c",
)
sitespecies <- tribble(
~species, ~more_columns,
"A a", "whatever",
"A c", "whatever",
)
left_join(sitespecies, family)
#> Joining, by = "species"
#> # A tibble: 2 x 3
#> species more_columns family
#> <chr> <chr> <chr>
#> 1 A a whatever Aea
#> 2 A c whatever Aea Created on 2019-03-27 by the reprex package (v0.2.1) |
I'll leave this up to @gonzalezeb . There will be multiple of instances with these generic equations where we'll need to list the genera in a family. I bet there's some existing resource that we could source. |
Oh, sure, you make me realize that I can find all species in a family from the species table, which the code requires anyway to match codes with speices names. You can disregard my previous message. fgeo.biomass::scbi_species
#> # A tibble: 73 x 10
#> sp Latin Genus Species Family SpeciesID Authority IDLevel syn subsp
#> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <lgl> <lgl>
#> 1 acne Acer~ Acer negundo Sapin~ 1 "" species NA NA
#> 2 acpl Acer~ Acer platan~ Sapin~ 2 "" species NA NA
#> 3 acru Acer~ Acer rubrum Sapin~ 3 "" species NA NA
#> 4 acsp Acer~ Acer sp Sapin~ 4 "" Multip~ NA NA
#> 5 aial Aila~ Aila~ altiss~ Simar~ 5 "" species NA NA
#> 6 amar Amel~ Amel~ arborea Rosac~ 6 "" species NA NA
#> 7 astr Asim~ Asim~ triloba Annon~ 7 "" species NA NA
#> 8 beth Berb~ Berb~ thunbe~ Berbe~ 8 "" species NA NA
#> 9 caca Carp~ Carp~ caroli~ Betul~ 9 "" species NA NA
#> 10 caco Cary~ Carya cordif~ Jugla~ 10 "" species NA NA
#> # ... with 63 more rows Created on 2019-03-27 by the reprex package (v0.2.1) |
@teixeirak, is library(tidyverse)
library(allodb)
sitespecies %>%
filter(str_detect(site, "any")) %>%
select(site, family, species, equation_id)
#> # A tibble: 11 x 4
#> site family species equation_id
#> <chr> <chr> <chr> <chr>
#> 1 any temperate NA Fabaceae <NA> 7f7777
#> 2 any temperate NA Juglandaceae <NA> 1c1ac8
#> 3 any temperate NA Pinaceae Abies balsamea 4872ed
#> 4 any temperate NA Pinaceae Abies fraseri 4872ed
#> 5 any temperate NA Pinaceae Abies lasiocarpa 4872ed
#> 6 any temperate NA Pinaceae Abies amabilis 74dd65
#> 7 any temperate NA Pinaceae Abies concolor 74dd65
#> 8 any temperate NA Pinaceae Abies grandis 74dd65
#> 9 any temperate NA Pinaceae Abies magnifica 74dd65
#> 10 any temperate NA Pinaceae Abies procera 74dd65
#> 11 any temperate NA Pinaceae Abies sp. 74dd65 Created on 2019-03-27 by the reprex package (v0.2.1) |
Assuming I understand your question right, we could replace "NA" with "any". |
Wait. I could get a little bit more specific on the 'species' column for some generic equations. I will need to look for more details in another table in chojnacky publication. |
That's what I did for Abies (to differentiate by wood gravity). I don't think it matters for Fabaceae/Juglandaceae, but I was a bit confused by their categories on that one and would appreciate your review. |
I just closed forestgeo/fgeo.biomass#31 (Convert . |
Note that we may run into some other issues as we continue entering Chojanski equations. In particular, I'm not sure how to deal with the "woodland" category (that will make sense to @gonzalezeb, not @maurolepore ). |
Chojnacky 2014 noted that the woodland equations seemed to predict low biomass values, based likely on errors in the old data they used. As long as we specify this in our publication we can include them. |
Closing because this issue was solved by the new weighting system. |
We need to add the Chojnacky et al. 2014 equations to the
sitespecies
table, both to give users the option to go with the generic option and to handle issue #69.Necessary steps:
1- @gonzalezeb, add the equations to the table. For site, put something like "Any" or "NA". For species, fill in specificity for which equation is designed, (e.g., "Picea spp.).
2- @maurolepore, we'll then need a mechanism in the code to identify and assign these equations. They should be selected when (1) they are are only equation available for the species at a given DBH or (2) when user selects the generic equation option.
The text was updated successfully, but these errors were encountered: