-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate allodb and bmss #1
Comments
This post brakes down the process of calculating biomass from a table of allometric equations (that we provide) and a table with dbh measurements (that the user provides). This is to more clearly show what I think it is missing from our table of equations. |
The parameters a, b, d, or d are in our master table. These are the coefficients used for the regression equations compiled from the original publications. We originally though that we could replace these parameters in the actual equation, i.e, instead of having aDBH^b on the equation column we will have 4.13741DBH*1.08876. If you thinks is best, I will modify the data.Rmd so those columns will be included on the equation table. |
Great! Yes, I agree that So we still need to do the conversion, right? I'll have a look to see how big is the challenge to do this in R. Do you have any other suggestion? I recently learned about OpenRefine -- which might be good to keep in mind. |
Yes, we still need to the conversion. Maybe by ‘concatenating’ the needed columns using paste? I will give it a try.
From: Mauro Lepore [mailto:notifications@github.com]
Sent: Monday, March 19, 2018 3:48 PM
To: forestgeo/allodb <allodb@noreply.github.com>
Cc: Gonzalez, Erika B. <GonzalezEB@si.edu>; Mention <mention@noreply.github.com>
Subject: Re: [forestgeo/allodb] Integrate allodb and bmss (#36)
Great! Yes, I agree that 4.13741DBH*1.08876 is better than aDBH^b.
So we still need to do the conversion, right? I'll have a look to see how big is the challenge to do this in R.
Do you have any other suggestion? I recently learned about OpenRefine<http://openrefine.org/> -- which might be good to keep in mind.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://github.com/forestgeo/allodb/issues/36#issuecomment-374346340>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AZoLNHqrKb8czjvJjtmqFfA7A1RYm39mks5tgAtsgaJpZM4SwRL4>.
|
Good news, a quick try shows that we can replace the parameters a-d. This relies on matching strings so we have to be careful to avoid mismatches. For example, if we replace a by A from apple we get Apple. But if we replace a by A in america we get AmericA . But I still wonder where to get some other parameters. All of this is in section 2 of this document |
I fixed the "other parameters" which were actually part of the original equations. I rewrote equations that needed to take a more general form (ie, ln(BST)=a+bln(DBH) changed to exp(a+bln(DBH).. |
Great! OK, then I'll soon rerun the replacement -- which should result in equations that are only a function of |
There are now 16 unique equations but notice that 11 and 12 have 2 extra variables (WD and Bk) that are embedded within the formula.
|
Now, also notice that some equations are a function of DBA (diameter at base) or BA (basal area). Calculating BA is a step that needs to happen before the actual biomass calculation. Valentine, already wrote a code to:
|
Mmm this brings too many questions. There are tiny details that can easily lead to confusion and result in a completely wrong result. I think we should meet one day and work on this together -- with excel, R or whatever -- until we clean all equations. Then you can continue to collect new equations and format them consistently with what we achieve in our meeting. What do you think? If OK, when would it work for you? For our records, here I list que questions that come to my mind right away.
# literal copy paste
exp(a+bln(DBH))
# I think that by `bln` you mean `b * ln(...).
# Here I use parenthesis to explicitely show precedence rules:
exp( a + ( b * ln(DBH) ) )
# Is this the same?
ln_dbh <- ln(DBH)
b_times_ln_dbh <- b * ln_dbh
exp(a + b_times_ln_dbh)
|
I run distinct(master, equation) again and got the correct equations: 1 a*(DBH^2)^b
2 a*(DBH^b)
3 a+b*DBH+c*(DBH^d)
4 a*(DBA^b)
5 exp(a+b*ln(DBH))
6 exp(a+b*DBH+c*(ln(DBH^d)))
7 10^a+b*(log10(DBH^c))
8 a+b*DBH
9 a+b*BA
10 exp(a+b*ln(DBA))
11 exp(a+(b*ln(DBH)))*419.814*1.22
12 exp(a+(b*ln(DBH)))*645.704*1.05
13 10^a*DBH^b
14 a+(b*DBH)+c*(DBH^2)+d*(DBH^3)
15 NA
16 exp(a+(b*(ln(pi*DBH))))
17 exp(a+b*(DBH/DBH+c)) But I will have to check on the precedent rules.... |
BA and DBA are not the same, we will need to reconsider strategies. |
Great! This addresses almost all my concerns. Here are the few things that I still need to ask you or think about:
DBH <- 2
c <- 1
# Notice how parentheses produce different results
DBH / DBH + c
#> [1] 2
DBH / (DBH + c)
#> [1] 0.6666667 My updates notes are in section 2 here. |
Function to calculate basal area: https://forestgeo.github.io/fgeo.abundance/reference/basal_area.html |
I incorporated coefficients a-d in equations (in a 'temporal" column called equation_final), however I still need to work on more changes, i.e., if original equations used no-metric units then we will need to convert to metric. Another issue to consider when calculating biomass. |
Summary and moving forwardI believe that most of the comments above have been addressed except the this:
@gonzalezeb, I'm particularly interested in the item 3. Is this one of the things you would like to talk about during my visit to SCBI? Or have we already talked about this? Do you want to chat and refresh my mind before my visit to SCBI? |
@maurolepore, |
Here I attempt to answer my own question. My conclusion is that we will have all tables linked once we populate the column
LINK VIA SPECIES Users provide In turn, LINK VIA SITE Users may provide a sting of text of length-1 giving the name of the site where the data comes from. (We may vectorize over this argument to allow multiple sites -- but let's worry about that later). That string would populate a new column |
Yes, equation_id is the link between tables, see description here. But, yes, I haven't populate the equation_id. That's part of our conversation next week, as many more sites and equations need to be populated in the allodb_master. On another note, I like the idea of LINK VIA SPECIES because that open the use of allodb not just to ForestGEO sites but to anyone who want to use it, after selecting a region (for example).. There is currently a limitation with species codes: for few sites, which species list I got from ForestGEO website I don't have a code, I will need to contact PI's for imput. I have so much work to do! |
We can discuss this in person, but I write one idea to clarify my thinking and as a reminder. I think we need a system for assigning
|
@teixeirak and all, I had expected to resume work on this today but I'll start tomorrow. I had to wrap up a few things that I really needed to get out from my head and into code. I'll keep you posted. |
Follows issue #58 PR ropensci/allodb#61. Today I set things up in allodb. Most importantly, I updated tests and drafted this report to capture my progress. Tomorrow I'll be working on bmss to adapt the code to work by default with tables from allodb (instead of the dummy tables I had created for testing purposes). |
Today I wrote some funcitons to compute biomass with data from SCBI and site-level equations from allodb. For an example see README. At least for now, the code lives in allodb instead of bmss because it mostly restructures and combines data from allodb. The logic that bmss has is not yet available. Instead, the code follows a simple path to computing biomass. Tomorrow I'll revisit this code with a fresh brain, and will test it more. Then I'll think what's next. |
Today I wrote code that starts integrating allodb and bmss:
This is still work in progress but what's important is that I'm now able to reuse the logic Sean, Gabriel, and I developed a while ago. That logic will surely change but it's a good starting point. See this updated README to see an example for equations at the species-level and for equations at all-levels. The code now lives in allodb but will likely move to bmss -- which will make allodb a very independent package (right now it's not). |
Today I made the code more flexible and simple that what it used to be in bmss. You can see an example in README. The most important change is not exposed to the user -- it is the ability to order a list of dataframes by index or element-name and then reduce the list to a single dataframe, where each row overwrites the others of lower order-priority. In practice, this allows us to match the user-data with equations of different types, then let the user decide what type of equation overwrites which other type. The result is simpler logic and greater flexibility. library(allodb)
library(dplyr)
prio <- list(
prio1 = tibble(rowid = 1:1, x = "prio1"),
prio2 = tibble(rowid = 1:2, x = "prio2"),
prio3 = tibble(rowid = 1:3, x = "prio3")
)
rowbind_inorder(prio)
#> # A tibble: 3 x 2
#> rowid x
#> <int> <chr>
#> 1 1 prio1
#> 2 2 prio2
#> 3 3 prio3
# 2 overwrites over 1; 3 is dropped
rowbind_inorder(prio, c(2, 1))
#> # A tibble: 2 x 2
#> rowid x
#> <int> <chr>
#> 1 1 prio2
#> 2 2 prio2 What does face the user is a summary of the available equations of each type -- in the form of a nested dataframe. This is what it looks like: eqn <- get_equations(census_species)
eqn
#> # A tibble: 5 x 2
#> eqn_type data
#> <chr> <list>
#> 1 species <tibble [8,930 x 8]>
#> 2 genus <tibble [5,642 x 8]>
#> 3 mixed_hardwood <tibble [5,516 x 8]>
#> 4 family <tibble [10,141 x 8]>
#> 5 woody_species <tibble [0 x 8]> Then it's up to the user to use the default priority order or change it. default_order <- order = c(
"species",
"genus",
"family",
"mixed_hardwood",
"woody_species"
)
pick_best_equations(eqn, order = default_order)
#> # A tibble: 30,229 x 8
#> rowid site sp dbh equation_id eqn eqn_source eqn_type
#> <int> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 4 scbi nyssa syl~ 135 8da09d 1.5416 * ~ default species
#> 2 21 scbi liriodend~ 232. 34fe5a 1.0259 * ~ default species
#> 3 29 scbi acer rubr~ 326. 7c72ed exp(4.589~ default species
#> 4 38 scbi fraxinus ~ 42.8 0edaff 0.1634 * ~ default species
#> 5 72 scbi acer rubr~ 289. 7c72ed exp(4.589~ default species
#> 6 77 scbi quercus a~ 636. 07dba7 1.5647 * ~ default species
#> 7 79 scbi tilia ame~ 475 3f99ba 1.4416 * ~ default species
#> 8 79 scbi tilia ame~ 475 76d19b 0.004884 ~ default species
#> 9 84 scbi fraxinus ~ 170. 0edaff 0.1634 * ~ default species
#> 10 89 scbi fagus gra~ 27.2 74186d 2.0394 * ~ default species
#> # ... with 30,219 more rows There are also a few other convenient functions. I'll tidy this up, move the code out of allodb and park the project for a bit until you give some feedback. I'll let you know. |
This is great! What I can't see (not sure if it is incorporated) is the issue about units #42 . But now I realize I didnt include a "conversion factor column" in the equation table that would finally tackle the problem.. Also, I am doing the tedious exercise of checking equations "by hand" and I am making some changes to correct estimates (for example eq |
You're right, #42 and other issues are still undone. What I've done so far is basic and exploratory. But -- after some polishing -- I will have set the road for the rest to come. |
Today I pre-released allodb 0.0.0.9004.
Now allodb is light weight again (although it depends on two packages that may be later removed), and focused on hosting tables -- not on computing with those tables. I moved code from allodb to a new package fgeo.biomass. I deprecated the old bmss package. It remains as a private repository of ideas but the implementation of those idea in fgeo.biomass is now totally different -- simpler and more flexible. Some issues will gradually move from allodb to fgeo.biomass. |
@gonzalezeb and @teixeirak, Today I pre-released fgeo.biomass 0.0.0.9000. With this I finish this iteration of the integration between allodb and fgeo.biomass. There is still a lot to do and I already plan some improvements. But before I continue it would be great to get feedback from you and whoever you want to share this work with. |
Closing because this issue is unfocused. We may later extract the bits we need and follow up. |
@gonzalezeb,
Where in the table should the code look for the parameters in the column
equation
?It is clear that
DBH
is a measurement that the user must provide for each stem. But is it not clear where the other parameters come from. Should we give them in theequations
table? Should the user get them from somewhere else and feed them into our code?For example, where should the code get
a
from? Orb
, ord
? Also, is there a lookup table to know what each of those parameters mean?The text was updated successfully, but these errors were encountered: