Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate equation_id #47

Closed
maurolepore opened this issue Sep 26, 2018 · 4 comments
Closed

Populate equation_id #47

maurolepore opened this issue Sep 26, 2018 · 4 comments

Comments

@maurolepore
Copy link
Member

maurolepore commented Sep 26, 2018

Follows https://github.com/forestgeo/allodb/issues/36#issuecomment-423217920.

Here is a good way to generate random ids in R: ids::random_id().

@maurolepore
Copy link
Member Author

I understand that the unique equations that we want to identify are the unique values of equation_allometry, right? Until we solve this permanently, would it work if I populate equation_id with the values of equation_allometry?

@maurolepore
Copy link
Member Author

After removing duplicated rows from the equations table, I still find that this table has more rows than unique values of equation_allometry (relates to #48 ). Is this expected?

library(allodb)
equations_table <- as_allodb(equations)
# Not the same:
nrow(equations_table)
#> [1] 178
length(unique(equations_table$equation_allometry))
#> [1] 147

Created on 2018-09-26 by the reprex package (v0.2.1)

maurolepore added a commit that referenced this issue Sep 27, 2018
* This reduces the size of the data. But still does not normalize �the `equations` table because there are more rows than unique values of `equation_allometry` (#47).
@gonzalezeb
Copy link
Contributor

@maurolepore Maybe we need the equation_id before I start to test the splitted tables. It is too complicated right now, specially if I want to use the same equation for multiple sites.

I was thinking the equation_id could be something like first letter of genus+first letter of sp+number..for example, there are 4 equations for Acer rubrum, we could use acru_001, acru_002, acru_003, acru_004. I think equation_id should not be too long (ie. a 14 digit random number!).. so maybe the time stamp idea work best..

what do you think?

@maurolepore
Copy link
Member Author

maurolepore commented Oct 1, 2018

@gonzalezeb, I have updated the .csv database (233ef4c). Now the equations table has equation_ids (see). I used random ids of 6 characters. Next time you add a new equation you could pick an id from data-raw/available_random_ids.csv (here). I avoided ids that use info from other columns because if that info changes then the ids would be missleading. Take my choice as a suggestion. I'm happy to change the approach if you prefer something different.

Closing now but feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants