-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create script to do some basic database checking #23
Comments
@bpbond, you may also be interested in this one. |
Started on this.
Before I go farther @teixeirak is something like this what you're thinking of? See code at https://github.com/bpbond/ForC/blob/qc-script/scripts/checks.R |
From a quick look, yes, this does look like what I have in mind. We will want to generate text or files listing the errors. I'm surprised at the number of errors. When we checked a few months ago, there were none. If you can generate a file listing the errors, I'll figure out what's wrong. I suspect that some very minor changes to plot names (e.g., trailing space) got introduced. |
OK. I will trim white space, implement the other checks on your list, and post the results. |
|
OK, the script now checks all the issues above except the last (and I'll do that in a bit). Should I start listing the issues one by one here for you to check and resolve? |
I don't understand why you crossed out the comment about site names. This does make sense; specifically, I believe that my Matlab script that generates ForC_plots from ForC_history (I will move to public repository soon) introduces this error to sites whose name includes the ' character. |
It would be great if you could program the script to generate files reporting the errors. |
@teixeirak This is (apostrophes in site names) definitely a factor, but not the only one.
Easy, no problem.
Yes yes yes! I'm going to open an issue with a suggestion in this regard. |
I'm going to open separate issues for all the mismatches, OK? Seems like a cleaner way to deal with things. |
yes, thanks |
Note that this isn't currently possible, as the |
@bpbond I am running into errors with https://github.com/forc-db/ForC/blob/master/scripts/checks.R:
I should probably start getting more into Hadley Wickham's packages but I find his system harder to troubleshoot so I'll let you handle that! ;-) It might just be that we need to load another library... |
@ValentineHerr Uh oh, this is probably my bad–I think @teixeirak renamed those columns and I didn't properly update the script (or the changes didn't get pushed to the PR). Apologies. I'll turn to this in a bit and look. |
Fix column name issue found by @ValentineHerr in #23
line 111 is catching up some "meaningless" discrepancy between "regrowth(_prior)" in HIST and "Regrowth" in HISTORY (same with "Disturbance(_prior)" and "Disturbance"). |
Is it just for these two, or should there be a general rule that "xxxx(yyy)" always matches "xxxx"? |
It is just those two. |
> unique(HISTORY$histcat)
[1] "Disturbance" "Regrowth" "Establishment" "No.disturbance" "No.info" "Management"
[7] "Disturbance_prior" "Regrowth_prior"
> unique(HIST$histcat)
[1] "Establishment" "Regrowth(_prior)" "Disturbance(_prior)" "Management" "No.disturbance"
[6] "No.info" So, is this what is wanted?
|
Yes. Disturbance_prior matches Disturbance(_prior), and same for regrowth. |
Should "Fertilization_N", "Fertilization_P" etc. in HISTORY all match "Fertilization_X" in HIST? |
Yes. Sorry I missed that before. |
It would be good to have a script here (public database) that can be run whenever there is a substantial update to the database to check that the structure of the database remains correct and that there are no egregious errors in values.
Here's the start of a list of things to check:
(THIS TABLE NEEDS TO BE COMPLETED BASED ON THE RELATIONSHIP ENTITY DIAGRAM, which needs to be updated.)
Check for records where stands of different age are represented by a single plot (see Check for and correct any records where different-aged stands are identified by a single plot #22)The text was updated successfully, but these errors were encountered: