Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRDB import: count records with consecutive Study_midyear as single plot #183

Closed
teixeirak opened this issue May 14, 2019 · 15 comments
Closed

Comments

@teixeirak
Copy link
Member

@ValentineHerr, in cases where Study_midyear increases consecutively and Age stays the same, we have what the code thinks are separate plots when in fact we're dealing with a multi-year study at the same plot (example: Schanis). Please modify these such that we only get one plot, with age defined based on the first study date.

Please alert me if this causes unexpected problems. I think we've been through this issue before and decided to do it the way its currently done, but now that I'm looking at the import I'm realizing that this creates a lot of unwanted false separation of plots.

@ValentineHerr
Copy link
Member

I wrote the code to deal with that. I don't know yet what problem it can cause (I need to roll back all changes in ForC and run SRDB_import.R again to figure it out), so leaving this open for now.

@ValentineHerr
Copy link
Member

I found a tricky example related to this.

This is data from site "Oak Ridge"
Columns indicates the stand.age, rows the study_midyear.
Numbers in the grid are the number of record in SRDB.

image

We, as human, can see that stand with age 10 was samped in years 1997-2001 but the stand age was not updated along those years. Other plots seem to be independents. and stand age should be kept to what they are.

I don't know how to code this to handle all potential cases.... quickly enough (to be done before I leave tonight)

@ValentineHerr
Copy link
Member

ValentineHerr commented May 17, 2019

This is a file with all situations (273 sites) that my code is not addressing. Not all of them need to be fixed but most. (It is easier to look at the file in Excel)
All other cases (where there is only one stand age involved - 204 sites) are fixed.

@teixeirak
Copy link
Member Author

Got it. This is a tricky one to deal with. Some will obviously require manual review, but I'm wondering if there's any way to facilitate that/ code away some of the more common problems.

@teixeirak
Copy link
Member Author

teixeirak commented May 17, 2019

Here's the start of a list I'm making as I assess these:

Plots that show up on this list but are fine:

a. Examples of legitimately independent plots, but handled correctly::

  • Northern Limestone Alps
  • Buckingham-20
  • BOREAS-SOBS (one younger and one older site, where the older site was re-measured multiple consecutive years)
  • Lavigne-warm (just one plot, where the older site was re-measured multiple consecutive years)

b. those where measurements were made in only one year:

  • Yellowstone NP
  • Vielsalm

c. cases where all plots are from the same study (I don't think you'd ever get slightly inconsistent ages from within the same study).

Examples that seem like they'd be solved by counting as the same plot if date is only 1/2 year off:

  • BOREAS NSA-OBS

  • Coweeta-WS6

  • Coweeta-WS13

  • Coweeta-WS17

  • Coweeta-WS18

  • solution: floor year in plot name (implemented)

Those where 1+ plots has NAC for stand age:

Those that are legitimately complicated/ will require review of original pubs:

@teixeirak
Copy link
Member Author

The 1/2 year issue seems like it may be easily solveable, and seems to be affecting a good number. Sometimes it would be solved by rounding up, other times by rounding down.

@teixeirak
Copy link
Member Author

There are a number where measurements were made in only one year. If all records are from the same study, they can be counted as legitimately separate plots and removed from the list.

@teixeirak
Copy link
Member Author

When one of the ages is NAC, its a likely duplicate with another plot but requires manual review. These should be imported as is and will need to be reviewed later. Please make a list of these.

@ValentineHerr
Copy link
Member

Can you show me an example where rounding 1/2 year would solve the problem?
And how to automatically decide if it is supposed to be rounded up or down?
I don't see the trick! :-)

@teixeirak
Copy link
Member Author

Actually, this isn't dealing with consecutive years; rather, it would come in where you decide whether plots were established in the same year. If

Several examples below. In this case, if you count 1971 as the earlier year of measurement, there's a 24 year difference between the first and second measurements, and also a 24 year difference in stand age. One way to solve this (that would actually be better for plot names) would be to floor estimates of year established. We'd rarely say, "stand established in 1940.5" or "stand is 70.5" years old. This false precision is essentially what's causing a lot of these problems.

Coweeta-WS6 ____________________________
,29,5
1971.5,0,1
1995,1,0

Coweeta-WS13 ____________________________
,33,9
1971.5,0,1
1995,1,0

Coweeta-WS17 ____________________________
,15,39
1971.5,1,0
1995,0,1

Coweeta-WS18 ____________________________
,44,68
1971.5,1,0
1995,0,1

@ValentineHerr
Copy link
Member

ok, I'll use floor(Study_midyear) when attributing plot.name

@teixeirak
Copy link
Member Author

@ValentineHerr, I've updated the list above. Could you please try implementing those (hopefully all pretty easy) and see how the list looks?

@ValentineHerr
Copy link
Member

ValentineHerr commented Jun 6, 2019

Site names are now listed in
sites_where_age_have_been_changed
sites_with_NAC_and_at_least_one_other_age
sites_legitimately_complicated

Note that the lists are smaller than earlier because I was looking at sites for which there was no records of interests before (they only had variables that are not mapped to ForC)

@teixeirak
Copy link
Member Author

Thanks. This reduces the number of sites that need review to a manageable number.

@teixeirak
Copy link
Member Author

I just found an example where a single plots is falsely counted as multiple plots. It is "Kurth_2014_isrr research site in Minnesota USA", "Temperate Deciduous Forest. Stand established around 1950/1951/1952". The same study had plots of age 0-1-2 (measured in 3 consecutive years) that are treated correctly. In SRDB, age 60 is given to all 3 instances of the older plot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants