-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add package maintenance tracker #41
Comments
Wade uses the maintenance element consistently. I will take the task of looking at the W3D, whether he stores this in Metabase or it just comes straight from GCE Toolbox. I looked at the ERD and did not find maintenance. Regardless, I will suggest it use the updateFrequency that is already stored in pkg_mgmt to avoid redundancy. Maybe we should compose a generic way of saying the changeHistory, something like "metadata only update for revision (N)", or "time series update for revision (N)", ... have a few canned examples. I have not looked how other LTER sites do this (aside from Wade). |
Using updateFrequency would jive with ESIP's recommendations on data citations, as in Maslanik, J. and J. Stroeve. 1999, updated daily. Near-Real-Time DMSP SSMIS Daily Polar Gridded Sea Ice Concentrations, Version 1. NASA National Snow and Ice Data Center Distributed Active Archive Center. https://doi.org/10.5067/U8C09DWVX9LM. Accessed 2019-02-14. |
Here's the KNB controlled vocab on what's allowed as a
|
@atn38 I have struggled a bit to fit the descriptive change histories we have into the eml ChangeHistory module, as there is not always an easy description of oldValue, changeScope, etc, which are required. So, mostly I have been stuffing it into just the maintenance description. But, we have a long history of changes to our datasets which range from 'appended on 2018 data [initials, date], to 'appended on 2018 data, removed 2014 data from plots x,y,z due to protocol deviations used in those years' to 'changed temperature units to celsius' to 'updated metadata to correct units and definitions, data remains unchanged'. Perhaps that gives you a flavor for the janitorial work that likely accompanies many long-term dataset. oldValue in those cases is not a scalar or anything that is easy to fill out without diffing the files -- which anyone very concerned with exactly what changed should certainly do! nonetheless, I think the verbal descriptions are helpful in case someone wants to know whether the change means they really ought to redo their analyses or not. I would possibly lean towards just 'date', 'bywhom', 'description', and then use code to parse that into the /dataset/meantenance/description element? |
Precisely what @scelmendorf wrote. To answer @atn38 's question, I do not add a maintenance note with every revision if it is just routine, such as adding a year's data to a timeseries. But if it were automated, or even semi-automated, I would, as I think it would reassure the data user. On rare occasion, we find and correct errors in the data. I always note those. (And also email everyone who registered a download, which the pasta users miss out on.) Also thank you @atn38 for looking in pkg_mgmt for updateFrequency. I thought it was there but I was wrong. Let's put it in metabase then, with no redundancy problem. |
Too bad Here's a draft CREATE TABLE
If we have this table populated like below; the empty cell could be a routine update like you said @gastil.
It could be resolved via R code (I wrote a MRE to create this) to this EML snippet:
Thoughts? Note that |
It helps to work from this example. I have not used the description element that way. Here is an example what I put in the description element:
In your example, it concatenates each revision note. I am not sure that is ideal. Although, by using the para element you can separate them. Some time series datasets of mine have revisions into the 50's. The example dataset I have open today is in its 37th revision, with 9 changeHistory trees. And MCR is a relatively young site, started in 2005. I use the changeHistory element. I butcher its intended use when it does not apply. Note you can put 'na' for a value. Here are some examples:
As for confusion over updateFrequency, that is the expected data timeseries update frequency, not the metadata maintenance like adding ORCiDs, cksums, typos, redesigning tables, taxonomic updates, etc. I could be ok with updateFrequency being in DataSet. It is 1-1 with dataset and does not change, or very rarely changes. |
Probably too late in the eml 2.2 release train to release the requirement on changeHistory/oldValue and changeHistory/changeScope @mobb |
Margaret is traveling but I am 99.9% sure there cannot be any schema changes at this point. The EML dev crew assured folks they could count on the schema for 2.2 to be stable. Only documentation is left to finish. So we can submit an enhancement request for EML2.3. Here is my earlier example in table format, with an earlier one included. I see I did not always note which revision. I should have. I will enter 'na' for those.
I suggest we always use YYYY-MM-DD, the ISO 8601 date format. And I use the LTER Network user IDs, which we used to call "lno_uid" when lno was LTER Network Office, before your time. Those take the form flast, sometimes with a number appended. I do not know if that format is still in use. You may have different user ID formats. |
@gastil I really like using "metadata", "data, or "metadata and data" for So now if we have this table:
Feed it through R code to get this EML. No place to put in names, but we can concatenate into
I am merely subtracting 1 from the current revision number to get |
@atn38 there is no oldScope. I think you meant oldValue. Im not sure the previous revision number is meaningful if we make a practice of including the current revision number in the comment. Although I am leery of using such an undefined field as comment for structured content, we do not have good options until EML enhances the maintenance section. I know many IMs do not use maintenance at all, some may not even realize it exists. It is not one of the parts of EML that receive attention. Our local catalog does not even display it; portal does, if it is included in an EML doc. And yet, if I were a data user of a timeseries, I know Id rely on maintenance to give me a heads up. Because time series data is not so uniform year-to-year as some assume. Real stuff happens. Some examples of stuff that happened with one of our core time series are viewable here, under the 'maintenance' section (near bottom of page). https://portal.lternet.edu/nis/metadataviewer?packageid=knb-lter-mcr.4.37 |
@gastil, it is |
Yes. I wish I had consistently included the revision number in all my changeHistory entries. Let's make a practice of that. Actually, if we can define a format or template for how we compose our comment field then in a later version of EML maybe we can parse and re-enter that content with a script. Or, at minimum, just converge on how we enter the info into the EML. We can store the info in metabase by fields more specific than EML currently offers. Back to @atn38 's draft of the table, I will add a column for changeScope and the PK and FKs:
Would we want to put a CHECK on ChangeScope to be in (data, metadata, data and metadata)? I put 3 columns in the PK to allow a separate entry for a data and a metadata entry for the same revision. Do you think that is overkill? |
What do you all think about putting the maintenance content into the pkg_mgmt schema? It occurs to me that the content feels related. |
Agree that the content feels related, but to play devil's advocate: I think maintenance content fits in Note that the way the views and R code are written right now, populating |
closing since #44 and subsequent edits have implemented this feature |
Per request from @scelmendorf and @gastil
Need to add way to track what was done in maintenance updates to data packages.
The text was updated successfully, but these errors were encountered: