-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for multiple missing value codes in set_attributes #268
Comments
Hi An You are correct that Here is a MRE:
IMO adding support for this to Hope that helps! |
Hi An, thanks for filing this issue! Re:
I agree with @jeanetteclark here. Does @jeanetteclark 's "manual" solution work for you, or would writer a helper function for your group work? I could definitely help with that if needed. |
@jeanetteclark I think this would be a great addition to the EML package. We need more of these helpers that make it easier to construct common patterns. Your MRE illustrates how much one needs to know about the EML schema to add the metadata to the EML document. Can you explain why you think this should be a separate function rather than an addition to set_attributes? Seems like this could be syntactically handled as: attributes <- data.frame(attributeName = 'length_1',
attributeDefinition = 'def1',
measurementScale = 'ratio',
domain = 'numericDomain',
unit = 'meter',
numberType = 'real',
stringsAsFactors = FALSE,
missingValueCode = list(c(code = "A", codeExplanation = "exp 1"),
c(code = "B", codeExplanation = "exp 2"))) Would that work? Probably also nice if it accepted a non-list version as well in case someone only had to add one missing value code (so just |
@mbjones the code you included above generates a somewhat garbled I like the idea of adding a helper function ( Another option would be to add support to Happy to hear arguments for the other side and write the function :) |
Good point. And I forgot that as.data.frame converts the list to columns, which is indeed garbled (lists can be columns, but they have to be added separately I guess). I like your proposal to be consistent with how enumeratedDomain is handled with a separate |
Yeah in thinking about it more - adding the argument to |
@amoeba: I'm working on a workflow/package to take a set of data.frames imported from PostgreSQL views (which will come from a metadata database schema designed for LTER sites), then use rEML under the hood to insert info from these dfs into appropriate slots, then validate and write EML docs. The workflow is meant to be reused with different datasets at different sites. @jeanetteclark thank you for the MRE. Good to know that it's possible. I'd be interested in a general solution however, since I'm not writing custom EML-generation scripts for each dataset. I like the idea of an additional argument to |
+1 for @mbjones suggestion above (updating set_attributes to have a missingValueCodes argument and workflow paralleling that of the "factors" argument that currently populates codes/definitions for enumeratedDomain) |
I submitted a PR with the addition of this feature for @cboettig and @amoeba to review - thanks for the suggestion @atn38 and @scelmendorf! |
@jeanetteclark, thank you for the unbelievably quick feature add! Let me know if I got the behavior correct: (1) for a given attribute, the If above is true, then it'll be easier for the LTER metabase users to transition! Edit: did some testing, (1) is true but (2) is not. |
Hi @atn38 - yes you have the behavior correct. In my testing, both of your listed cases are true. Have you set the columns in your The code below shows both of your cases:
I'll submit a separate issue to improve the documentation and error checking for correct column names in the |
I should get into the habit of MREs! Usually lots of moving pieces in workflow. And yes -- more documentation for The second case I'm concerned with is more like this:
Interesting behavior here with Also note how two
Hopefully this makes sense! |
Ah yes, thanks for that example. That definitely is not what we want! I'll look into it |
okay @atn38 I made the function smarter to handle these cases. Would you mind installing the package from my fork to see if it works as you expect now?
|
Just installed in a fresh project with packrat enabled -- so I'm pretty
sure it's your fork -- any way to check definitively? Nothing seems to have
changed when above MRE is run. Still returns the `attribute 'length_3' has
missing value codes set in both the 'attributes' and 'missingValues'
data.frames.
Using codes from 'missingValues' data.frame` error and `length_3`
has two incorrect elements.
…On Thu, Apr 18, 2019 at 12:34 PM Jeanette Clark ***@***.***> wrote:
okay @atn38 <https://github.com/atn38> I made the function smarter to
handle these cases. Would you mind installing the package from my fork to
see if it works as you expect now?
devtools::install_github("jeanetteclark/EML")
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#268 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AKAZD5RVRYHLN2ET5DIEIHTPRCWIXANCNFSM4HGJ54VQ>
.
|
@atn38 I just updated the documentation for the |
Works beautifully now -- thank you! |
(I looked through prev. issues + set_attributes.R and found no reference to this. Apologies if feature's already implemented.)
It's important for scientific interpretation that metadata documents explanations for missing values. "data not collected due to field conditions" is different from "no specimens were found". EML spec supports repeated
missingValueCode
elements.If we want to implement this, how should the input to
set_attributes
look like? Most simple might be for themissingValueCode
andmissingValueCodeExplanation
columns in the data.frame supplied toset_attributes
to take two paired comma separated list of strings and parse them into sets ofmissingValueCode
EML elements.folks over at LTER would like to have this feature in a core metadata database design. How this gets done here might bear on how we do this in database schema and then in R workflow to generate EML, which uses this package under the hood.
The text was updated successfully, but these errors were encountered: