-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow to ensure users write valid EML? #46
Comments
@cboettig One potential source of validation errors that you may not have considered is the use of illegal XML characters in the user input. Before you write out the XML, all illegal XML chanracters need to be escaped. Does your S4 class handle this escaping automatically when moving data in and out of R data structures? |
Yes, it looks like the R XML library automatically escapes these characters. (Noticed this somewhat by accident in my example from the README: https://github.com/ropensci/reml/blob/70c1f8b2747515ae32b770007c84c905f1fda3d3/inst/doc/reml_example.xml I added an html-marked up link for intellectual rights, which you will see escaped there. (How would you suggest that section be marked up properly to include a link to the relevant license? Or should I just stick in the whole license text? |
I think we aim for a two-fold strategy: (1) mostly to provide constructor functions for which it is difficult to make invalid EML, while still supporting direct construction for advanced users, and then (2) wrap the validation check in the "publish" functions (with toggle off option), but not wrap it in the regular "write" functions. We also expose the validation function for end-users to run it themselves if they wish. (Re-tagging question as "publish" instead of "write"). To Do:
|
In our early discussions about validation, we agreed it was really just part of the developer testing suite. For a user consuming EML, having the software complain the file isn't valid isn't really helpful, it's best just to give it our best shot anyway. For writing EML, since this is programmatically generated we can assure it is valid ... or can we?
The S4 R objects we use mimic the schema, but they don't enforce required vs optional slots (in fact, all slots are always 'present' in the S4 objects, so an operational definition of "empty" is that the slot has an empty S4 object (recursive) or a length 0 character/numeric/logical string.) A user can create an S4 object and pass it into their EML file (seems like a useful/powerful option to have, particularly for reusing elements). If the object is missing some required elements, this will create invalid EML.
We can avoid this in several ways:
new
constructor. This is the strategy we employ so far, but we still permit pre-built S4 nodes to be passed to some constructors to facilitate reuse (but bypassing the protection regarding required elements).The text was updated successfully, but these errors were encountered: