Workflow to ensure users write valid EML? #46

cboettig · 2013-09-04T20:39:42Z

In our early discussions about validation, we agreed it was really just part of the developer testing suite. For a user consuming EML, having the software complain the file isn't valid isn't really helpful, it's best just to give it our best shot anyway. For writing EML, since this is programmatically generated we can assure it is valid ... or can we?

The S4 R objects we use mimic the schema, but they don't enforce required vs optional slots (in fact, all slots are always 'present' in the S4 objects, so an operational definition of "empty" is that the slot has an empty S4 object (recursive) or a length 0 character/numeric/logical string.) A user can create an S4 object and pass it into their EML file (seems like a useful/powerful option to have, particularly for reusing elements). If the object is missing some required elements, this will create invalid EML.

We can avoid this in several ways:

We could write a validation check as part of each S4 method. Rather tedious, this also seems redundant with the schema validation check. On the other hand, this approach provides a nice warning earlier to the advanced user.
We could instead write constructor functions for each object. Also tedious, but allows clear indication of optional and required parameters and can be easier to use than the new constructor. This is the strategy we employ so far, but we still permit pre-built S4 nodes to be passed to some constructors to facilitate reuse (but bypassing the protection regarding required elements).
Run the validator by default on calls to write_eml (would require an internet connection or packaging the schema). If we we check only by validating the final EML file, the user may be at some trouble to find just what they need to change. On the other hand, it is perhaps the surest way to guarantee validity.

The text was updated successfully, but these errors were encountered:

mbjones · 2013-09-05T01:33:29Z

@cboettig One potential source of validation errors that you may not have considered is the use of illegal XML characters in the user input. Before you write out the XML, all illegal XML chanracters need to be escaped. Does your S4 class handle this escaping automatically when moving data in and out of R data structures?

cboettig · 2013-09-05T15:49:10Z

Yes, it looks like the R XML library automatically escapes these characters. (Noticed this somewhat by accident in my example from the README: https://github.com/ropensci/reml/blob/70c1f8b2747515ae32b770007c84c905f1fda3d3/inst/doc/reml_example.xml

I added an html-marked up link for intellectual rights, which you will see escaped there. (How would you suggest that section be marked up properly to include a link to the relevant license? Or should I just stick in the whole license text?

cboettig · 2013-10-15T17:24:02Z

I think we aim for a two-fold strategy: (1) mostly to provide constructor functions for which it is difficult to make invalid EML, while still supporting direct construction for advanced users, and then (2) wrap the validation check in the "publish" functions (with toggle off option), but not wrap it in the regular "write" functions. We also expose the validation function for end-users to run it themselves if they wish. (Re-tagging question as "publish" instead of "write").

To Do:

Add validation check to publish functions

cboettig mentioned this issue Oct 15, 2013

Parse and Validate EML against schema #7

Closed

4 tasks

cboettig closed this as completed in b80fc6f Dec 3, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow to ensure users write valid EML? #46

Workflow to ensure users write valid EML? #46

cboettig commented Sep 4, 2013

mbjones commented Sep 5, 2013

cboettig commented Sep 5, 2013

cboettig commented Oct 15, 2013

Workflow to ensure users write valid EML? #46

Workflow to ensure users write valid EML? #46

Comments

cboettig commented Sep 4, 2013

mbjones commented Sep 5, 2013

cboettig commented Sep 5, 2013

cboettig commented Oct 15, 2013