-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EML parser passes records which are missing customUnit definitions #355
Comments
Tests are now failing for the EML parser and validator and need to be fixed.
I can verify this for both EML 2.1.1 that @mobb sent, as well as for EML 2.2.0 documents. The issue seems to arise from customUnit using a different XPath selector for its references than other key/keyref pairs in the document, and so unit validation was skipped in the new EMLValidator class. The original EMLParser class (now Deprecated) configuration included the following key/keyref pair check for units: <key name="unitIdentifierKey">
<selector xpath="//unitList/unit"/>
<field xpath="@id"/>
</key>
<keyref name="unitReference" refer="unitIdentifierKey">
<selector xpath="//unit/customUnit"/>
<field xpath="."/>
</keyref> This was not implemented in |
I commited a fix for this to the In the process, I found several erroneous test documents in the test suite, which are now also fixed in the branch. I updated the validation rules documentation to reflect this additional rule which got lost in translation from the previous release. The validation algorithm for units is now: // If a customUnit is referenced in a document, it must have a
// corresponding STMML unit definition in the document with a matching
// @id
ArrayList<String> unitIdentifiers = getXPathValues("//*[local-name() = 'unitList']/*[local-name() = 'unit']/@id");
ArrayList<String> unitReferences = getXPathValues("//unit/customUnit");
for (String s : unitReferences) {
if (!unitIdentifiers.contains(s)) {
errors.add("Invalid: Custom unit definition missing: " + s);
isValid = false;
}
} @cboettig and @amoeba and @mobb Please review and let me know if that looks correct to you. If so, we'll need to be sure to add a corresponding validation check to This set of changes does not change the EML schema, but it does affect any system that might have done a validation check before accepting content, such as Metacat. @taojing2002 will need to incorporate a new built version of the eml.jar into Metacat, and we'll need to redeploy the online parser on the KNB. We will also need to redeploy the EML site documentation to reflect the updated validation documentation. Comments appreciated. |
This fix looks like it will catch docs with missing unit definitions. HOWEVER: I am pretty sure that the previous parse behavior was to match the IMO, |
Hey @mobb thanks for the feedback. I accidentally started with |
@taojing2002 When this upgrade is applied to Metacat on our repositories and for DataONE, it may find that documents that previously passed the EMLValidator no longer do so.... which concerns me, and we need a strategy. It would be interesting to find out which documents fail before we deploy it widely (which we might be able to do with a modified quality check with @gothub that runs the new validator on all DataONE content). |
Sam Csik found this example of a doc missing a few custom unit defs that passed Metacat's validation so I'll add it here https://search.dataone.org/view/doi:10.18739/A29882N5H. |
The EML parser: https://knb.ecoinformatics.org/emlparser/parse
no longer traps datasets which are missing their customUnit definitions in metadata. I.e., this passes the parser (no additionalMetadata section)
:
Test doc is uploaded (with extension .txt for github)
edi.9.0_2.1.1_custom_unit_missing.txt
The text was updated successfully, but these errors were encountered: