-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider changing default xsi:schemaLocation
value to point to a remote copy of the eml.xsd
#44
Comments
Thanks Bryce! I'm for moving to |
LGTM. As @amoeba said, it is an optional attribute, so you could also just leave it out altogether. But your proposal seems good too. |
PR sent. @cboettig take a look when you get a chance? |
@cboettig asked a question on this I don't think we should be doing any string parsing of <?xml version="1.0"?>
<eml:eml
packageId="doi:10.xxxx/eml.1.1" system="knb"
xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<dataset id="dataset-01">
...
</dataset>
</eml> Validation of that structure explicitly starts with validation of the root In contrast, this structure explicitly is rooted at a different root schema, namely <?xml version="1.0"?>
<ns2:dataset id="dataset-01"
xmlns:ns2="https://eml.ecoinformatics.org/eml-dataset-2.2.0">
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
...
</dataset> This is everything one needs to know to know its type, and can be resolved as long as the validator has been configured with a canonical set of schemas for each of the referenced namespaces. This is usually done via an XML Catalog file, but each parser provides multiple mechanisms to pass in namespace to schema-file hashes. If a local schema hasn't been provided for a referenced namespace, an error is rightfully generated. And you might note that the namespace of the String parsing for namespaces is highly error prone, because there are a lot of rules about how to determine the namespace for a given XML element or attribute. At a minimum, you have to consider the prefix and namespace string definitions, the The only piece of information that can always be ignored by an XML validator is |
@mbjones Thanks for this, I agree 💯 on this, which is why I wanted to flag it again before cutting the next release. I think the key issue here is where you observe:
The validator in
(I do wish I had time to dive in here more -- its always been a bit frustrating that I haven't been able to secure funding or academic credit that would let me allocate more time to this! But I do really really appreciate all the excellent pull requests here and on EML package too!) |
Stemming from discussion in ropensci#44, we wanted to change the behavior of `eml_validate` to no longer validate by guessing the schema to validate by parsing the `xsi:schemaLocation` value on the root. Now, `eml_validate` determines the schema file to validate with in a more XML-aware way using a combination of the full QName of the root element and any namespaces defined on the root. See eml_validate for two new helpers: `find_real_root_name` and `guess_root_schema`. These are really workarounds for a limitation of `xml2` that may be removed in the future. Basically, `xml2` can't give us the full QName of the root element so we can't know which namespace the root is from. To work around thsi, we parse the document with regular expressions (hacky but not horrible) to find the QName and go from there.
As discussed in ropensci/EML#292, the current default behavior of
emld
is to set thexsi:schemaLocation
to values likehttps:://ecoinformatics.org/eml-2.2.0/ eml.xsd
which matches the examples in https://github.com/NCEAS/eml but isn't useful for validation tools that follow remote URIs to find a copy of the schema. @scelmendorf had to set a custom value to get validation to work which I'd argue isn't desirable.We should discuss whether it makes sense to set a value like
https://eml.ecoinformatics.org/eml-2.2.0/eml.xsd
instead of justeml.xsd
so, by default and with no tweaking from the user, validation tools like Oxygen or any XML-aware text editors can do validation of documentsEML
/emld
produces.What do others think about this?
The text was updated successfully, but these errors were encountered: