New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove dependency of eml_validate on schemaLocation #53
Conversation
Stemming from discussion in ropensci#44, we wanted to change the behavior of `eml_validate` to no longer validate by guessing the schema to validate by parsing the `xsi:schemaLocation` value on the root. Now, `eml_validate` determines the schema file to validate with in a more XML-aware way using a combination of the full QName of the root element and any namespaces defined on the root. See eml_validate for two new helpers: `find_real_root_name` and `guess_root_schema`. These are really workarounds for a limitation of `xml2` that may be removed in the future. Basically, `xml2` can't give us the full QName of the root element so we can't know which namespace the root is from. To work around thsi, we parse the document with regular expressions (hacky but not horrible) to find the QName and go from there.
As a note, I ran R CMD CHECK on emld and all is good. I also ran the tests on |
👏 👏 🎉 Thanks for this @amoeba. Now I recall running into the QName problem when trying to get validation to work initially. Still surprised this is missing from the Minor thing, but can you just maybe add a test file that doesn't specify a It's probably a subject for a separate PR and maybe not mission critical enough, but I think that with this fix some of the logic in test-validate.R round trip checks could be simplified. Most importantly, I think there are some test files that had been skipped for wrongly throwing validation errors, and it would be good to get them un-skipped. I think the first step there is probably extending the |
@amoeba also can you maybe give a concise statement of the issue into NEWS.md? Of course can link the PR and related issues. |
Sure thing. Good call on this: The validator still works but the roundtrip test breaks. Should be a quick fix.
Yep! I'll try to wrap this today. |
Also adds a pattern in the roundtrip tests to let us validate and roundtrip docs that are supposed to be invalid (negative tests). Ref ropensci#53
Removes the dep on schemaLocation in the test suite I missed earlier. Ref ropensci#53
27dac4a
to
b6da367
Compare
All ready for a look. See NEWS.md in this PR for a description of the changes. To add tests for docs without We'll have to have a look in the future about cleaning up the validation tests as we've been discussing here and on Slack. |
Awesome, this looks really good to me, and glad to have these issues addressed, even given the limitations with xml2 bindings. I'm happy to merge this and prepare a CRAN update. @mbjones let me know if you want to give this a code review first? |
Last night I read the source for the |
Refactors find_real_root_name and guess_root_schema in light of me figuring out how to actually use `xml2::xml_name`. Much simpler now. Goes with ropensci#52. Also removes guess_ns helper from test-validate.R in favor of `find_real_root_name` which is now an internal helper.
A lot simpler now: c746ad5. |
I just read through and this all sounds like an improvement. I don't have cycles to review this week, so I'd say go for it and we can always tweak it later. I think getting this out will be an improvement that everyone will benefit from. So, 👍 from me @cboettig On a side note, I still don't see why we need to determine the QName of the root node at all to get validation to work. I'll need to look more carefully at the code, but it seems to me that we shouldn't have to tell
So we should be able to add a So, a bit more to discuss here, but maybe we should do that under a new issue? |
Thanks @mbjones. Re: |
@amoeba @mbjones Thanks both for the feedback! Merging this now but please do open another issue to track the above discussion. Sounds like it might be worth a bug report against Unrelated, but you can see |
Stemming from discussion in #44, we wanted to change the behavior of
eml_validate
to no longer validate by guessing the schema to validate by parsing thexsi:schemaLocation
value on the root. Now,eml_validate
determines the schema file to validate with in a more XML-aware way using a combination of the full QName of the root element and any namespaces defined on the root.See eml_validate for two new helpers:
find_real_root_name
andguess_root_schema
. These are really workarounds for a limitation ofxml2
that may be removed in the future. Basically,xml2
can't give us the full QName of the root element so we can't know which namespace the root is from. To work around thsi, we parse the document with regular expressions (hacky but not horrible) to find the QName and go from there.Closes #52