Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upRemove dependency of eml_validate on schemaLocation #53
Conversation
Stemming from discussion in #44, we wanted to change the behavior of `eml_validate` to no longer validate by guessing the schema to validate by parsing the `xsi:schemaLocation` value on the root. Now, `eml_validate` determines the schema file to validate with in a more XML-aware way using a combination of the full QName of the root element and any namespaces defined on the root. See eml_validate for two new helpers: `find_real_root_name` and `guess_root_schema`. These are really workarounds for a limitation of `xml2` that may be removed in the future. Basically, `xml2` can't give us the full QName of the root element so we can't know which namespace the root is from. To work around thsi, we parse the document with regular expressions (hacky but not horrible) to find the QName and go from there.
|
As a note, I ran R CMD CHECK on emld and all is good. I also ran the tests on |
|
Thanks for this @amoeba. Now I recall running into the QName problem when trying to get validation to work initially. Still surprised this is missing from the Minor thing, but can you just maybe add a test file that doesn't specify a It's probably a subject for a separate PR and maybe not mission critical enough, but I think that with this fix some of the logic in test-validate.R round trip checks could be simplified. Most importantly, I think there are some test files that had been skipped for wrongly throwing validation errors, and it would be good to get them un-skipped. I think the first step there is probably extending the |
|
@amoeba also can you maybe give a concise statement of the issue into NEWS.md? Of course can link the PR and related issues. |
Sure thing. Good call on this: The validator still works but the roundtrip test breaks. Should be a quick fix.
Yep! I'll try to wrap this today. |
|
All ready for a look. See NEWS.md in this PR for a description of the changes. To add tests for docs without We'll have to have a look in the future about cleaning up the validation tests as we've been discussing here and on Slack. |
|
Awesome, this looks really good to me, and glad to have these issues addressed, even given the limitations with xml2 bindings. I'm happy to merge this and prepare a CRAN update. @mbjones let me know if you want to give this a code review first? |
|
Last night I read the source for the |
Refactors find_real_root_name and guess_root_schema in light of me figuring out how to actually use `xml2::xml_name`. Much simpler now. Goes with #52. Also removes guess_ns helper from test-validate.R in favor of `find_real_root_name` which is now an internal helper.
|
A lot simpler now: c746ad5. |
|
I just read through and this all sounds like an improvement. I don't have cycles to review this week, so I'd say go for it and we can always tweak it later. I think getting this out will be an improvement that everyone will benefit from. So, On a side note, I still don't see why we need to determine the QName of the root node at all to get validation to work. I'll need to look more carefully at the code, but it seems to me that we shouldn't have to tell
So we should be able to add a So, a bit more to discuss here, but maybe we should do that under a new issue? |
|
Thanks @mbjones. Re: |
|
@amoeba @mbjones Thanks both for the feedback! Merging this now but please do open another issue to track the above discussion. Sounds like it might be worth a bug report against Unrelated, but you can see |
Stemming from discussion in #44, we wanted to change the behavior of
eml_validateto no longer validate by guessing the schema to validate by parsing thexsi:schemaLocationvalue on the root. Now,eml_validatedetermines the schema file to validate with in a more XML-aware way using a combination of the full QName of the root element and any namespaces defined on the root.See eml_validate for two new helpers:
find_real_root_nameandguess_root_schema. These are really workarounds for a limitation ofxml2that may be removed in the future. Basically,xml2can't give us the full QName of the root element so we can't know which namespace the root is from. To work around thsi, we parse the document with regular expressions (hacky but not horrible) to find the QName and go from there.Closes #52