Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

namespace conflict introduced when importing/exporting EML generated under older schema #347

Open
RobLBaker opened this issue Jan 4, 2023 · 0 comments

Comments

@RobLBaker
Copy link

RobLBaker commented Jan 4, 2023

I ran across this interesting issue with an older EML file. I downloaded the file, imported it using EML::read_eml() and then wrote it back to .xml using EML::write_eml(). The result was a corrupted eml file with conflicts in the namespace that nevertheless passes the EML::eml_validate() validation check:

I downloaded a data package from EDI: https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-and.4780.4

The file knb-lter-and.4780.4.xml is an EML formatted file. Upon download, the initial eml tag in knb-lter-and.4780.4.xml looks like so:

<eml:eml xmlns:ds="eml://ecoinformatics.org/dataset-2.1.1" xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" packageId="knb-lter-and.4780.4" system="https://pasta.edirepository.org/" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 [http://nis.lternet.edu/schemas/EML/eml-2.1.1/eml.xsd">

I then imported to R with EML::read_eml and wrote it back to .xml:

mymeta<-EML::read_eml("knb-lter-and.4780.4.xml", from="xml")
View(mymeta)
EML::write_eml(mymeta, "exportedEML.xml")

And when I open the new "exportedEML.xml" file I see:

<eml:eml xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.2" xmlns:ds="eml://ecoinformatics.org/dataset-2.1.1" packageId="knb-lter-and.4780.4" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 http://nis.lternet.edu/schemas/EML/eml-2.1.1/eml.xsd" system="[https://pasta.edirepository.org">](https://pasta.edirepository.org%22%3E/)

It appears that even though the xmlns:eml attribute is now eml-2.2.0, the schema location (xsi:schemaLocation=) and xmlns:ds both still indicate the original EML 2.1.1.

Both files validate using EML::eml_validate(). I assume this is because the EML package does not actually use the namespace within the EML file to identify the schema to validate against but instead has that namespace hardcoded in elsewhere.

I understand it is possible to tell EML to switch between schema versions, but I still think this qualifies as a potential bug. I can see users generating an EML file under one schema and (perhaps years later) updating it under a second schema. In that scenario, this namespace conflict is easily introduced. If the default it to update everything to the latest schema, that should be done consistently.

On a side note, it would be nice to preserve the evolution of an EML file if it is edited under multiple different schemas during it's lifetime (for instance as a data package is incrementally added to and versioned). But I think there is likely a better place to systematically implement that version history than the eml namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant