Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Provider: some OAI-PMH responses are not parseable by XOAI due to XML binding conflicts #141

Open
eduardorep opened this issue Mar 31, 2023 · 13 comments
Labels
enhancement New feature or request question Further information is requested service-provider Related to Service Provider implementation

Comments

@eduardorep
Copy link

Hello, I'd like to know if by any chance you faced this issue DSpace#67 and if so if you resolved it. or have any knowledge that might help us resolve it.

@pdurbin
Copy link
Member

pdurbin commented Mar 31, 2023

@eduardorep hi! Do you have a URL we can harvest from to test this?

@eduardorep
Copy link
Author

@poikilotherm
Copy link
Member

poikilotherm commented Mar 31, 2023

Hi @eduardorep just to make sure I got this right: we are talking about the service provider here, using it to harvest a resource, aye?

I tried to read up on the other issues and it seems like you want this to parse just fine, right? Have you tried using our new service provider yet, as it already has an updated version of Woodstox, which might change things already?

@eduardorep
Copy link
Author

Haven't tried yet because we were trying to understand if this would solve our issues. Since using this lib would bring breaking changes I was just trying to understand if this issue had been tackled explicitly. But it seems like your suggestion might be a viable option, thank you very much we will likely try it :) Have a nice one!

@poikilotherm
Copy link
Member

Please feel free to come back anytime! This is probably something that would affect Dataverse Installations round the world. Fixing this would definitely be in scope!

@jfeio
Copy link

jfeio commented Apr 3, 2023

Hey, so we upgraded our XOAI to use this fork, so that we could test out whether the issue described in DSpace#67 is happening, and I'm afraid it does.

For the records listed in the following response:

https://doaj.org/oai?verb=ListRecords&metadataPrefix=oai_dc&setSpec=TENDOkRlcm1hdG9sb2d5

Processing returns the error "The prefix xsi for attribute xsi schemaLocation associated with an element type oai_dc dc is not bound."

This issue seems to be caused by the fact that the namespace "xmlns:xsi" is only defined in the root OAI-PMH element, and not in each oai_dc:dc element.

While this issue is ultimately caused by a non-compliance of the OAI-PMH specification from DOAJs' part, it would be great if the XOAI parser was able to be configured to ignore namespace errors, or to add namespaces that were defined in the root element on any invalid nodes.

However, I believe this would be a complicated change, and would probably not be relevant for Dataverse. Do correct me if I'm wrong however :)

@pdurbin
Copy link
Member

pdurbin commented Apr 3, 2023

@eduardorep @jfeio are you aware of other systems besides DOAJ that are out of compliance with the spec in this way? I'm wondering how common of a problem this is.

Are either of you interested in creating a pull request? (If so, before you start, I'd like to hear what @landreev and @poikilotherm think.)

@eduardorep
Copy link
Author

eduardorep commented Apr 3, 2023

Yes there is another one, ScieloBR: DOAJ/doaj#2186 (comment)

Their website: https://www.scielo.br/

An example of a list record from that repository: https://oaipmh.scielo.org/br/oai?verb=ListRecords&metadataPrefix=oai_dc

Hope this helps!

@poikilotherm
Copy link
Member

poikilotherm commented Jun 30, 2023

Hi @eduardorep and @jfeio !

I looked into this again today and put some thought into it. Dataverse does not always have this problem you describe, as we are not using the record parser in this project, but a custom one.

In the data provider, we had kind of a similar problem: we create some XML files already and wanted to "just include them" in the response. So maybe the same trick would be useful here, too? Would you benefit from using such a CopyElement that would simply transfer the content inside <metadata></metadata> unprocessed?

It would be part of the resulting Record's Metadata. From there, you could make it write to some String or whatever using an XmlWriter.

In terms of configuration when to go this or the other way, the Context you provide to the ServiceProvider can hold the information about your choices here.

@poikilotherm poikilotherm changed the title Problem with XOAI metadata harvesting Service Provider: some OAI-PMH responses are not parseable by XOAI due to XML binding conflicts Jun 30, 2023
@poikilotherm poikilotherm added bug Something isn't working enhancement New feature or request service-provider Related to Service Provider implementation labels Jun 30, 2023
@poikilotherm poikilotherm added question Further information is requested and removed bug Something isn't working labels Jun 30, 2023
@poikilotherm
Copy link
Member

@eduardorep @jfeio please also feel free to join us on Zulip to discuss this less async. Here's an invite link, see you on the dev channel!

@jfeio
Copy link

jfeio commented Jun 30, 2023

Hi @poikilotherm! We actually ended up creating a fork of DSpace/xoai, and we adapted the record parser so that it detects whether any given element contains the "xsi" property without declaring its namespace; if this is true, the parser adds the missing declaration to the offending element before validating it.

This solution is not as generic as the solution you are implementing, but for our purposes, it works fine ;)

@poikilotherm
Copy link
Member

Feel free to point me to your implementation or create a pull request. Always happy to add sth like this - the less forks to maintain the better.

@pdurbin
Copy link
Member

pdurbin commented Jul 12, 2023

@jfeio hi! I'm also curious about your implementation. Is the commit online?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested service-provider Related to Service Provider implementation
Projects
None yet
Development

No branches or pull requests

4 participants