Switching to gdcc/xoai v5.2.0 (9910) #10012
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
This pr contains no code changes in the Dataverse source, but changes the dataverse-parent pom file switching to the newly released v. 5.2 of the gdcc/xoai library. This fixes the problem affecting (some) OAI records with utf8 characters. (see the original issue; the actual problem is described in detail in an issue in the gdcc/xoai repo).
Which issue(s) this PR closes:
Closes #9910
Special notes for your reviewer:
Suggestions on how to test this:
The condition is somewhat tricky to reproduce on purpose. But can be very easily tested using a known affected export and some cheating; I will provide the details.
The bug is triggered when a multi-byte utf8 character is split in 2 at the exact offset of 1024 bytes in the harvestable metadata export. Even if you create a test dataset with all the metadata encoded in some multi-byte utf8 (Chinese, etc.), there will still be a chance that this condition will not be met. It could then be triggered by padding the export with extra characters, but it's easier to cheat as follows.
You need an exported dataset that's part of an OAI set. It helps if the dataset is stored on a filesystem; s3 will work too, but fs is easier.
doi:10.70122/FK2/9TZE9Y
is used below since that was the dataset I used on my local instance.Verify that the dataset is exported, and can be accessed via OAI:
Now replace the cached oai_dc export for the dataset with the attached known problematic export (
export_oai_dc.cached.txt
; adjust the path as needed as well):Verify that the error condition is now met, by looking at the 1024 byte offset:
the end of the record should look like this:
The experiments were carried out on farmer?
- i.e., we get the question mark instead of the fancy utf8 apostrophe in the word "farmer’s".To test the "before" state, in develop branch, look at the OAI record above -
/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=doi:10.70122/FK2/9TZE9Y
in firefox. It should refuse to render the response complaining about invalid xml.The OAI record should be rendered properly once the build from this branch is deployed.
export_oai_dc.cached.txt
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: