Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix redirects to the original repository for "Generic OAI Archive"-type Harvesting Clients #10254

Closed
landreev opened this issue Jan 23, 2024 · 3 comments · Fixed by #10430
Closed
Labels
Feature: Harvesting Size: 10 A percentage of a sprint. 7 hours.
Milestone

Comments

@landreev
Copy link
Contributor

landreev commented Jan 23, 2024

This is to address the problems discovered while resolving #7624.

At the moment redirects to the original repository source (when our user clicks on the search card for a harvested record) are broken for ALL datasets harvested from "Generic OAI Archives"; i.e., the OAI sources that are not other Dataverse instances. What's supposed to happen is a redirect to the persistent id resolver (oai or handle), which should further redirect to the archival source. Instead it is bombing and causing an internal error in our code. This is because the identifier is processed using the new PidUtil.parseAsGlobalID() method that does some complicated things which specific provider, out of potentially multiple providers configured locally, owns it... things that are not necessary when it comes to harvested records. We should not call that method, but instead reverse to the default resolvers that were used before; [EDIT: NO, it doesn't bomb on account of that parseAsGlobalId method! ... but something is bombing there in that redirect, and it needs to be fixed] and, more importantly, use the custom resolver URLs that are configurable for a harvesting client.

@landreev
Copy link
Contributor Author

landreev commented Jan 23, 2024

Reproducing steps for the issue 1. above locally:

  1. in the dashboard, create a harvesting client to harvest from https://demo.dataverse.org/oai, pick the set controlTestSet and oai_dc for the format. You can harvest into any existing collection; may be easier to create one just for the test. Leave it unscheduled, and make sure to select Generic OAI archive in step 4.
  2. Run the harvest. You should get 7 records total.
  3. Click on the title in the search card for any of the above.

@landreev
Copy link
Contributor Author

One positive development is that there is a 1-line fix for the issue 1., and I'm planning to incorporate it into the 6.1 deployment patch on our prod. Meaning, the redirects are going to start working for harvested records in collections like https://dataverse.harvard.edu/dataverse/srda_harvested once 6.1 is deployed - whenever that happens.

landreev added a commit that referenced this issue Jan 28, 2024
@DS-INRA DS-INRA added this to ⚠️ Needed/Important in Recherche Data Gouv (formerly Data INRAE) Feb 5, 2024
@cmbz cmbz added the Size: 10 A percentage of a sprint. 7 hours. label Mar 14, 2024
@jp-tosca jp-tosca self-assigned this Mar 14, 2024
@landreev
Copy link
Contributor Author

Edited, removed the second issue, that was not directly related to the broken redirects, created a new issue for it: #10429
Will make a quick PR for the broken redirects, since a branch with a fix already exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Harvesting Size: 10 A percentage of a sprint. 7 hours.
5 participants