Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add the ability to configure a Harvesting Client to add a custom header to OAI calls #9231

Closed
landreev opened this issue Dec 14, 2022 · 7 comments · Fixed by #9310
Assignees
Labels
pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues pm.GREI-d-1.4.2 NIH, yr1, aim4, task2: Create working group on packaging standards Size: 80 A percentage of a sprint. 56 hours.
Milestone

Comments

@landreev
Copy link
Contributor

(creating this as a followup to an email discussion; making this a NIH GREI deliverable has been mentioned)

This is a request from/for a specific collaboration - integration of Dataverse with Data Monitor. The idea is that an admin will be able to configure a Harvesting Client with some entry(-ies) to be added as extra HTTP headers when making requests to the OAI archive in question. Specifically in their use case this extra header will contain a special token issued by the remote archive (DM), authorizing the client to receive some extra content (that, I'm assuming, they don't want to be fully public).

It feels like this would be something very doable. On the Dataverse side, I'm seeing this as another entry on the Harvesting Client config form ("Custom Headers"), and another entry in the json format that /api/harvest/clients understands. We'll add another column to the clients db table to store these entries.

The only remotely tricky part here is that it will require a PR into gdcc/xoai as well - because that's where the http requests are cooked. Should still be straightforward.

@mreekie mreekie added NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... Size: 80 A percentage of a sprint. 56 hours. labels Jan 11, 2023
@mreekie
Copy link

mreekie commented Jan 11, 2023

This spans XOAI and dataverse.
Thse are opensource harvesting libraries that dataverse uses.
This introduces some uncertainty into the sizing.
Size this as a time bounded to a sprint so size 80.
We expect that we will be working both sides of the fence.
THere will be some coordination with Oliver and we may need some help from him to accomplish this.

At the end of that time we can resize or break out new issues.
At a minumimum this will out put 2 PRs.
One to DV and one to XOAI

@mreekie mreekie added this to This Sprint 🏃‍♀️ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) via automation Jan 11, 2023
@landreev landreev moved this from This Sprint 🏃‍♀️ to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jan 12, 2023
@landreev
Copy link
Contributor Author

@poikilotherm Hi, I started working on this and I was curious if I could implement it without needing to make any changes on the gdcc.xoai side. I was able to do that, but not in a particularly pretty way. Of course I may have missed something I could have used in your OaiClient implementation to make it prettier. Otherwise, if making it any prettier does require making a PR into gdcc.xoai, I'm happy to do that. So I just want to consult with you quickly on the best way to proceed, if you have a moment.

Your io.gdcc.xoai.serviceprovider.client.JdkHttpOaiClient allows me to pass my own jdk HttpClient.Builder, so in theory I should be able to have it instantiated with a client that is configured to my liking. Unfortunately, I wasn't able to figure out how to make it build it so that it would add a specified header to every request, I'm ashamed to admit. I know how to make it happen with the Apache HttpClient - it literally only takes 2 lines. But, having spent some time reading their docs and crawling through stackoverflow, I wasn't able to find a similarly straightforward way to do that with java.net.http.HttpClient. (am I missing something obvious? - probably).

The xoai JdkHttpOaiClient class is final. So, as a quick experiment I created my own copy of it, with just the extra mechanism for adding custom headers in the execute() methods. I quickly got everything to work, it's pretty straightforward. (https://github.com/IQSS/dataverse/blob/9231-extra-headers-oai-requests/src/main/java/edu/harvard/iq/dataverse/harvest/client/oai/CustomJdkHttpXoaiClient.java). But this is not ideal, obviously, to have this class in the Dataverse tree.

Is there a better solution than checking in my changes above into the XOAI JdkHttpOaiClient? - I figured you may be able to suggest something cleaner.

Thank you, as always!

@poikilotherm
Copy link
Contributor

poikilotherm commented Jan 16, 2023

Basically I see three options:

  1. Let DV have its own implementation of OaiClient (like the one you linked above)
  2. Make JdkHttpXoaiClient non-final and extend in DV codebase
  3. Add custom headers functionality to upstream

I see no reason why this custom header thing wouldn't be a good addition to the basic client in XOAI package. Do you want to create an issue and a pull request?

@landreev
Copy link
Contributor Author

Thank you. I would go with 2. I don't think it is super important for the xoai client class to be final; and it would keep things simple on both ends.
Today is a holiday here, but I will create an issue and put together a PR first thing tomorrow.

@landreev
Copy link
Contributor Author

(That said, 3. is not a bad option either, if you think it would not be a bad thing to have in the library; I'll think about it some more until tomorrow).

@landreev
Copy link
Contributor Author

Made a PR in the xoai project, gdcc/xoai/pull/119.

landreev added a commit that referenced this issue Jan 23, 2023
@scolapasta scolapasta moved this from IQSS Team - In Progress 💻 to Ready for Review in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jan 23, 2023
landreev added a commit that referenced this issue Jan 23, 2023
… change in behavior in the latest gdcc.xoai - that I knew, but had forgotten about over the weekend. (#9231)
@landreev
Copy link
Contributor Author

Let me rebuild and retest with the latest xoai jars real quick.

landreev added a commit that referenced this issue Jan 26, 2023
@pdurbin pdurbin added this to the 5.13 milestone Jan 26, 2023
@mreekie mreekie added pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues pm.GREI-d-1.4.2 NIH, yr1, aim4, task2: Create working group on packaging standards labels Mar 20, 2023
@mreekie mreekie removed the NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... label Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues pm.GREI-d-1.4.2 NIH, yr1, aim4, task2: Create working group on packaging standards Size: 80 A percentage of a sprint. 56 hours.
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants