Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to set up or complete a harvesting client through the API crashes Dataverse #8290

Closed
tjouneau opened this issue Dec 8, 2021 · 1 comment · Fixed by #9174
Closed
Assignees
Labels
Feature: Harvesting NIH OTA DC Grant: The Harvard Dataverse repository: A generalist repository integrated with a Data Commons NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... pm.epic.nih_harvesting pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues pm.GREI-d-1.4.2 NIH, yr1, aim4, task2: Create working group on packaging standards
Milestone

Comments

@tjouneau
Copy link

tjouneau commented Dec 8, 2021

What steps does it take to reproduce the issue?

  • When does this issue occur?
    While trying to work around the ListSets command not working in the GUI (see The "ListSets" command fails during the creation of a harvesting client for Zenodo  #8289 )...
    I tried to first create a harvesting client (here zenodo_lmops).
    I retrieved the JSON representation through
    curl -H X-Dataverse-key:$API_TOKEN -X PUT -H "Content-Type: application/json" $SERVER_URL/api/harvest/clients/zenodo_lmops
    and then tried to update it with the PUT command.
    curl -H X-Dataverse-key:$API_TOKEN -X PUT -H "Content-Type: application/json" $SERVER_URL/api/harvest/clients/zenodo_lmops --upload-file client.json

where client.json is like this (I removed only the informations about the last harvests) :


{
    "nickName": "zenodo_lmops",
    "dataverseAlias": "lmops",
    "type": "oai",
    "harvestUrl": "https://zenodo.org/oai2d",
    "archiveUrl": "https://zenodo.org",
    "archiveDescription": "Moissonné depuis la collection LMOPS de l'entrepôt Zenodo. En cliquant sur ce jeu de données, vous serez redirigé vers Zenodo.",
    "metadataFormat": "oai_dc",
    "set": "user-lmops",
    "schedule": "none",
    "status": "inActive",
  }

The answer to the curl command was as follows and Dataverse / Payara went down.

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at 
 root@localhost to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>

The server.log file did not show anything particularly relevant, just stopping at :

[2021-12-02T09:00:30.635+0100] [Payara 5.2020] [INFO] [] [edu.harvard.iq.dataverse.api.HarvestingClients] [tid: _ThreadID=89 _ThreadName=http-thread-pool::jk-connector(1)] [timeMillis: 1638432030635] [levelValue: 800] [[
  retrieved Harvesting Client zenodo_lmops with the GetHarvestingClient command.]]
  • Which page(s) does it occurs on?

  • What happens?

  • To whom does it occur (all users, curators, superusers)?

  • What did you expect to happen?
    I was

Which version of Dataverse are you using?

Any related open or closed issues to this bug report?

@mreekie mreekie added pm.epic.nih_harvesting NIH OTA DC Grant: The Harvard Dataverse repository: A generalist repository integrated with a Data Commons labels May 9, 2022
@mreekie mreekie added NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... and removed NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... labels Oct 25, 2022
@landreev landreev moved this from This Sprint 🏃‍♀️ to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Nov 14, 2022
@landreev landreev self-assigned this Nov 14, 2022
landreev added a commit that referenced this issue Nov 17, 2022
landreev added a commit that referenced this issue Nov 17, 2022
landreev added a commit that referenced this issue Nov 17, 2022
landreev added a commit that referenced this issue Nov 18, 2022
landreev added a commit that referenced this issue Nov 18, 2022
landreev added a commit that referenced this issue Nov 28, 2022
landreev added a commit that referenced this issue Nov 28, 2022
landreev added a commit that referenced this issue Nov 28, 2022
pdurbin added a commit that referenced this issue Nov 29, 2022
@landreev
Copy link
Contributor

landreev commented Dec 1, 2022

This was discussed and the decision was made to keep the Create/Edit/Delete APIs superuser-only. (as implemented, a user with edit permission on the host collection was allowed to create and modify clients).
From the slack discussion:

kcondon: Would like help on sorting out behavior of harvesting client api. As tested, it allows collection admins to create and modify harvest clients, just not delete them. In the ui only super users can do this. Is this what we want? A significant possible downside, without additional coding, is that two collections harvesting from the same source/set would collide and potentially get partial lists, since a dataset can only exist once in the app.

landreev: I can confirm that it’s implemented like this ⬆️ on purpose. But have no recollection of why. (it’s implemented on the command level; but in the ui only the superusers can get to the harvest dashboard)
Kevin has a point - it’s a bit strange. IMO, this is a bit out of scope - but it’s not too much effort to make the api superuser only. We were wondering if anyone else has thoughts, etc.
The rationale may have been as simple as “we allow people to add linked content to collections, why not allow them to harvest also…” But Kevin’s argument - what if 2 diff. collections decide to harvest from the same remote archive? - does show that it’s impractical.

pdurbin: I’m fine with superuser only for all operations.

Julian: I agree about making the endpoints superuser only. But does super-user only endpoints conflict with the user story? If all three endpoints are made superuser only, will someone want to create a new issue about letting non-superusers manage harvesting clients?

landreev: The more I think about it, the less I can think of any practical value of letting non-superusers create and/or mess with harvesting clients. And, to be clear, “superuser-only” here means that it’ll stay under /api/harvest/clients; so somebody with a superuser api token - like you - would be able to use it remotely; it’s not going to be a localhost-only api)
But, for non-superuser, collection-level admins it looks like the scenario should be: if they want some content harvested and show up in their collection, they should ask support/superuser admin to set up the harvest and get that content; and then they can link it into their collection if they so desire. If anyone else wants these harvested datasets to show in their collection, they can link them too. Avoiding the mess of 2 different collection trying to harvest the same archive (and both getting only parts of it; or maybe the one that harvests earlier in the day getting all the datasets, etc.)

@pdurbin pdurbin added this to the 5.13 milestone Dec 2, 2022
@mreekie mreekie added pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues pm.GREI-d-1.4.2 NIH, yr1, aim4, task2: Create working group on packaging standards labels Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Harvesting NIH OTA DC Grant: The Harvard Dataverse repository: A generalist repository integrated with a Data Commons NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... pm.epic.nih_harvesting pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues pm.GREI-d-1.4.2 NIH, yr1, aim4, task2: Create working group on packaging standards
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants