Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8339 file data api #9269

Merged
merged 24 commits into from
Jan 25, 2023
Merged

8339 file data api #9269

merged 24 commits into from
Jan 25, 2023

Conversation

sekmiller
Copy link
Contributor

@sekmiller sekmiller commented Jan 9, 2023

What this PR does / why we need it: Adds an api endpoint to get file metadata and info from the datafile in json format

Which issue(s) this PR closes:

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No

Is there a release notes update needed for this change?:

Additional documentation: Doc added to file section of Native API

@coveralls
Copy link

coveralls commented Jan 9, 2023

Coverage Status

Coverage: 19.985% (-0.01%) from 19.999% when pulling 261545a on 8339-file-data-api into be5dfa3 on develop.

@pdurbin pdurbin self-assigned this Jan 11, 2023
@pdurbin pdurbin moved this from Ready for Review to Review 🔎 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jan 11, 2023
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code, docs, and tests are looking good!

I mentioned to @sekmiller that he's using a non-default format for file DOIs but it's completely legit and shorter in the docs so I think it's fine. Others, please speak up if you think it's important the change. Details in the review.

I ran the code locally and the JSON output looks good to me.

@kcondon one addition to the JSON is categories (file tags in the UI), which I didn't test. If you could try exercising this, I'd appreciate it. I guess we could also add an automated test for categories. Please let us know if you want this.

Here's some example JSON that came out of me running the new API test (all other API test are passing in Jenkins, by the way):

{
  "status": "OK",
  "data": {
    "description": "",
    "label": "stata13-auto-withstrls.dta",
    "restricted": false,
    "version": 1,
    "datasetVersionId": 1,
    "dataFile": {
      "id": 4,
      "persistentId": "",
      "pidURL": "",
      "filename": "stata13-auto-withstrls.dta",
      "contentType": "application/x-stata-13",
      "filesize": 8854,
      "description": "",
      "storageIdentifier": "s3://pdurbin:185a14c1549-769c167d9eeb",
      "rootDataFileId": -1,
      "md5": "6ce6be68c8b8bccf08c1b7ba5e924e5a",
      "checksum": {
        "type": "MD5",
        "value": "6ce6be68c8b8bccf08c1b7ba5e924e5a"
      },
      "creationDate": "2023-01-11"
    }
  }
}


.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.

Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB*:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format (10.5072/FK2/J8SJZB) is the :DataFilePIDFormat=INDEPENDENT ( https://guides.dataverse.org/en/5.12.1/installation/config.html#datafilepidformat ) but DEPENDENT (10.5072/FK2/J8SJZB/MLGWJO) is the default. INDEPENDENT is shorter, which is nice for docs, but DEPENDENT is more common in the wild, I assume. I'm fine keeping this as is. I just thought I'd point it out.

IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from Review 🔎 to QA ✅ Jan 11, 2023
@pdurbin pdurbin removed their assignment Jan 11, 2023
@mreekie
Copy link

mreekie commented Jan 11, 2023

size for the next sprint: 10
This is just an API that takes a file id and gets a json represenation of the file.

@mreekie mreekie added the Size: 10 A percentage of a sprint. 7 hours. label Jan 11, 2023
@kcondon kcondon self-assigned this Jan 19, 2023
@kcondon
Copy link
Contributor

kcondon commented Jan 20, 2023

Basic endpoint seems to work ok, json is downloaded.
Noticed a couple things:
[x] 1. The doc could maybe be simplified a bit, just two forms, id and persistentId, with env variable and fully expanded examples for each. Our doc sometimes only shows one form (id or persistentId) and mentions the other. Make sure to add export API_TOKEN to the examples.
[Kevin] Our docs' API style has drifted over time. This doc was modeled after the original and older get dataset json endpoint. I think adding the API_TOKEN improves it. It could be further refined but ok for now, functional.
2. Able to download draft json as guest, unclear how to differentiate between draft and published version. Description and path are file metadata but reside in a dataset version. The current perms always fetches the latest, even if draft and as long as file is published, anyone will get it. So, a published dataset with file but which has a draft dataset where the file's description has changed. If a guest user downloads file json, they get the draft version/description. I think the restricted field would also be a potential issue. I've tracked it down to a missing :draft option to the url. So, it should fetch the latest published version unless the :draft flag is passed. I'm not sure how unpublished/draft should be handled, will test dataset json behavior but maybe needs the :draft flag? [SEK] Unable to recreate "guest" issue. When there's a draft of a published file, the guest user gets the published version - just realized this is after I implemented the draft path parameter - so maybe that fixes what you are seeing.

[Kevin] restricted field value seems to report the published state, even if draft has changed. I think that is how we view restricted as a state -whatever the published version's restricted state is determines state across all versions.
[Kevin] unpublished dataset json can be downloaded without the :draft flag with permitted api token. I suppose the :draft flag isn't needed because there is only one option unlike published with draft. [SEK] with implementation. of draft path parameter the user would see the published version. If they leave off the draft param they will get the draft version only if the dataset is unpublished and they have view unpublished perms.

@kcondon kcondon moved this from QA ✅ to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jan 23, 2023
@kcondon kcondon assigned sekmiller and unassigned kcondon Jan 23, 2023
@sekmiller
Copy link
Contributor Author

I implemented 'draft' and updated the doc. I think this is all that is pending.

@sekmiller sekmiller moved this from IQSS Team - In Progress 💻 to QA ✅ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jan 24, 2023
@sekmiller sekmiller removed their assignment Jan 24, 2023
@kcondon kcondon self-assigned this Jan 24, 2023
@kcondon
Copy link
Contributor

kcondon commented Jan 25, 2023

Still seeing the draft description as a user without view draft perms.

@kcondon kcondon moved this from QA ✅ to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jan 25, 2023
@kcondon kcondon assigned sekmiller and unassigned kcondon Jan 25, 2023
@sekmiller sekmiller moved this from IQSS Team - In Progress 💻 to Review 🔎 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jan 25, 2023
@sekmiller sekmiller removed their assignment Jan 25, 2023
@kcondon kcondon self-assigned this Jan 25, 2023
@kcondon
Copy link
Contributor

kcondon commented Jan 25, 2023

Retested and this works fine now. Ready to merge, waiting for integration tests to complete.

@kcondon kcondon merged commit 0183d41 into develop Jan 25, 2023
IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from QA ✅ to Done 🚀 Jan 25, 2023
@kcondon kcondon deleted the 8339-file-data-api branch January 25, 2023 20:57
@pdurbin pdurbin added this to the 5.13 milestone Jan 26, 2023
@mreekie
Copy link

mreekie commented Jan 26, 2023

sprint kickoff

  • include in QATail for Jan 11, 2023

@mreekie mreekie added Size: SprintTail Issues that have made it into Ready for QA or further right and removed Size: 10 A percentage of a sprint. 7 hours. labels Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: SprintTail Issues that have made it into Ready for QA or further right
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Provide an API for retrieving information about an individual datafile
5 participants