Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Datasets getDownloadSize API endpoint to support file search criteria and deaccessioned datasets #10014

Merged
merged 5 commits into from Oct 19, 2023

Conversation

GPortas
Copy link
Contributor

@GPortas GPortas commented Oct 16, 2023

What this PR does / why we need it:

Extended the getDownloadSize endpoint (/api/datasets/{id}/versions/{versionId}/downloadsize), including the following new features:

  • The endpoint now accepts a new boolean optional query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned dataset versions when searching for versions to obtain the file total download size.

  • The endpoint now supports filtering by criteria. In particular, it accepts the following optional criteria query parameters:

    • contentType
    • accessStatus
    • categoryName
    • tabularTagName
    • searchText

Which issue(s) this PR closes:

Closes #9995

Special notes for your reviewer:

This extension is required for the SPA to display deaccessioned dataset versions.

Suggestions on how to test this:

Create a dataset.

Upload files to the dataset and test the endpoint via curl.

This is a call which already worked before this PR. For ignoring the tabular original sizes:

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "http://localhost:8080/api/datasets/24/versions/1.0/downloadsize?mode=Archival"

Add one of the new criteria options, for example "contentType".

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "http://localhost:8080/api/datasets/24/versions/1.0/downloadsize?mode=Archival&contentType=image/png"

Here we are using contentType=image/png. Replace the value with the kind of content type you are looking for.

Deaccession the dataset and ensure you can obtain the same information as before when sending an API key linked to a user with permissions and specifying includeDeaccessioned=true.

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "http://localhost:8080/api/datasets/24/versions/1.0/downloadsize?mode=Archival&contentType=image/png&includeDeaccessioned=true"

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No

Is there a release notes update needed for this change?:

Yes

Additional documentation:

N/A

@GPortas GPortas added the SPA These changes are required for the Dataverse SPA label Oct 16, 2023
@GPortas GPortas added this to Ready for Review ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) via automation Oct 16, 2023
@GPortas GPortas moved this from Ready for Review ⏩ to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Oct 16, 2023
@GPortas GPortas self-assigned this Oct 16, 2023
@github-actions

This comment has been minimized.

@GPortas GPortas changed the title Extend Datasets getDownloadSize API endpoint to support file search criteria Extend Datasets getDownloadSize API endpoint to support file search criteria and deaccessioned datasets Oct 16, 2023
@github-actions

This comment has been minimized.

@GPortas GPortas marked this pull request as ready for review October 16, 2023 12:21
@GPortas GPortas moved this from IQSS Team - In Progress 💻 to Ready for Review ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Oct 16, 2023
@GPortas GPortas removed their assignment Oct 16, 2023
@github-actions

This comment has been minimized.

2 similar comments
@github-actions

This comment has been minimized.

@github-actions
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9995-file-total-download-size-criteria
ghcr.io/gdcc/configbaker:9995-file-total-download-size-criteria

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@sekmiller sekmiller self-assigned this Oct 16, 2023
@sekmiller sekmiller moved this from Ready for Review ⏩ to In Review 🔎 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Oct 16, 2023
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?tabularTagName=Survey"

Content type filtering is also optionally supported. To return the size of all files available for download matching the requested content type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to list some or all of the possible content types here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many different content types (https://github.com/IQSS/dataverse/blob/develop/src/main/java/propertyFiles/MimeTypeFacets.properties).

Being something so extensive and variable, I'm not sure if it would be really useful.

Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Just a question about adding more to the doc and a reminder to get the latest from dev

@kcondon
Copy link
Contributor

kcondon commented Oct 18, 2023

@GPortas @don I'm seeing build error on Jenkins:
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] SiteMapUtilTest.testUpdateSiteMap:107 Unexpected exception thrown: java.io.FileNotFoundException: /tmp/junit4920710667909896148/docroot/sitemap/sitemap.xml (No such file or directory)
[INFO]
[ERROR] Tests run: 1542, Failures: 1, Errors: 0, Skipped: 32
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 51.555 s
[INFO] Finished at: 2023-10-18T17:25:12-04:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.1.0:test (default-test) on project dataverse: There are test failures.
[ERROR]
[ERROR] Please refer to /home/worker/workspace/IQSS_Dataverse_Internal/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Build step 'Execute shell' marked build as failure
[PostBuildScript] - [INFO] Executing post build scripts.
[PostBuildScript] - [INFO] Build does not have any of the results [SUCCESS]. Did not execute build step #0.
[PostBuildScript] - [INFO] Build does not have any of the results [SUCCESS]. Did not execute build step #1.
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
An attempt to send an e-mail to empty list of recipients, ignored.
Finished: FAILURE

@GPortas
Copy link
Contributor Author

GPortas commented Oct 19, 2023

@kcondon @donsizemore

I don't see any errors in actions or my localhost. It seems more like a problem with how the directories are configured in Jenkins (?) Since it tries to access the file in the /tmp directory and cannot find it.

Also, the component under test is not related to the changes in this PR.

@kcondon
Copy link
Contributor

kcondon commented Oct 19, 2023

@GPortas @donsizemore It's building now, thanks.

@kcondon kcondon merged commit ab231ff into develop Oct 19, 2023
10 of 11 checks passed
IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from QA ✅ to Done 🚀 Oct 19, 2023
@kcondon kcondon deleted the 9995-file-total-download-size-criteria branch October 19, 2023 14:21
@pdurbin pdurbin added this to the 6.1 milestone Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SPA These changes are required for the Dataverse SPA
Projects
No open projects
4 participants