Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globus Bug Fixes #10345

Merged
merged 16 commits into from Mar 13, 2024
Merged

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Feb 27, 2024

Several issues have been discovered in testing the new Globus capabilities in prep for the MOC demo and DCM 2024 meeting:

  • Remote stores:error in creating exports and Dataset getThumbnail() failure - both caused by trying to treat a Dataset storageIdentifier as a path when it isn't like a DataFile one
  • Obsolete/undocumented dataverse-inaccessible setting still used in one place - replaced with the documented files-not-accessible-by-dataverse setting
  • Previewers/Ext. tools showing when files are not dataverse accessible - previewers/file tools now only show if files are accessible or they require auxFileAccess (since aux files are accessible via the backing store).
  • The direct and Globus upload APIs only checked the file size/quota limits after the file transfer had happened - the direct upload API now won't send signedURLs for files that are too big and the Globus API will report both the fileSizeLimit and remainingQuota to the caller. (Since Globus doesn't enforce size limits, we still have to trust that users won't try transferring files that they've been told will be rejected/not added to the dataset.)

PR is draft until the latest round of testing is done.

Which issue(s) this PR closes:

Closes #10344
Closes #10231

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@coveralls
Copy link

coveralls commented Feb 27, 2024

Coverage Status

coverage: 20.585% (+0.01%) from 20.574%
when pulling 3be8cac on GlobalDataverseCommunityConsortium:globusstore
into e26fabe on IQSS:develop.

@qqmyers qqmyers marked this pull request as ready for review March 1, 2024 18:03
@qqmyers qqmyers added the Size: 3 A percentage of a sprint. 2.1 hours. label Mar 1, 2024
@qqmyers qqmyers modified the milestones: 6.1.1, 6.2 Mar 1, 2024
@landreev landreev self-requested a review March 4, 2024 01:37
@landreev landreev self-assigned this Mar 4, 2024
@landreev
Copy link
Contributor

landreev commented Mar 4, 2024

Testing with NESE Demo Tape pool - a managed store with files inaccessible to Dataverse - the following in the log must be coming from the calculateDetails() method in the Globus service bean, so it appears that it attempts to calculate the checksum once the upload is complete. It's not fatal - it gives up after 3 tries and sets the hash to "Not available in Dataverse", as it should. But, is it a problem that it even tries, on a store with files-not-accessible-by-dataverse?

[2024-03-04T15:32:48.367-0500] [Payara 6.2023.8] [SEVERE] [] [] [tid: _ThreadID=1770 _ThreadName=pool-217-thread-1] [timeMillis: 1709584368367] [levelValue: 1000] [[
  java.lang.NullPointerException: Cannot invoke "java.io.InputStream.read(byte[])" because "in" is null
	at edu.harvard.iq.dataverse.util.FileUtil.calculateChecksum(FileUtil.java:735)
	at edu.harvard.iq.dataverse.globus.GlobusServiceBean.calculateDetails(GlobusServiceBean.java:1138)
	at edu.harvard.iq.dataverse.globus.GlobusServiceBean.lambda$calculateDetailsAsync$11(GlobusServiceBean.java:1112)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
]]

[2024-03-04T15:32:48.368-0500] [Payara 6.2023.8] [INFO] [] [edu.harvard.iq.dataverse.globus.GlobusServiceBean] [tid: _ThreadID=1770 _ThreadName=pool-217-thread-1] [timeMillis: 1709584368368] [levelValue: 800] [[
  Cannot invoke "java.io.InputStream.read(byte[])" because "in" is null]]

[2024-03-04T15:32:53.374-0500] [Payara 6.2023.8] [SEVERE] [] [] [tid: _ThreadID=1770 _ThreadName=pool-217-thread-1] [timeMillis: 1709584373374] [levelValue: 1000] [[
  java.lang.NullPointerException: Cannot invoke "java.io.InputStream.read(byte[])" because "in" is null
	at edu.harvard.iq.dataverse.util.FileUtil.calculateChecksum(FileUtil.java:735)
	at edu.harvard.iq.dataverse.globus.GlobusServiceBean.calculateDetails(GlobusServiceBean.java:1138)
	at edu.harvard.iq.dataverse.globus.GlobusServiceBean.lambda$calculateDetailsAsync$11(GlobusServiceBean.java:1112)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
]]

[2024-03-04T15:32:53.374-0500] [Payara 6.2023.8] [INFO] [] [edu.harvard.iq.dataverse.globus.GlobusServiceBean] [tid: _ThreadID=1770 _ThreadName=pool-217-thread-1] [timeMillis: 1709584373374] [levelValue: 800] [[
  Cannot invoke "java.io.InputStream.read(byte[])" because "in" is null]]

[2024-03-04T15:32:58.379-0500] [Payara 6.2023.8] [SEVERE] [] [] [tid: _ThreadID=1770 _ThreadName=pool-217-thread-1] [timeMillis: 1709584378379] [levelValue: 1000] [[
  java.lang.NullPointerException: Cannot invoke "java.io.InputStream.read(byte[])" because "in" is null
	at edu.harvard.iq.dataverse.util.FileUtil.calculateChecksum(FileUtil.java:735)
	at edu.harvard.iq.dataverse.globus.GlobusServiceBean.calculateDetails(GlobusServiceBean.java:1138)
	at edu.harvard.iq.dataverse.globus.GlobusServiceBean.lambda$calculateDetailsAsync$11(GlobusServiceBean.java:1112)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
]]

[2024-03-04T15:32:58.379-0500] [Payara 6.2023.8] [INFO] [] [edu.harvard.iq.dataverse.globus.GlobusServiceBean] [tid: _ThreadID=1770 _ThreadName=pool-217-thread-1] [timeMillis: 1709584378379] [levelValue: 800] [[
  Cannot invoke "java.io.InputStream.read(byte[])" because "in" is null]]

@landreev
Copy link
Contributor

landreev commented Mar 5, 2024

Testing a remote managed store upload workflow, aside from the cast-to-int problem mentioned on slack, the issue persists where the transfer successfully completes on the globus side, but then dataverse fails to save the datafile on its end.
The failure happens inside the /addFiles api, with the following exception stack trace:

java.lang.NullPointerException: Cannot invoke "java.lang.Long.longValue()" because the return value of "edu.harvard.iq.dataverse.FileMetadata.getVersion()" is null
	at edu.harvard.iq.dataverse.util.json.JsonPrinter.json(JsonPrinter.java:672)
	at edu.harvard.iq.dataverse.util.json.JsonPrinter.json(JsonPrinter.java:656)
	at edu.harvard.iq.dataverse.util.json.JsonPrinter.jsonFileMetadatas(JsonPrinter.java:526)
	at edu.harvard.iq.dataverse.util.json.JsonPrinter.jsonDataFileList(JsonPrinter.java:486)
	at edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper.getSuccessResultAsJsonObjectBuilder(AddReplaceFileHelper.java:1904)
	at edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper.addFiles(AddReplaceFileHelper.java:2091)
	at edu.harvard.iq.dataverse.api.Datasets.addFilesToDataset(Datasets.java:4055)
... 

The above is likely a symptom, not the cause of the failure (the exception is thrown in AddReplaceFileHelper.getSuccessResultAsJsonObjectBuilder() - probably called after a failure to save that didn't get detected (?).
Attached is a fine server log output; starting from receiving a confirmation from globus about the completion of the upload.

Looking at the following lines: https://github.com/GlobalDataverseCommunityConsortium/dataverse/blob/6d3769f2a26b70e7108c18b92f39c1838ad8cd28/src/main/java/edu/harvard/iq/dataverse/globus/GlobusServiceBean.java#L820-L852 it looks like when a failure occurs during this /addFiles api call it is not recorded as such (countError not incremented). Also, the edit lock is left on the dataset, because the removeDatasetLock() call is inside the catch() statement, and no exception is thrown in this case.

It's not immediately clear to me from the server log why the actual failure in /addFiles occurs. But has to be something simple - ?

server_log.txt

@poikilotherm
Copy link
Contributor

/push-image thx

Copy link

github-actions bot commented Mar 6, 2024

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:globusstore
ghcr.io/gdcc/configbaker:globusstore

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@landreev
Copy link
Contributor

I have confirmed that the above (datafile not created, dataset stuck with an "edit in progress" lock) is happening consistently not just on my own dev. instance, but also on other servers, like dataverse-internal. Same exception stack trace.
I'm wondering if it's something very straightforward; like some necessary fix missing in the PR branch. Or maybe some JVM option, or setting that I am missing in my config?

@landreev
Copy link
Contributor

Finally occurred to me to check - but it looks like it's not just the Globus uploads!
Direct uploads via DvUploader are similarly broken (same exception). I.e. /addFiles appears to be broken in develop (not just this branch).

This is from beta, which is running the develop branch build:

PROCESSING(F): /Users/landreev/NetBeansProjects/dataverse/test_685.txt
               Does not yet exist on server.
              UPLOADED as: MD5:ab3b010ddaaf87bc09944ad210e5d6eb
CURRENT TOTAL: 1 files :685 bytes
Error response when processing files in Individual files on command line : Bad Request
{"status":"ERROR","message":"Cannot invoke \"java.lang.Long.longValue()\" because the return value of \"edu.harvard.iq.dataverse.FileMetadata.getVersion()\" is null"}

@landreev
Copy link
Contributor

landreev commented Mar 12, 2024

It's really simple too: all it appears to be is that this JsonPrinter line in 6.1:

.add("version", fmd.getVersion())

is working because the JsonObjectBuilder methods are overridden by our own NullSafeJsonBuilder, and in develop we somehow lost that, and the .add methods from the native class are used.

@landreev
Copy link
Contributor

(there was a wrong code line copy-and-pasted in the last comment; corrected)

@landreev
Copy link
Contributor

It is a very recent change, and it is this line, 660 in JsonPrinter.json:

   JsonObjectBuilder builder = jsonObjectBuilder();

jsonObjectBuilder() creates a NullSafeJsonBuilder, but it is cast back to the native JsonObjectBuilder. ☹️

@landreev
Copy link
Contributor

With the fix for the main issue merged in develop (#10366) and with the fix for file size>2GB committed, I can now upload files - yay.
Screen Shot 2024-03-12 at 6 19 34 PM
I'm inclined to go ahead and merge this PR. With the acknowledgement that everything about Globus is rather convoluted, and I have been testing one simple workflow - uploading files to one specific endpoint (NESE test/demo tape), using Borealis dataverse-globus app. So, apologies if I missed something in the process.

landreev and others added 3 commits March 12, 2024 19:15
A few cosmetic/trivial additions to the guide, if you want them. 
Things like the dataset id query parameters missing from the curl example lines... that are really obvious, and it may be safe to assume that only serious developers are going to read this guide... so, up to you.
cosmetic additions to globus-api.rst
@landreev landreev merged commit 5caf436 into IQSS:develop Mar 13, 2024
11 of 12 checks passed
@landreev landreev removed their assignment Mar 13, 2024
@landreev
Copy link
Contributor

Just noticed the "size 3" label - lol.

@landreev landreev added Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) and removed Size: 3 A percentage of a sprint. 2.1 hours. labels Mar 14, 2024
@scolapasta scolapasta modified the milestones: 6.1.1, 6.2 Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Projects
Status: Done 🧹
Development

Successfully merging this pull request may close these issues.

Direct Upload: missing error check for checksum Previewers should not show when files are not accessible
5 participants