Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globus Store implementation #10162

Merged
merged 98 commits into from Dec 11, 2023

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Dec 1, 2023

What this PR does / why we need it: Implements new Globus functionality to handle

  • use of a standard Globus endpoint managed by dataverse
  • referencing files in remote Globus endpoints, and
  • supporting use of an S3 store with the Globus S3 Connector (which existed before) as a store rather than a one-per-instance option (with better security)

The overall functionality requires use of the Borealis Dataverse-Globus app which is being updated to work with the functionality added in this PR.

Which issue(s) this PR closes:

Closes #9123

Special notes for your reviewer:
~code complete - some docs but more to follow. External doc includes more info.
Suggestions on how to test this:
Without the app, testing requires some manual steps. Basically, Dataverse launches the app like an external tool to support upload (transfer to Dataverse) or download (transfer from Dataverse), so to test one can look at the URL used to launch the app and manually make the Dataverse API calls it would do, along with initiating the Globus transfer it would do (via the standard Globus app). As it sounds, this is tedious, and it requires a properly configured Dataverse instance.

For dev, I have an AWS instance (with associated Globus endpoints) that can be used for testing. Getting on a zoom call is probably the easiest way to walk through it all.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: It adds Globus-related functionality for upload/download if /when the Globus functionality is enabled.

Is there a release notes update needed for this change?: yes - tbd

Additional documentation:

qqmyers added 30 commits May 2, 2023 10:52
Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm clicking approve, not because I'm claiming that I understand everything that's going on here, but because I'm doing a combination of review+QA at the same time already.

@@ -0,0 +1,19 @@
Globus support in Dataverse has been expanded to include support for using file-based Globus endpoints, including the case where files are stored on tape and are not immediately accessible,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


The setup required to enable Globus is described in the `Community Dataverse-Globus Setup and Configuration document <https://docs.google.com/document/d/1mwY3IVv8_wTspQC0d4ddFrD2deqwr-V5iAGHgOy4Ch8/edit?usp=sharing>`_ and the references therein.
More details of the setup required to enable Globus is described in the `Community Dataverse-Globus Setup and Configuration document <https://docs.google.com/document/d/1mwY3IVv8_wTspQC0d4ddFrD2deqwr-V5iAGHgOy4Ch8/edit?usp=sharing>`_ and the references therein.

As described in that document, Globus transfers can be initiated by choosing the Globus option in the dataset upload panel. (Globus, which does asynchronous transfers, is not available during dataset creation.) Analogously, "Globus Transfer" is one of the download options in the "Access Dataset" menu and optionally the file landing page download menu (if/when supported in the dataverse-globus app).

An overview of the control and data transfer interactions between components was presented at the 2022 Dataverse Community Meeting and can be viewed in the `Integrations and Tools Session Video <https://youtu.be/3ek7F_Dxcjk?t=5289>`_ around the 1 hr 28 min mark.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this video still worth watching, given the changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. I gave a talk in 2023 as well, but 2022 goes into the steps in more detail, so I left it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
export JSON_DATA="{"taskIdentifier":"3f530302-6c48-11ee-8428-378be0d9c521", \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does this work? We might need single quotes on the outside instead of double.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or have the double quotes on the inside escaped. But yeah, the above will export a JSON string with no double quotes in it.

@@ -499,14 +499,14 @@ Logging & Slow Performance

.. _file-storage:

File Storage: Using a Local Filesystem and/or Swift and/or Object Stores and/or Trusted Remote Stores
-----------------------------------------------------------------------------------------------------
File Storage: Using a Local Filesystem and/or Swift and/or Object Stores and/or Trusted Remote Stores and/or Globus Stores
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha. At some point we might want a more generic title instead of listing each type. 😄 Plus, Swift probably shouldn't be second forever. It's hardly used.

@landreev
Copy link
Contributor

landreev commented Dec 8, 2023

I'm going to assume that the war file currently deployed on the test instance (from around noon Dec. 7) is the latest build.
The setup and configuration was easier to figure out than I expected; everything makes sense for the most part.
I will be asking question about specific functionality. The general idea of testing the external app functionality by checking the redirect urls issues has been perfectly workable so far.

@landreev
Copy link
Contributor

landreev commented Dec 8, 2023

Also, going to assume that the last Jenkins failure (build 18) is one of those random flukes where the ec2 instance fails to start up in time and has nothing to do with the branch.

@landreev
Copy link
Contributor

landreev commented Dec 8, 2023

In this dataset on the test instance, doi:10.5072/FK2/0FUH2K is this the correct behavior? - the local download - i.e., the link to the native /api/access/datafile/ also showing in the download pulldown:
Screen Shot 2023-12-08 at 10 16 50 AM

@qqmyers
Copy link
Member Author

qqmyers commented Dec 8, 2023

My guess is the globusr store does not have the files-not-accessible-by-dataverse flag set to true when it should. (Same reason publish fails as validation is on and not disabled by this flag.)

qqmyers and others added 2 commits December 8, 2023 15:44
Co-authored-by: Philip Durbin <philipdurbin@gmail.com>
@landreev
Copy link
Contributor

landreev commented Dec 8, 2023

[this is a status update per slack discussion] So, we have a few documentation edit requests, some less some more nitpicking. But as for the functionality in the PR, what I could crudely test on my macbook and on Jim’s ec2 instance, all worked for me. “crudely” is the key.
One exception is the use case of a managed globus store that’s pointed to a globus endpoint behind s3 connector (Dataverse can then access the files, so the native apis etc. work). I concluded that I could not test that scenario, but Jim is going to hook me up with his existing endpoint over the weekend.
So, without claiming that I’m qualified to QA this stuff really thoroughly, I’m fairly close to signing off on it. (seeing the experimental nature of the feature as a license of sorts to give myself some breaks on thoroughness).
… once we confirm that the documentation is done, we should go ahead and add the release note to the 6.1 note, w/out waiting for the PR to be merged.

@landreev
Copy link
Contributor

We have reviewed and cleared the "managed store tied to Globus S3 connector" use case this morning. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Enhance Globus support for remote endpoints and tape stores
4 participants