Skip to content

Latest commit

 

History

History
229 lines (127 loc) · 10.5 KB

005_quick_start.md

File metadata and controls

229 lines (127 loc) · 10.5 KB

Quick start

Note that datasets placed on the testing storage locations within this guide are readable globally by anyone, not deletable once deposited there, not backed up, and may disappear at any time.

Reporting issues

Please report any issues with the dtool lookup GUI at https://github.com/livMatS/dtool-lookup-gui/issues, any issues with this documentation at https://github.com/livMatS/RDM-Wiki-public/issues, or to data@livmats.uni-freibur.de.

When reporting an issue, please include information on your OS, the version information available within the GUI's About dialog, and debug log ouptput if possible. The dtool-lookup-gui offers a logging window. Open it by a click on logging within the "burger" menu. Switch the loglevel to DEBUG, go ahead with the actions that lead to the problem you want to report, save the logging output to file, and append that log when reporting the issue. Thank you.

Set-up

Navigate to https://github.com/livMatS/dtool-lookup-gui/releases and download the latest release zip file containing the dtool lookup gui for your OS. Minimum requirements are macOS 10.15, Windows 10, Ubuntu 20.04 or comparably recent Linux distribution. Unpack the zip archive and launch the application. When you launch the GUI for the first time, it may look quite empty:

Clean start

Open the main menu by clicking the burger menu in the upper right corner and select settings.

Main menu

Download the sample configuration for a testing server instance at dtool.json.

Sample dtool.json

Import it via the import in the settings dialog

Import config icon

and selecting the downloaded file.

Import config icon

The imported settings will then appear in the dialog.

Import config icon

dtool uses cacheable tokens to facilitate authentication against the lookup server. Click on renew token to fetch such a token. Authenticate with your username and password.

Import config icon

For the testing configuration, it's testuser and test_password. The generated token appears in the settings dialog.

Renewed token

You won't have to authenticate again until the cached token looses its validity.

After importing the configuration and closing the settings dialog, you will find these settings stored within ~/.config/dtool/dtool.json below your user's home folder. The GUI will list two new base URIs on the left-hand side:

After configuration

The prefix indicates the type of protocol used to communicate with the underlying storage infrastructure. s3 points to s3-compatible object storage like Amazon S3, smb points to network storage better known as Windows shares. The tesitng server instance offers s3://test-bucket and smb://test-share to play around with. Browse those locations by selecting them.

S3 bucket after coniguration

The first entry in the list plays a special role. Here you can see and search through all the datasets that have been indexed by the lookup server:

Lookup results after configurations

In the central column you see the list of datasets. On the right-hand side you see a few buttons, the Details, Manifest and Dependencies tab, and below them the fixed administrative and editable descriptive metadata. The latter is shown as YAML-highlighted text.

Add a local base URI

Add a local folder to the list of base URIs by clicking on the folder icon in the upper left corner

Open local base URI

and selecting the desired location.

Select path as local base URI

To distinguish them from other (remote) endpoints, local base URIs come with the file:// prefix.

Empty local baseURI

Copy a dataset from remote to local

Now, copy a dataset from a remote location to your local machine. Select a dataset on the s3-endpoint and download it by choosing your local folder from the copy-button's drop-down menu:

Copy fromS3 to local base URI

The dataset will appear at your local base URI:

Copied dataset at local base URI.

Notice the dataset URI entry in the administrative metadata.

Manifest and show tooltip

The Manifest lists all items contained within the dataset. Click on Show to explore the dataset with the local file system browser.

Create a dataset

Download the README.yml template.

Download readme template

Adapt it to your needs in a text editor.

Point the GUI to this template at the bottom of settings dialog

Open readme template

or just open your .config/dtool/dtool.json in a text editor and set the DTOOL_README_TEMPLATE_FPATH entry to point to your README.yml template:

Modify dtool.json

Specify your name and e-mail address as well. Next, create a new dataset by selecting your local base URI and clicking the '+' icon in the upper left corner:

Create dataset

Pick a name

Pick dataset name

and notice the new entry in the list of datasets.

After dataset creation

You see a new UUID in bold assigned to the freshly created dataset. This is an important concept. No matter how your dataset is stored, how it's moved around, or how many copies of it are created, this Universally Unique IDentifier will stay with your dataset over its whole lifetime. No other dataset will ever own the same UUID. It hence serves as a persistent identifier, an important building block for implementing the FAIR principles 1.

The UUID is prefixed by an asterisk '*' to mark it as a ProtoDataset. If configured correctly, the README.yml template should appear as descriptive metadata for the fresh dataset with some placeholders automatically filled in.

Enable the metadata editing switch at the bottom and fill in some more descriptive metadata in YAML format.

Editing metadata

Add items to your dataset,

Add items

and freeze it,

Freeze dataset

confirming the warning.

Freeze confirmation

Freezing means making the dataset immutable. The ProtoDataset turns into a Datset, the asterisk mark disappears. It's now forbidden to alter the content. You may inspect the manifest

Manifest

and explore the contents with your file system browser.

Show

Structure of a dataset

The dataset's top level holds the README.yml, the data and the .dtool directories:

Top level content

The README.yml just contains what you have entered as descriptive metadata in YAML-formatted text:

README.yml content

The data directory holds all items:

README.yml content

The .dtool directory contains administrative and structural metadata distributed into several small files.

.dtool content

It is designed to be both machine-processible but also human-readible. As such, it holds a README.txt describing the meaning of all items within:

.dtool/README.txt content

The manifest.json holds size and checksums of all items at the point of freezing, making any illegal tempering with the items of the frozen dataset immediately noticeable:

.dtool/manifest.json content

For more information on the structure of a dataset, refer to the software authors' publication 2.

Search for a dataset

Copy your frozen dataset to the s3://test-bucket,

Copy from local base URI to S3

and confirm it's there,

Copied dataset at S3 endpoint

Depending on the server's configuration, it will register new datasets immediately or just at certain time intervals. After that has happened, the dataset will appear on the Lookup server dataset list,

New dataset in index

The lookup server makes the dataset discoverable by its administrative and descriptive metadata. A search query may be plain text to aim at content of the README.yml, i.e.

Text search

or formulated more specifically to aim at certain fields of the README.yml,

Mongo search

To understand more about the possibilities for sophisticated querying, continue reading Finding a dataset.

Browse repository from anywhere

livMatS offers a simple web app to explore the content of s3://test-bucket in your browser. Visit https://livmats-data.vm.uni-freiburg.de:4443 and log in with testuser and test_password. You should see a few datasets, among them your creation:

dtool-lookup-webapp

Footnotes

  1. T. S. G. Olsson and M. Hartley, “Lightweight data management with dtool,” PeerJ, vol. 7, p. e6562, Mar. 2019, doi: 10.7717/peerj.6562.

  2. M. D. Wilkinson et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, vol. 3, no. 1, Art. no. 1, Mar. 2016, doi: 10.1038/sdata.2016.18.