Note that datasets placed on the testing storage locations within this guide are readable globally by anyone, not deletable once deposited there, not backed up, and may disappear at any time.
Please report any issues with the dtool lookup GUI at https://github.com/livMatS/dtool-lookup-gui/issues, any issues with this documentation at https://github.com/livMatS/RDM-Wiki-public/issues, or to data@livmats.uni-freibur.de.
When reporting an issue, please include information on your OS, the version
information available within the GUI's About dialog, and debug log ouptput if
possible. The dtool-lookup-gui
offers a logging window. Open it by a click on
logging within the "burger" menu. Switch the loglevel to DEBUG
, go ahead
with the actions that lead to the problem you want to report, save the logging
output to file, and append that log when reporting the issue. Thank you.
Navigate to https://github.com/livMatS/dtool-lookup-gui/releases and download the latest release zip file containing the dtool lookup gui for your OS. Minimum requirements are macOS 10.15, Windows 10, Ubuntu 20.04 or comparably recent Linux distribution. Unpack the zip archive and launch the application. When you launch the GUI for the first time, it may look quite empty:
Open the main menu by clicking the burger menu in the upper right corner and select settings.
Download the sample configuration for a testing server instance at
dtool.json
.
Import it via the import in the settings dialog
and selecting the downloaded file.
The imported settings will then appear in the dialog.
dtool uses cacheable tokens to facilitate authentication against the lookup server. Click on renew token to fetch such a token. Authenticate with your username and password.
For the testing configuration, it's testuser
and test_password
. The generated token appears in the settings dialog.
You won't have to authenticate again until the cached token looses its validity.
After importing the configuration and closing the settings dialog, you will find these settings stored within ~/.config/dtool/dtool.json
below your user's home folder. The GUI will list two new base URIs on the left-hand side:
The prefix indicates the type of protocol used to communicate with the underlying storage infrastructure. s3 points to s3-compatible object storage like Amazon S3, smb points to network storage better known as Windows shares. The tesitng server instance offers s3://test-bucket
and smb://test-share
to play around with. Browse those locations by selecting them.
The first entry in the list plays a special role. Here you can see and search through all the datasets that have been indexed by the lookup server:
In the central column you see the list of datasets. On the right-hand side you see a few buttons, the Details, Manifest and Dependencies tab, and below them the fixed administrative and editable descriptive metadata. The latter is shown as YAML-highlighted text.
Add a local folder to the list of base URIs by clicking on the folder icon in the upper left corner
and selecting the desired location.
To distinguish them from other (remote) endpoints, local base URIs come with the file://
prefix.
Now, copy a dataset from a remote location to your local machine. Select a dataset on the s3-endpoint and download it by choosing your local folder from the copy-button's drop-down menu:
The dataset will appear at your local base URI:
Notice the dataset URI entry in the administrative metadata.
The Manifest lists all items contained within the dataset. Click on Show to explore the dataset with the local file system browser.
Download the README.yml
template.
Adapt it to your needs in a text editor.
Point the GUI to this template at the bottom of settings dialog
or just open your .config/dtool/dtool.json
in a text editor and set the DTOOL_README_TEMPLATE_FPATH
entry to point to your README.yml
template:
Specify your name and e-mail address as well. Next, create a new dataset by selecting your local base URI and clicking the '+' icon in the upper left corner:
Pick a name
and notice the new entry in the list of datasets.
You see a new UUID in bold assigned to the freshly created dataset. This is an important concept. No matter how your dataset is stored, how it's moved around, or how many copies of it are created, this Universally Unique IDentifier will stay with your dataset over its whole lifetime. No other dataset will ever own the same UUID. It hence serves as a persistent identifier, an important building block for implementing the FAIR principles 1.
The UUID is prefixed by an asterisk '*' to mark it as a ProtoDataset. If configured correctly, the README.yml
template should appear as descriptive metadata for the fresh dataset with some placeholders automatically filled in.
Enable the metadata editing switch at the bottom and fill in some more descriptive metadata in YAML format.
Add items to your dataset,
and freeze it,
confirming the warning.
Freezing means making the dataset immutable. The ProtoDataset turns into a Datset, the asterisk mark disappears. It's now forbidden to alter the content. You may inspect the manifest
and explore the contents with your file system browser.
The dataset's top level holds the README.yml
, the data
and the .dtool
directories:
The README.yml
just contains what you have entered as descriptive metadata in YAML-formatted text:
The data
directory holds all items:
The .dtool
directory contains administrative and structural metadata distributed into several small files.
It is designed to be both machine-processible but also human-readible. As such, it holds a README.txt
describing the meaning of all items within:
The manifest.json
holds size and checksums of all items at the point of freezing, making any illegal tempering with the items of the frozen dataset immediately noticeable:
For more information on the structure of a dataset, refer to the software authors' publication 2.
Copy your frozen dataset to the s3://test-bucket
,
and confirm it's there,
Depending on the server's configuration, it will register new datasets immediately or just at certain time intervals. After that has happened, the dataset will appear on the Lookup server
dataset list,
The lookup server makes the dataset discoverable by its administrative and descriptive metadata. A search query may be plain text to aim at content of the README.yml
, i.e.
or formulated more specifically to aim at certain fields of the README.yml
,
To understand more about the possibilities for sophisticated querying, continue reading Finding a dataset.
livMatS offers a simple web app to explore the content of s3://test-bucket in
your browser. Visit https://livmats-data.vm.uni-freiburg.de:4443 and
log in with testuser
and test_password
. You should see a few datasets,
among them your creation:
Footnotes
-
T. S. G. Olsson and M. Hartley, “Lightweight data management with dtool,” PeerJ, vol. 7, p. e6562, Mar. 2019, doi: 10.7717/peerj.6562. ↩
-
M. D. Wilkinson et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, vol. 3, no. 1, Art. no. 1, Mar. 2016, doi: 10.1038/sdata.2016.18. ↩