# Zenodo

The [Zenodo](https://zenodo.org/) repository is a concrete implementation of the `RepositoryInterface`. Other repositories such as `Figshare` (https://figshare.com/) could be possible future realizations of it.

Zenodo provides a sandbox (testing environment) and a production environment. They work the same in principle. Therefore, only one implementation is needed, which is `ZenodoRecord` (the interface to a record in Zenodo). Pass `sandbox=True` to use the testing environment.

The below diagram shows the abstract base class with its abstract methods (indicated by italics). Note, that `upload_file` is *not* abstract. It depends on the implementation of `__upload_file__` in the subclasses, which uploads a file to the repository record. `upload_file()` is basically a wrapper, which additionally allows generating metadata files of the uploaded files. We will explore this feature later in this section.

The `RepositoryInterface` further defines the communication with files. A file object `RepositoryFile` is implemented, providing mandatory properties as well as a download method. A repository implementation (just like the one for Zenodo) must return a Dictionary of `RepositoryFile` objects for the `files` class property (see source code for in-depth explanation and the example at the end of this section).

<img src="../../_static/repo_class_diagram.svg"
     alt="../../_static/repo_class_diagram.svg"
     style="margin-right: 10px; height: 500px;" />
     

## Example usage

The example below will upload an HDF file to the sandbox server:

In [1]:
from h5rdmtoolbox.repository import zenodo
import h5rdmtoolbox as h5tbx

### 1. Init a Repo:

As said, we use the testing interface, hence `sandbox=True`:

In [2]:
repo = zenodo.ZenodoRecord(None, sandbox=True)

We create a test HDF5 file, which we will later publish in the repository:

In [3]:
with h5tbx.File() as h5:
    h5.create_dataset('velocity', shape=(10, 30), attrs={'units': 'm/s'})
filename = h5.hdf_filename

### 2. Add repository metadata
The repository needs **metadata**. The Zenodo module has a special class `Metadata` for this purpose. It validates the data expected by the Zenodo API (For required and optional fields, please refer to the [API](https://developers.zenodo.org/#representation) or carefully read the `Metadata` docstring. However, as `pydantic` is used as parent class, invalid or missing parameters will lead to errors):

In [4]:
from h5rdmtoolbox.repository.zenodo import metadata
from datetime import datetime

meta = metadata.Metadata(
    version="0.1.0-rc.1+build.1",
    title='[deleteme]h5tbxZenodoInterface',
    description='A toolbox for managing HDF5-based research data management',
    creators=[metadata.Creator(name="Probst, Matthias",
                      affiliation="KIT - ITS",
                      orcid="0000-0001-8729-0482")],
    contributors=[metadata.Contributor(name="Probst, Matthias",
                              affiliation="KIT - ITS",
                              orcid="0000-0001-8729-0482",
                              type="ContactPerson")],
    upload_type='image',
    image_type='photo',
    access_right='open',
    keywords=['hdf5', 'research data management', 'rdm'],
    publication_date=datetime.now(),
    embargo_date='2020'
)

... finally make the changes effective by setting the metadata:

In [5]:
repo.set_metadata(meta)

### 3. Upload files

*Any* file can be added (uploaded) by calling `upload_file(...)`. It can be a simple text, CSV or binary file. Often, it is advisable to describe the content in an additional file and hence provide more (machine-interpretable) information. Best is, to use JSON-LD files for this. The JSON-LD format allows describing file content and context in a standardized way.

One of the parameters of `upload_file(...)` is `metamapper`. It expects a function, that extracts meta information from the input file. If the parameter `auto_map_hdf` is True and a HDF5 file is passed (scans for file suffixes `.hdf`, `.hdf5` and `.h5`), the built-in converter function will be called, which writes a JSON-LD file.

By providing the `metamapper`-function, the target file and its metadata filename (which the function created) will be uploaded together.

Adding a metadata file is especially beneficial for large, binary files. Like this, the metadata file can be downloaded and explored quickly by the user.

In [6]:
repo.upload_file(filename)

List the just uploaded files in the repository:

In [7]:
repo.files

{'tmp0.jsonld': RepositoryFile(tmp0.jsonld),
 'tmp0.hdf': RepositoryFile(tmp0.hdf)}

### 3b Custom metamapper

We could of course write and use our own metadata extract function like so:

In [8]:
import pathlib

def my_meta_mapper(filename):
    """very primitive...and not a jsonld file, but 
    servese the demonstrating purpose."""
    txt_filename = pathlib.Path(filename).with_suffix('.txt')
    with open(txt_filename, 'w') as f:
        f.write(f'filename: {filename}')
    return txt_filename

In [10]:
repo.upload_file(filename, metamapper=my_meta_mapper)

Proof, that it worked:

In [11]:
for file in repo.files:
    print(file)

tmp0.jsonld
tmp0.hdf
tmp0.txt
