# User-defined metadata

Next to system metadata, iRODS allows you to create own metadata with data objects and collections.

You can use that metadata to describe your data and later search for this data; and it can help you keeping the overview of what was the input for an analysis and what is the outcome.

<img src="img/DataObject5.png" width="400">

Technically, iRODS offers metadata as key-value-units triple. Let's investigate this:

## Add metadata to data objects

As always: first we have to create an iRODS session:

In [None]:
from ibridges.interactive import interactive_auth
session = interactive_auth()

Now we can retrieve a data object and insect its metadata.

In [None]:
from ibridges.path import IrodsPath
from ibridges.meta import MetaData

irods_coll_path = IrodsPath(session, '~').joinpath('demo')
obj = irods_coll_path.joinpath('demofile.txt').dataobject

obj_metadata = MetaData(obj)
print(obj_metadata)

Most probably you will see no metadata in the above cell. **Note, that system metadata and user-defined metadata are two different entities in a data object!**
With the command `MetaData(obj)` we only retrieve the user-defined metadata.

<img src="img/DataObject4.png" width="400">

Now we can add some own metadata. The metadata comes as key-value-units triple:

In [None]:
obj_metadata.add('Key', 'Value', 'Units')
print(obj_metadata)

Sometimes we do not really have `units`, so we can leave this part empty:

In [None]:
obj_metadata.add('Author', 'Christine')
print(obj_metadata)

We can also add a second author:

In [None]:
obj_metadata.add('Author', 'Raoul')
print(obj_metadata)

You see, that keys in **iRODS metadata keys can have different values**. That is different from python dictionaries where one key can only have one value. **How then to overwrite a value?**

## Overwrite metadata

If you wish to *overwrite* a value, you can first add the new metadata key-value-units triple as above and subsequently remove the old one, you need to specify the whole triple if the metadata contains a units part. As you see the following command will fail:

In [None]:
obj_metadata.delete('Key', 'Value')

While this one will succeed:

In [None]:
obj_metadata.delete('Key', 'Value', 'Units')

You can also set all existing values to **one** new value:

In [None]:
print(obj_metadata)
obj_metadata.set('Author', 'Maarten')
print(obj_metadata)

## Add metadata to collections

The same functionality we saw above, we can use for collections:

In [None]:
coll = irods_coll_path.collection
coll_metadata = MetaData(coll)
print(coll_metadata)

In [None]:
coll_metadata.add('TypeOfCollection', 'Results')
print(coll_metadata)

## Which metadata can help you keeping an overview?

iRODS metadata can help you keeping an overview while you are working with data and maybe many files which have relations to each other. There are ontologies which define keywords and links between keywords like the **[prov-o Ontology](https://www.w3.org/TR/prov-o/#prov-o-at-a-glance)**.

Let's see how we can annotate our test data, so that we know that it is test data.

In [None]:
from datetime import datetime
coll_metadata.add('prov:wasGeneratedBy', 'Christine')
coll_metadata.add('CollectionType', 'testcollection')
obj_metadata.add('prov:SoftwareAgent', 'iRODS jupyter Tutorial')
obj_metadata.add('prov:wasGeneratedBy', 'Maarten')
obj_metadata.add('DataType', 'testdata')

Now we have some more descriptive metadata that gives us hints, in which context the data was created:

In [None]:
print(coll_metadata)
print(obj_metadata)

## Finding data by their metadata

Metadata does not only help you to. keep an overview over your data, but can also be used to select and retrieve data. In iBridges you can use the user-defined metadata and some system metadata fields to search for data.

In our first example we list all collections and data objects in our iRODS home.

In [None]:
from ibridges.search import search_data
result = search_data(session, path=session.home)
print(result)

The output is a list of Python dictionaries, where each dictionary contains

1) Collections: `'COLL_NAME': '/<ZONE>/home/<YOUR PATH>'}`
2) Data Objects: 
    ```
    {'COLL_NAME': '/<ZONE>/home/<YOUR PATH>', 
     'DATA_NAME': '<OBJECT NAME>', 
     'D_DATA_CHECKSUM': '<CHECKSUM>'}
    ```

Now let's try to find data by its metadata. We will have to create a Python dictionary with the metadata keys and the metadata values as search criterion:

In [None]:
key_vals = {'prov:wasGeneratedBy': 'Christine'}
result = search_data(session, key_vals=key_vals)
print(result)

If we do not want to specify the particular value of the key, we can use a *wildcard*. In iRODS the wildcard is `%`.

In [None]:
key_vals = {'prov:wasGeneratedBy': '%'}
result = search_data(session, key_vals=key_vals)
print(result)

Now we also receive the data object that was generated by *Maarten*.

And of course we can combine information about the path and the metadata. they will be connected with `and`. The following search will retrieve all data objects and collections wich are labeled with a metadata key *'prov:wasGeneratedBy'* and whose path has the prefix */nluu12p/home/research-test-christine/demo/*.

In [None]:
key_vals = {'prov:wasGeneratedBy': '%'}
result = search_data(session, path='/nluu12p/home/research-test-christine/%', key_vals=key_vals)
print(result)

## Retrieving data

Now that we have the search results in a list of Python dictionaries, we can use the information to create the full iRODS paths and continue working with them e.g. download them.

In [None]:
from ibridges.path import IrodsPath

paths = [IrodsPath(session, r.get('COLL_NAME', '')).joinpath(r.get('DATA_NAME', '')) for r in result]
print(paths)

# Metadata archives

In most cases the user is encouraged to access and manipulate metadata through the `MetaData` class. However, there are some cases where it can be useful to create an archive of all metadata in a collection and all subcollections and data objects. One example might be a backup of the data and metadata on a system that does not support metadata. Another might be to easily transfer metadata from one iRODS system to another. A final use case might be having access to the metadata during computation on a system that is not connected to the internet.

## Creating a metadata archive

In [2]:
from ibridges.data_operations import create_meta_archive

collection_path = IrodsPath(session, "~", "Demo")
create_meta_archive(session, collection_path, "meta_archive.json")

## Applying a metadata archive

This restores/overwrites the metadata on the iRODS server with the metadata from the archive. Make sure that the paths of the subcollections and data objects have not changed.

In [None]:
from ibridges.data_operations import apply_meta_archive

apply_meta_archive(session, "meta_archive.json", collection_path)