# FAIR Attributes

According to [F1 of the *FAIR Principles*](https://www.go-fair.org/fair-principles/f1-meta-data-assigned-globally-unique-persistent-identifiers/) attributes shall be assigned to globally unique and persistent identifiers.

Here's what www.g-fair.org says about it:

*Globally unique and persistent identifiers remove ambiguity in the meaning of your published data by assigning a unique identifier to every element of metadata and every concept/measurement in your dataset. In this context, identifiers consist of an internet link (e.g., a URL that resolves to a web page that defines the concept such as a particular human protein). Many data repositories will automatically generate globally unique and persistent identifiers to deposited datasets. Identifiers can help other people understand exactly what you mean, and they allow computers to interpret your data in a meaningful way (i.e., computers that are searching for your data or trying to automatically integrate them). Identifiers are essential to the human-machine interoperation that is key to the vision of Open Science. In addition, identifiers will help others to properly cite your work when reusing your data.*

The *h5rdmtoolbox* allows assigning attributes (and their data) to identifiers. For this, each name and value of an attribute may obtain an IRI (internationalized resource identifier). The following outlines, how it is done.

## Concept

We understand HDF5 objects, attributes and attribute values as RDF triples:
- obj (dataset/group) $\rightarrow$ *subject*
- attribute $\rightarrow$ <u>predicate</u>
- data $\rightarrow$ **object**

Then, we can "explain" the data in the following way:
- The *group "contact"* <u>is</u> **Person**
- The *group "contact"* <u>has ORCiD</u> **\<value\>**
- The *dataset "u"* <u>hasUnit/u> **"m/s"**
- *"m/s"* <u>is</u> **"https://qudt.org/vocab/unit/M-PER-SEC"**
- The *dataset "u"* <u>has kind of quantity/u> **Velocity** (defined by qudt)
- etc.

Let's build such a file:

In [7]:
import h5rdmtoolbox as h5tbx

## Attribute-IRI-Association

An IRI can be assigned during or after attribute creation. Various possibilities are shown below.

Note, that you can type the IRIs yourself, however, it is safer to use implemented namespace objects as provided by *rdflib* (e.g. FOAF). The toolbox provides the NFDI4Ing-supported ontology **metadata4ing (m4i)** which is very useful for engineering data. The toolbox implements it in the same way as *rdflib* does:

In [9]:
from h5rdmtoolbox.namespace import M4I, OBO, QUDT_UNIT, HAS_QKIND
from rdflib.namespace import FOAF

In [11]:
import numpy as np
import time
from datetime import datetime

hdf_filename = h5tbx.utils.generate_temporary_filename()

with h5tbx.File(hdf_filename) as h5:
    person = h5.create_group('contact')
    person.iri = FOAF.Person
    person.attrs.create(name='first_name',
                        predicate=FOAF.firstName,
                        data='Matthias')
    person.attrs.create(name='orcid',
                        predicate=M4I.orcidId,
                        data='https://orcid.org/0000-0001-8729-0482')

    st = datetime.now()
    time.sleep(2)
    ds = h5.create_dataset('random_velocity', data=np.random.random(100))
    et = datetime.now()
    ds.attrs.create('units', predicate=M4I.hasUnit,
                    data='m/s',
                    object=QUDT_UNIT.M_PER_SEC)
    ds.attrs.create('quantity_kind',
                     data='velocity',
                     predicate=M4I.hasKindOfQuantity,
                     object=HAS_QKIND.Velocity)

    proc = h5.create_group('proc_random_number')
    proc.iri = M4I.ProcessingStep
    proc.attrs['has_participants', OBO.has_participant] = h5['contact']
    proc.attrs.create('start_time', data=st,
                      predicate='https://schema.org/startTime')
    proc.attrs.create('end_time', data=et,
                      predicate='https://schema.org/startTime')
    proc.attrs['output', 'http://purl.obolibrary.org/obo/RO_0002234'] = ds
    

## Make use of FAIR metadata

There are three ways, how the above assignment helps us and how we might want to use IRIs:
1. Visual inspection by dumping the content to screen: This will outline the file (meta) content and we can click on the attributes with IRIs, which will explain the attribute (data)
2. We can extract a *JSON-LD* file. This is useful for other processes. We can also investigate this file further with tools like [JSON-LD-playground](https://json-ld.org/playground/).
3. Access IRI in (Python) code

### 1. Visual inspection

The *dump()* method will now add IRI-icons. Click on it and get redirected to the resources:

In [18]:
h5tbx.dump(hdf_filename, collapsed=False)

### 2. JSON-LD extraction

In [19]:
from h5rdmtoolbox import jsonld as h5tbxjld

In [20]:
print(h5tbxjld.dumps(hdf_filename, indent=2))

{
  "@graph": [
    {
      "@id": "file://./tmp2/",
      "https://w3id.org/okn/o/sd#SoftwareVersion": "1.0.2"
    },
    {
      "@id": "file://./tmp2/contact",
      "@type": "http://xmlns.com/foaf/0.1/Person",
      "http://xmlns.com/foaf/0.1/firstName": "Matthias",
      "http://w3id.org/nfdi4ing/metadata4ing#orcidId": "https://orcid.org/0000-0001-8729-0482"
    },
    {
      "@id": "file://./tmp2/proc_random_number",
      "@type": "http://w3id.org/nfdi4ing/metadata4ing#ProcessingStep",
      "https://schema.org/startTime": "2024-01-02T14:55:04.600987",
      "http://purl.obolibrary.org/obo/RO_0000057": "file://./tmp2/contact",
      "http://purl.obolibrary.org/obo/RO_0002234": "file://./tmp2/random_velocity"
    },
    {
      "@id": "file://./tmp2/random_velocity",
      "http://w3id.org/nfdi4ing/metadata4ing#hasKindOfQuantity": "https://qudt.org/vocab/quantitykind/Velocity",
      "http://w3id.org/nfdi4ing/metadata4ing#hasUnit": "https://qudt.org/vocab/unit/M-PER-SEC"
    }
 

## 3. Access IRI in code

You may want to access the IRI of an attribute with the Python:

In [34]:
with h5tbx.File(hdf_filename) as h5:
    person_iri = h5.contact.iri.subject
    orcid_iri = h5.contact.iri['orcid']

In [36]:
person_iri

rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person')

In [37]:
orcid_iri

{'predicate': 'http://w3id.org/nfdi4ing/metadata4ing#orcidId', 'object': None}