# Describing PIV recordings

Let's say you recorded multiple PIV images and put them into a *ZIP* folder. The scenario is shown below. We will describe the data stored in the zip folder using linked-data syntax. The collection of PIV images is called a *dataset*. The information about it will be stored in a JSON-LD file:

![piv_image_dataset_management](piv_image_dataset_management.svg)

The ["PIV Challenge"](https://www.pivchallenge.org/) datasets will serve as real-world examples. We will describe one of them.

Before we start, let's get clear about the vocabulary/semantic:

We will use multiple vocabularies and ontologies. At the core, we will use the "Data Catalog Vocabulary" (dcat), which allows describing datasets. According to [dcat](https://www.w3.org/TR/vocab-dcat-2/), dataset and distribution, the main objects in our problem are described as follows:
- *dcat:Dataset*: "A collection of data, published or curated by a single agent, and available for access or download in one or more representations."
- *dcat:Distribution*: "A specific representation of a dataset. A dataset might be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above)."

Besides the description of file objects (*dcat:Distribution*), the dataset has many more properties, like the creator, a description and more such fields. We will add all this below.

## Imports

We will import some namespace modules, which are provided by `pivmetalib`. All these namespace modules contain classes representing the concepts of their ontology. E.g. `prov` contains the class `Person` and describes [*prov:Person*](https://www.w3.org/ns/prov#Person). The most important properties of a person, such as the first and last name, the email or a researcher ID is implemented as class attributes. Learn more bout in the [GettingStarted Notebook](./GettingStarted.ipynb)

In [1]:
from pivmetalib import PIVMETA
from pivmetalib import prov, dcat, pivmeta, m4i

ImportError: cannot import name 'PIVMETA' from 'pivmetalib' (C:\Users\matth\Documents\GitHub\pivmetalib\pivmetalib\__init__.py)

## Data collection

We refer to the [PIV-Challenge](https://www.pivchallenge.org/pub/) website for all information, i.e. data and metadata. Much is written in the README file, but some metadata is also available in the HTML text - you get the problem of scattered data! Using **Case C** as an example, we want to write the metadata using semantic technologies.

Here is a (probably incomplete) list of metadata:
- case/dataset name: "C"
- description: "Strong wall reflection in an impeller (background images and mask are provided), (provided by Stanislas)"
- long description from README: "The set of images is referenced C001_1.tif and C001_2.tif...The two white circles are the two edges of the fixed vaneless diffuser."
- image type: "real"
- number of sets: "1 + 2bg + 1 msk"
- author(s): "Stanislas"
- camera characteristics (see README): "Type: KODAK ES1.0 b & w.....Acquisition software	INSIGHT 2.10."

The challenge is to translate this into to a common language so that datasets become comparable, also from other sources. This is exactly what the `PIVMeta`-ontology achieves. Let's dive into building an interoperable description of the dataset:

## Before we start - a word on Standard Names

We will use numerical variables to describe some of the properties and settings of the PIV measurement. Those values are of great importance for the understanding of the experiment and analysis, respectively. It is therefore important to get the naming right. A precise way of doing this is, by assigning variables to a well-defined standard name, which is documented in a standard name table online (see more about it [here](https://matthiasprobst.github.io/ssno/))

Let's download one such standard name. We will use it in the process of describing the data:

In [None]:
from ssnolib import StandardNameTable

snt = StandardNameTable.download(url="https://zenodo.org/records/14175299/files/Standard_Name_Table_for_Particle_Image_Velociemtry_data.jsonld?download=1", fmt="jsonld")
standard_names = snt.get_standard_names_as_frozen_dataclass()

## Describe the setup

Essentially, the most importent components are the camera (optics+sensor) and the laser:

For the **camera**, we are interested in the number and size of the pixel as well as the used lens. Properties of a tool like a camera can be described via [m4i:NumericalVariable](https://nfdi4ing.pages.rwth-aachen.de/metadata4ing/metadata4ing/index.html#NumericalVariable). 

We know the **sensor width and pixel size**:

In [None]:
sensor_width = m4i.NumericalVariable(
    value=1008,
    label="sensor width",
    standard_name=standard_names.sensor_pixel_width
)
sensor_height = m4i.NumericalVariable(
    value=1008,
    label="sensor height",
    standard_name=standard_names.sensor_pixel_height
)

pixel_width = m4i.NumericalVariable(
    value=9.072,
    hasUnit='um',
    hasStandardName=standard_names.ccd_width
)
pixel_height = m4i.NumericalVariable(
    value=9.072,
    hasUnit='um',
    hasStandardName=standard_names.ccd_height
)

The objective can be described using the `pivmeta:Objective` class:

In [None]:
objective = pivmeta.Objective(
    label="Nikkor",
    fnumber='f/2',
    hasParameter=[
        m4i.NumericalVariable(
            label='focal length',
            value=9.072,
            hasUnit='mm',
            hasStandardName=standard_names.focal_length
        )
    ]
)
standard_names.focal_length

In [None]:
print(objective.hasParameter[0])

Finally, we can describe the **camera as a tool** with three parameters and another tool (the objective) which is part of it:

In [None]:
camera = pivmeta.DigitalCamera(  # is a subclass of m4i.Tool, so use hasParameter
    label='KODAK ES1.0 b & w',
    hasParameter=[sensor_width, sensor_height, pixel_width, pixel_height],
    hasPart=objective
)
print(camera.serialize("ttl", context={"ssno": "https://matthiasprobst.github.io/ssno#"}))

Now, let's describe the complete dataset:

In [None]:
piv_setup = pivmeta.PIVSetup(haspart=[camera,])

## Author

In [None]:
creator = prov.Person(
    lastName='Stanislas',
    mbox="pivnet-sig32@univ-lille1.fr"
)
creator

In [None]:
image_bit_depth = m4i.NumericalVariable(label="bit depth", value=8)

In [None]:
ds = pivmeta.PIVDataset(
    title='piv-challenge-1-C',
    creator=creator,
    modified="2000-10-28",
    landingPage="https://www.pivchallenge.org/pub/index.html#c",
    description="Different velocity gradients with spatially varying image quality (provided by Okamoto) < synthetic > [256 x 128]",
    hasPart=piv_setup,
    distribution=[
        pivmeta.PIVDistribution(
            title='Raw piv image data',
            accessURL='https://www.pivchallenge.org/pub',
            downloadURL='https://www.pivchallenge.org/pub/C/C.zip',
            mediaType='https://www.iana.org/assignments/media-types/image/tiff',
            compressedFormat='application/zip',
            isPIVDistributionType=[PIVMETA.SyntheticImage, PIVMETA.Image],
            numberOfRecords=1,  # It contains one double image
            filenamePattern=r"C[0-9][0-9][0-9]_[1,2].tif",  # the regex for the filename
            hasMetric=image_bit_depth
        ),
        pivmeta.PIVDistribution(
            title='Mask file',
            isPIVDistributionType=PIVMETA.PIVMask,
            accessURL='https://www.pivchallenge.org/pub',
            downloadURL='https://www.pivchallenge.org/pub/C/C.zip',
            compressedFormat='application/zip',  # https://www.w3.org/TR/vocab-dcat-2/#Property:distribution_compression_format
            mediaType='https://www.iana.org/assignments/media-types/image/tiff'
        ),
        # dcat.Distribution(
        #     label='README file',
        #     title='ReadMe file',
        #     accessURL='https://www.pivchallenge.org/pub',
        #     downloadURL='https://www.pivchallenge.org/pub/E/readmeE.txt'
        # ),
    ]
)
print(ds.serialize(format="ttl"))

## Export to JSON-LD

The dataset python object can be written to JSON-LD like so:

In [None]:
with open('piv_challenge.jsonld', 'w') as f:
    json_ld_str = ds.model_dump_jsonld(context={"local":"https://example.org/"}).replace("_:", "local:")
    f.write(json_ld_str)
print(json_ld_str)

# Integrate into the broader context

The dataset we described is part of the PIV-Challenge. As the data is hosted on the website https://www.pivchallenge.org with many other datasets per PIV-Challenge-Event we can define it as a data catalog ([dcat:Catalog](https://www.w3.org/TR/vocab-dcat-3/#Class:Catalog)).

`pivmealib` does not provide this class (maybe in the future). But we can build a class for it (see also [this figure](https://www.w3.org/TR/vocab-dcat-3/#fig-dcat-all-attributes)):

In [None]:
from rdflib import DCAT
from typing import Union, List
from pivmetalib.dcat import Dataset
from ontolutils import Thing
from pydantic import HttpUrl

Event = Thing.build(
    namespace="https://schema.org/",
    namespace_prefix="schema",
    class_name="Event",
    properties=[
        dict(name="location", property_type=str),
        dict(name="startDate", property_type=str),
    ]
)

Catalog = dcat.Resource.build(
    namespace=str(DCAT),
    namespace_prefix="dcat",
    class_name="Catalog",
    properties=[
        dict(name="dataset", property_type=Union[Dataset, List[Dataset], pivmeta.PIVDataset]),
        dict(name="homepage", property_type=HttpUrl, namespace="https://schema.org/", namespace_prefix="schema"),
        dict(name="relation", property_type=Thing, namespace="http://purl.org/dc/terms/", namespace_prefix="dct"),
    ]
)

In [None]:
from datetime import datetime

first_challenge = Event(
    label="1st PIV Challenge (Sept.14-15, 2001, Göttingen, Germany)",
    location="Göttingen, Germany",
    startDate="2001-9-14"
)
catalog = Catalog(dataset=ds, homepage="https://www.pivchallenge.org", relation=first_challenge)
print(catalog.serialize("ttl"))