# Describe a PIV recording

Let's say you recorded multiple PIV images and put them into a *ZIP* folder. The scenario is shown below. We will describe the data stored in the zip folder using linked-data syntax. The collection of PIV images is called a *dataset*. The information about it will be stored in a JSON-LD file:

![piv_image_dataset_management](piv_image_dataset_management.svg)

The ["PIV Challenge"](https://www.pivchallenge.org/) datasets will serve as real-world examples. We will describe one of them.

Before we start, let's get clear about the vocabulary/semantic:

We will use multiple vocabularies and ontologies. At the core, we will use the "Data Catalog Vocabulary" (dcat), which allows describing datasets. According to [dcat](https://www.w3.org/TR/vocab-dcat-2/), dataset and distribution, the main objects in our problem are described as follows:
- *dcat:Dataset*: "A collection of data, published or curated by a single agent, and available for access or download in one or more representations."
- *dcat:Distribution*: "A specific representation of a dataset. A dataset might be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above)."

Besides the description of file objects (*dcat:Distribution*), the dataset has many more properties, like the creator, a description and more such fields. We will add all this below.

## Imports

We will import some namespace modules, which are provided by `pivmetalib`. All these namespace modules contain classes representing the concepts of their ontology. E.g. `prov` contains the class `Person` and describes [*prov:Person*](https://www.w3.org/ns/prov#Person). The most important properties of a person, such as the first and last name, the email or a researcher ID is implemented as class attributes. Learn more bout in in the [GettingStarted Notebook](./GettingStarted.ipynb)

Here's an example for a Person:

In [None]:
from pivmetalib import prov

creator = prov.Person(
    lastName='Okamoto',
    mbox="okamoto@tokai.t.u-tokyo.ac.jp"
)
creator

Other important modules are `dcat` and `pivmeta`. The namespace module `dcat` contains *Distribution* and *Dataset*. The *pivmeta*-ontology provides many more PIV-specific concepts, among others it defines e.g. *PIVImageDistribution*, which is a (sub)type of *Distribution* and tells us, that the distribution contains PIV images as compared to other regular files, like README files, for example.

Let's import the other modules:

In [None]:
from pivmetalib import dcat # dcat import Dataset, Distribution
from pivmetalib import pivmeta # we will need PivImageDistribution
from pivmetalib import PIVMETA  # the namespace module containing the URI addresses

## Data collection

We refer to the [PIV-Challenge](https://www.pivchallenge.org/pub/) website for all information, i.e. data and metadata. Much is written in the README file but some metadata is also available in the HTML text.

Here is a (probably incomplete) list of metadata:
- case/dataset name: "C"
- description: "Strong wall reflection in an impeller (background images and mask are provided), (provided by Stanislas)"
- long description from README: "The set of images is referenced C001_1.tif and C001_2.tif...The two white circles are the two edges of the fixed vaneless diffuser."
- image type: "real"
- number of sets: "1 + 2bg + 1 msk"
- author(s): "Stanislas"
- camera characteristics (see README): "Type: KODAK ES1.0 b & w.....Acquisition software	INSIGHT 2.10."

The challenge is to translate this into to a common language so that datasets become comparable, also from other sources. This is exactly what the `  T h a`-ontology achieves. Let's dive into building an interoperable description of the dataset:

## Describe the dataset

The package `pivmetalib` has implemented the [RDF](https://www.w3.org/RDF/) vocabularies as python objects. Their parameters validated.

Let's first examine this by creating the person, who created the dataset:

## Author

In [None]:
creator = prov.Person(
    lastName='Stanislas',
    mbox="pivnet-sig32@univ-lille1.fr"
)
creator

## Camera

The most important properties of a camera used for PIV is the sensor size and the lens used:

In [None]:
from pivmetalib import m4i
from ontolutils import PIVMETA, QUDT_KIND

In [None]:
sensor_width = pivmeta.NumericalVariable(hasNumericalValue=1008,
                                         hasStandardName="https://matthiasprobst.github.io/pivmeta#sensor_pixel_width")
sensor_height = pivmeta.NumericalVariable(hasNumericalValue=1008,
                                         hasStandardName="https://matthiasprobst.github.io/pivmeta#sensor_pixel_width")

In [None]:
# # dont use PIVMETA better build a SNT-Namespace-like-class similar to PIVMETA
# # consider outsourcing this to a separate package onto_utils, which let's you build namespace classes...

# def standard_variable(name, value, unit):
#     sn = PIVMETA.get(name)
#     if unit != '':
#         qk = get_qudt_from_string(unit)
#         return pivmeta.NumericalVariable(hasNumericalValue=value,
#                                          hasStandardName="https://matthiasprobst.github.io/pivmeta#sensor_pixel_width",
#                                          hasUnit=unit)
#     return pivmeta.NumericalVariable(hasNumericalValue=value,
#                                      hasStandardName="https://matthiasprobst.github.io/pivmeta#sensor_pixel_width")

In [None]:
# standard_variable('x_pixel_coordinate', 1.4, 'm/s')
# # download the TTL file (https://matthiasprobst.github.io/pivmeta/ontology.ttl) and find out the quantity type, then verify it!

In [None]:
# standard_variable(name='sensor_pixel_width', value=1008, unit='m/s')
# standard_variable(name='sensor_pixel_width', value=1008, unit='') ## TODO quantity kind must be determined automatically!

In [None]:
camera = pivmeta.DigitalCamera(  # is a subclass of m4i.Tool, so use hasParameter
    label='KODAK ES1.0 b & w',
    fnumber='f/2',
    hasParameter=[
        pivmeta.NumericalVariable(
            hasNumericalValue=1008,
            hasStandardName="https://matthiasprobst.github.io/pivmeta#sensor_pixel_width"),
        pivmeta.NumericalVariable(
            hasNumericalValue=1018,
            hasStandardName="https://matthiasprobst.github.io/pivmeta#sensor_pixel_height"),
        pivmeta.NumericalVariable(
            hasNumericalValue=9.072,
            hasUnit='um',
            hasKindOfQuantity=QUDT_KIND.Length,
            hasStandardName="https://matthiasprobst.github.io/pivmeta#ccd_width"),
        pivmeta.NumericalVariable(
            hasNumericalValue=9.07,
            hasUnit='um',
            hasKindOfQuantity=QUDT_KIND.Length,
            hasStandardName="https://matthiasprobst.github.io/pivmeta#ccd_height"),
        pivmeta.NumericalVariable(
            label='focal length',
            hasNumericalValue=35,
            hasUnit='mm',
            hasKindOfQuantity=QUDT_KIND.Length,
            hasStandardName="https://matthiasprobst.github.io/pivmeta#focal_length",
            hasVariableDescription='Nikkor')
    ]
)
camera.model_dump(exclude_none=True)

Note, that sometimes there is a helper method for a model class. For DigitalCamera there is:

In [None]:
cam = pivmeta.DigitalCamera.build_minimal(
    label='KODAK ES1.0 b & w',
    sensor_pixel_size=[1008, 1018],
    ccd_pixel_size_um=[9.07, 9.07],
    fnumber='f/2',
    focal_length_mm=35,
    image_coding='8 bits'
)
cam.model_dump(exclude_none=True)

In [None]:
cam

Now, let's describe the complete dataset:

In [None]:
ds = dcat.Dataset(
    title='piv-challenge-1-C',
    creator=creator,
    modified="2000-10-28",
    landingPage="https://www.pivchallenge.org/pub/index.html#c",
    description="Different velocity gradients with spatially varying image quality (provided by Okamoto) < synthetic > [256 x 128]",
    distribution=[
        pivmeta.PivImageDistribution(
            title='Raw piv image data',
            downloadURL='https://www.pivchallenge.org/pub/C/C.zip',
            mediaType='https://www.iana.org/assignments/media-types/image/tiff',
            compressedFormat='application/zip',
            pivImageType=PIVMETA.SyntheticImage,
            numberOfRecords=1,  # It contains one double image
            filenamePattern=r"^C\d{3}_\d.tif$",  # the regex for the filename
            imageBitDepth=8
        ),
        pivmeta.PivMaskDistribution(
            title='Mask file',
            downloadURL='https://www.pivchallenge.org/pub/C/C.zip',
            compressedFormat='application/zip',  # https://www.w3.org/TR/vocab-dcat-2/#Property:distribution_compression_format
            mediaType='https://www.iana.org/assignments/media-types/image/tiff',
            filenamePattern="Cmask_1.tif"  # for compressed data
        ),
        dcat.Distribution(
            title='ReadMe file',
            downloadURL='https://www.pivchallenge.org/pub/E/readmeE.txt'
        ),
    ]
)

## Export to JSON-LD

The dataset python object can be written to JSON-LD like so:

In [None]:
with open('piv_challenge.jsonld', 'w') as f:
    json_ld_str = ds.dump_jsonld(context='https://raw.githubusercontent.com/matthiasprobst/pivmeta/main/pivmeta_context.jsonld')
    f.write(json_ld_str)
print(json_ld_str)

## Re-use the dataset

Now, that we have written the metadata to the file, we would like to reuse it, i.e. identify specific data

### Find distribution within JSON-LD file

In [None]:
import ontolutils

In [None]:
dss = ontolutils.query(dcat.Dataset, source='piv_challenge.jsonld')
ds = dss[0]
ds.model_dump(exclude_none=True)

In [None]:
ds.creator

In [None]:
image_dists = ontolutils.query(pivmeta.PivImageDistribution, source='piv_challenge.jsonld')
image_dist = image_dists[0]

In [None]:
from pprint import pprint
pprint(image_dist.model_dump())

In [None]:
zip_filename = image_dist.download(dest_filename='imgs.zip', overwrite_existing=False)

In [None]:
import zipfile
import re
from typing import Union
import pathlib

def get_files(img_dir, pattern: Union[re.Pattern, str]):
    img_dir = pathlib.Path(img_dir)
    if isinstance(pattern, str):
        pattern = re.compile(pattern)
    filenames = []
    for f in img_dir.iterdir():
        if f.is_file():
            if pattern.match(f.name):
                filenames.append(f)
    return filenames

with zipfile.ZipFile(zip_filename, 'r') as zip_ref:
    zip_ref.extractall('imgs')

In [None]:
get_files('imgs', image_dist.filenamePattern)

In [None]:
image_dist.is_synthetic()

In [None]:
mask_dists = ontolutils.query(pivmeta.PivMaskDistribution, source='piv_challenge.jsonld')
mask_dist = mask_dists[0]

In [None]:
get_files('imgs', mask_dist.filenamePattern)

In [None]:
# import cv2
# import pivimage
# pivimage.PIVImagePair('imgs/C001_1.tif', 'imgs/C001_2.tif').plot()

In [None]:
# import h5rdmtoolbox as h5tbx

In [None]:
# piv_software = pivmeta.PIVSoftware(
#     id='https://www.mypivsoftware.com/',
#     author=prov.Organisation(
#         name='MyPIVSoftware GmbH',
#         mbox='info@mypivsoftware.com',
#         url='https://www.mypivsoftware.com/'
#     ),
#     hasDocumentation='https://www.mypivsoftware.com/download/docs/Manual.pdf',
# )
# from pprint import pprint
# # pprint(piv_software.dump_jsonld())
# # pprint(piv_software.model_dump_namespaced())
# with h5tbx.File('piv_software.h5', 'w') as h5:
#     piv_software.dump_hdf(h5.create_group('my_piv_software'))
#     h5.dump(collapsed=False)

In [None]:
# from h5rdmtoolbox import jsonld
# import pivmetalib
# from pprint import pprint

In [None]:
# with h5tbx.File('piv_software.h5', 'r') as h5:
#     pprint(jsonld.dumpd(h5['my_piv_software'],
#                       context=pivmetalib.get_context_json())['@graph'])
#     pprint(jsonld.dumpd(h5['my_piv_software']))

In [None]:
# with h5tbx.File(mode='w') as h5:
#     g = h5.create_group('my dataset')
#     ds.dump_hdf(g)
#     h5.dump(collapsed=False)

In [None]:
# print(ds.dump_jsonld())

In [None]:
# with h5rdmtoolbox.File('exp1_001.hdf', 'w') as h5:
#     h5.create_dataset(