# Describing PIV recordings

Let's say you recorded multiple PIV images and put them into a *ZIP* folder. The scenario is shown below. We will describe the data stored in the zip folder using linked-data syntax. The collection of PIV images is called a *dataset*. The information about it will be stored in a JSON-LD file:

![piv_image_dataset_management](piv_image_dataset_management.svg)

The ["PIV Challenge"](https://www.pivchallenge.org/) datasets will serve as real-world examples. We will describe one of them.

Before we start, let's get clear about the vocabulary/semantic:

We will use multiple vocabularies and ontologies. At the core, we will use the "Data Catalog Vocabulary" (dcat), which allows describing datasets. According to [dcat](https://www.w3.org/TR/vocab-dcat-2/), dataset and distribution, the main objects in our problem are described as follows:
- *dcat:Dataset*: "A collection of data, published or curated by a single agent, and available for access or download in one or more representations."
- *dcat:Distribution*: "A specific representation of a dataset. A dataset might be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above)."

Besides the description of file objects (*dcat:Distribution*), the dataset has many more properties, like the creator, a description and more such fields. We will add all this below.

## Imports

We will import some namespace modules, which are provided by `pivmetalib`. All these namespace modules contain classes representing the concepts of their ontology. E.g. `prov` contains the class `Person` and describes [*prov:Person*](https://www.w3.org/ns/prov#Person). The most important properties of a person, such as the first and last name, the email or a researcher ID is implemented as class attributes. Learn more bout in the [GettingStarted Notebook](./GettingStarted.ipynb)

Here's an example for a Person, who is one of the contributors to the PIV Challenge datasets:

In [1]:
from pivmetalib import prov

creator = prov.Person(
    lastName='Okamoto',
    mbox="okamoto@tokai.t.u-tokyo.ac.jp"
)
creator

Other important modules are `dcat` and `pivmeta`. The namespace module `dcat` contains *Distribution* and *Dataset*. The *pivmeta*-ontology provides many more PIV-specific concepts, among others it defines e.g. *PIVImageDistribution*, which is a (sub)type of *Distribution* and tells us, that the distribution contains PIV images as compared to other regular files, like README files, for example.

Let's import the other modules:

In [2]:
from pivmetalib import dcat # dcat import Dataset, Distribution
from pivmetalib import pivmeta # we will need PivImageDistribution
from pivmetalib import PIVMETA  # the namespace module containing the URI addresses

## Data collection

We refer to the [PIV-Challenge](https://www.pivchallenge.org/pub/) website for all information, i.e. data and metadata. Much is written in the README file but some metadata is also available in the HTML text.

Here is a (probably incomplete) list of metadata:
- case/dataset name: "C"
- description: "Strong wall reflection in an impeller (background images and mask are provided), (provided by Stanislas)"
- long description from README: "The set of images is referenced C001_1.tif and C001_2.tif...The two white circles are the two edges of the fixed vaneless diffuser."
- image type: "real"
- number of sets: "1 + 2bg + 1 msk"
- author(s): "Stanislas"
- camera characteristics (see README): "Type: KODAK ES1.0 b & w.....Acquisition software	INSIGHT 2.10."

The challenge is to translate this into to a common language so that datasets become comparable, also from other sources. This is exactly what the `  T h a`-ontology achieves. Let's dive into building an interoperable description of the dataset:

## Describe the dataset

The package `pivmetalib` has implemented the [RDF](https://www.w3.org/RDF/) vocabularies as python objects. Their parameters validated.

Let's first examine this by creating the person, who created the dataset:

## Author

In [3]:
creator = prov.Person(
    lastName='Stanislas',
    mbox="pivnet-sig32@univ-lille1.fr"
)
creator

## Camera

The most important properties of a camera used for PIV is the sensor size and the lens used.

Properties of a tool like a camera can be described via `m4i:NumericalVariable`. 

**Sensor size**

In [4]:
sensor_width = pivmeta.NumericalVariable(value=1008, label="sensor width", description="The width of the camera sensor")
sensor_height = pivmeta.NumericalVariable(value=1008, label="sensor height", description="The height of the camera sensor")

See, a numerical variable is not very precise. Hence, we added a label and a description. Even more effectively would be the usage of standard names from a list of well-defined names. For this, we make use of the [SSNO-Ontology](https://matthiasprobst.github.io/ssno/).

For this, we first need to download the standard name table, we or our porject agreed on:

In [5]:
from ssnolib import StandardNameTable

In [6]:
snt = StandardNameTable.download(url="https://zenodo.org/records/14175299/files/Standard_Name_Table_for_Particle_Image_Velociemtry_data.jsonld?download=1", fmt="jsonld")
sn_dict = snt.get_standard_name_dict()

The standard name "ccd_width" and "ccd_height" are the standardized names to be used for the description of the sensor width and height. The user may refer to the standard name table for more precise information. Meanwhile, the standard name string is an ideal way of searching for specific information. Here is our improved variables:

In [7]:
sensor_width = pivmeta.NumericalVariable(value=1008, standard_name=sn_dict["sensor_pixel_width"])
sensor_height = pivmeta.NumericalVariable(value=1008, standard_name=sn_dict["sensor_pixel_height"])

There is more information about the camera:

In [8]:
pixel_width = pivmeta.NumericalVariable(value=9.072, hasUnit='um', hasStandardName=sn_dict["ccd_width"])
pixel_height = pivmeta.NumericalVariable(value=9.072, hasUnit='um', hasStandardName=sn_dict["ccd_height"])

In [9]:
objective = pivmeta.Objective(
    label="Nikkor",
    fnumber='f/2',
    hasParameter=[pivmeta.NumericalVariable(label='focal length', value=9.072, hasUnit='mm', hasStandardName=sn_dict["focal_length"])]
)

In [10]:
camera = pivmeta.DigitalCamera(  # is a subclass of m4i.Tool, so use hasParameter
    label='KODAK ES1.0 b & w',
    hasParameter=[sensor_width, sensor_height, pixel_width, pixel_height],
    hasPart=objective
)
# print(camera.model_dump_jsonld(exclude_none=True))

Now, let's describe the complete dataset:

In [11]:
piv_setup = pivmeta.PIVSetup(haspart=[camera,])

In [12]:
ds = pivmeta.PIVDataset(
    title='piv-challenge-1-C',
    creator=creator,
    modified="2000-10-28",
    landingPage="https://www.pivchallenge.org/pub/index.html#c",
    description="Different velocity gradients with spatially varying image quality (provided by Okamoto) < synthetic > [256 x 128]",
    hasPart=piv_setup,
    distribution=[
        pivmeta.PIVImageDistribution(
            label="Raw PIV image data",
            title='Raw piv image data',
            downloadURL='https://www.pivchallenge.org/pub/C/C.zip',
            mediaType='https://www.iana.org/assignments/media-types/image/tiff',
            compressedFormat='application/zip',
            pivImageType=PIVMETA.SyntheticImage,
            numberOfRecords=1,  # It contains one double image
            filenamePattern=r"C[0-9][0-9][0-9]_[1,2].tif",  # the regex for the filename
            imageBitDepth=8
        ),
        pivmeta.PIVMaskDistribution(
            label='Mask file',
            title='Mask file',
            downloadURL='https://www.pivchallenge.org/pub/C/C.zip',
            compressedFormat='application/zip',  # https://www.w3.org/TR/vocab-dcat-2/#Property:distribution_compression_format
            mediaType='https://www.iana.org/assignments/media-types/image/tiff',
            filenamePattern="Cmask_1.tif"  # for compressed data
        ),
        dcat.Distribution(
            label='README file',
            title='ReadMe file',
            downloadURL='https://www.pivchallenge.org/pub/E/readmeE.txt'
        ),
    ]
)

## Export to JSON-LD

The dataset python object can be written to JSON-LD like so:

In [13]:
with open('piv_challenge.jsonld', 'w') as f:
    json_ld_str = ds.model_dump_jsonld(context={"local":"https://example.org/"}).replace("_:", "local:")
    f.write(json_ld_str)
print(json_ld_str)

{
    "@context": {
        "owl": "http://www.w3.org/2002/07/owl#",
        "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
        "dcat": "http://www.w3.org/ns/dcat#",
        "dcterms": "http://purl.org/dc/terms/",
        "pivmeta": "https://matthiasprobst.github.io/pivmeta#",
        "local": "https://example.org/",
        "prov": "http://www.w3.org/ns/prov#",
        "foaf": "http://xmlns.com/foaf/0.1/",
        "m4i": "http://w3id.org/nfdi4ing/metadata4ing#",
        "schema": "https://schema.org/",
        "obo": "http://purl.obolibrary.org/obo/",
        "ssno": "https://matthiasprobst.github.io/ssno#",
        "skos": "http://www.w3.org/2004/02/skos/core#"
    },
    "@type": "pivmeta:PIVDataset",
    "dcterms:title": "piv-challenge-1-C",
    "dcterms:description": "Different velocity gradients with spatially varying image quality (provided by Okamoto) < synthetic > [256 x 128]",
    "dcterms:creator": {
        "@type": "prov:Person",
        "foaf:mbox": "pivnet-sig32@un

## Re-use the dataset

Now, that we have written the metadata to the file, we would like to reuse it, i.e. identify specific data

### Find distribution within JSON-LD file

In [14]:
import ontolutils

In [15]:
loaded_ds = pivmeta.PIVDataset.from_jsonld(source='piv_challenge.jsonld', limit=1)
loaded_ds.model_dump(exclude_none=True)

{'id': 'https://example.org/N09580465f017468b85b8ff17e0e42557',
 'title': 'piv-challenge-1-C',
 'description': 'Different velocity gradients with spatially varying image quality (provided by Okamoto) < synthetic > [256 x 128]',
 'creator': {'id': '_:Nf2c28425d3ae41f8a860cf39410b1003',
  'mbox': 'pivnet-sig32@univ-lille1.fr',
  'last_name': 'Stanislas',
  '@id': 'https://example.org/N570171c3a9a14796b82396cc1011ce44',
  '@type': 'http://www.w3.org/ns/prov#Person'},
 'distribution': [{'@id': 'https://example.org/N8da732699d424b40b63461b8ab896395',
   '@type': 'https://matthiasprobst.github.io/pivmeta#PIVImageDistribution',
   'label': 'Raw PIV image data',
   'title': 'Raw piv image data',
   'downloadURL': 'https://www.pivchallenge.org/pub/C/C.zip',
   'mediaType': 'https://www.iana.org/assignments/media-types/image/tiff',
   'filenamePattern': 'C[0-9][0-9][0-9]_[1,2].tif',
   'pivImageType': 'https://matthiasprobst.github.io/pivmeta#SyntheticImage',
   'imageBitDepth': '8',
   'numberO

In [16]:
loaded_ds.creator

In [18]:
image_dist = pivmeta.PIVImageDistribution.from_jsonld(source='piv_challenge.jsonld', limit=1)
image_dist

In [19]:
from pprint import pprint
pprint(image_dist.model_dump())

{'access_URL': None,
 'byte_size': None,
 'creator': None,
 'description': None,
 'download_URL': Url('https://www.pivchallenge.org/pub/C/C.zip'),
 'filenamePattern': 'C[0-9][0-9][0-9]_[1,2].tif',
 'id': 'https://example.org/N8da732699d424b40b63461b8ab896395',
 'identifier': None,
 'image_bit_depth': 8,
 'keyword': None,
 'label': 'Raw PIV image data',
 'media_type': Url('https://www.iana.org/assignments/media-types/image/tiff'),
 'number_of_records': 1,
 'piv_image_type': Url('https://matthiasprobst.github.io/pivmeta#SyntheticImage'),
 'title': 'Raw piv image data',
 'version': None}


In [20]:
zip_filename = image_dist.download(dest_filename='imgs.zip', overwrite_existing=False)

In [21]:
import zipfile
import pathlib

with zipfile.ZipFile(zip_filename, 'r') as zip_ref:
    zip_ref.extractall('imgs')

In [22]:
image_dist.is_synthetic()

False

In [24]:
mask_dist = pivmeta.PIVMaskDistribution.from_jsonld(source='piv_challenge.jsonld', limit=1)
mask_dist

In [25]:
filenames = sorted(pathlib.Path('imgs').glob(mask_dist.filenamePattern))
filenames

[WindowsPath('imgs/Cmask_1.tif')]