# Describe a PIV recording

Let's say you recorded multiple PIV images and put them into a *ZIP* folder. The scenario is shown below. We will describe the data stored in the zip folder using linked-data syntax. The collection of PIV images is called a *dataset*. The information about it will be stored in a JSON-LD file:

![piv_image_dataset_management](piv_image_dataset_management.svg)

The ["PIV Challenge"](https://www.pivchallenge.org/) datasets will serve as real-world examples. We will describe one of them.

Before we start, let's get clear about the vocabulary/semantic:

We will use multiple vocabularies and ontologies. At the core, we will use the "Data Catalog Vocabulary" (dcat), which allows describing datasets. According to [dcat](https://www.w3.org/TR/vocab-dcat-2/), dataset and distribution, the main objects in our problem are described as follows:
- *dcat:Dataset*: "A collection of data, published or curated by a single agent, and available for access or download in one or more representations."
- *dcat:Distribution*: "A specific representation of a dataset. A dataset might be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above)."

Besides the description of file objects (*dcat:Distribution*), the dataset has many more properties, like the creator, a description and more such fields. We will add all this below.

## Imports

We will import some namespace modules, which are provided by `pivmetalib`. All these namespace modules contain classes representing the concepts of their ontology. E.g. `prov` contains the class `Person` and describes [*prov:Person*](https://www.w3.org/ns/prov#Person). The most important properties of a person, such as the first and last name, the email or a researcher ID is implemented as class attributes. Learn more bout in in the [GettingStarted Notebook](./GettingStarted.ipynb)

Here's an example for a Person:

In [1]:
from pivmetalib import prov

creator = prov.Person(
    lastName='Okamoto',
    mbox="okamoto@tokai.t.u-tokyo.ac.jp"
)
creator

Other important modules are `dcat` and `pivmeta`. The namespace module `dcat` contains *Distribution* and *Dataset*. The *pivmeta*-ontology provides many more PIV-specific concepts, among others it defines e.g. *PIVImageDistribution*, which is a (sub)type of *Distribution* and tells us, that the distribution contains PIV images as compared to other regular files, like README files, for example.

Let's import the other modules:

In [2]:
from pivmetalib import dcat # dcat import Dataset, Distribution
from pivmetalib import pivmeta # we will need PivImageDistribution
from ontolutils import PIVMETA  # the namespace module containing the URI addresses

## Data collection

We refer to the [PIV-Challenge](https://www.pivchallenge.org/pub/) website for all information, i.e. data and metadata. Much is written in the README file but some metadata is also available in the HTML text.

Here is a (probably incomplete) list of metadata:
- case/dataset name: "C"
- description: "Strong wall reflection in an impeller (background images and mask are provided), (provided by Stanislas)"
- long description from README: "The set of images is referenced C001_1.tif and C001_2.tif...The two white circles are the two edges of the fixed vaneless diffuser."
- image type: "real"
- number of sets: "1 + 2bg + 1 msk"
- author(s): "Stanislas"
- camera characteristics (see README): "Type: KODAK ES1.0 b & w.....Acquisition software	INSIGHT 2.10."

The challenge is to translate this into to a common language so that datasets become comparable, also from other sources. This is exactly what the `  T h a`-ontology achieves. Let's dive into building an interoperable description of the dataset:

## Describe the dataset

The package `pivmetalib` has implemented the [RDF](https://www.w3.org/RDF/) vocabularies as python objects. Their parameters validated.

Let's first examine this by creating the person, who created the dataset:

## Author

In [3]:
creator = prov.Person(
    lastName='Stanislas',
    mbox="pivnet-sig32@univ-lille1.fr"
)
creator

## Camera

The most important properties of a camera used for PIV is the sensor size and the lens used:

In [4]:
from pivmetalib import m4i
from ontolutils import PIVMETA, QUDT_KIND

**Sensor size**

In [5]:
sensor_width = pivmeta.NumericalVariable(value=1008,
                                         standard_name="https://matthiasprobst.github.io/pivmeta#sensor_pixel_width")
sensor_height = pivmeta.NumericalVariable(value=1008,
                                         standard_name="https://matthiasprobst.github.io/pivmeta#sensor_pixel_width")

In [6]:
camera = pivmeta.DigitalCamera(  # is a subclass of m4i.Tool, so use hasParameter
    label='KODAK ES1.0 b & w',
    fnumber='f/2',
    hasParameter=[
        sensor_width, sensor_height,
        pivmeta.NumericalVariable(
            value=9.072,
            hasUnit='um',
            hasKindOfQuantity=QUDT_KIND.Length,
            standard_name="https://matthiasprobst.github.io/pivmeta#ccd_width"),
        pivmeta.NumericalVariable(
            value=9.07,
            hasUnit='um',
            hasKindOfQuantity=QUDT_KIND.Length,
            standard_name="https://matthiasprobst.github.io/pivmeta#ccd_height"),
        pivmeta.NumericalVariable(
            label='focal length',
            value=35,
            hasUnit='mm',
            hasKindOfQuantity=QUDT_KIND.Length,
            standard_name="https://matthiasprobst.github.io/pivmeta#focal_length",
            hasVariableDescription='Nikkor')
    ]
)
camera.model_dump(exclude_none=True)

{'id': '_:N54572e9276d048e3a351bea598a890ce',
 'label': 'KODAK ES1.0 b & w',
 'parameter': [{'id': '_:N40663cc2f13545618132a30183120b61', 'value': 1008},
  {'id': '_:Ncc75f0cac0234e3bb668c448c87144b9', 'value': 1008},
  {'id': '_:N8b968042c4354519bb9b2ac2759b2ec5',
   'value': 9.072,
   'unit': 'http://qudt.org/vocab/unit/MicroM',
   'quantity_kind': 'http://qudt.org/vocab/quantitykind/Length'},
  {'id': '_:N4e10e885f5a34a23a65d5ab1ff72c082',
   'value': 9.07,
   'unit': 'http://qudt.org/vocab/unit/MicroM',
   'quantity_kind': 'http://qudt.org/vocab/quantitykind/Length'},
  {'id': '_:N40e140eba9ea4b9bba85400072995d09',
   'label': 'focal length',
   'value': 35,
   'unit': 'http://qudt.org/vocab/unit/MilliM',
   'quantity_kind': 'http://qudt.org/vocab/quantitykind/Length',
   'hasVariableDescription': 'Nikkor'}],
 'hasParameter': [{'id': '_:N40663cc2f13545618132a30183120b61',
   'value': 1008,
   'standard_name': 'https://matthiasprobst.github.io/pivmeta#sensor_pixel_width'},
  {'id': 

In [7]:
# # dont use PIVMETA better build a SNT-Namespace-like-class similar to PIVMETA
# # consider outsourcing this to a separate package onto_utils, which let's you build namespace classes...

# def standard_variable(name, value, unit):
#     sn = PIVMETA.get(name)
#     if unit != '':
#         qk = get_qudt_from_string(unit)
#         return pivmeta.NumericalVariable(value=value,
#                                          standard_name="https://matthiasprobst.github.io/pivmeta#sensor_pixel_width",
#                                          hasUnit=unit)
#     return pivmeta.NumericalVariable(value=value,
#                                      standard_name="https://matthiasprobst.github.io/pivmeta#sensor_pixel_width")

In [8]:
# standard_variable('x_pixel_coordinate', 1.4, 'm/s')
# # download the TTL file (https://matthiasprobst.github.io/pivmeta/ontology.ttl) and find out the quantity type, then verify it!

In [9]:
# standard_variable(name='sensor_pixel_width', value=1008, unit='m/s')
# standard_variable(name='sensor_pixel_width', value=1008, unit='') ## TODO quantity kind must be determined automatically!

Note, that sometimes there is a helper method for a model class. For DigitalCamera there is:

In [10]:
cam = pivmeta.DigitalCamera.build_minimal(
    label='KODAK ES1.0 b & w',
    sensor_pixel_size=[1008, 1018],
    ccd_pixel_size_um=[9.07, 9.07],
    fnumber='f/2',
    focal_length_mm=35,
    image_coding='8 bits'
)
cam.model_dump(exclude_none=True)

{'id': '_:N0dc756128778421fb58a235c26d0a310',
 'label': 'KODAK ES1.0 b & w',
 'parameter': [{'id': '_:Nac445d26829340a4aa654fbe3ee0149c',
   'value': 35,
   'unit': 'http://qudt.org/vocab/unit/MilliM',
   'quantity_kind': 'http://qudt.org/vocab/quantitykind/Length'},
  {'id': '_:N978237770d16456ba5b889870318ca84',
   'label': 'sensor_pixel_width',
   'value': 1008},
  {'id': '_:Neb6f5a1b73df4c2aa5a06431fc687f1f',
   'label': 'sensor_pixel_height',
   'value': 1018},
  {'id': '_:Nc7e9dc5e6b624d328ffd3a46c79a3d8d',
   'label': 'ccd_pixel_width',
   'value': 9.07,
   'unit': 'http://qudt.org/vocab/unit/MicroM',
   'quantity_kind': 'http://qudt.org/vocab/quantitykind/Length'},
  {'id': '_:N4898d48a6b61447cb3417255a6197b4f',
   'label': 'ccd_pixel_height',
   'value': 9.07,
   'unit': 'http://qudt.org/vocab/unit/MicroM',
   'quantity_kind': 'http://qudt.org/vocab/quantitykind/Length'},
  {'id': '_:Naeb19a8036884c3b842aa90b369d43f7',
   'label': 'image_coding',
   'value': '8 bits'}],
 'hasP

In [11]:
cam

Now, let's describe the complete dataset:

In [12]:
ds = dcat.Dataset(
    title='piv-challenge-1-C',
    creator=creator,
    modified="2000-10-28",
    landingPage="https://www.pivchallenge.org/pub/index.html#c",
    description="Different velocity gradients with spatially varying image quality (provided by Okamoto) < synthetic > [256 x 128]",
    distribution=[
        pivmeta.PivImageDistribution(
            title='Raw piv image data',
            downloadURL='https://www.pivchallenge.org/pub/C/C.zip',
            mediaType='https://www.iana.org/assignments/media-types/image/tiff',
            compressedFormat='application/zip',
            pivImageType=PIVMETA.SyntheticImage,
            numberOfRecords=1,  # It contains one double image
            filenamePattern=r"C[0-9][0-9][0-9]_[1,2].tif",  # the regex for the filename
            imageBitDepth=8
        ),
        pivmeta.PivMaskDistribution(
            title='Mask file',
            downloadURL='https://www.pivchallenge.org/pub/C/C.zip',
            compressedFormat='application/zip',  # https://www.w3.org/TR/vocab-dcat-2/#Property:distribution_compression_format
            mediaType='https://www.iana.org/assignments/media-types/image/tiff',
            filenamePattern="Cmask_1.tif"  # for compressed data
        ),
        dcat.Distribution(
            title='ReadMe file',
            downloadURL='https://www.pivchallenge.org/pub/E/readmeE.txt'
        ),
    ]
)

## Export to JSON-LD

The dataset python object can be written to JSON-LD like so:

In [13]:
with open('piv_challenge.jsonld', 'w') as f:
    json_ld_str = ds.model_dump_jsonld(
        context={"@import": "https://raw.githubusercontent.com/matthiasprobst/pivmeta/main/pivmeta_context.jsonld"}
    )
    f.write(json_ld_str)
print(json_ld_str)

{
    "@context": {
        "owl": "http://www.w3.org/2002/07/owl#",
        "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
        "dcat": "http://www.w3.org/ns/dcat#",
        "dcterms": "http://purl.org/dc/terms/",
        "prov": "http://www.w3.org/ns/prov#",
        "@import": "https://raw.githubusercontent.com/matthiasprobst/pivmeta/main/pivmeta_context.jsonld",
        "foaf": "http://xmlns.com/foaf/0.1/",
        "m4i": "http://w3id.org/nfdi4ing/metadata4ing#",
        "schema": "https://schema.org/",
        "pivmeta": "https://matthiasprobst.github.io/pivmeta#"
    },
    "@type": "dcat:Dataset",
    "dcterms:title": "piv-challenge-1-C",
    "dcterms:description": "Different velocity gradients with spatially varying image quality (provided by Okamoto) < synthetic > [256 x 128]",
    "dcterms:creator": {
        "@type": "prov:Person",
        "foaf:mbox": "pivnet-sig32@univ-lille1.fr",
        "foaf:lastName": "Stanislas",
        "@id": "_:N1b6f07b929cc4a509f95d5150e6eac31

## Re-use the dataset

Now, that we have written the metadata to the file, we would like to reuse it, i.e. identify specific data

### Find distribution within JSON-LD file

In [14]:
import ontolutils

In [15]:
ds = dcat.Dataset.from_jsonld(source='piv_challenge.jsonld', limit=1)
ds.model_dump(exclude_none=True)

{'id': '_:N4b73f581a47d44f4a6459aabe0e97110',
 'title': 'piv-challenge-1-C',
 'description': 'Different velocity gradients with spatially varying image quality (provided by Okamoto) < synthetic > [256 x 128]',
 'creator': {'id': '_:Ne3c4e5ffc7ae4735aae87efcce4e2de7',
  'mbox': 'pivnet-sig32@univ-lille1.fr',
  'type': 'http://www.w3.org/ns/prov#Person'},
 'distribution': [{'id': '_:N90d650c377db4dfdbb0ef78a92b7cb97',
   'title': 'Raw piv image data',
   'download_URL': Url('https://www.pivchallenge.org/pub/C/C.zip'),
   'media_type': Url('https://www.iana.org/assignments/media-types/image/tiff'),
   'pivImageType': 'https://matthiasprobst.github.io/pivmeta#SyntheticImage',
   'compressedFormat': 'application/zip',
   'type': 'https://matthiasprobst.github.io/pivmeta#PivImageDistribution',
   'imageBitDepth': '8',
   'filenamePattern': 'C[0-9][0-9][0-9]_[1,2].tif',
   'numberOfRecords': '1'},
  {'id': '_:N68b0d806384e48a69fb7e65c242e8458',
   'title': 'Mask file',
   'download_URL': Url(

In [16]:
ds.creator

In [17]:
image_dist = pivmeta.PivImageDistribution.from_jsonld(source='piv_challenge.jsonld', limit=1)
image_dist

In [18]:
from pprint import pprint
pprint(image_dist.model_dump())

{'access_URL': None,
 'byte_size': None,
 'compressedFormat': 'application/zip',
 'creator': None,
 'description': None,
 'download_URL': Url('https://www.pivchallenge.org/pub/C/C.zip'),
 'filenamePattern': 'C[0-9][0-9][0-9]_[1,2].tif',
 'id': '_:N2dbba4f850d14f838780b853bfbf4894',
 'identifier': None,
 'image_bit_depth': 8,
 'keyword': None,
 'label': None,
 'media_type': Url('https://www.iana.org/assignments/media-types/image/tiff'),
 'number_of_records': 1,
 'piv_image_type': Url('https://matthiasprobst.github.io/pivmeta#SyntheticImage'),
 'title': 'Raw piv image data',
 'version': None}


In [19]:
zip_filename = image_dist.download(dest_filename='imgs.zip', overwrite_existing=False)

In [20]:
import zipfile
import pathlib

with zipfile.ZipFile(zip_filename, 'r') as zip_ref:
    zip_ref.extractall('imgs')

In [21]:
image_dist.is_synthetic()

False

In [22]:
mask_dist = pivmeta.PivMaskDistribution.from_jsonld(source='piv_challenge.jsonld', limit=1)
mask_dist

In [23]:
filenames = sorted(pathlib.Path('imgs').glob(mask_dist.filenamePattern))
filenames

[WindowsPath('imgs/Cmask_1.tif')]