Skip to content

Commit

Permalink
Update docs prov (#119)
Browse files Browse the repository at this point in the history
* added prov overview to docs

* use images folder for docs
  • Loading branch information
cehbrecht committed Feb 15, 2021
1 parent f25e1df commit 59401af
Show file tree
Hide file tree
Showing 7 changed files with 251 additions and 0 deletions.
Binary file added docs/source/_images/prov-example.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_images/prov-overview.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_images/prov-subset.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_images/prov-workflow.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
dev_guide
notebooks
processes
prov
changes

Indices and tables
Expand Down
79 changes: 79 additions & 0 deletions docs/source/prov-example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
{
"prefix": {
"provone": "http://purl.dataone.org/provone/2015/01/15/ontology#",
"dcterms": "http://purl.org/dc/terms/",
"default": "http://purl.org/roocs/prov#"
},
"agent": {
"copernicus_CDS": {
"prov:type": "prov:Organization",
"dcterms:title": "Copernicus Climate Data Store"
},
"rook": {
"prov:type": "prov:SoftwareAgent",
"dcterms:source": "https://github.com/roocs/rook/releases/tag/v0.2.0"
},
"daops": {
"prov:type": "prov:SoftwareAgent",
"dcterms:source": "https://github.com/roocs/daops/releases/tag/v0.3.0"
}
},
"wasAttributedTo": {
"_:id1": {
"prov:entity": "rook",
"prov:agent": "copernicus_CDS"
}
},
"entity": {
"workflow": {
"prov:type": "provone:Workflow"
},
"c3s-cmip6.ScenarioMIP.INM.INM-CM5-0.ssp245.r1i1p1f1.day.tas.gr1.v20190619": {},
"tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20160101-20201229.nc": [{}, {}],
"tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20170101-20171229.nc": {}
},
"activity": {
"orchestrate": [{
"prov:startedAtTime": "2021-02-15T13:24:33"
}, {
"prov:endedAtTime": "2021-02-15T13:24:57"
}],
"subset_tas_1": {
"time": "2016-01-01/2020-12-30",
"apply_fixes": false
},
"subset_tas_2": {
"time": "2017-01-01/2017-12-30",
"apply_fixes": false
}
},
"wasAssociatedWith": {
"_:id2": {
"prov:activity": "orchestrate",
"prov:agent": "rook",
"prov:plan": "workflow"
},
"_:id3": {
"prov:activity": "subset_tas_1",
"prov:agent": "daops",
"prov:plan": "workflow"
},
"_:id5": {
"prov:activity": "subset_tas_2",
"prov:agent": "daops",
"prov:plan": "workflow"
}
},
"wasDerivedFrom": {
"_:id4": {
"prov:generatedEntity": "tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20160101-20201229.nc",
"prov:usedEntity": "c3s-cmip6.ScenarioMIP.INM.INM-CM5-0.ssp245.r1i1p1f1.day.tas.gr1.v20190619",
"prov:activity": "subset_tas_1"
},
"_:id6": {
"prov:generatedEntity": "tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20170101-20171229.nc",
"prov:usedEntity": "tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20160101-20201229.nc",
"prov:activity": "subset_tas_2"
}
}
}
171 changes: 171 additions & 0 deletions docs/source/prov.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
.. _prov:

Provenance
==========

.. contents::
:local:
:depth: 1

Introduction
------------

The *rook* processes are recording `provenance information`_ about the process execution details.
This information includes:

* used software and versions (``rook``, ``daops``, ...)
* applied operators like ``subset`` and ``average``
* used input data and parameters (cmip6 dataset, time, area)
* generated outputs (NetCDF files)
* execution time (start-time and end-time)

This information is described with the `W3C PROV`_ standard and using
the `Python PROV Library`_

Overview of PROV
----------------

The `W3C PROV Primer`_ document gives an overview of the `W3C PROV`_ standard.

.. image:: _images/prov-overview.png

A PROV document consists of *agents*, *activities* and *entities*.
These can be connected via PROV *relations* like *wasDerivedFrom*.

Entities
++++++++

W3C PROV
In PROV, physical, digital, conceptual, or other kinds of thing are called *entities*.

In *rook* we use *entities* for:

* workflow description,
* input datasets and
* resulting output NetCDF files.

Activities
++++++++++

W3C PROV
*Activities* are how entities come into existence
and how their attributes change to become new entities,
often making use of previously existing entities to achieve this.

In *rook* we use *activities* for:

* operators like ``subset`` and ``average``.
* processes like ``orchestrate`` to run a workflow.

Agent
+++++

W3C PROV
An *agent* takes a role in an activity such that the agent can be assigned
some degree of responsibility for the activity taking place.
An agent can be a person, a piece of software or an organisation.

In *rook* we use *agents* for:

* software like *rook* and *daops*,
* organisations like *Copernicus Climate Data Store*.

Namespaces
++++++++++

W3C PROV
Using URIs and namespaces, a provenance record can draw from multiple sources on the Web.

We use namespaces to use existing PROV vocabularies
like ``prov:SoftwareAgent``. These are for example:

* PROV (by W3C): https://www.w3.org/ns/prov/
* PROVONE (by DataONE_): https://purl.dataone.org/provone/2015/01/15/ontology
* dcterms (Dublin Core Metadata): https://dublincore.org/specifications/dublin-core/dcmi-terms/

Subset Example
++++++++++++++

.. image:: _images/prov-subset.png

The *activity* ``subset`` is started by the software *agent* ``daops`` (Python library)
which was triggered by ``rook`` (data-reduction service).

The NetCDF file ``tas_day_...nc`` *entity* was derived from ``c3s-cmip6`` dataset *entity*
using the *activity* ``subset``.

Workflow Example
++++++++++++++++

.. image:: _images/prov-workflow.png

W3C PROV Plans
Activities may follow pre-defined procedures, such as recipes, tutorials, instructions, or workflows.
PROV refers to these, in general, as *plans*.

In W3C PROV workflows are named *plans*.

The *activity* ``orchestrate`` is started by the *agent* ``rook``. It uses
a workflow document ``entity`` (*plan*) which consists of a ``subset`` and ``average``
*activity*. These activities are started by the software *agent* ``daops``.

Example: Workflow with Subsetting Operators
-------------------------------------------

The rooki_ client for ``rook`` has example notebooks_ for process executions
and displaying the provenance information.

You can run the ``orchestrate`` process to execute a workflow with subsetting operators
and show the provenance document:

.. code-block:: python
:linenos:
:emphasize-lines: 14-17
from rooki import operators as ops
wf = ops.Subset(
ops.Subset(
ops.Input(
'tas', ['c3s-cmip6.ScenarioMIP.INM.INM-CM5-0.ssp245.r1i1p1f1.day.tas.gr1.v20190619']
),
time="2016-01-01/2020-12-30",
),
time="2017-01-01/2017-12-30",
)
resp = wf.orchestrate()
# show URLs of output files
resp.download_urls()
# show URL to provenance document
resp.provenance()
# show URL to provenance image
resp.provenance_image()
The response of the process includes a provenance document in PROV-JSON_ format:

.. literalinclude:: prov-example.json
:language: JSON


This provenance document can also be displayed as an image:

.. image:: _images/prov-example.png
:alt: Provenance Example


Related work in other Projects
------------------------------

The ESMValTool_ project is recording provenance information of scientific workflows run as diagnostics.

The Climate4Impact_ project is using provenance to record the workflow of data staging and creating Jupyter notebooks.

.. _`provenance information`: https://www.dataone.org/uploads/DWS2015Provenance.pdf
.. _`Python PROV Library`: https://pypi.org/project/prov/
.. _`W3C PROV`: https://www.w3.org/TR/prov-dm/
.. _`W3C PROV Primer`: https://www.w3.org/TR/2013/NOTE-prov-primer-20130430/
.. _PROV-JSON: https://openprovenance.org/prov-json/
.. _DataONE: https://www.dataone.org/
.. _rooki: https://rooki.readthedocs.io/en/latest/
.. _notebooks: https://nbviewer.jupyter.org/github/roocs/rooki/tree/master/notebooks/demo/
.. _ESMValTool: https://docs.esmvaltool.org/en/latest/community/diagnostic.html?highlight=provenance#recording-provenance
.. _Climate4Impact: https://is.enes.org/files/C4ISWIRRLTraining.pdf

0 comments on commit 59401af

Please sign in to comment.