-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* added prov overview to docs * use images folder for docs
- Loading branch information
Showing
7 changed files
with
251 additions
and
0 deletions.
There are no files selected for viewing
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
dev_guide | ||
notebooks | ||
processes | ||
prov | ||
changes | ||
|
||
Indices and tables | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
{ | ||
"prefix": { | ||
"provone": "http://purl.dataone.org/provone/2015/01/15/ontology#", | ||
"dcterms": "http://purl.org/dc/terms/", | ||
"default": "http://purl.org/roocs/prov#" | ||
}, | ||
"agent": { | ||
"copernicus_CDS": { | ||
"prov:type": "prov:Organization", | ||
"dcterms:title": "Copernicus Climate Data Store" | ||
}, | ||
"rook": { | ||
"prov:type": "prov:SoftwareAgent", | ||
"dcterms:source": "https://github.com/roocs/rook/releases/tag/v0.2.0" | ||
}, | ||
"daops": { | ||
"prov:type": "prov:SoftwareAgent", | ||
"dcterms:source": "https://github.com/roocs/daops/releases/tag/v0.3.0" | ||
} | ||
}, | ||
"wasAttributedTo": { | ||
"_:id1": { | ||
"prov:entity": "rook", | ||
"prov:agent": "copernicus_CDS" | ||
} | ||
}, | ||
"entity": { | ||
"workflow": { | ||
"prov:type": "provone:Workflow" | ||
}, | ||
"c3s-cmip6.ScenarioMIP.INM.INM-CM5-0.ssp245.r1i1p1f1.day.tas.gr1.v20190619": {}, | ||
"tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20160101-20201229.nc": [{}, {}], | ||
"tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20170101-20171229.nc": {} | ||
}, | ||
"activity": { | ||
"orchestrate": [{ | ||
"prov:startedAtTime": "2021-02-15T13:24:33" | ||
}, { | ||
"prov:endedAtTime": "2021-02-15T13:24:57" | ||
}], | ||
"subset_tas_1": { | ||
"time": "2016-01-01/2020-12-30", | ||
"apply_fixes": false | ||
}, | ||
"subset_tas_2": { | ||
"time": "2017-01-01/2017-12-30", | ||
"apply_fixes": false | ||
} | ||
}, | ||
"wasAssociatedWith": { | ||
"_:id2": { | ||
"prov:activity": "orchestrate", | ||
"prov:agent": "rook", | ||
"prov:plan": "workflow" | ||
}, | ||
"_:id3": { | ||
"prov:activity": "subset_tas_1", | ||
"prov:agent": "daops", | ||
"prov:plan": "workflow" | ||
}, | ||
"_:id5": { | ||
"prov:activity": "subset_tas_2", | ||
"prov:agent": "daops", | ||
"prov:plan": "workflow" | ||
} | ||
}, | ||
"wasDerivedFrom": { | ||
"_:id4": { | ||
"prov:generatedEntity": "tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20160101-20201229.nc", | ||
"prov:usedEntity": "c3s-cmip6.ScenarioMIP.INM.INM-CM5-0.ssp245.r1i1p1f1.day.tas.gr1.v20190619", | ||
"prov:activity": "subset_tas_1" | ||
}, | ||
"_:id6": { | ||
"prov:generatedEntity": "tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20170101-20171229.nc", | ||
"prov:usedEntity": "tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20160101-20201229.nc", | ||
"prov:activity": "subset_tas_2" | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,171 @@ | ||
.. _prov: | ||
|
||
Provenance | ||
========== | ||
|
||
.. contents:: | ||
:local: | ||
:depth: 1 | ||
|
||
Introduction | ||
------------ | ||
|
||
The *rook* processes are recording `provenance information`_ about the process execution details. | ||
This information includes: | ||
|
||
* used software and versions (``rook``, ``daops``, ...) | ||
* applied operators like ``subset`` and ``average`` | ||
* used input data and parameters (cmip6 dataset, time, area) | ||
* generated outputs (NetCDF files) | ||
* execution time (start-time and end-time) | ||
|
||
This information is described with the `W3C PROV`_ standard and using | ||
the `Python PROV Library`_ | ||
|
||
Overview of PROV | ||
---------------- | ||
|
||
The `W3C PROV Primer`_ document gives an overview of the `W3C PROV`_ standard. | ||
|
||
.. image:: _images/prov-overview.png | ||
|
||
A PROV document consists of *agents*, *activities* and *entities*. | ||
These can be connected via PROV *relations* like *wasDerivedFrom*. | ||
|
||
Entities | ||
++++++++ | ||
|
||
W3C PROV | ||
In PROV, physical, digital, conceptual, or other kinds of thing are called *entities*. | ||
|
||
In *rook* we use *entities* for: | ||
|
||
* workflow description, | ||
* input datasets and | ||
* resulting output NetCDF files. | ||
|
||
Activities | ||
++++++++++ | ||
|
||
W3C PROV | ||
*Activities* are how entities come into existence | ||
and how their attributes change to become new entities, | ||
often making use of previously existing entities to achieve this. | ||
|
||
In *rook* we use *activities* for: | ||
|
||
* operators like ``subset`` and ``average``. | ||
* processes like ``orchestrate`` to run a workflow. | ||
|
||
Agent | ||
+++++ | ||
|
||
W3C PROV | ||
An *agent* takes a role in an activity such that the agent can be assigned | ||
some degree of responsibility for the activity taking place. | ||
An agent can be a person, a piece of software or an organisation. | ||
|
||
In *rook* we use *agents* for: | ||
|
||
* software like *rook* and *daops*, | ||
* organisations like *Copernicus Climate Data Store*. | ||
|
||
Namespaces | ||
++++++++++ | ||
|
||
W3C PROV | ||
Using URIs and namespaces, a provenance record can draw from multiple sources on the Web. | ||
|
||
We use namespaces to use existing PROV vocabularies | ||
like ``prov:SoftwareAgent``. These are for example: | ||
|
||
* PROV (by W3C): https://www.w3.org/ns/prov/ | ||
* PROVONE (by DataONE_): https://purl.dataone.org/provone/2015/01/15/ontology | ||
* dcterms (Dublin Core Metadata): https://dublincore.org/specifications/dublin-core/dcmi-terms/ | ||
|
||
Subset Example | ||
++++++++++++++ | ||
|
||
.. image:: _images/prov-subset.png | ||
|
||
The *activity* ``subset`` is started by the software *agent* ``daops`` (Python library) | ||
which was triggered by ``rook`` (data-reduction service). | ||
|
||
The NetCDF file ``tas_day_...nc`` *entity* was derived from ``c3s-cmip6`` dataset *entity* | ||
using the *activity* ``subset``. | ||
|
||
Workflow Example | ||
++++++++++++++++ | ||
|
||
.. image:: _images/prov-workflow.png | ||
|
||
W3C PROV Plans | ||
Activities may follow pre-defined procedures, such as recipes, tutorials, instructions, or workflows. | ||
PROV refers to these, in general, as *plans*. | ||
|
||
In W3C PROV workflows are named *plans*. | ||
|
||
The *activity* ``orchestrate`` is started by the *agent* ``rook``. It uses | ||
a workflow document ``entity`` (*plan*) which consists of a ``subset`` and ``average`` | ||
*activity*. These activities are started by the software *agent* ``daops``. | ||
|
||
Example: Workflow with Subsetting Operators | ||
------------------------------------------- | ||
|
||
The rooki_ client for ``rook`` has example notebooks_ for process executions | ||
and displaying the provenance information. | ||
|
||
You can run the ``orchestrate`` process to execute a workflow with subsetting operators | ||
and show the provenance document: | ||
|
||
.. code-block:: python | ||
:linenos: | ||
:emphasize-lines: 14-17 | ||
from rooki import operators as ops | ||
wf = ops.Subset( | ||
ops.Subset( | ||
ops.Input( | ||
'tas', ['c3s-cmip6.ScenarioMIP.INM.INM-CM5-0.ssp245.r1i1p1f1.day.tas.gr1.v20190619'] | ||
), | ||
time="2016-01-01/2020-12-30", | ||
), | ||
time="2017-01-01/2017-12-30", | ||
) | ||
resp = wf.orchestrate() | ||
# show URLs of output files | ||
resp.download_urls() | ||
# show URL to provenance document | ||
resp.provenance() | ||
# show URL to provenance image | ||
resp.provenance_image() | ||
The response of the process includes a provenance document in PROV-JSON_ format: | ||
|
||
.. literalinclude:: prov-example.json | ||
:language: JSON | ||
|
||
|
||
This provenance document can also be displayed as an image: | ||
|
||
.. image:: _images/prov-example.png | ||
:alt: Provenance Example | ||
|
||
|
||
Related work in other Projects | ||
------------------------------ | ||
|
||
The ESMValTool_ project is recording provenance information of scientific workflows run as diagnostics. | ||
|
||
The Climate4Impact_ project is using provenance to record the workflow of data staging and creating Jupyter notebooks. | ||
|
||
.. _`provenance information`: https://www.dataone.org/uploads/DWS2015Provenance.pdf | ||
.. _`Python PROV Library`: https://pypi.org/project/prov/ | ||
.. _`W3C PROV`: https://www.w3.org/TR/prov-dm/ | ||
.. _`W3C PROV Primer`: https://www.w3.org/TR/2013/NOTE-prov-primer-20130430/ | ||
.. _PROV-JSON: https://openprovenance.org/prov-json/ | ||
.. _DataONE: https://www.dataone.org/ | ||
.. _rooki: https://rooki.readthedocs.io/en/latest/ | ||
.. _notebooks: https://nbviewer.jupyter.org/github/roocs/rooki/tree/master/notebooks/demo/ | ||
.. _ESMValTool: https://docs.esmvaltool.org/en/latest/community/diagnostic.html?highlight=provenance#recording-provenance | ||
.. _Climate4Impact: https://is.enes.org/files/C4ISWIRRLTraining.pdf |