Runnable examples of YW provenance queries highlighted in poster for DataONE AHM 2016.
The purpose of this demo is to demonstrate the
Yesworkflow (YW) query ability to use the
prospective provenance created by YW and the
retrospective provenance together to answer queries that can not be answered solely by prospective provenance or retrospective provenance.
The prospective provenance in this demo is created by YW which models conventional scripts and programs as scientific workflows. YW can provide a number of the benefits of using a scientific workflow management system without having to rewrite scripts and other scientific software. A YW user simply adds special YW comments to existing scripts. These comments declare how data is used and results produced, step by step, by the script. Then, YW interprets these comments and produces graphical output that reveals the stages of computation and the flow of data in the script.
There are various approaches to capture retrospective provenance. Retrospective provenance observables, e.g., from
DataONE RunManagers (file-level),
ReproZip (OS-level), or
noWorkflow (Python code-level) only yield isolated fragments of the overall data lineage and processing history. In this demo, two types of retrospective provenance observables are used:
DataONE RunManager. The
yw-recon can search the file system for files that match the URI templates declared for @IN and @OUT ports in the script. On the other hand,
DataONE RunManager can record a list of input and output files for a script run.
Layouts of Repository
|examples/||Contains examples demonstrating the queries in the queries folder|
|queries/||it stores the scripts to the nine demo queries we asked.|
|rules/||it contains a set of Prolog rules for generating prospective yesworkflow views rules (
The example subfolders also have a typical folder structure:
Subfolders that all
<my_example> folders have:
|script/||the example script or scripts that make up <my_example>|
|facts/||the YW facts for <my_example>, generated by running YW on the example script(s)|
|views/||materialized views for <my_example>|
|recon/||reconstructed provenance used for <my_example>|
|results/||all artifacts generated by make.sh|
|supplementary/||a folder with supplementary files and information about the example|
|clean.sh||removes generated demo artifacts for <my_example>|
|make.sh||creates demo artifacts for <my_example>|
Note: after running
make.sh, you can use git status to see what demo artifacts have just been created.
simulate_data_collection/ ├── clean.sh ├── facts │ ├── yw_extract_facts.P │ └── yw_model_facts.P ├── make.sh ├── results ├── script │ ├── calibration.img │ ├── cassette_q55_spreadsheet.csv │ └── simulate_data_collection.py └── views └── yw_views.P
Installing, Browsing, and Running the Demo
- The following free software are required in order to run this demo.
XSB: a Logic Programming and Deductive Database system for Unix and Windows. It is available at [XSB homepage] (http://xsb.sourceforge.net). The download and installation page for XSB is at [here] (http://xsb.sourceforge.net/downloads/downloads.html).
SQLite: a high-reliability, embedded, zero-configuration, public-domain, SQL database engine. It is availabe at SQLite homepage.
- The following open-source packages are used in our demo project.
- Clone the
dataone-ahm-2016git repo to your local machine using the command:
git clone https://github.com/idaks/dataone-ahm-2016-poster.git.
Running the Demo
Go to the examples/ folder. We have provided four examples here:
- One MATLAB example (
- Three Python examples (
- One MATLAB example (
Go to one of the above example. First, run the cleaning script by calling
Run the demo example by calling
Developing your own Demo
Copy your example folder under examples/ folder. There are already four examples there:
Reorganize your directory layout for your example to be the same as
simulate_data_collection. Create a
recon/folder which contains your
Copy two script files
simulate_data_collectionof the existing three examples to your own example folder.
make.shand customize the scripting name, outputfile name, parameter data object name to your example.
Please read Query README in the demo repo.
How to run the Demo using Docker
We have created a Docker image (
yesworkflow/provenance-demo) to help readers to explore the YesWorkflow demonstrated provenance queries. In the
yesworkflow/provenance-demo image, the XSB, Graphivz, YesWorkflow, noWorkflow, dataone demo queries are installed. Users can boot up a Docker container to run the demo provenance queries using this image within seconds, without the need to manually install packages.
Here are instructions for each OS:
As part of this installation process, you’ll need to use a shell prompt. There’s a special version of the shell that comes pre-configured for using Docker commands. Users need to use the above shell prompt in order to run a Docker command or type a specific Docker command. Here is how to open it:
- Mac OS – launch the
Docker Quickstart Terminalapplication from Launchpad.
- Linux – launch any bash shell prompt, and
dockerwill already be available.
- Windows – click the
Docker Quickstart Terminalicon on your desktop.
Downloading Docker image
Users can use the following command to download the image from Docker Hub which is similar to GitHub. The command syntax is
docker pull IMAGE_NAME. The name of our current provenance query image is yesworkflow/provenance-demo. Users can type the following command into a shell prompt.
docker pull yesworkflow/provenance-demo
This will download the image from
Docker Hub for Docker images.
Running a container from a Docker image
Once downloaded the image, users can run it using the command
docker run. Executing
docker run will create a Docker container which is isolated from the user's local computer. Here are some configuration options for
-i: interactive session
-v H:C: mount the host path on your computer
Hat the path
Cinside the Docker container.
The full command to run the provenance query looks like:
docker run -it -v $HOME:$HOME yesworkflow/provenance-demo
Then, users can go to ... to check the query results.