Runnable examples of YW provenance queries highlighted in poster for DataONE AHM 2016.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
SQLiteToYaml
docker
examples
poster_template
queries
rules
yw_jar
.DS_Store
.gitignore
README.md

README.md

dataone-ahm-2016-poster

Runnable examples of YW provenance queries highlighted in poster for DataONE AHM 2016.

Introduction

The purpose of this demo is to demonstrate the Yesworkflow (YW) query ability to use the prospective provenance created by YW and the retrospective provenance together to answer queries that can not be answered solely by prospective provenance or retrospective provenance.

The prospective provenance in this demo is created by YW which models conventional scripts and programs as scientific workflows. YW can provide a number of the benefits of using a scientific workflow management system without having to rewrite scripts and other scientific software. A YW user simply adds special YW comments to existing scripts. These comments declare how data is used and results produced, step by step, by the script. Then, YW interprets these comments and produces graphical output that reveals the stages of computation and the flow of data in the script.

There are various approaches to capture retrospective provenance. Retrospective provenance observables, e.g., from DataONE RunManagers (file-level), ReproZip (OS-level), or noWorkflow (Python code-level) only yield isolated fragments of the overall data lineage and processing history. In this demo, two types of retrospective provenance observables are used: yw-recon and DataONE RunManager. The yw-recon can search the file system for files that match the URI templates declared for @IN and @OUT ports in the script. On the other hand, DataONE RunManager can record a list of input and output files for a script run.

Layouts of Repository

Directory Description
examples/ Contains examples demonstrating the queries in the queries folder
queries/ it stores the scripts to the nine demo queries we asked.
rules/ it contains a set of Prolog rules for generating prospective yesworkflow views rules (yw_rules.P and yw_views.P), retrospective reconstructed rules (recon_rules.P), graph rendering rules (gv_rules.P), and populating graph rules (yw_graph_rules.P).

The example subfolders also have a typical folder structure:

dataone-ahm-2016-poster/examples/<my_example>/

Subfolders that all <my_example> folders have:

Directory Description
script/ the example script or scripts that make up <my_example>
facts/ the YW facts for <my_example>, generated by running YW on the example script(s)
views/ materialized views for <my_example>
recon/ reconstructed provenance used for <my_example>
results/ all artifacts generated by make.sh
supplementary/ a folder with supplementary files and information about the example
clean.sh removes generated demo artifacts for <my_example>
make.sh creates demo artifacts for <my_example>

Note: after running clean.sh and make.sh, you can use git status to see what demo artifacts have just been created.

simulate_data_collection/
├── clean.sh
├── facts
│   ├── yw_extract_facts.P
│   └── yw_model_facts.P
├── make.sh
├── results
├── script
│   ├── calibration.img
│   ├── cassette_q55_spreadsheet.csv
│   └── simulate_data_collection.py
└── views
    └── yw_views.P

Installing, Browsing, and Running the Demo

Installing

  1. The following free software are required in order to run this demo.
  1. The following open-source packages are used in our demo project.
  1. Clone the dataone-ahm-2016 git repo to your local machine using the command: git clone https://github.com/idaks/dataone-ahm-2016-poster.git.

Running the Demo

  1. Go to the examples/ folder. We have provided four examples here:

    • One MATLAB example (C3C4/)
    • Three Python examples (LIGO/, Twitter/and simulate_data_collection/)
  2. Go to one of the above example. First, run the cleaning script by calling bash clean.sh or ./clean.sh

  3. Run the demo example by calling bash make.sh or ./make.sh.

Developing your own Demo

  1. Copy your example folder under examples/ folder. There are already four examples there: C3C4, LIGO, Twitter, and simulate_data_collection.

  2. Reorganize your directory layout for your example to be the same as C3C4, LIGO, and simulate_data_collection. Create a recon/ folder which contains your reconfacts.P.

  3. Copy two script files clean.sh and make.sh from the simulate_data_collection of the existing three examples to your own example folder.

  4. Open make.sh and customize the scripting name, outputfile name, parameter data object name to your example.

  5. Run bash make.sh.

Demo Queries

Please read Query README in the demo repo.

How to run the Demo using Docker

We have created a Docker image (yesworkflow/provenance-demo) to help readers to explore the YesWorkflow demonstrated provenance queries. In the yesworkflow/provenance-demo image, the XSB, Graphivz, YesWorkflow, noWorkflow, dataone demo queries are installed. Users can boot up a Docker container to run the demo provenance queries using this image within seconds, without the need to manually install packages.

Installing Docker

Here are instructions for each OS:

As part of this installation process, you’ll need to use a shell prompt. There’s a special version of the shell that comes pre-configured for using Docker commands. Users need to use the above shell prompt in order to run a Docker command or type a specific Docker command. Here is how to open it:

  • Mac OS – launch the Docker Quickstart Terminal application from Launchpad.
  • Linux – launch any bash shell prompt, and docker will already be available.
  • Windows – click the Docker Quickstart Terminal icon on your desktop.

Downloading Docker image

Users can use the following command to download the image from Docker Hub which is similar to GitHub. The command syntax is docker pull IMAGE_NAME. The name of our current provenance query image is yesworkflow/provenance-demo. Users can type the following command into a shell prompt.

docker pull yesworkflow/provenance-demo

This will download the image from Docker Hub for Docker images.

Running a container from a Docker image

Once downloaded the image, users can run it using the command docker run. Executing docker run will create a Docker container which is isolated from the user's local computer. Here are some configuration options for docker run.

  • -i: interactive session
  • -t: TTY
  • -v H:C: mount the host path on your computer H at the path C inside the Docker container.

The full command to run the provenance query looks like:

docker run -it -v $HOME:$HOME yesworkflow/provenance-demo

Then, users can go to ... to check the query results.