- Install dependencies
pip install -e .
- Initialize disdat
dsdt init
- Start Jupyter Notebooks
jupyter notebook
- Open example notebooks
- Run simple example pipeline from the command line and create an output bundle
return_targets
cd disdat-examples
dsdt apply pipelines.return_targets.ReturnTargets
dsdt ls -v return_targets
NAME PROC_NAME OWNER DATE COMMITTED UUID TAGS
return_targets ReturnTargets____ca7a191361 kyocum 06-04-19 20:13:34 False bef67232-86b6-4847-a2db-bf55eadc674b
- Now dockerize the pipeline (assuming you remain in the repo's top-level directory and Docker is installed on your system).
dsdt dockerize .
- Now run the dockerized version of the pipeline.
dsdt run -f . pipelines.return_targets.ReturnTargets
dsdt ls -v return_targets
NAME PROC_NAME OWNER DATE COMMITTED UUID TAGS
return_targets ReturnTargets____ca7a191361 root 06-04-19 20:17:26 False 96abb085-bbdd-48b6-917d-d51d2c8ac744
return_targets ReturnTargets____ca7a191361 kyocum 06-04-19 20:13:34 False bef67232-86b6-4847-a2db-bf55eadc674b
Notice that the run command required us to specify the directory of the setup.py (like dsdt dockerize
) and we added -f
to force the entire pipeline to re-run.
The pipelines
directory also contains the mnist.py
and nlp_spacy.py
pipelines.
Unlike the above examples, we will run the MNIST and Spacy examples using the CLI.
Here we create a example-context
data context (the same used in the above examples) into which we'll place our data.
$ dsdt context example-context
$ dsdt switch example-context
We've adapted the Tensorflow Keras example here. Here we've
broken the example down into three steps in mnist.py <pipelines/mnist.py>
_, which you will see as three classes:
-
GetTFDS
: This downloads the mnist tfds and stores the files in a bundle named for the tfdsmnist
-
Train
: This PipeTask depends on theGetTFDS
tasks and trains a simple Keras NN using it. It stores the saved model into an output bundle calledmnist-trained
. -
Evaluate
: This PipeTask depends on both upstream tasks. It restores the model, and evaluates it. It returns a loss and accuracy in its output bundlemnist-evaluation
To run all three steps, tell the Disdat CLI to execute the last step:
$ dsdt apply pipelines.mnist.Evaluate
[ . . . lots of output . . . ]
===== Luigi Execution Summary =====
Scheduled 4 tasks of which:
* 4 ran successfully:
- 1 DriverTask(...)
- 1 Evaluate(...)
- 1 GetTFDS(...)
- 1 Train(...)
This progress looks :) because there were no failed tasks or missing dependencies
Now you've produced three bundles. Use dsdt ls
to see our three bundles. You can cat
each bundle to see what's inside. There you'll find all of our output files and
values.
$ dsdt ls m.*
mnist-evaluation
mnist-trained
mnist
$ dsdt cat mnist-evaluation
[0.08208457 0.97430003]
The Spacy example illustrates how you might include additional packages or data inside your Disdat container. In this case
we have created a MANIFEST.in
file which tells setuptools to include the data in pipelines/en_core_web
.
This trivial example simply shows how to use Python's built-in pkg_resources
to get the Spacy en_core_web
data. You can run
this example via
$dsdt apply pipelines.nlp_spacy.SimpleNLP