Skip to content

Commit 2002b49

Browse files
committed
Imporvements for Conda tutorials.
1 parent e6efb0f commit 2002b49

File tree

5 files changed

+231
-21
lines changed

5 files changed

+231
-21
lines changed

docs/_writing_conda_overview.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
.. note:: *Why Conda?*
2+
3+
Many different package managers could potentially be targeted here, but we focus on Conda_
4+
for a few key reasons.
5+
6+
* No compilation at install time - binaries with their dependencies and libraries
7+
* Support for all operating systems
8+
* Easy to manage multiple versions of the same recipe
9+
* HPC-ready: no root privileges needed
10+
* Easy-to-write YAML recipes
11+
* Viberant communities
12+
13+
.. note:: **Conda Terminology**
14+
15+
.. figure:: http://galaxyproject.github.io/training-material/topics/dev/images/miniconda_vs_anaconda.png
16+
:alt: Diagram describing the relationship between Conda, Miniconda, and Anaconda.
17+
18+
Conda *recipes* build *packages* that are published to *channels*.
19+
20+
Planemo is setup to target a few channels by default, these include ``iuc``, ``bioconda``,
21+
``conda_forge``, ``defaults`` - the whole dependency management scheme outlined here works a lot
22+
better if packages can be found in one of these "best practice" channels.

docs/_writing_dependencies_conda.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ will attempt to install Conda, check for referenced packages (such as
4141
perspective but other dependency resolution techniques are covered in
4242
the `Galaxy docs <https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__.
4343

44+
.. include:: _writing_conda_overview.rst
45+
4446
We can check if the requirements on a tool are available in best practice
4547
Conda channels using an extended form of the ``planemo lint`` command. Passing
4648
``--conda_requirements`` flag will ensure all listed requirements are found.

docs/_writing_dependencies_conda_cwl.rst

Lines changed: 202 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -11,25 +11,25 @@ Specifying and Using `Software Requirements`_
1111

1212
.. note:: Why not just use containers?
1313

14-
Containers are great, use containers (be it Docker, Singularity, etc.) whenever possible to
15-
increase reproducibility and portability - but building ad hoc containers to support CWL
16-
tools has some limitations that this document describes a process for addressing.
14+
Containers are great, use containers (be it Docker_, Singularity_, etc.) whenever possible to
15+
increase reproducibility and portability of your tools and workflow. Building ad hoc containers
16+
to support CWL tools (e.g. custom ``Dockerfile`` definitions) has serious limitations, in the next
17+
tutorial on containers we will argue that using Biocontainers_ built or discovered
18+
from your tool's `Software Requirements`_ is a superior approach.
1719

18-
There are technical reasons to describe `Software Requirements`_ in addition or in lieu
19-
of just using ad hoc containers - it will allow your tool to be used in environments without
20-
container runtimes available and the containers built from Conda software requirements are very
21-
likely to be "best practice" (e.g. smaller than ad hoc containers). Perhaps the most important
22-
reasons are less technical however such as reducing the opaqueness of traditional Docker
23-
containers.
20+
Besides leading to better containers, there are other reasons to describe
21+
`Software Requirements`_ also - it will allow your tool to be used in environments without
22+
container runtimes available and provides valuable and actionable metadata about the computation
23+
described by the tool.
2424

25-
Read more about this in our preprint `Practical computational reproducibility in the life sciences
26-
<https://www.biorxiv.org/content/early/2017/10/10/200683>`__
25+
Read more about this whole dependency stack in our preprint `Practical computational reproducibility
26+
in the life sciences <https://www.biorxiv.org/content/early/2017/10/10/200683>`__
2727

28-
The Common Workflow Language specification loosely describes
28+
The `Common Workflow Language`_ specification loosely describes
2929
`Software Requirements`_ - a way to map CWL hints to packages, environment
3030
modules, or any other mechanism to describe dependencies for running a tool
3131
outside of a container. The large and active Galaxy tool development community
32-
has built a library and set of best practices for describing dependencies
32+
has built an open source library and set of best practices for describing dependencies
3333
for Galaxy that should work just as well for CWL. The library has been integrated
3434
with cwltool_ and Toil_ to enable CWL tool authors and users to leverage the
3535
power and flexibility of the Galaxy dependency management and best practices.
@@ -49,9 +49,12 @@ a ``SoftwareRequirement`` in the form the following the YAML fragment::
4949
version:
5050
- "1.2"
5151

52-
Planemo (and cwltool_ and Toil_) can interpret these ``SoftwareRequirement`` in varoius ways
53-
including as Conda packages and install Conda packages referenced this way (such as ``seqtk``),
54-
and install them as needed for tool testing.
52+
Planemo (and cwltool_ and Toil_) can interpret these ``SoftwareRequirement`` annotations in various ways
53+
including as Conda packages. When interpreting these as Conda packages
54+
these runtimes can setup isolated, reproducible Conda environments for tool execution with the correct
55+
packages installed (e.g. ``seqtk`` in the above example).
56+
57+
.. include:: _writing_conda_overview.rst
5558

5659
We can check if the requirements on a tool are available in best practice
5760
Conda channels using an extended form of the ``planemo lint`` command (``planemo lint`` was
@@ -81,7 +84,7 @@ cwltool_ and Toil_).
8184

8285
$ planemo conda_install seqtk_seq.cwl
8386
Install conda target CondaTarget[seqtk,version=1.2]
84-
/home/john/miniconda2/bin/conda create -y --name __seqtk@1.2 seqtk=1.2
87+
/home/john/miniconda3/bin/conda create -y --name __seqtk@1.2 seqtk=1.2
8588
Fetching package metadata ...............
8689
Solving package specifications: ..........
8790

@@ -178,6 +181,184 @@ demonstrating using this tool.
178181
Since ``seqtk`` isn't on the path and we did not use a container, we can see the SoftwareRequirement
179182
resolution was successful and it found the environment we previously installed with ``conda_install``.
180183

184+
This can be used outside of Planemo testing as well, the following invocation shows running a job
185+
with cwltool_ using an environment like the one created above:
186+
187+
::
188+
189+
$ cwltool --no-container --beta-conda-dependencies seqtk_seq.cwl seqtk_seq_job.yml
190+
/Users/john/workspace/planemo/.venv/bin/cwltool 1.0.20180508202931
191+
Resolved 'seqtk_seq.cwl' to 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl'
192+
No handlers could be found for logger "rdflib.term"
193+
[job seqtk_seq.cwl] /private/tmp/docker_tmpDQYeqC$ seqtk \
194+
seq \
195+
-a \
196+
/private/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/tmpQwBqPo/stg8cf2282a-d807-4f90-b94d-feeda004cacd/2.fastq > /private/tmp/docker_tmpDQYeqC/out
197+
PREFIX=/Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/cwltool_deps/_conda
198+
installing: python-3.6.3-h47c878a_7 ...
199+
Python 3.6.3 :: Anaconda, Inc.
200+
installing: ca-certificates-2017.08.26-ha1e5d58_0 ...
201+
installing: conda-env-2.6.0-h36134e3_0 ...
202+
installing: libcxxabi-4.0.1-hebd6815_0 ...
203+
installing: tk-8.6.7-h35a86e2_3 ...
204+
installing: xz-5.2.3-h0278029_2 ...
205+
installing: yaml-0.1.7-hc338f04_2 ...
206+
installing: zlib-1.2.11-hf3cbc9b_2 ...
207+
installing: libcxx-4.0.1-h579ed51_0 ...
208+
installing: openssl-1.0.2n-hdbc3d79_0 ...
209+
installing: libffi-3.2.1-h475c297_4 ...
210+
installing: ncurses-6.0-hd04f020_2 ...
211+
installing: libedit-3.1-hb4e282d_0 ...
212+
installing: readline-7.0-hc1231fa_4 ...
213+
installing: sqlite-3.20.1-h7e4c145_2 ...
214+
installing: asn1crypto-0.23.0-py36h782d450_0 ...
215+
installing: certifi-2017.11.5-py36ha569be9_0 ...
216+
installing: chardet-3.0.4-py36h96c241c_1 ...
217+
installing: idna-2.6-py36h8628d0a_1 ...
218+
installing: pycosat-0.6.3-py36hee92d8f_0 ...
219+
installing: pycparser-2.18-py36h724b2fc_1 ...
220+
installing: pysocks-1.6.7-py36hfa33cec_1 ...
221+
installing: python.app-2-py36h54569d5_7 ...
222+
installing: ruamel_yaml-0.11.14-py36h9d7ade0_2 ...
223+
installing: six-1.11.0-py36h0e22d5e_1 ...
224+
installing: cffi-1.11.2-py36hd3e6348_0 ...
225+
installing: setuptools-36.5.0-py36h2134326_0 ...
226+
installing: cryptography-2.1.4-py36h842514c_0 ...
227+
installing: wheel-0.30.0-py36h5eb2c71_1 ...
228+
installing: pip-9.0.1-py36h1555ced_4 ...
229+
installing: pyopenssl-17.5.0-py36h51e4350_0 ...
230+
installing: urllib3-1.22-py36h68b9469_0 ...
231+
installing: requests-2.18.4-py36h4516966_1 ...
232+
installing: conda-4.3.31-py36_0 ...
233+
installation finished.
234+
Fetching package metadata .................
235+
Solving package specifications: .
236+
237+
Package plan for installation in environment /Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/cwltool_deps/_conda:
238+
239+
The following packages will be UPDATED:
240+
241+
conda: 4.3.31-py36_0 --> 4.3.33-py36_0 conda-forge
242+
243+
conda-4.3.33-p 100% |#################################################################| Time: 0:00:00 1.13 MB/s
244+
245+
246+
Package plan for installation in environment /Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/cwltool_deps/_conda/envs/__seqtk@1.2:
247+
248+
The following NEW packages will be INSTALLED:
249+
250+
seqtk: 1.2-1 bioconda
251+
zlib: 1.2.11-0 conda-forge
252+
253+
254+
[job seqtk_seq.cwl] completed success
255+
{
256+
"output1": {
257+
"checksum": "sha1$322e001e5a99f19abdce9f02ad0f02a17b5066c2",
258+
"basename": "out",
259+
"location": "file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/out",
260+
"path": "/Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/out",
261+
"class": "File",
262+
"size": 150
263+
}
264+
}
265+
Final process status is success
266+
267+
This demonstrates that cwltool will install the packages needed on the first run, if we rerun cwltool it will
268+
reuse that previous environment.
269+
270+
::
271+
272+
$ cwltool --no-container --beta-conda-dependencies seqtk_seq.cwl seqtk_seq_job.yml
273+
/Users/john/workspace/planemo/.venv/bin/cwltool 1.0.20180508202931
274+
Resolved 'seqtk_seq.cwl' to 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl'
275+
No handlers could be found for logger "rdflib.term"
276+
[job seqtk_seq.cwl] /private/tmp/docker_tmp4vvE_i$ seqtk \
277+
seq \
278+
-a \
279+
/private/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/tmpcvQ3Ph/stg2ef3a21c-9fb0-4099-88c2-36e24719901d/2.fastq > /private/tmp/docker_tmp4vvE_i/out
280+
[job seqtk_seq.cwl] completed success
281+
{
282+
"output1": {
283+
"checksum": "sha1$322e001e5a99f19abdce9f02ad0f02a17b5066c2",
284+
"basename": "out",
285+
"location": "file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/out",
286+
"path": "/Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/out",
287+
"class": "File",
288+
"size": 150
289+
}
290+
}
291+
Final process status is success
292+
293+
And the same thing is possible with Toil_.
294+
295+
::
296+
297+
$ cwltoil --no-container --beta-conda-dependencies seqtk_seq.cwl seqtk_seq_job.yml
298+
jlaptop17.local 2018-05-23 15:27:25,754 MainThread INFO toil.lib.bioio: Root logger is at level 'INFO', 'toil' logger at level 'INFO'.
299+
jlaptop17.local 2018-05-23 15:27:25,785 MainThread INFO toil.jobStores.abstractJobStore: The workflow ID is: '92328fb2-33b7-44cd-879f-41d8cbf94555'
300+
Resolved 'seqtk_seq.cwl' to 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl'
301+
jlaptop17.local 2018-05-23 15:27:25,787 MainThread INFO cwltool: Resolved 'seqtk_seq.cwl' to 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl'
302+
jlaptop17.local 2018-05-23 15:27:27,002 MainThread WARNING rdflib.term: http://schema.org/docs/!DOCTYPE html does not look like a valid URI, trying to serialize this will break.
303+
jlaptop17.local 2018-05-23 15:27:27,396 MainThread INFO rdflib.plugins.parsers.pyRdfa: Current options:
304+
preserve space : True
305+
output processor graph : True
306+
output default graph : True
307+
host language : RDFa Core
308+
accept embedded RDF : False
309+
check rdfa lite : False
310+
cache vocabulary graphs : False
311+
312+
jlaptop17.local 2018-05-23 15:27:29,797 MainThread INFO toil.common: Using the single machine batch system
313+
jlaptop17.local 2018-05-23 15:27:29,798 MainThread WARNING toil.batchSystems.singleMachine: Limiting maxCores to CPU count of system (8).
314+
jlaptop17.local 2018-05-23 15:27:29,798 MainThread WARNING toil.batchSystems.singleMachine: Limiting maxMemory to physically available memory (17179869184).
315+
jlaptop17.local 2018-05-23 15:27:29,808 MainThread INFO toil.common: Created the workflow directory at /var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/toil-92328fb2-33b7-44cd-879f-41d8cbf94555-132281828025877
316+
jlaptop17.local 2018-05-23 15:27:29,808 MainThread WARNING toil.batchSystems.singleMachine: Limiting maxDisk to physically available disk (202669449216).
317+
jlaptop17.local 2018-05-23 15:27:29,815 MainThread INFO toil.common: User script ModuleDescriptor(dirPath='/Users/john/workspace/planemo/.venv/lib/python2.7/site-packages', name='toil.cwl.cwltoil', fromVirtualEnv=True) belongs to Toil. No need to auto-deploy it.
318+
jlaptop17.local 2018-05-23 15:27:29,816 MainThread INFO toil.common: No user script to auto-deploy.
319+
jlaptop17.local 2018-05-23 15:27:29,816 MainThread INFO toil.common: Written the environment for the jobs to the environment file
320+
jlaptop17.local 2018-05-23 15:27:29,816 MainThread INFO toil.common: Caching all jobs in job store
321+
jlaptop17.local 2018-05-23 15:27:29,816 MainThread INFO toil.common: 0 jobs downloaded.
322+
jlaptop17.local 2018-05-23 15:27:29,911 MainThread INFO toil: Running Toil version 3.15.0-0e3a87e738f5e0e7cff64bfdad337d592bd92704.
323+
jlaptop17.local 2018-05-23 15:27:29,911 MainThread INFO toil.realtimeLogger: Real-time logging disabled
324+
jlaptop17.local 2018-05-23 15:27:29,937 MainThread INFO toil.toilState: (Re)building internal scheduler state
325+
2018-05-23 15:27:29,937 - toil.toilState - INFO - (Re)building internal scheduler state
326+
jlaptop17.local 2018-05-23 15:27:29,938 MainThread INFO toil.leader: Found 1 jobs to start and 0 jobs with successors to run
327+
2018-05-23 15:27:29,938 - toil.leader - INFO - Found 1 jobs to start and 0 jobs with successors to run
328+
jlaptop17.local 2018-05-23 15:27:29,938 MainThread INFO toil.leader: Checked batch system has no running jobs and no updated jobs
329+
2018-05-23 15:27:29,938 - toil.leader - INFO - Checked batch system has no running jobs and no updated jobs
330+
jlaptop17.local 2018-05-23 15:27:29,938 MainThread INFO toil.leader: Starting the main loop
331+
2018-05-23 15:27:29,938 - toil.leader - INFO - Starting the main loop
332+
jlaptop17.local 2018-05-23 15:27:29,939 MainThread INFO toil.leader: Issued job 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' seqtk seq e/V/jobsxUpTU with job batch system ID: 0 and cores: 1, disk: 3.0 G, and memory: 2.0 G
333+
2018-05-23 15:27:29,939 - toil.leader - INFO - Issued job 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' seqtk seq e/V/jobsxUpTU with job batch system ID: 0 and cores: 1, disk: 3.0 G, and memory: 2.0 G
334+
jlaptop17.local 2018-05-23 15:27:31,409 MainThread INFO toil.leader: Job ended successfully: 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' seqtk seq e/V/jobsxUpTU
335+
2018-05-23 15:27:31,409 - toil.leader - INFO - Job ended successfully: 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' seqtk seq e/V/jobsxUpTU
336+
jlaptop17.local 2018-05-23 15:27:31,411 MainThread INFO toil.leader: Finished the main loop: no jobs left to run
337+
2018-05-23 15:27:31,411 - toil.leader - INFO - Finished the main loop: no jobs left to run
338+
jlaptop17.local 2018-05-23 15:27:31,411 MainThread INFO toil.serviceManager: Waiting for service manager thread to finish ...
339+
2018-05-23 15:27:31,411 - toil.serviceManager - INFO - Waiting for service manager thread to finish ...
340+
jlaptop17.local 2018-05-23 15:27:31,946 MainThread INFO toil.serviceManager: ... finished shutting down the service manager. Took 0.535056114197 seconds
341+
2018-05-23 15:27:31,946 - toil.serviceManager - INFO - ... finished shutting down the service manager. Took 0.535056114197 seconds
342+
jlaptop17.local 2018-05-23 15:27:31,947 MainThread INFO toil.statsAndLogging: Waiting for stats and logging collator thread to finish ...
343+
2018-05-23 15:27:31,947 - toil.statsAndLogging - INFO - Waiting for stats and logging collator thread to finish ...
344+
jlaptop17.local 2018-05-23 15:27:31,960 MainThread INFO toil.statsAndLogging: ... finished collating stats and logs. Took 0.0131621360779 seconds
345+
2018-05-23 15:27:31,960 - toil.statsAndLogging - INFO - ... finished collating stats and logs. Took 0.0131621360779 seconds
346+
jlaptop17.local 2018-05-23 15:27:31,961 MainThread INFO toil.leader: Finished toil run successfully
347+
2018-05-23 15:27:31,961 - toil.leader - INFO - Finished toil run successfully
348+
{
349+
"output1": {
350+
"checksum": "sha1$322e001e5a99f19abdce9f02ad0f02a17b5066c2",
351+
"basename": "out",
352+
"nameext": "",
353+
"nameroot": "out",
354+
"http://commonwl.org/cwltool#generation": 0,
355+
"location": "file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/out",
356+
"class": "File",
357+
"size": 150
358+
}
359+
jlaptop17.local 2018-05-23 15:27:31,972 MainThread INFO toil.common: Successfully deleted the job store: <toil.jobStores.fileJobStore.FileJobStore object at 0x10554d490>
360+
}2018-05-23 15:27:31,972 - toil.common - INFO - Successfully deleted the job store: <toil.jobStores.fileJobStore.FileJobStore object at 0x10554d490>
361+
181362
.. include:: _writing_conda_search.rst
182363

183364
----------------------------------------------------------------
@@ -236,6 +417,10 @@ not work properly without modification.
236417
.. include:: _writing_conda_recipe_complete.rst
237418

238419
.. _Software Requirements: https://www.commonwl.org/v1.0/CommandLineTool.html#SoftwareRequirement
420+
.. _BioContainers: http://biocontainers.pro/
421+
.. _Docker: https://www.docker.com/
422+
.. _Singularity: https://singularity.lbl.gov/
423+
.. _Common Workflow Language: https://www.commonwl.org/
239424
.. _seqtk: https://github.com/lh3/seqtk
240425
.. _fleeqtk: https://github.com/jmchilton/fleeqtk
241426
.. _Bioconda: https://github.com/bioconda/bioconda-recipes
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
2+
input1:
3+
class: File
4+
path: test-data/2.fastq
Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,6 @@
11

22
- doc: test generated from example command
3-
job:
4-
input1:
5-
class: File
6-
path: test-data/2.fastq
3+
job: seqtk_seq_job.json
74
outputs:
85
output1:
96
path: test-data/2.fasta

0 commit comments

Comments
 (0)