Skip to content

Commit

Permalink
backlog
Browse files Browse the repository at this point in the history
  • Loading branch information
mr-c committed Apr 12, 2016
1 parent 6801e51 commit b814ca8
Show file tree
Hide file tree
Showing 4 changed files with 173 additions and 1 deletion.
2 changes: 1 addition & 1 deletion about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,6 @@ CWL Community Engineer.

.. raw:: html

<a href="https://impactstory.org/MichaelRCrusoe"><img src="https://impactstory.org/logo/small" width="200" /></a>
<a href="https://impactstory.org/u/0000-0002-2961-9670"><img src="https://impactstory.org/logo/small" width="200" /></a>


74 changes: 74 additions & 0 deletions community-engineer-update.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
.. post:: 2016-04-08
:tags: update
:author: me
:location: Brussels, Belgium

***************************************
CWL Community Engineer Six Month Update
***************************************

Hello everyone. It has been a while since my last update. Here is what has been
happening in the CWL world since I began full time work for the project.

Draft 3 was released and the community has committed to releasing 1.0 of the
standards before ISMB. I will be the release driver.

I met with potential fiscal sponsor, the Software Freedom Conservancy, and
submitted application for the CWL project to become part of their 501(c)(3).

I've been taking full advantage of my base in Europe (first Romania and now
Belgium) to raise and enhanced the perception of CWL in the European life science
computing community. I presented at two leading centers (SciLifeLab in
Stockholm, Flanders ExaScience Lab in Belgium) and I participated in four
ELIXIR sponsored hackathons: Amsterdam, Netherlands; Freiburg, Germany;
Copenhagen, Denmark; and Trondheim, Norway.

A CWL subgroup of academic cluster users are figuring out what changes are
needed to support non-containerized tool execution. Non-cloud support
for CWL will be critical for wider adoption. I have enjoyed coordinating this
group and I was able to host a visiting Australian graduate student (Kevin
Murray) who has gotten Docker containers to work on older platforms without
needing to upgrade them or use ``root`` privileges.

The CWL is spreading into the wider F/OSS tech world thanks to a partnership
with the Debian-Med community, the leading community packagers of bioinformatic
tools and workflows. In support of this partnership I applied for and received
official status within that community (“Debian Maintainer”) and I have an
application for full status (“Debian Developer”) in progress.

Roman Valls Guimera and I have started a sub project to automatically produce
CWL descriptions for those Python tools who use Python’s standard argument
parser. This is now a Google Summer of Code project that will hopefully
get their support for a student to work on over the summer.

Speaking of GSoC, I agreed to co-mentor (with Stian Soiland-Reyes) another
student's project to add CWL support to the Apache Taverna project.

New Implementations & SDKs:
Paul Gross's Java re-implementation has already found a couple issue with the
specifications and fixes have been incorporated.

Sketched out plan for using Peter Amstutz’s “schema salad” tool to
auto-generate code for representing the CWL object model in as many different
languages as we care for. This is a critical first step to having autogenerated
SDKs in multiple languages.

Reference implementation improvements:
Finished review of Peter Amstutz’s ‘cwltool’ and ‘schema salad’. I am maturing
his work by adding Python 3 compatibility, type checking, code cleanups, and
documentation.

Other CWL impacts:
Sent letter of support for Dr. Bernhard Renard, Robert Koch Institute
(Germany), and his “Collaborative Benchmarking of Bioinformatics Tools and
Workflows (CoBe)” project which uses CWL as a core technology.

GA4GH container registry API project: CWL a key component and seen as a leader
on the metadata issue; many CWL community members participate in their weekly call

Logo acquired, Twitter account created, domain name purchased.

Continuous testing of CWL implementations. Peter Amstutz and I have setup
https://ci.commonwl.org to testing the conformance of CWL implementations on a
continuous basis.

16 changes: 16 additions & 0 deletions cwl-paris-hackathon.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
.. post:: 2016-03-21
:tags: events
:author: me
:location: Paris, France

******************************************************
Technical Hackathon : Tools, Workflows and Workbenches
******************************************************

A hackathon bringing together developers from the ELIXIR Tools & Data Services
Registry, Galaxy, Taverna, Arvados, CWL, ReGaTE and EDAM ontology, with Galaxy
instance providers from ELIXIR and beyond, to promote collaboration and
technical developments will take place on 18-20 May 2016 at the Institut
Pasteur in Paris. [Further details to
follow](https://www.elixir-europe.org/events/technical-hackathon-tools-workflows-and-workbenches).

82 changes: 82 additions & 0 deletions tacc-201511.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
.. post::
:tags: weekly-update
:author: me
:location: Austin

********************************************
Summary of TACC Life Science Computing visit
********************************************

Met with John Fonner, Joe Stubbs, Matt Vaghen, Rion Dooley, Victor Eijkhout

Very positive about CWL; they have agreed to become a CWL partner; effort will
come from TACC sources; possibly also IPlant. They are also working on an app
directory.

Existing capabilities:

From the CWL perspective, `The Agave Platform <http://agaveapi.co/>`__ is a
multi-tenent, multi-execution-environment remote job runner. The primary use
case is submission of jobs via a command line tool; specification of tool
options is done via a JSON formated plain text file.

Their workflow manager is called `endofday`. It started as a nextflow based
docker orchestration program; it is now a pydoit based Docker & Agave
application orchestration program. For long-running analysis steps; not (web)
services.

Areas of concern:

How and where to make link between a generic CWL tool description & a
particular tenant? This will be a concern for other platforms that don't use
Docker, such as Galaxy.

Their asks:
[1] site specific config
[2] Python & Java SDKs/libraries autogenerated from the spec for parsing CWL
files.
[3] document how to run the test suite by hand
[4] best practices document: imports at top; IDs defined explicitly for each
tool.
[5] reduce syntax verbosity via implicit namespacing. [Does Draft 3 satisfy
that?]


Follow up: John Fonner & others to present Agave & their workflow system to the
CWL group during the December 1st video chat. They will meet privately after
that to organize; MRC to follow up on Dec

A lot of the discussion was about the collaboration model between the larger
CWL community and specific implementations: how will tool and workflow
descriptions be shared?

For implementations not using Docker: one collaboration model is to fork each
tool description as that tool is installed: adding implementation specific
fields to indicate which tenant the tool is installed to and other required
details. In the case of a tool being installed multiple times the tool ID would
be changed to allow for unique references from workflows. In this model
workflows from outside sources would also be customized to refer to these
platform-specific tools.

Concerns about the portability of such workflows outside the implementations
that produced them were raised.

Another proposal was to add another stanza to the job document (along with the
already approved for Draft 3 identifier of which workflow or tool to run).

However this could get quite unwieldy for users, especially for complex
workflows with many steps & applications.

While this information could be added on a per-tool basis to the CLI interface
description document it would require changing the tool IDs from the community
maintained copies thus breaking portability of workflows that reference such
tools.

Misc questions:

How to mark input as required / optional? (Is this the `type: [null, ...]`
trick?
Would like to be able to feed output document back in as new input document to
reproduce/re-do analysis automatically. Great idea, easily doable by adding the
input document to the output object and updating the spec to specify that the
output stanza (if any) should be ignored on input objects.

0 comments on commit b814ca8

Please sign in to comment.