Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CWL Discussion #27

Closed
rowlandm opened this issue Jul 20, 2018 · 19 comments
Closed

CWL Discussion #27

rowlandm opened this issue Jul 20, 2018 · 19 comments

Comments

@rowlandm
Copy link
Contributor

rowlandm commented Jul 20, 2018

There was feedback recently around people working on CWL. I think the time is right to have a chat around this at a National level. Can we make CWL a viable option for RSEs?

So far the people I know who have registered interest are:

  • Paul Hancock from Curtin, WA - Astronomy
  • Anthony Truskinger from QUT, QLD - Ecology
  • Marc Marenda from UoM, VIC - Veterinary Science
  • Pasi Korhonen from UoM, VIC - Veterinary Science

Please add your name, organisation and domain if you are interested and let's see if we can't arrange a time to meetup and discuss.

@rowlandm
Copy link
Contributor Author

Ideas for an agenda could include:

  • Who is using it now?
  • What problems are you having?
  • What could we do as a community to contribute to it?
  • Who would be keen to work together? Who has the time?
  • What would a project like this look like? eg. If we got two or three students to work across multiple workflows across multiple universities

@rowlandm
Copy link
Contributor Author

I know Evan Thomas at WEHI is working with CWL too:
https://github.com/WEHI-ResearchComputing/wehi-cwltools

@jimmybgammyknee
Copy link

We (with help from University of Adelaide ITS) are currently developing a workflow management environment that runs with wdl and cromwell (essentially CWL). So im happy to contribute

@rowlandm
Copy link
Contributor Author

Hi all,

I have put together an idea that maybe people could look at and comment for an ARDC proposal.
https://docs.google.com/document/d/1mspUNllbQIWCX5aOx-O2ocImSmSMSpnPFPNMvIgXDsE/edit?usp=sharing

You should all be able to comment.

I will be submitting this close of business on 23rd of July.

@manodeep
Copy link
Collaborator

@rowlandm Do you have any template/guidelines for ARDC proposals?

Separately, will you please clarify this sentence at the top I think the time is right to see if we can't at least have a chat around this at a National scale to see if we can't work together to make CWL a viable alternative for RSEs in Australia. Too many double-negatives in there; and, perhaps you mean to make CWL a viable option for RSEs?

@rowlandm
Copy link
Contributor Author

No - at the moment it's internal, I'm unsure if I'm supposed to be doing this ;)

@rowlandm
Copy link
Contributor Author

Did you change it already @manodeep - I can't see it :)

@manodeep
Copy link
Collaborator

The second sentence in the where you created this issue -- look at the very first comment on this issue. "There was feedback recently around people working on CWL. I think the time is right to see if we can't at least have a chat around this at a National scale to see if we can't work together to make CWL a viable alternative for RSEs in Australia."

@rowlandm
Copy link
Contributor Author

oh right! lol - I thought you meant the document :) Will change now.

@jimmybgammyknee
Copy link

Looks good @rowlandm. In "Why this idea is worth considering?", it would be great to include that we see CWL filling the need of researchers in developing workflows in all research spaces given that its flexible enough to be implemented within most current computing options for researchers (locally maintained machines, HPCs and cloud infrastructure). Also, some benchmarking with job schedulers (Slurm, SGE etc) would be super handy imo

@rowlandm
Copy link
Contributor Author

Cheers @jimmybgammyknee - added in your comments.

@manodeep
Copy link
Collaborator

manodeep commented Jul 22, 2018

Coming from an astronomy background, I have never had any exposure to CWL until recently. For me (and the astro community, in general) - we tend to gravitate towards using conda python for an equivalent "CWL" like experience. For instance, all unit/regression tests for well-known astronomy packages (e.g., astropy), use continuous integration (CI). And since such CI instances are virtual machines, i.e., mostly come without any software pre-installed, we have to specify exactly what software versions need to be installed. And frequently these installs have to be made without root access.

For instance, my code-base, Corrfunc has the following lines in the CI specification. Those lines fully specify the OS, the compiler, the gsl dependency, the python version, the numpy version required to install and run Corrfunc.

Is "CWL" meant to be a super-set and designed from the ground-up to capture a more diverse set of workflows?

@rowlandm
Copy link
Contributor Author

Basically it's like an API standard that they want to implement for existing and new workflow engines like Apache-Airflow, Cromwell, Rabix, Taverna.

So if you are working in Cromwell, but want to run it in Taverna, you should be able to export your workflow info in CWL format and import it into Taverna.

https://www.commonwl.org/

@rowlandm
Copy link
Contributor Author

For example, in Galaxy, if you had a CWL workflow, you could potentially import it into Galaxy and then any tools would be downloaded from the bioconda archive and installed via conda automatically. That is the dream!

@manodeep
Copy link
Collaborator

That seems very bio-informatics specific. Are all of those tools that you named closed-source?

@manodeep
Copy link
Collaborator

This is my current understanding of CWL :) --

If your workflow already only involves only python + installable packages (via pip/conda), then running on multiple infrastructure is fairly straight-forward. However, this is not the norm -- and then CWL can translate across various existing packages (each with their own conventions/nomenclature)

Is that broadly correct?

@rowlandm
Copy link
Contributor Author

Some of them are open source, not sure which ones exactly.

It's getting used in Astronomy, not much though. Gijs Molenaar has been working on it as part of LOFAR telescope and Paul Hancock is looking into it from Curtin in WA.

https://medium.com/@gijzelaerr/portable-radio-astronomy-data-processing-pipelines-4e6ba8b00ca3

@rowlandm
Copy link
Contributor Author

Yeah, in theory. Not so much in practise!

@rowlandm
Copy link
Contributor Author

Also the idea is to install the dependencies on the fly with conda. So much easier in Python than R!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants