Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CommonWL #1310

Open
gijzelaerr opened this issue Feb 7, 2017 · 11 comments
Open

CommonWL #1310

gijzelaerr opened this issue Feb 7, 2017 · 11 comments

Comments

@gijzelaerr
Copy link

gijzelaerr commented Feb 7, 2017

Hi! More of a question than a issue. Currently we are modelling our data processing workflow using common workflow language (CommonWL, CWL). We hope this gives us the most portability between the various workflow management tools available. I can't really find any work or talk about integrating pachyderm with CWL. Has there ever been any thoughts about doing this? How likely is it that a 'standard' like CWL would get adopted for pachyderm?

Thanks!

@mr-c
Copy link

mr-c commented Feb 7, 2017

@jdoliner
Copy link
Member

jdoliner commented Feb 7, 2017

CWL is something that's been on our radar for a while and is definitely something we're interested in exploring. A few questions I have:

  • you can of course run CWL workflows on Pachyderm by packaging up the runtime into a container and running it like that. Is that appealing as a stopgap measure or does that have too many drawbacks to be useful?
  • One obvious drawback is that this wouldn't be distributed. How do CWL implementations distribute the workload? Can they distribute individual steps in the workflow or just the different nodes of the workflow?
  • Would running a side cluster of one of the implementations on their site be an appealing way to run CWL workloads? This is a patterns that's emerging in Pachyderm right now for incorporating other protocols into pachyderm pipelines.

@mr-c
Copy link

mr-c commented Feb 8, 2017

Cool, glad to hear it.

  1. FYI: CWL is a standard, not a runtime. True, one could use any CWL implementation inside a container treating the entire workflow as a single step, but then you'd lose granularity.
  2. CWL implementations distribute their workload according to implementation specific heuristics as they may be running on a single node, high throughput computing cluster (grid), or one or more cloud providers. The CWL model is rich enough to provide them with lots of information to make these choices. Some implementations will implement CWL's scatter/gather across nodes, yes.
  3. I'll leave this for @gijzelaerr and others to answer, but from what I can see there should be enough information in CWL descriptions to convert into Pachyderm workflows.

Let me know if you'd like a real time video chat, I'd be happy to go into more depth and share about how various projects have implemented CWL support so far.

@samuell
Copy link

samuell commented Mar 7, 2017

Just to chime in with my 5c: Based on my experimentation with CWL and my breif reading of Pachyderm workflow examples, I get the impression that they are actually very similar.

I would even think just a converter from CWL yaml/json to Pachyderm yaml, would not be very hard at all.

CWL is also doing the definition of inputs / outputs (in-ports / out-ports in some contexts) in a more re-usable and composable way, as far as I can see, so I think pachy could probably learn a thing or two there as well, for any future updates of its format.

But all in all, the formats seem very similar.

@OliverEvans96
Copy link

OliverEvans96 commented Jul 12, 2017

Any further thought here? Has anyone attempted a CWL to Pachyderm yaml converter? Does it seem like there's sufficient feature parity for this to work well? I think it would make Pachyderm quite appealing to interface smoothly with the CWL standard.

@dannykwells
Copy link

Big Pachdyerm user here - we are using it throughout all of our pipelines. Are there any plans to support CWL soon? This language is becoming the lingua franca of computational genomics and support for it is extremely important for our continued use of Pachyderm.

@stale
Copy link

stale bot commented Feb 14, 2018

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 14, 2018
@gijzelaerr
Copy link
Author

well that would be a shame :)

@stale stale bot removed the stale label Feb 14, 2018
@marcadella
Copy link

marcadella commented May 26, 2019

As stated above it'd just be a matter of implementing a converter from CWL to pachyderm's pipeline def. I think it would be quite useful for pachyderm's community indeed.
A quick google search did not bring up any results. Has anyone seen a project somewhere?

@mr-c
Copy link

mr-c commented May 26, 2019

@marcadella I'm not aware of any, but that doesn't mean that such a thing hasn't been started.

I am happy to assist anybody writing a CWL converter!

@JoeyZwicker
Copy link
Member

While I'm not very familiar with CWL, I'm not actually sure if this would be very easy to implement as there are many concepts that don't seem to translate cleanly from CWL to Pachyderm.

A first implementation of this would make for an excellent PR from someone who's more familiar with CWL and we'd be happy to give further guidance from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants