New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Make to increase Automation & Reproducibility by Shaun Jackman #6

Closed
ttimbers opened this Issue Jun 22, 2015 · 31 comments

Comments

Projects
None yet
9 participants
@ttimbers

ttimbers commented Jun 22, 2015

Details

Where: Simon Fraser University, Burnaby Campus, SSB 7172
When: 3:30 pm - 4:30 pm, Tuesday, August 11, 2015
Livevstream: Google Hangouts On Air
Public notepad for this session can be accessed here.
Lesson notes can be found here.

Dependencies: Bash Shell and Make

Setup

Windows

Bash

Install Git for Windows by downloading and running the installer (http://msysgit.github.io/). This will provide you with both Git and Bash in the Git Bash program.

Make

  1. Download make.exe from here: https://github.com/msysgit/msysgit/blob/master/bin/make.exe?raw=true

  2. Place it in the bin directory where you installed Git Bash e.g. C:\Program Files (x86)\Git\bin

  3. To test: open a Git Bash window, type make, and press Enter

  4. You should see the following message

    make: *** No targets specified and no makefile found. Stop.
    

    This means that Make was successfully installed. Otherwise, you'll see this error message:

    bash: make: command not found
    

Mac OS X

Bash

The default shell in all versions of Mac OS X is bash, so no need to install anything. You access bash from the Terminal (found in /Applications/Utilities). You may want to keep Terminal in your dock for this workshop.

Make

Make doesn't come by default with OS X. You need to install a package called Command Line Tools. If you're running one of the latest versions of OS X (10.9 or later), you can easily do so by running the following command.

You can find out which version of OS X you're running by clicking the Apple icon () at the top-left of your screen and selecting "About This Mac".

xcode-select  --install

However, if you're running an older version of OS X (10.8 or earlier), you have a bit more work to do :

  1. Visit the Downloads section of the Apple Developer website and login using your Apple ID. If you don't have one, you can create a dummy one without giving personal information.
  2. Search for "command line tools" using the search box on the left.
  3. Find the Command Line Tools for your version of OS X. Avoid versions labelled as beta; these might not work as expected.
  4. Double-click that Command Line Tools (or click the plus sign on the left) and download the file using the link on the right.
  5. Install the Command Line Tools by mounting the downloaded DMG file and launching the installer.

Linux

Bash and make are typically included with any Linux distribution. Also, bash is usually the default shell. If not, you can just open a Terminal window and run the command bash.

@brunogrande brunogrande added the workshop label Jul 8, 2015

@jstaf

This comment has been minimized.

Show comment
Hide comment
@jstaf

jstaf Aug 7, 2015

@BillMills mentioned that you guys are webcasting your meetups now. Any chance you're going to be webcasting this one?

I'm at UBC and was considering getting a Zipcar and driving over to SFU for this one, but if you're webcasting this that wouldn't quite be necessary... (hello from the UBC group by the way!). I've been writing up my thesis using a LaTeX template and makefile, and figured it'd help a bit if I actually knew a bit more about how the makefile works (besides just typing make and watching magic happen).

Also, I don't think make comes with OS X distributions (outrageous, I know...). You need to install XCode and the Command Line Tools before you can use it.

jstaf commented Aug 7, 2015

@BillMills mentioned that you guys are webcasting your meetups now. Any chance you're going to be webcasting this one?

I'm at UBC and was considering getting a Zipcar and driving over to SFU for this one, but if you're webcasting this that wouldn't quite be necessary... (hello from the UBC group by the way!). I've been writing up my thesis using a LaTeX template and makefile, and figured it'd help a bit if I actually knew a bit more about how the makefile works (besides just typing make and watching magic happen).

Also, I don't think make comes with OS X distributions (outrageous, I know...). You need to install XCode and the Command Line Tools before you can use it.

@ttimbers

This comment has been minimized.

Show comment
Hide comment
@ttimbers

ttimbers Aug 7, 2015

@kazi11 we most certainly will be! You will find the link for the webcast will appear on Monday or Tuesday morning in the issue above. And yes, I think you're right about make coming with OS X... I'll fix the note about that. Thanks!

ttimbers commented Aug 7, 2015

@kazi11 we most certainly will be! You will find the link for the webcast will appear on Monday or Tuesday morning in the issue above. And yes, I think you're right about make coming with OS X... I'll fix the note about that. Thanks!

@jstaf

This comment has been minimized.

Show comment
Hide comment
@jstaf

jstaf Aug 7, 2015

Awesome - looking forward to actually seeing how make works!

jstaf commented Aug 7, 2015

Awesome - looking forward to actually seeing how make works!

@brunogrande

This comment has been minimized.

Show comment
Hide comment
@brunogrande

brunogrande Aug 8, 2015

Member

Hello, @kazi11. I updated the instructions in the issue above for installing the Command Line Tools on OS X (which include make). I also added the link to the Google Hangouts On Air where we will be livestreaming the event. I believe I can set it up to receive any questions you might have and I could monitor it. No guarantees though, since I haven't tried it before.

Member

brunogrande commented Aug 8, 2015

Hello, @kazi11. I updated the instructions in the issue above for installing the Command Line Tools on OS X (which include make). I also added the link to the Google Hangouts On Air where we will be livestreaming the event. I believe I can set it up to receive any questions you might have and I could monitor it. No guarantees though, since I haven't tried it before.

@jstaf

This comment has been minimized.

Show comment
Hide comment
@jstaf

jstaf Aug 12, 2015

Hey, great tutorial! I can see how this could be very useful.

A couple few last questions (basically I'm curious about using Make for sequencing pipelining)

  • Can you tell Make to run a step in parallel for all targets at a certain step (ideally with X number of targets to build at a time, and Y number of processors per target)?
  • How do you get Make to wait for a set of inputs before doing another step? Say you hypothetically needed to have Make build 6 intermediate files, and then do a computation that uses all 6 intermediates to produce a single output. How do you detect when the intermediates are done?
  • Is there any way to force Make to rebuild a project from scratch besides just deleting all the starting files?
  • Is there a way to have Make evaluate a target once it's built, and stop a pipeline if the file doesn't meet a requirement (basically some kind of "if/else statement")?

I know how to do most of these things in shell, but Make seems like it could be cleaner and more, um... "elegant".

jstaf commented Aug 12, 2015

Hey, great tutorial! I can see how this could be very useful.

A couple few last questions (basically I'm curious about using Make for sequencing pipelining)

  • Can you tell Make to run a step in parallel for all targets at a certain step (ideally with X number of targets to build at a time, and Y number of processors per target)?
  • How do you get Make to wait for a set of inputs before doing another step? Say you hypothetically needed to have Make build 6 intermediate files, and then do a computation that uses all 6 intermediates to produce a single output. How do you detect when the intermediates are done?
  • Is there any way to force Make to rebuild a project from scratch besides just deleting all the starting files?
  • Is there a way to have Make evaluate a target once it's built, and stop a pipeline if the file doesn't meet a requirement (basically some kind of "if/else statement")?

I know how to do most of these things in shell, but Make seems like it could be cleaner and more, um... "elegant".

@brunogrande

This comment has been minimized.

Show comment
Hide comment
@brunogrande

brunogrande Aug 12, 2015

Member

Hello, @kazi11. I'll try to answer your questions, but @sjackman can certainly pitch in.

  1. You can readily parallelize make using the --jobs/-j flag. You can specify a number jobs that can be run simultaneously, which typically shouldn't be larger than the number of available processor cores. For this, make automatically detects parallel steps, so you don't need to worry about it. For instance, in today's example, running make --jobs 2 wordsEn.tsv wordsFr.tsv would run the analysis for both English and French words simultaneously, as they don't depend on each other.

  2. If you need a set of input files before doing a given step, you can simply add these files to the list of dependencies for a make rule, i.e. what comes after the colon. Make should then wait for all of them to be ready before proceeding. For example:

    output.txt: input1.txt input2.txt input3.txt
        python analyze.py input1.txt input2.txt input3.txt > output.txt
  3. A quick check of the make manual points to the existence of the --always-make/-B flag. This is suppose to "unconditionally make all targets." I tried it and it did what you're looking for.

  4. I'm not sure about this one, but I can't imagine you not being able to add another command for a given make rule that returns a non-zero exit code if a condition isn't met. This should cause the Makefile to terminate at that point.

Member

brunogrande commented Aug 12, 2015

Hello, @kazi11. I'll try to answer your questions, but @sjackman can certainly pitch in.

  1. You can readily parallelize make using the --jobs/-j flag. You can specify a number jobs that can be run simultaneously, which typically shouldn't be larger than the number of available processor cores. For this, make automatically detects parallel steps, so you don't need to worry about it. For instance, in today's example, running make --jobs 2 wordsEn.tsv wordsFr.tsv would run the analysis for both English and French words simultaneously, as they don't depend on each other.

  2. If you need a set of input files before doing a given step, you can simply add these files to the list of dependencies for a make rule, i.e. what comes after the colon. Make should then wait for all of them to be ready before proceeding. For example:

    output.txt: input1.txt input2.txt input3.txt
        python analyze.py input1.txt input2.txt input3.txt > output.txt
  3. A quick check of the make manual points to the existence of the --always-make/-B flag. This is suppose to "unconditionally make all targets." I tried it and it did what you're looking for.

  4. I'm not sure about this one, but I can't imagine you not being able to add another command for a given make rule that returns a non-zero exit code if a condition isn't met. This should cause the Makefile to terminate at that point.

@sjackman

This comment has been minimized.

Show comment
Hide comment
@sjackman

sjackman Aug 13, 2015

Is there any way to force Make to rebuild a project from scratch besides just deleting all the starting files?

I often use --always-make/-B in conjunction with --dry-run/-n to see all the commands that would be run for the pipeline, without actually running them. I'll then copy and paste one command to run it manually. Running make after that will run all the commands dependent on that one command that I've rerun manually.

I'm not sure about this one, but I can't imagine you not being able to add another command for a given make rule that returns a non-zero exit code if a condition isn't met. This should cause the Makefile to terminate at that point.

I was asked this question in person as well after the lesson. Bruno's answer is good.

add another command for a given make rule that returns a non-zero exit code if a condition isn't met. This should cause the Makefile to terminate at that point.

sjackman commented Aug 13, 2015

Is there any way to force Make to rebuild a project from scratch besides just deleting all the starting files?

I often use --always-make/-B in conjunction with --dry-run/-n to see all the commands that would be run for the pipeline, without actually running them. I'll then copy and paste one command to run it manually. Running make after that will run all the commands dependent on that one command that I've rerun manually.

I'm not sure about this one, but I can't imagine you not being able to add another command for a given make rule that returns a non-zero exit code if a condition isn't met. This should cause the Makefile to terminate at that point.

I was asked this question in person as well after the lesson. Bruno's answer is good.

add another command for a given make rule that returns a non-zero exit code if a condition isn't met. This should cause the Makefile to terminate at that point.

@radaniba

This comment has been minimized.

Show comment
Hide comment
@radaniba

radaniba Aug 13, 2015

I have two remarks to @sjackman :)

1- you mentioned a point I am very sensitive to in your title. Although I agree make can help automate things, I don't think one can demonstrate that automation and reproducibility actually correlate, unless you cover other things in your talk beyond chaining tasks together using make . Reproducibility is sexy, but often confused with repeatability.

2- make can be fun to use at a small scale, do you think it is scalable to cover big analytics jobs. It is may be worth mentioning in any material associated with the course some alternatives or kind of pros/cons of using make.

radaniba commented Aug 13, 2015

I have two remarks to @sjackman :)

1- you mentioned a point I am very sensitive to in your title. Although I agree make can help automate things, I don't think one can demonstrate that automation and reproducibility actually correlate, unless you cover other things in your talk beyond chaining tasks together using make . Reproducibility is sexy, but often confused with repeatability.

2- make can be fun to use at a small scale, do you think it is scalable to cover big analytics jobs. It is may be worth mentioning in any material associated with the course some alternatives or kind of pros/cons of using make.

@dfornika

This comment has been minimized.

Show comment
Hide comment
@dfornika

dfornika Aug 13, 2015

Collaborator

@radaniba

make can be fun to use at a small scale, do you think it is scalable to cover big analytics jobs. It is may be worth mentioning in any material associated with the course some alternatives or kind of pros/cons of using make.

There is a qmake utility for Sun Grid Engine that might be useful for scaling up make-based pipelines.

Collaborator

dfornika commented Aug 13, 2015

@radaniba

make can be fun to use at a small scale, do you think it is scalable to cover big analytics jobs. It is may be worth mentioning in any material associated with the course some alternatives or kind of pros/cons of using make.

There is a qmake utility for Sun Grid Engine that might be useful for scaling up make-based pipelines.

@radaniba

This comment has been minimized.

Show comment
Hide comment
@radaniba

radaniba Aug 13, 2015

@dfornika thank you. I like snakemake too (same spirit) which turns out to be efficient as well

radaniba commented Aug 13, 2015

@dfornika thank you. I like snakemake too (same spirit) which turns out to be efficient as well

@sjackman

This comment has been minimized.

Show comment
Hide comment
@sjackman

sjackman Aug 13, 2015

@radaniba I agree with your first point. Repeatability is a prerequisite to reproducibility, but not the whole thing.

do you think it [make] is scalable to cover big analytics jobs

Yes. I use make for nearly all of my analytics. A comprehensive list of alternatives to make and their pros/cons would be a useful community resource. We could start a collaborative one in a wiki on a GitHub repository, if you'd like to get the ball rolling.

sjackman commented Aug 13, 2015

@radaniba I agree with your first point. Repeatability is a prerequisite to reproducibility, but not the whole thing.

do you think it [make] is scalable to cover big analytics jobs

Yes. I use make for nearly all of my analytics. A comprehensive list of alternatives to make and their pros/cons would be a useful community resource. We could start a collaborative one in a wiki on a GitHub repository, if you'd like to get the ball rolling.

@radaniba

This comment has been minimized.

Show comment
Hide comment
@radaniba

radaniba Aug 13, 2015

let's do that, thanks @sjackman

radaniba commented Aug 13, 2015

let's do that, thanks @sjackman

@sjackman

This comment has been minimized.

Show comment
Hide comment
@sjackman

sjackman Aug 13, 2015

@radaniba: @pditommaso has a fantastic list of pipeline tools, although it's conspicuously missing make. https://github.com/pditommaso/awesome-pipeline

sjackman commented Aug 13, 2015

@radaniba: @pditommaso has a fantastic list of pipeline tools, although it's conspicuously missing make. https://github.com/pditommaso/awesome-pipeline

@radaniba

This comment has been minimized.

Show comment
Hide comment
@radaniba

radaniba Aug 13, 2015

Ouuuuuh ! I like that ! Thanks for Sharing

Personnally I use Ruffus and I am totally happy with it, even though we went through a lot of issues with drmaa etc .. but Leo (the author of Ruffus) is pretty open to changes and is very helpful (this is important for open source packages maintainers to get his 'clients' happy)

I think at the end of the day its a pick-and-stick until you see the need to change to a new engine, as long as the job is done, efficiently, I am ready to use them all :)

Would you like to help me to add support to some of them, by adding them to CodersCrowd docker image and create a series of blog posts on reviewing these beasts ? like once a week ? it will be a nice way of digging deeper in each of them to see how practical they are in real life (not gonna be easy to go through documentations etc but its also part of the game, are they easy to learn or not :) )

radaniba commented Aug 13, 2015

Ouuuuuh ! I like that ! Thanks for Sharing

Personnally I use Ruffus and I am totally happy with it, even though we went through a lot of issues with drmaa etc .. but Leo (the author of Ruffus) is pretty open to changes and is very helpful (this is important for open source packages maintainers to get his 'clients' happy)

I think at the end of the day its a pick-and-stick until you see the need to change to a new engine, as long as the job is done, efficiently, I am ready to use them all :)

Would you like to help me to add support to some of them, by adding them to CodersCrowd docker image and create a series of blog posts on reviewing these beasts ? like once a week ? it will be a nice way of digging deeper in each of them to see how practical they are in real life (not gonna be easy to go through documentations etc but its also part of the game, are they easy to learn or not :) )

@pditommaso

This comment has been minimized.

Show comment
Hide comment
@pditommaso

pditommaso Aug 13, 2015

@radaniba That's really a great idea. Count on my help regarding Nextflow.

@sjackman Thanks to mention the list. About make, I didn't include it because the list started as a collection of niche projects and tools specifically designed for workflow and pipeline management, instead make strictly speaking is a build automation tool.

pditommaso commented Aug 13, 2015

@radaniba That's really a great idea. Count on my help regarding Nextflow.

@sjackman Thanks to mention the list. About make, I didn't include it because the list started as a collection of niche projects and tools specifically designed for workflow and pipeline management, instead make strictly speaking is a build automation tool.

@radaniba

This comment has been minimized.

Show comment
Hide comment
@radaniba

radaniba Aug 13, 2015

@sjackman @pditommaso sounds exciting I will create a repo and link it to this issue, with a plan of action, if you don't mind I can send you invitation to be authors at CodersCrowd blog you can send me your best emails I can reach you at at aradwen [a___$__t] gmail [d_o--**_t] com (making it harder for email extractor bots :)

radaniba commented Aug 13, 2015

@sjackman @pditommaso sounds exciting I will create a repo and link it to this issue, with a plan of action, if you don't mind I can send you invitation to be authors at CodersCrowd blog you can send me your best emails I can reach you at at aradwen [a___$__t] gmail [d_o--**_t] com (making it harder for email extractor bots :)

@sjackman

This comment has been minimized.

Show comment
Hide comment
@sjackman

sjackman Aug 13, 2015

I'm afraid that I won't have the time to contribute blog posts myself, but I'm excited to see the result of these blog posts.

sjackman commented Aug 13, 2015

I'm afraid that I won't have the time to contribute blog posts myself, but I'm excited to see the result of these blog posts.

@jstaf

This comment has been minimized.

Show comment
Hide comment
@jstaf

jstaf Aug 14, 2015

Just wanted to drop by and say thanks for answering my q's about make, @sjackman @brunogrande

jstaf commented Aug 14, 2015

Just wanted to drop by and say thanks for answering my q's about make, @sjackman @brunogrande

@sjackman

This comment has been minimized.

Show comment
Hide comment
@sjackman

sjackman Aug 14, 2015

No, worries. Happy to help. Feel free to fire any other questions you have my way either via GitHub or Twitter @sjackman.

sjackman commented Aug 14, 2015

No, worries. Happy to help. Feel free to fire any other questions you have my way either via GitHub or Twitter @sjackman.

@rdocking

This comment has been minimized.

Show comment
Hide comment
@rdocking

rdocking Aug 17, 2015

Hey Shaun et al: if you do decide to start another curated list of workflow managers, here's what I've collected over the last few years:

  • NextFlow - "Nextflow is a fluent DSL modelled around
    the UNIX pipe concept, that simplifies writing parallel and scalable pipelines
    in a portable manner."
  • Galaxy - Galaxy (PSU)
  • SeqWare - SeqWare (OICR)
  • Firehose -
    Firehose (Broad)
  • Taverna - Taverna (EBI)
  • Snakemake - '
    This project aims to reduce the complexity of creating workflows by providing
    a fast and comfortable execution environment, together with a clean and modern
    domain specific specification language (DSL) in python style' (see also
    snakemake_notes.md)
  • Luigi - 'Luigi is a
    Python package that helps you build complex pipelines of batch jobs. It
    handles dependency resolution, workflow management, visualization, handling
    failures, command line integration, and much more.' Developed by Spotify.
  • bpipe - 'Bpipe
    provides a platform for running big bioinformatics jobs that consist of a
    series of processing stages - known as 'pipelines'
  • ruffus - 'Ruffus is a Computation Pipeline library
    for python. It is open-sourced, powerful and user-friendly, and widely used in
    science and bioinformatics.'
  • GenePattern -
    GenePattern pipelines allow you to capture, automate, and share the complex
    series of steps required to analyze genomic data.
  • Workflow Description Language - 'The
    Workflow Description Language is a domain specific language for describing
    tasks and workflows.'
  • Common Workflow
    Language
    -
    'an informal, multi-vendor working group consisting of various organizations
    and individuals that have an interest in portability of data analysis
    workflows'
  • GNU Make - 'GNU Make is a tool which
    controls the generation of executables and other non-source files of a program
    from the program's source files.'
  • BigDataScript - 'We introduce the BigDataScript (BDS)
    programming language for data processing pipelines, which improves abstraction
    from hardware resources and assists with robustness'
  • sake - 'Sake is a way to easily
    design, share, build, and visualize workflows with intricate
    interdependencies. Sake is self-documenting because the instructions for
    building a project also serve as the documentation of the project's workflow.'
  • cpipe - Cpipe: a
    shared variant detection pipeline designed for diagnostic settings
  • bcbio - 'Validated, scalable,
    community developed variant calling and RNA-seq analysis.'
  • Paired Ends Genomics - 'Cloud computing. Big data. So
    big. Such science. Wow.' (If not immediately obvious, this is a joke, which
    will be slightly funnier if you're familiar with the Twitter-wars they're
    alluding to)

rdocking commented Aug 17, 2015

Hey Shaun et al: if you do decide to start another curated list of workflow managers, here's what I've collected over the last few years:

  • NextFlow - "Nextflow is a fluent DSL modelled around
    the UNIX pipe concept, that simplifies writing parallel and scalable pipelines
    in a portable manner."
  • Galaxy - Galaxy (PSU)
  • SeqWare - SeqWare (OICR)
  • Firehose -
    Firehose (Broad)
  • Taverna - Taverna (EBI)
  • Snakemake - '
    This project aims to reduce the complexity of creating workflows by providing
    a fast and comfortable execution environment, together with a clean and modern
    domain specific specification language (DSL) in python style' (see also
    snakemake_notes.md)
  • Luigi - 'Luigi is a
    Python package that helps you build complex pipelines of batch jobs. It
    handles dependency resolution, workflow management, visualization, handling
    failures, command line integration, and much more.' Developed by Spotify.
  • bpipe - 'Bpipe
    provides a platform for running big bioinformatics jobs that consist of a
    series of processing stages - known as 'pipelines'
  • ruffus - 'Ruffus is a Computation Pipeline library
    for python. It is open-sourced, powerful and user-friendly, and widely used in
    science and bioinformatics.'
  • GenePattern -
    GenePattern pipelines allow you to capture, automate, and share the complex
    series of steps required to analyze genomic data.
  • Workflow Description Language - 'The
    Workflow Description Language is a domain specific language for describing
    tasks and workflows.'
  • Common Workflow
    Language
    -
    'an informal, multi-vendor working group consisting of various organizations
    and individuals that have an interest in portability of data analysis
    workflows'
  • GNU Make - 'GNU Make is a tool which
    controls the generation of executables and other non-source files of a program
    from the program's source files.'
  • BigDataScript - 'We introduce the BigDataScript (BDS)
    programming language for data processing pipelines, which improves abstraction
    from hardware resources and assists with robustness'
  • sake - 'Sake is a way to easily
    design, share, build, and visualize workflows with intricate
    interdependencies. Sake is self-documenting because the instructions for
    building a project also serve as the documentation of the project's workflow.'
  • cpipe - Cpipe: a
    shared variant detection pipeline designed for diagnostic settings
  • bcbio - 'Validated, scalable,
    community developed variant calling and RNA-seq analysis.'
  • Paired Ends Genomics - 'Cloud computing. Big data. So
    big. Such science. Wow.' (If not immediately obvious, this is a joke, which
    will be slightly funnier if you're familiar with the Twitter-wars they're
    alluding to)
@sjackman

This comment has been minimized.

Show comment
Hide comment
@sjackman

sjackman Sep 30, 2015

@rdocking Do you have a subjective take on which of these tools your prefer? My first and second preference are Make and Snakemake.

sjackman commented Sep 30, 2015

@rdocking Do you have a subjective take on which of these tools your prefer? My first and second preference are Make and Snakemake.

@rdocking

This comment has been minimized.

Show comment
Hide comment
@rdocking

rdocking Sep 30, 2015

@sjackman - my (very) subjective take on some of these:

  • I spent a fair amount of effort developing some analysis pipelines with Snakemake. At first I liked it a lot, since I really liked being able to seamlessly move from a local host onto an SGE cluster. After spending more effort on it, I was a bit less enthusiastic - I found that it worked well for certain kinds of tasks, but it was fairly difficult to debug when things didn't work. It seems like it's still actively developed, but it's not as nearly as easy to find help online as it is for, say, Make
  • My own research work has been much more on the prototyping / data-analysis side lately, so I've really started to dig into and really try to learn Make. It's been good for the reasons I mentioned above - it's relatively easy to find answers online for common issues.
  • For more long-lived / re-usable / production pipelines, I'd want something that could easily be ported between a single host / cluster environment / container / cloud / etc. I'm still open to trying new things for that - since it seems difficult to scale Make to that kind of environment.

rdocking commented Sep 30, 2015

@sjackman - my (very) subjective take on some of these:

  • I spent a fair amount of effort developing some analysis pipelines with Snakemake. At first I liked it a lot, since I really liked being able to seamlessly move from a local host onto an SGE cluster. After spending more effort on it, I was a bit less enthusiastic - I found that it worked well for certain kinds of tasks, but it was fairly difficult to debug when things didn't work. It seems like it's still actively developed, but it's not as nearly as easy to find help online as it is for, say, Make
  • My own research work has been much more on the prototyping / data-analysis side lately, so I've really started to dig into and really try to learn Make. It's been good for the reasons I mentioned above - it's relatively easy to find answers online for common issues.
  • For more long-lived / re-usable / production pipelines, I'd want something that could easily be ported between a single host / cluster environment / container / cloud / etc. I'm still open to trying new things for that - since it seems difficult to scale Make to that kind of environment.
@sjackman

This comment has been minimized.

Show comment
Hide comment
@sjackman

sjackman Sep 30, 2015

Have you tried http://www.nextflow.io/? I believe it's meant to run easily on a host / container / cloud. I don't know about a traditional cluster. I tried it briefly. My two beefs: I didn't much like the DSL (very subjective opinion), and anything fancy required learning the Groovy language.

sjackman commented Sep 30, 2015

Have you tried http://www.nextflow.io/? I believe it's meant to run easily on a host / container / cloud. I don't know about a traditional cluster. I tried it briefly. My two beefs: I didn't much like the DSL (very subjective opinion), and anything fancy required learning the Groovy language.

@rdocking

This comment has been minimized.

Show comment
Hide comment
@rdocking

rdocking Sep 30, 2015

I only tried NextFlow briefly as well - I'd agree that the DSL seemed like it would involve a bit of a learning curve. Right now, I'm mostly hoping that something like Snakemake / bcbio / CWL gets more community traction and documentation, so that I can implement things in a language I'm more familiar with (Python in this case).

rdocking commented Sep 30, 2015

I only tried NextFlow briefly as well - I'd agree that the DSL seemed like it would involve a bit of a learning curve. Right now, I'm mostly hoping that something like Snakemake / bcbio / CWL gets more community traction and documentation, so that I can implement things in a language I'm more familiar with (Python in this case).

@pditommaso

This comment has been minimized.

Show comment
Hide comment
@pditommaso

pditommaso Sep 30, 2015

Let me add that Nextflow (I'm the author of it) has built in support for the most common used resource/cluster managers (Sun/Univa grid engine, Platform LSF, SLURM, PBS and Torque). Thus the same pipeline can run on any of these clusters.

Regarding the DSL, in my opinion more than the Groovy lang, what is more confusing novice users is the streaming/reactive model on which is based which can seems a bit fancy at the beginning.

pditommaso commented Sep 30, 2015

Let me add that Nextflow (I'm the author of it) has built in support for the most common used resource/cluster managers (Sun/Univa grid engine, Platform LSF, SLURM, PBS and Torque). Thus the same pipeline can run on any of these clusters.

Regarding the DSL, in my opinion more than the Groovy lang, what is more confusing novice users is the streaming/reactive model on which is based which can seems a bit fancy at the beginning.

@sjackman

This comment has been minimized.

Show comment
Hide comment
@sjackman

sjackman Sep 30, 2015

Let me add that I think you're doing great work with Nextflow, Paolo. Do you think it would be possible to implement the Make language as an alternative front end to Nextflow? Not necessarily all the GNU Make extensions, which are a bit nuts at times, but the basic and standard Make language. This feature would be a big deal for me. If you're interested, here's the standard: IEEE Std 1003.1 utilities/make

sjackman commented Sep 30, 2015

Let me add that I think you're doing great work with Nextflow, Paolo. Do you think it would be possible to implement the Make language as an alternative front end to Nextflow? Not necessarily all the GNU Make extensions, which are a bit nuts at times, but the basic and standard Make language. This feature would be a big deal for me. If you're interested, here's the standard: IEEE Std 1003.1 utilities/make

@pditommaso

This comment has been minimized.

Show comment
Hide comment
@pditommaso

pditommaso Sep 30, 2015

Hi Shaun, in principle it should be possible. Actually @lindenb implemented something similar to what you are proposing though as far as I know it's little more than an experiment and I have no clue how it could work with a complex Makefile.

Said that I think that a complete mapping of the overall Make specification to Nextflow would require a considerable effort and at the same time I'm skeptical of the "quality" of code generated from an automatic translation from Make to Nextflow due to the different models of the tools. Hardly I can manage to find resources for that.

pditommaso commented Sep 30, 2015

Hi Shaun, in principle it should be possible. Actually @lindenb implemented something similar to what you are proposing though as far as I know it's little more than an experiment and I have no clue how it could work with a complex Makefile.

Said that I think that a complete mapping of the overall Make specification to Nextflow would require a considerable effort and at the same time I'm skeptical of the "quality" of code generated from an automatic translation from Make to Nextflow due to the different models of the tools. Hardly I can manage to find resources for that.

@lindenb

This comment has been minimized.

Show comment
Hide comment
@lindenb

lindenb Sep 30, 2015

yes paolo is right, the make->nextflow is not optimized but it's a good start to convert one workflow to another or to quickly use the features of nextflow (e.g: running on a cluster).

On a side note, I've created stylesheets to convert make to snakemake, markdown, plain makefile, html, etc... ( https://github.com/lindenb/xml-patch-make#xsl-stylesheets )

lindenb commented Sep 30, 2015

yes paolo is right, the make->nextflow is not optimized but it's a good start to convert one workflow to another or to quickly use the features of nextflow (e.g: running on a cluster).

On a side note, I've created stylesheets to convert make to snakemake, markdown, plain makefile, html, etc... ( https://github.com/lindenb/xml-patch-make#xsl-stylesheets )

@sjackman

This comment has been minimized.

Show comment
Hide comment
@sjackman

sjackman Oct 2, 2015

Very cool. I use your tool makefile2graph quite often, particularly for teaching, or for inclusion in the README.md of a pipeline repository.

sjackman commented Oct 2, 2015

Very cool. I use your tool makefile2graph quite often, particularly for teaching, or for inclusion in the README.md of a pipeline repository.

@lindenb

This comment has been minimized.

Show comment
Hide comment
@lindenb

lindenb Oct 2, 2015

@sjackman thanks. the patch https://github.com/lindenb/xml-patch-make + xslt can generate the same kind of output and more, but I need more time to translate the output to CWL and other pipelines.

lindenb commented Oct 2, 2015

@sjackman thanks. the patch https://github.com/lindenb/xml-patch-make + xslt can generate the same kind of output and more, but I need more time to translate the output to CWL and other pipelines.

@sjackman

This comment has been minimized.

Show comment
Hide comment
@sjackman

sjackman Jul 22, 2017

Update: graph2cwl converts a Makefile to an older draft of CWL.
https://github.com/lindenb/xml-patch-make/blob/master/stylesheets/graph2cwl.xsl

sjackman commented Jul 22, 2017

Update: graph2cwl converts a Makefile to an older draft of CWL.
https://github.com/lindenb/xml-patch-make/blob/master/stylesheets/graph2cwl.xsl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment