diff --git a/docs/conf.py b/docs/conf.py index 109968b46d..ad33529753 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -40,6 +40,8 @@ myst_heading_anchors = 3 rediraffe_redirects = { + 'basic.md': 'overview.md', + 'getstarted.md': 'install.md', 'dsl2.md': 'dsl1.md' } diff --git a/docs/developer/index.md b/docs/developer/index.md index 8d2c7d6979..48dd278d22 100644 --- a/docs/developer/index.md +++ b/docs/developer/index.md @@ -142,7 +142,7 @@ If you need to test changes to the `nextflow` launcher script, you can run it di ### Groovy REPL -The `groovysh` command provides a command-line REPL that you can use to play around with Groovy code independently of Nextflow. The `groovyConsole` command provides a graphical REPL similar to `nextflow console`. These commands require a standalone Groovy distribution, which can be installed as described for Java in {ref}`Getting started `. +The `groovysh` command provides a command-line REPL that you can use to play around with Groovy code independently of Nextflow. The `groovyConsole` command provides a graphical REPL similar to `nextflow console`. These commands require a standalone Groovy distribution, which can be installed as described for Java in {ref}`Getting started `. :::{note} If you are using WSL, you must also install an X server for Windows, such as [VcXsrv](https://sourceforge.net/projects/vcxsrv/) or [Xming](http://www.straightrunning.com/XmingNotes/), in order to use these commands. diff --git a/docs/developer/nextflow.ast.md b/docs/developer/nextflow.ast.md index e6c99f5bec..5048ca6a8d 100644 --- a/docs/developer/nextflow.ast.md +++ b/docs/developer/nextflow.ast.md @@ -23,7 +23,7 @@ You can see the effect of Nextflow's AST transforms by using the Nextflow consol 3. Execute the script 4. Go to **Script** > **Inspect AST** -Here is the example from {ref}`getstarted-first`: +Here is the example from {ref}`your-first-script`: ```groovy params.str = 'Hello world!' diff --git a/docs/dsl1.md b/docs/dsl1.md index 5635590b3d..50cea72d4e 100644 --- a/docs/dsl1.md +++ b/docs/dsl1.md @@ -18,7 +18,7 @@ export NXF_DEFAULT_DSL=2 ## Processes and workflows -In DSL1, a process definition is also the process invocation. Process inputs and outputs are connected to channels using `from` and `into`. Here is the {ref}`getstarted-first` example written in DSL1: +In DSL1, a process definition is also the process invocation. Process inputs and outputs are connected to channels using `from` and `into`. Here is the {ref}`your-first-script` example written in DSL1: ```groovy nextflow.enable.dsl=1 diff --git a/docs/getstarted.md b/docs/getstarted.md deleted file mode 100644 index 2b182f28c4..0000000000 --- a/docs/getstarted.md +++ /dev/null @@ -1,211 +0,0 @@ -(getstarted-page)= - -# Getting started - -(getstarted-requirement)= - -## Requirements - -Nextflow can be used on any POSIX compatible system (Linux, macOS, etc). It requires Bash 3.2 (or later) and [Java 11 (or later, up to 21)](http://www.oracle.com/technetwork/java/javase/downloads/index.html) to be installed. - -For the execution in a cluster of computers, the use of a shared file system is required to allow the sharing of tasks input/output files. - -Nextflow can also be run on Windows through [WSL](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux). - -:::{tip} -We recommend that you install Java through [SDKMAN!](https://sdkman.io/), and that you use the latest LTS version of Corretto or Temurin. See [this website](https://whichjdk.com/) for more information. While other Java distros may work at first or even most of the time, many users have experienced issues that are difficult to debug and are usually resolved by using one of the recommended distros. - -To install Corretto 17: - -```bash -sdk install java 17.0.6-amzn -``` - -To install Temurin 17: - -```bash -sdk install java 17.0.6-tem -``` -::: - -(getstarted-install)= - -## Installation - -Nextflow is distributed as a self-installing package, which means that it does not require any special installation procedure. - -It only needs two easy steps: - -1. Download the executable package by copying and pasting either one of the following commands in your terminal window: `wget -qO- https://get.nextflow.io | bash` - - Or, if you prefer `curl`: `curl -s https://get.nextflow.io | bash` - - This will create the `nextflow` main executable file in the current directory. - -2. Make the binary executable on your system by running `chmod +x nextflow`. - -3. Optionally, move the `nextflow` file to a directory accessible by your `$PATH` variable (this is only required to avoid remembering and typing the full path to `nextflow` each time you need to run it). - -:::{tip} -Set `export CAPSULE_LOG=none` to make the dependency installation logs less verbose. -::: - -:::{tip} -If you don't have `curl` or `wget`, you can also download the Nextflow launcher script from the [project releases page](https://github.com/nextflow-io/nextflow/releases/latest) on GitHub, in lieu of step 1. -::: - -:::{tip} -To avoid downloading the dependencies, you can also use the `nextflow-VERSION-all` distribution available for every Nextflow release on Github. - -1. Go to the [Github releases page](https://github.com/nextflow-io/nextflow/releases) and expand the `Assets` section for a specific release. -2. Copy the URL of the `nextflow-VERSION-all` asset and enter the download command in your terminal, e.g. `wget -qO- ASSET-URL`. It will create the completely self-contained `nextflow-VERSION-all` executable file in the current directory. -::: - -## Updates - -Having Nextflow installed in your computer you can update to the latest version using the following command: - -```bash -nextflow self-update -``` - -:::{tip} -You can temporarily switch to a specific version of Nextflow by prefixing the `nextflow` command with the `NXF_VER` environment variable. For example: - -```bash -NXF_VER=20.04.0 nextflow run hello -``` -::: - -## Stable and Edge releases - -A *stable* version of Nextflow is released on a six-months basic schedule, in the 1st and 3rd quarter of every year. - -Along with the stable release, an *edge* version is released on a monthly basis. This version is useful to test and use most recent updates and experimental features. - -To use the latest edge release run the following snippet in your shell terminal: - -```bash -export NXF_EDGE=1 -nextflow self-update -``` - -(getstarted-first)= - -## Your first script - -Copy the following example into your favorite text editor and save it to a file named `tutorial.nf`: - -```{literalinclude} snippets/your-first-script.nf -:language: groovy -``` - -:::{note} -For versions of Nextflow prior to `22.10.0`, you must explicitly enable DSL2 by adding `nextflow.enable.dsl=2` to the top of the script or by using the `-dsl2` command-line option. -::: - -This script defines two processes. The first splits a string into 6-character chunks, writing each one to a file with the prefix `chunk_`, and the second receives these files and transforms their contents to uppercase letters. The resulting strings are emitted on the `result` channel and the final output is printed by the `view` operator. - -Execute the script by entering the following command in your terminal: - -```console -$ nextflow run tutorial.nf - -N E X T F L O W ~ version 22.10.0 -executor > local (3) -[69/c8ea4a] process > splitLetters [100%] 1 of 1 ✔ -[84/c8b7f1] process > convertToUpper [100%] 2 of 2 ✔ -HELLO -WORLD! -``` - -You can see that the first process is executed once, and the second twice. Finally the result string is printed. - -It's worth noting that the process `convertToUpper` is executed in parallel, so there's no guarantee that the instance processing the first split (the chunk `Hello`) will be executed before the one processing the second split (the chunk `world!`). - -Thus, it is perfectly possible that you will get the final result printed out in a different order: - -``` -WORLD! -HELLO -``` - -:::{tip} -The hexadecimal string, e.g. `22/7548fa`, is the unique hash of a task, and the prefix of the directory where the task is executed. You can inspect a task's files by changing to the directory `$PWD/work` and using this string to find the specific task directory. -::: - -(getstarted-resume)= - -### Modify and resume - -Nextflow keeps track of all the processes executed in your pipeline. If you modify some parts of your script, only the processes that are actually changed will be re-executed. The execution of the processes that are not changed will be skipped and the cached result used instead. - -This helps a lot when testing or modifying part of your pipeline without having to re-execute it from scratch. - -For the sake of this tutorial, modify the `convertToUpper` process in the previous example, replacing the process script with the string `rev $x`, so that the process looks like this: - -```groovy -process convertToUpper { - input: - path x - output: - stdout - - """ - rev $x - """ -} -``` - -Then save the file with the same name, and execute it by adding the `-resume` option to the command line: - -```bash -nextflow run tutorial.nf -resume -``` - -It will print output similar to this: - -``` -N E X T F L O W ~ version 22.10.0 -executor > local (2) -[69/c8ea4a] process > splitLetters [100%] 1 of 1, cached: 1 ✔ -[d0/e94f07] process > convertToUpper [100%] 2 of 2 ✔ -olleH -!dlrow -``` - -You will see that the execution of the process `splitLetters` is actually skipped (the process ID is the same), and its results are retrieved from the cache. The second process is executed as expected, printing the reversed strings. - -:::{tip} -The pipeline results are cached by default in the directory `$PWD/work`. Depending on your script, this folder can take up a lot of disk space. It's a good idea to clean this folder periodically, as long as you know you won't need to resume any pipeline runs. -::: - -For more information, see the {ref}`cache-resume-page` page. - -(getstarted-params)= - -### Pipeline parameters - -Pipeline parameters are simply declared by prepending to a variable name the prefix `params`, separated by dot character. Their value can be specified on the command line by prefixing the parameter name with a double dash character, i.e. `--paramName` - -For the sake of this tutorial, you can try to execute the previous example specifying a different input string parameter, as shown below: - -```bash -nextflow run tutorial.nf --str 'Bonjour le monde' -``` - -The string specified on the command line will override the default value of the parameter. The output will look like this: - -``` -N E X T F L O W ~ version 22.10.0 -executor > local (4) -[8b/16e7d7] process > splitLetters [100%] 1 of 1 ✔ -[eb/729772] process > convertToUpper [100%] 3 of 3 ✔ -m el r -edno -uojnoB -``` - -:::{versionchanged} 20.11.0-edge -Any `.` (dot) character in a parameter name is interpreted as the delimiter of a nested scope. For example, `--foo.bar Hello` will be interpreted as `params.foo.bar`. If you want to have a parameter name that contains a `.` (dot) character, escape it using the back-slash character, e.g. `--foo\.bar Hello`. -::: diff --git a/docs/index.md b/docs/index.md index a8a113ad90..ab59d37cbd 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,5 +1,5 @@ -# Nextflow's documentation! +# Nextflow reference documentation [![Nextflow CI](https://github.com/nextflow-io/nextflow/workflows/Nextflow%20CI/badge.svg)](https://github.com/nextflow-io/nextflow/actions/workflows/build.yml?query=branch%3Amaster+event%3Apush) [![Nextflow version](https://img.shields.io/github/release/nextflow-io/nextflow.svg?colorB=58bd9f&style=popout)](https://github.com/nextflow-io/nextflow/releases/latest) @@ -10,23 +10,18 @@ Nextflow is a workflow system for creating scalable, portable, and reproducible workflows. -## Rationale - -The rise of big data has made it increasingly necessary to be able to analyze and perform experiments on large datasets in a portable and reproducible manner. - -Parallelization and distributed computing are the best ways to tackle this challenge, but the tools commonly available to computational scientists often lack good support for these techniques, or they provide a model that fits poorly with the needs of computational scientists and often require knowledge of complex tools and APIs. - -The Nextflow language is inspired by [the Unix philosophy](https://en.wikipedia.org/wiki/Unix_philosophy), in which many simple command line tools can be chained together into increasingly complex tasks. Similarly, a Nextflow script consists of composing many simple processes into increasingly complex pipelines. Each process executes a given tool or scripting language, and by specifying the process inputs and outputs, Nextflow coordinates the execution of tasks for you. - -The Nextflow runtime integrates with many popular execution platforms (HPC schedulers, cloud providers) and software tools (Git, Docker, Conda), allowing you to fully describe a computational pipeline with all of its dependencies and run it in nearly any environment -- write once, run anywhere. +- Get an {ref}`overview ` of Nextflow and its key concepts. +- Get started with Nextflow by {ref}`installing ` it and running {ref}`your first script `. +- Check out [this blog post](https://www.nextflow.io/blog/2023/learn-nextflow-in-2023.html) for even more resources on how to learn Nextflow. ```{toctree} :hidden: :caption: Introduction :maxdepth: 1 -getstarted -basic +overview +install +your-first-script ``` ```{toctree} diff --git a/docs/install.md b/docs/install.md new file mode 100644 index 0000000000..affa669acd --- /dev/null +++ b/docs/install.md @@ -0,0 +1,124 @@ +(install-page)= + +# Installation + +(install-requirements)= + +## Requirements + +Nextflow can be used on any POSIX-compatible system (Linux, macOS, etc), and on Windows through [WSL](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux). It requires Bash 3.2 (or later) and [Java 11 (or later, up to 21)](http://www.oracle.com/technetwork/java/javase/downloads/index.html) to be installed. You can see which version you have using the following command: + +```bash +java -version +``` + +If you don't have a compatible version of Java installed in your computer, We recommend that you install it through [SDKMAN!](https://sdkman.io/), and that you use the latest LTS version of Temurin. See [this website](https://whichjdk.com/) for more information. + +To install Java with SDKMAN: + +1. Install SDKMAN: + + ```bash + curl -s https://get.sdkman.io | bash + ``` + +2. Open a new terminal and install Java: + + ```bash + sdk install java 17.0.10-tem + ``` + +3. Confirm that Java is installed correctly: + + ```bash + java -version + ``` + +4. To install Temurin 17: + + ```bash + sdk install java 17.0.6-tem + ``` + +(install-nextflow)= + +## Install Nextflow + +Nextflow is distributed as a self-installing package, in order to make the installation process as simple as possible: + +1. Install Nextflow: + + ```bash + curl -s https://get.nextflow.io | bash + ``` + + This will create the `nextflow` executable in the current directory. + + :::{tip} + You can set `export CAPSULE_LOG=none` to make the installation logs less verbose. + ::: + +2. Make Nextflow executable: + + ```bash + chmod +x nextflow + ``` + +3. Move Nextflow into an executable path: + + ```bash + sudo mv nextflow /usr/local/bin + ``` + +4. Confirm that Nextflow is installed correctly: + + ```bash + nextflow info + ``` + +## Updates + +With Nextflow installed in your environment, you can update to the latest version using the following command: + +```bash +nextflow self-update +``` + +You can also temporarily switch to a specific version of Nextflow with the `NXF_VER` environment variable. For example: + +```bash +NXF_VER=23.10.0 nextflow run hello +``` + +## Stable and edge releases + +A *stable* version of Nextflow is released every six months, in the 4th and 10th month of each year. + +Additionally, an *edge* version is released on a monthly basis. The edge releases can be used to access the latest updates and experimental features. + +To use the latest edge release, set `NXF_EDGE=1` when updating: + +```bash +NXF_EDGE=1 nextflow self-update +``` + +You can also use `NXF_VER` to switch to any edge release: + +```bash +$ nextflow info +``` + +## Standalone distribution + +Nextflow has a set of {ref}`core plugins ` which are downloaded at runtime by default. There is also a standalone distribution (i.e. the `all` distribution) which comes pre-packaged with all core plugins. This distribution is mainly useful for offline environments. + +The installer for the `all` distribution can be found on the [GitHub releases page](https://github.com/nextflow-io/nextflow/releases), under the "Assets" section for a specific release. The installation procedure is the same as for the standard distribution, only using this URL instead of `https://get.nextflow.io`: + +```bash +export NXF_VER=23.10.0 +curl -s https://github.com/nextflow-io/nextflow/releases/download/v$NXF_VER/nextflow-$NXF_VER-all +``` + +:::{warning} +The `all` distribution does not support third-party plugins. Only the {ref}`core plugins ` are supported. +::: diff --git a/docs/basic.md b/docs/overview.md similarity index 74% rename from docs/basic.md rename to docs/overview.md index ca71302697..cc9e2b9ab0 100644 --- a/docs/basic.md +++ b/docs/overview.md @@ -1,10 +1,14 @@ -# Basic concepts +(overview-page)= -Nextflow is a reactive workflow framework and a programming [DSL](http://en.wikipedia.org/wiki/Domain-specific_language) that eases the writing of data-intensive computational pipelines. +# Overview -It is designed around the idea that the Linux platform is the lingua franca of data science. Linux provides many simple but powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. +## Why Nextflow? -Nextflow extends this approach, adding the ability to define complex program interactions and a high-level parallel computational environment based on the *dataflow* programming model. +The rise of big data has made it increasingly necessary to be able to analyze and perform experiments on large datasets in a portable and reproducible manner. Parallelization and distributed computing are the best ways to tackle this challenge, but the tools commonly available to computational scientists often lack good support for these techniques, or they provide a model that fits poorly with the needs of computational scientists and often require knowledge of complex tools and APIs. Nextflow was created to address these challenges. + +The Nextflow language is inspired by [the Unix philosophy](https://en.wikipedia.org/wiki/Unix_philosophy), in which many simple command line tools can be chained together into increasingly complex tasks. Similarly, a Nextflow script consists of composing many simple processes into increasingly complex pipelines. Each process executes a given tool or scripting language, and by specifying the process inputs and outputs, Nextflow coordinates the execution of tasks for you. + +The Nextflow runtime integrates with many popular execution platforms (HPC schedulers, cloud providers) and software tools (Git, Docker, Conda), allowing you to fully describe a computational pipeline with all of its dependencies and run it in nearly any environment -- write once, run anywhere. ## Processes and channels @@ -71,18 +75,19 @@ In other words, Nextflow provides an abstraction between the pipeline's function The following batch schedulers are supported: -- [Open grid engine](http://gridscheduler.sourceforge.net/) -- [Univa grid engine](http://www.univa.com/) +- [Open Grid Engine](http://gridscheduler.sourceforge.net/) +- [Univa Grid Engine](http://www.univa.com/) - [Platform LSF](http://www.ibm.com/systems/technicalcomputing/platformcomputing/products/lsf/) -- [Linux SLURM](https://computing.llnl.gov/linux/slurm/) +- [SLURM](https://computing.llnl.gov/linux/slurm/) - [Flux Framework](https://flux-framework.org/) -- [PBS Works](http://www.pbsworks.com/gridengine/) +- [PBS](http://www.pbsworks.com/gridengine/) - [Torque](http://www.adaptivecomputing.com/products/open-source/torque/) - [HTCondor](https://research.cs.wisc.edu/htcondor/) The following cloud platforms are supported: - [Amazon Web Services (AWS)](https://aws.amazon.com/) +- [Microsoft Azure](https://azure.microsoft.com/) - [Google Cloud Platform (GCP)](https://cloud.google.com/) - [Kubernetes](https://kubernetes.io/) @@ -96,10 +101,6 @@ Nextflow scripting is an extension of the [Groovy programming language]( - - - ## Configuration options Pipeline configuration properties are defined in a file named `nextflow.config` in the pipeline execution directory. diff --git a/docs/plugins.md b/docs/plugins.md index cf6fa4aa95..3d1e4200ef 100644 --- a/docs/plugins.md +++ b/docs/plugins.md @@ -4,6 +4,8 @@ Nextflow has a plugin system that allows the use of extensible components that are downloaded and installed at runtime. +(plugins-core)= + ## Core plugins The following functionalities are provided via plugin components, and they make part of the Nextflow *core* plugins: @@ -377,7 +379,7 @@ nextflow run -plugins nf-hello To use Nextflow plugins in an offline environment: -1. {ref}`Download Nextflow ` and install it on a system with an internet connection. Do not use the "all" package, as this does not allow the use of custom plugins. +1. {ref}`Download Nextflow ` and install it on a system with an internet connection. Do not use the "all" package, as this does not allow the use of custom plugins. 2. Download any additional plugins by running `nextflow plugin install `. Alternatively, simply run your pipeline once and Nextflow will download all of the plugins that it needs. diff --git a/docs/your-first-script.md b/docs/your-first-script.md new file mode 100644 index 0000000000..6b8cfcf10a --- /dev/null +++ b/docs/your-first-script.md @@ -0,0 +1,115 @@ +(your-first-script)= + +# Your first script + +## Run a pipeline + +This script defines two processes. The first splits a string into 6-character chunks, writing each one to a file with the prefix `chunk_`, and the second receives these files and transforms their contents to uppercase letters. The resulting strings are emitted on the `result` channel and the final output is printed by the `view` operator. Copy the following example into your favorite text editor and save it to a file named `tutorial.nf`: + +```{literalinclude} snippets/your-first-script.nf +:language: groovy +``` + +Execute the script by entering the following command in your terminal: + +```console +$ nextflow run tutorial.nf + +N E X T F L O W ~ version 23.10.0 +executor > local (3) +[69/c8ea4a] process > splitLetters [100%] 1 of 1 ✔ +[84/c8b7f1] process > convertToUpper [100%] 2 of 2 ✔ +HELLO +WORLD! +``` + +:::{note} +For versions of Nextflow prior to `22.10.0`, you must explicitly enable DSL2 by adding `nextflow.enable.dsl=2` to the top of the script or by using the `-dsl2` command-line option. +::: + +You can see that the first process is executed once, and the second twice. Finally the result string is printed. + +It's worth noting that the process `convertToUpper` is executed in parallel, so there's no guarantee that the instance processing the first split (the chunk `Hello`) will be executed before the one processing the second split (the chunk `world!`). Thus, you may very likely see the final result printed in a different order: + +``` +WORLD! +HELLO +``` + +:::{tip} +The hexadecimal string, e.g. `22/7548fa`, is the unique hash of a task, and the prefix of the directory where the task is executed. You can inspect a task's files by changing to the directory `$PWD/work` and using this string to find the specific task directory. +::: + +(getstarted-resume)= + +## Modify and resume + +Nextflow keeps track of all the processes executed in your pipeline. If you modify some parts of your script, only the processes that are actually changed will be re-executed. The execution of the processes that are not changed will be skipped and the cached result used instead. This helps a lot when testing or modifying part of your pipeline without having to re-execute it from scratch. + +For the sake of this tutorial, modify the `convertToUpper` process in the previous example, replacing the process script with the string `rev $x`, like so: + +```groovy +process convertToUpper { + input: + path x + output: + stdout + + """ + rev $x + """ +} +``` + +Then save the file with the same name, and execute it by adding the `-resume` option to the command line: + +```bash +nextflow run tutorial.nf -resume +``` + +It will print output similar to this: + +``` +N E X T F L O W ~ version 23.10.0 +executor > local (2) +[69/c8ea4a] process > splitLetters [100%] 1 of 1, cached: 1 ✔ +[d0/e94f07] process > convertToUpper [100%] 2 of 2 ✔ +olleH +!dlrow +``` + +You will see that the execution of the process `splitLetters` is actually skipped (the process ID is the same), and its results are retrieved from the cache. The second process is executed as expected, printing the reversed strings. + +:::{tip} +The pipeline results are cached by default in the directory `$PWD/work`. Depending on your script, this folder can take up a lot of disk space. It's a good idea to clean this folder periodically, as long as you know you won't need to resume any pipeline runs. +::: + +For more information, see the {ref}`cache-resume-page` page. + +(getstarted-params)= + +## Pipeline parameters + +Pipeline parameters are simply declared by prepending to a variable name the prefix `params`, separated by dot character. Their value can be specified on the command line by prefixing the parameter name with a double dash character, i.e. `--paramName` + +For the sake of this tutorial, you can try to execute the previous example specifying a different input string parameter, as shown below: + +```bash +nextflow run tutorial.nf --str 'Bonjour le monde' +``` + +The string specified on the command line will override the default value of the parameter. The output will look like this: + +``` +N E X T F L O W ~ version 23.10.0 +executor > local (4) +[8b/16e7d7] process > splitLetters [100%] 1 of 1 ✔ +[eb/729772] process > convertToUpper [100%] 3 of 3 ✔ +m el r +edno +uojnoB +``` + +:::{versionchanged} 20.11.0-edge +Any `.` (dot) character in a parameter name is interpreted as the delimiter of a nested scope. For example, `--foo.bar Hello` will be interpreted as `params.foo.bar`. If you want to have a parameter name that contains a `.` (dot) character, escape it using the back-slash character, e.g. `--foo\.bar Hello`. +:::