# Introduction to Nextflow Lab

### 1. Setup your environment

Install mambaforge

In [None]:
! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh
! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge

In [None]:
! rm Mambaforge-Linux*

In [None]:
# add to your path, you need to redo this each session. If your windows times out, rerun this.
import os
os.environ["PATH"] += os.pathsep + os.environ["HOME"]+"/mambaforge/bin"

Install nf-core tools, nextflow, and sra tools to download data from the SRA database

In [None]:
!mamba install -c bioconda nf-core nextflow sra-tools -y

In [None]:
! nextflow -version

### 2. Clone NF Core Repositories

Download an example nf-core repo and look over the organization. Look at the main.nf, the workflows dir, and the modules dir.

In [None]:
! git clone https://github.com/nf-core/rnaseq.git

### 3. Run RNAseq and look at the work dir

In [None]:
! nextflow run nf-core/rnaseq -profile test,docker --outdir rnaseq_out/

In [None]:
! ls work/e4/69d7f6*

### 4. Create an nf-core template repository

Now let's create a template repo using the nf-core template and we will start building out a workflow using that template.

In [None]:
! nf-core create -h

In [None]:
! nf-core create -n nfcoretutorial -o nfcore-tutorial -d 'This repo is a demo of the nf core template' -a 'William Welch Deloitte' --plain

Now we have a template. The template has a lot of extra stuff, so if you were doing this for a production pipeline, you would need to go through each directory carefully and clean things up. 
The goal now is build a simple variant calling workflow for MPOX viral sequences. 

### 5. Set up your input data

In [None]:
cd nfcore-tutorial-template

In [None]:
! cp ../assets/ON56* assets/
! cp ../assets/samplesheet.csv assets/
! cp ../assets/illumina_adapters.fasta assets/

In [None]:
! fasterq-dump -f -O data -e 8 SRR23873775

In [None]:
!gzip data/SRR23873775_1.fastq
!gzip data/SRR23873775_2.fastq

In [None]:
ls data/

### 6. Add your first modules
Instructions can be [found here](https://nf-co.re/docs/contributing/tutorials/adding_modules_to_pipelines).
Modules make up the building blocks of a workflow. Here we will add a few modules to create a workflow.
+ Clean our data with fastp
+ Index our reference with bwa
+ Align to reference sequence with bwa mem
+ Call variants with IVAR

In [None]:
! nf-core modules install fastp
! nf-core modules install bwa/index
! nf-core modules install bwa/mem
! nf-core modules install ivar/variants

Note that the module files under added under modules/nf-core/. Along the same lines, if you need to develop a custom module to run a Python or bash script, you can do that by creating a module that calls your script and putting it under modules/local.

The output also gives you the include statements you need to add to the workflow file: 
```
include { FASTP                       } from '../modules/nf-core/fastp/main'
include { BWA_INDEX                   } from '../modules/nf-core/bwa/index/main'
include { BWA_MEM                     } from '../modules/nf-core/bwa/mem/main'
include { IVAR_VARIANTS               } from '../modules/nf-core/ivar/variants/main'
```

Next, we need to call these modules from within the workflow. Add the following to your workflow file under FASTQC like this:
```
    //
    // MODULE: Run FastQC
    //
    FASTQC (
        INPUT_CHECK.out.reads
    )
    ch_versions = ch_versions.mix(FASTQC.out.versions.first())

    CUSTOM_DUMPSOFTWAREVERSIONS (
        ch_versions.unique().collectFile(name: 'collated_versions.yml')
    )

    //
    // MODULE: Run Fastp
    //
    FASTP (
        READS
    )
    ch_versions = ch_versions.mix(FASTP.out.versions.first())

    CUSTOM_DUMPSOFTWAREVERSIONS (
        ch_versions.unique().collectFile(name: 'collated_versions.yml')
    )
```

Note what we are doing here. We call the module, then we give it the inputs required in the modules/nf-core/fastp/main.nf file. Then for output channels, we can look at the same main.nf file and see what is emitted. In this case, we want the trimmed reads, so it will be FASTP.out.reads. Now try and do the same thing for BWA_INDEX,BWA_MEM, and IVAR_VARIANTS. If you get stuck, we have the answers below. If you get an error like this `Process `NFCORE_TUTORIAL:TUTORIAL:IVAR_VARIANTS` declares 5 input channels but 1 were specified`, it just means that you need to add more input channels in the workflow declaration, because the main.nf for that process is expecting more values. In some cases an output is optional, and so you need to specify `true` or `false` in the declaration block of the workflow.


<details id=0>
<summary><h2>Answers</h2></summary>

    //
    // MODULE: Run bwa index
    //
    BWA_INDEX (
        params.reference
    )
    ch_versions = ch_versions.mix(BWA_INDEX.out.versions.first())
    ch_index = BWA_INDEX.out.index

    CUSTOM_DUMPSOFTWAREVERSIONS (
        ch_versions.unique().collectFile(name: 'collated_versions.yml')
    )
    
    //
    // MODULE: Run bwa mem
    //
    BWA_MEM (
        ch_index
    )
    ch_versions = ch_versions.mix(BWA_MEM.out.versions.first())
    ch_bam = BWA_MEM.out.bam

    CUSTOM_DUMPSOFTWAREVERSIONS (
        ch_versions.unique().collectFile(name: 'collated_versions.yml')
    )

    //
    // MODULE: Run IVAR Variants
    //
    IVAR_VARIANTS (
        ch_bam
    )
    ch_versions = ch_versions.mix(IVAR_VARIANTS.out.versions.first())
    ch_bam = BWA_MEM.out.bam

    CUSTOM_DUMPSOFTWAREVERSIONS (
        ch_versions.unique().collectFile(name: 'collated_versions.yml')
    )


</details>

You also need to add a few parameters to get these modules to work. You can add default params in the nextflow.config file to point to the files we moved into the assets dir. Your References section should look like this: 
```
    // References
    fasta                      = "${launchDir}/assets/ON563414_mpox_reference.fasta"
    fai                        = "${launchDir}/assets/ON563414_mpox_reference.fasta.fai"
    gff                        = "${launchDir}/assets/ON563414_mpox_reference.gff"
    adapters                   = "${launchDir}/assets/adapters.fasta"
    genome                     = null
```

Also add the following output channel after INPUT_CHECK

````
ch_reads=INPUT_CHECK.out.reads
```


### 7. Run our workflow

In [None]:
!nextflow run main.nf -profile docker --input assets/samplesheet.csv --outdir results -resume

### 8. Troubleshoot our workflow

You should get an error for FASTP about a mismatch in the number of input channels. This is because, the module/main.nf file requires 4 input channels, but you only specify one in the main workflow. So you need to edit nfcoretutorial.nf to give four channels. For example, the module file for fastq looks like this:
```
    tuple val(meta), path(reads)
    path  adapter_fasta
    val   save_trimmed_fail
    val   save_merged
```
You need to give the trimmer fasta and then just say false false for the other two. Like this and then rerun the cell above:
```
FASTP (
        ch_reads,
        params.adapters,
        false,
        false
    )
```

You need to change BWA to this input chnannel, changing `reference` to `fasta`:

BWA_INDEX (
        params.fasta
    )

Make sure you only have one block at the very end of the following:
CUSTOM_DUMPSOFTWAREVERSIONS (
        ch_versions.unique().collectFile(name: 'collated_versions.yml')
    )

If you get an issue with the versions like related to 'FIRST' then just replace the version lines like this: 
    `ch_versions = ch_versions.mix(BWA_MEM.out.versions)`


### 9. Giving up in despair and just using the premade nfcore template

If you are running in codespaces we are memory limited, and thus the pipeline will never work unfortunately. We included a more complete version of the workflow called nfcore-tutorial-template that will run better but still not all the way through. That said, hopefully at this point you have been able to get a better understanding of how the pieces of a workflow fit together. 