## Домашнее задание

Загрузите прочтения всех экспериментов (SRP127360) из статьи <a href="https://www.nature.com/articles/s41598-018-23226-4"><b>Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polyA+ selection versus rRNA depletion</b></a>, в которой сравнивают два протокола подготовки библиотек для bulk RNA-Seq: при помощи деплеции рРНК (=<code>total</code>) или при помощи селекции polyA-транскриптов (=<code>polyA+</code>).

При выполнении задания используйте Nextflow pipeline.

## Основные данные для решения задания

Access number of experiment: SRP127360. \\

Total RNA-seq of colon: SRR6410607,SRR6410608,SRR6410609,SRR6410610,SRR6410615,SRR6410616,SRR6410617,SRR6410618.

polyA+ mRNA-seq of colon: SRR6410603,SRR6410604,SRR6410605,SRR6410606,SRR6410611,SRR6410612,SRR6410613,SRR6410614.

## SRA Toolkit

Скачивание SRA Toolkit, инструмента для работы c .fastq файлами.

In [None]:
!wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/3.0.0/sratoolkit.3.0.0-ubuntu64.tar.gz
!tar -xvzf sratoolkit.3.0.0-ubuntu64.tar.gz

Настройка конфигурационных файлов SRA Toolkit.

In [None]:
!sratoolkit.3.0.0-ubuntu64/bin/vdb-config -i

[2J[?25l[?1000h[?1002h2022-11-18T18:07:08 vdb-config.3.0.0 fatal: SIGNAL - Segmentation fault 


In [None]:
!sratoolkit.3.0.0-ubuntu64/bin/fasterq-dump SRR3900953

spots read      : 4,941,237
reads read      : 9,882,474
reads written   : 9,882,474


In [None]:
!tail SRR3900953_1.fastq

+SRR3900953.4941235 8_2316_17189_101340_2 length=60
CCCCCGGGCGGGGGGGGGCGGGGGEG@@FGGGGGGF>FGEFGGG>EGAGBFGGCF>GBG1
@SRR3900953.4941236 8_2316_17335_101333_2 length=60
CATGCGGTTTGGATGTGTTTGTTGAATGCAAGCCTGTGGAGGCGTTAACGTCTCAGTTAC
+SRR3900953.4941236 8_2316_17335_101333_2 length=60
BBCCCEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@SRR3900953.4941237 8_2316_17416_101338_2 length=60
CATGCCTGGTGTAAGGAAAATATCTGAGAACCGTCAGTGCAAAATCCATGCAATGTGGCC
+SRR3900953.4941237 8_2316_17416_101338_2 length=60
BCCBCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEG


## NextFlow

Устанока NextFlow

In [None]:
!curl -fsSL https://get.nextflow.io | bash

[K
      N E X T F L O W
      version 22.10.2 build 5832
      created 13-11-2022 18:13 UTC 
      cite doi:10.1038/nbt.3820
      http://nextflow.io


Nextflow installation completed. Please note:
- the executable file `nextflow` has been created in the folder: /content
- you may complete the installation by moving it to a directory in your $PATH



### Первая версия NextFlow пайплайна

Первая версия скрипта скачивает .fastq файлы по SRA ключу и сохраняет в папке `/content/results/reads`.

```
params.SRA = "SRR000000"
params.results_dir = "results/"

log.info ""
log.info "  Q U A L I T Y   C O N T R O L  "
log.info "================================="
log.info "SRA number         : ${params.SRA}"
log.info "Results location   : ${params.results_dir}"

process DownloadFastQ {
  publishDir "${params.results_dir}"

  output:
    path "reads/*"

  script:
    """
    /content/sratoolkit.3.0.0-ubuntu64/bin/fasterq-dump ${params.SRA} -O reads/
    """
}

workflow {
  DownloadFastQ()
}
```

In [None]:
!rm -r results
!./nextflow run my_pipeline_ver_1.nf --SRA SRR3900953

rm: cannot remove 'results': No such file or directory
N E X T F L O W  ~  version 22.10.2
Launching `my_pipeline_ver_1.nf` [distracted_rosalind] DSL2 - revision: 00b75134cc

  Q U A L I T Y   C O N T R O L  
SRA number         : SRR3900953
Results location   : results/
[-        ] process > DownloadFastQ -[K
[2A
[-        ] process > DownloadFastQ [  0%] 0 of 1[K
[2A
executor >  local (1)[K
[85/779691] process > DownloadFastQ [  0%] 0 of 1[K
[3A
executor >  local (1)[K
[85/779691] process > DownloadFastQ [100%] 1 of 1 ✔[K



### Вторая версия NextFlow пайплайна

Добавлена функция создания отчета FastQC для всех прочтений.

```
params.SRA = "SRR000000"
params.results_dir = "results/"

log.info ""
log.info "  Q U A L I T Y   C O N T R O L  "
log.info "================================="
log.info "SRA number         : ${params.SRA}"
log.info "Results location   : ${params.results_dir}"

process DownloadFastQ {
  publishDir "${params.results_dir}"

  output:
    path "reads/*"

  script:
    """
    /content/sratoolkit.3.0.0-ubuntu64/bin/fasterq-dump ${params.SRA} -O reads/
    """
}

process QC {
  publishDir "${params.results_dir}"

  input:
    path x

  output:
    path "qc/*.html"

  script:
    """
    mkdir qc
    /content/FastQC/fastqc -o qc $x
    """
}

workflow {
  DownloadFastQ()
  QC( DownloadFastQ.out.collect() )
}
```

#### Скачивание и установка FastQC

In [None]:
!wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip
!unzip fastqc_v0.11.9.zip
!chmod +x FastQC/fastqc
!mkdir qc
!FastQC/fastqc -o qc SRR3900953_1.fastq.gz SRR3900953_2.fastq.gz

#### Запуск второй версии скрипта

In [None]:
!rm -r results
!./nextflow run my_pipeline_ver_2.nf --SRA SRR3900953

N E X T F L O W  ~  version 22.10.2
Launching `my_pipeline_ver_2.nf` [ridiculous_raman] DSL2 - revision: 4457d4ca78

  Q U A L I T Y   C O N T R O L  
SRA number         : SRR3900953
Results location   : results/
[-        ] process > DownloadFastQ -[K
[2A
[-        ] process > DownloadFastQ -[K
[-        ] process > QC            -[K
[3A
executor >  local (1)[K
[49/2dac8d] process > DownloadFastQ [  0%] 0 of 1[K
[-        ] process > QC            -[K
[4A
executor >  local (1)[K
[49/2dac8d] process > DownloadFastQ [  0%] 0 of 1[K
[-        ] process > QC            -[K
[4A
executor >  local (2)[K
[49/2dac8d] process > DownloadFastQ [100%] 1 of 1 ✔[K
[4d/00e925] process > QC            [  0%] 0 of 1[K
[4A
executor >  local (2)[K
[49/2dac8d] process > DownloadFastQ [100%] 1 of 1 ✔[K
[4d/00e925] process > QC            [  0%] 0 of 1[K
[4A
executor >  local (2)[K
[49/2dac8d] process > DownloadFastQ [100%] 1 of 1 ✔[K
[4d/00e925] process > QC            [100%] 1 of 1 

### Третья версия NextFlow пайплайна

```
params.results_dir = "results/"
SRA_list = params.SRA.split(",")

log.info ""
log.info "  Q U A L I T Y   C O N T R O L  "
log.info "================================="
log.info "SRA number         : ${SRA_list}"
log.info "Results location   : ${params.results_dir}"

process DownloadFastQ {
  publishDir "${params.results_dir}"

  input:
    val sra

  output:
    path "${sra}/*"

  script:
    """
    /content/sratoolkit.3.0.0-ubuntu64/bin/fasterq-dump ${sra} -O ${sra}/
    """
}

process QC {
  input:
    path x

  output:
    path "qc/*"

  script:
    """
    mkdir qc
    /content/FastQC/fastqc -o qc $x
    """
}

process MultiQC {
  publishDir "${params.results_dir}"

  input:
    path x

  output:
    path "multiqc_report.html"

  script:
    """
    multiqc $x
    """
}

workflow {
  data = Channel.of( SRA_list )
  DownloadFastQ(data)
  QC( DownloadFastQ.out )
  MultiQC( QC.out.collect() )
}
```

#### Скачивание и установка MultiQC

In [None]:
!pip3 install multiqc -q
!multiqc qc

[K     |████████████████████████████████| 1.2 MB 5.2 MB/s 
[K     |████████████████████████████████| 237 kB 52.3 MB/s 
[K     |████████████████████████████████| 46 kB 2.7 MB/s 
[K     |████████████████████████████████| 130 kB 45.8 MB/s 
[K     |████████████████████████████████| 51 kB 6.1 MB/s 
[K     |████████████████████████████████| 86 kB 4.3 MB/s 
[?25h  Building wheel for spectra (setup.py) ... [?25l[?25hdone
  Building wheel for colormath (setup.py) ... [?25l[?25hdone
  Building wheel for lzstring (setup.py) ... [?25l[?25hdone

  [34m/[0m[32m/[0m[31m/[0m ]8;id=508504;https://multiqc.info\[1mMultiQC[0m]8;;\ 🔍 [2m| v1.13[0m

[34m|           multiqc[0m | Search path : /content/qc
[2K[34m|[0m         [34msearching[0m | [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m  0%[0m [32m0/0[0m  
[?25h[34m|           multiqc[0m | [33mNo analysis results found. Cleaning up..[0m
[34m|           multiqc[0m | MultiQC complete


#### Запуск третьей версии скрипта

In [None]:
!rm -r results
!./nextflow run my_pipeline_ver_3.nf --SRA SRR6410604

N E X T F L O W  ~  version 22.10.2
Launching `my_pipeline_ver_3.nf` [jovial_mahavira] DSL2 - revision: 30656d3ce8

  Q U A L I T Y   C O N T R O L  
SRA number         : [SRR6410604]
Results location   : results/
[-        ] process > DownloadFastQ -[K
[-        ] process > QC            -[K
[3A
[-        ] process > DownloadFastQ [  0%] 0 of 1[K
[-        ] process > QC            -[K
[-        ] process > MultiQC       -[K
[4A
executor >  local (1)[K
[7e/51da7a] process > DownloadFastQ (1) [  0%] 0 of 1[K
[-        ] process > QC                -[K
[-        ] process > MultiQC           -[K
[5A
executor >  local (2)[K
[7e/51da7a] process > DownloadFastQ (1) [100%] 1 of 1 ✔[K
[82/d847b6] process > QC (1)            [  0%] 0 of 1[K
[-        ] process > MultiQC           -[K
[5A
executor >  local (2)[K
[7e/51da7a] process > DownloadFastQ (1) [100%] 1 of 1 ✔[K
[82/d847b6] process > QC (1)            [  0%] 0 of 1[K
[-        ] process > MultiQC           -[K
[5A
