Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hands-On Tutorial - "Nextflow DSL1 is no longer supported — Update your script to DSL2" #255

Closed
felixm3 opened this issue Aug 14, 2023 · 8 comments

Comments

@felixm3
Copy link

felixm3 commented Aug 14, 2023

Hello,

I'm new to Nextflow and trying to run the Hands-On Tutorial here: https://training.nextflow.io/hands_on/

I'm getting this message though:


N E X T F L O W  ~  version 23.04.1
Nextflow DSL1 is no longer supported — Update your script to DSL2

How do I update my script to DSL2 please?

Thanks in advance.

@mribeirodantas
Copy link
Member

Hello, @felixm3. This is a very outdated material. Maybe we should take it down if there isn't time to update it... 😢

But to answer your question, Nextflow official documentation has a section on DSL2 here.

@felixm3
Copy link
Author

felixm3 commented Aug 15, 2023

I see.

Is there an updated training somewhere else that I can use to learn Nextflow then please?

@mribeirodantas
Copy link
Member

Yes. There are two trainings here. The up-to-date community training, and the second one that you're using. The first one is very complete and up-to-date.

@mribeirodantas
Copy link
Member

I'm updating the hands-on workshop training, @felixm3. You can check the complete script converted to DSL2 in the final_main.nf file in this draft PR here.

@felixm3
Copy link
Author

felixm3 commented Aug 16, 2023

Thank you for the updates!

It now appears to be running however halfway through it stops with an error:


% nextflow run main.nf  
N E X T F L O W  ~  version 23.04.1
Launching `main.nf` [grave_hugle] DSL2 - revision: 083c39247f
executor >  local (10)
[69/91f61c] process > prepare_genome_samtools   [100%] 1 of 1 ✔
[59/d23d44] process > prepare_genome_picard     [100%] 1 of 1 ✔
[ad/843fc6] process > prepare_star_genome_index [100%] 1 of 1 ✔
[da/c82cf6] process > prepare_vcf_file          [100%] 1 of 1 ✔
[62/118f6a] process > rnaseq_mapping_star (6)   [  0%] 0 of 6
[-        ] process > rnaseq_gatk_splitNcigar   -
[-        ] process > rnaseq_gatk_recalibrate   -
[-        ] process > rnaseq_call_variants      -
[-        ] process > post_process_vcf          -
[-        ] process > prepare_vcf_for_ase       -
[-        ] process > ASE_knownSNPs             -
ERROR ~ Error executing process > 'rnaseq_mapping_star (2)'

Caused by:
  Process `rnaseq_mapping_star (2)` terminated with an error exit status (137)

Command executed:

  # ngs-nf-dev Align reads to genome
  STAR --genomeDir genome_dir          --readFilesIn ENCSR000CPO1_1.fastq.gz ENCSR000CPO1_2.fastq.gz          --runThreadN 1          --readFilesCommand zcat          --outFilterType BySJout          --alignSJoverhangMin 8          --alignSJDBoverhangMin 1          --outFilterMismatchNmax 999
  
  # 2nd pass (improve alignments using table of splice junctions and create a new index)
  mkdir genomeDir
  STAR --runMode genomeGenerate          --genomeDir genomeDir          --genomeFastaFiles genome.fa          --sjdbFileChrStartEnd SJ.out.tab          --sjdbOverhang 75          --runThreadN 1
  
  # Final read alignments
  STAR --genomeDir genomeDir          --readFilesIn ENCSR000CPO1_1.fastq.gz ENCSR000CPO1_2.fastq.gz          --runThreadN 1          --readFilesCommand zcat          --outFilterType BySJout          --alignSJoverhangMin 8          --alignSJDBoverhangMin 1          --outFilterMismatchNmax 999          --outSAMtype BAM SortedByCoordinate          --outSAMattrRGline ID:ENCSR000CPO1 LB:library PL:illumina PU:machine SM:GM12878
  
  # Index the BAM file
executor >  local (10)
[69/91f61c] process > prepare_genome_samtools   [100%] 1 of 1 ✔
[59/d23d44] process > prepare_genome_picard     [100%] 1 of 1 ✔
[ad/843fc6] process > prepare_star_genome_index [100%] 1 of 1 ✔
[da/c82cf6] process > prepare_vcf_file          [100%] 1 of 1 ✔
[62/118f6a] process > rnaseq_mapping_star (6)   [100%] 1 of 1, failed: 1
[-        ] process > rnaseq_gatk_splitNcigar   -
[-        ] process > rnaseq_gatk_recalibrate   -
[-        ] process > rnaseq_call_variants      -
[-        ] process > post_process_vcf          -
[-        ] process > prepare_vcf_for_ase       -
[-        ] process > ASE_knownSNPs             -
ERROR ~ Error executing process > 'rnaseq_mapping_star (2)'

Caused by:
  Process `rnaseq_mapping_star (2)` terminated with an error exit status (137)

Command executed:

  # ngs-nf-dev Align reads to genome
  STAR --genomeDir genome_dir          --readFilesIn ENCSR000CPO1_1.fastq.gz ENCSR000CPO1_2.fastq.gz          --runThreadN 1          --readFilesCommand zcat          --outFilterType BySJout          --alignSJoverhangMin 8          --alignSJDBoverhangMin 1          --outFilterMismatchNmax 999
  
  # 2nd pass (improve alignments using table of splice junctions and create a new index)
  mkdir genomeDir
  STAR --runMode genomeGenerate          --genomeDir genomeDir          --genomeFastaFiles genome.fa          --sjdbFileChrStartEnd SJ.out.tab          --sjdbOverhang 75          --runThreadN 1
  
  # Final read alignments
  STAR --genomeDir genomeDir          --readFilesIn ENCSR000CPO1_1.fastq.gz ENCSR000CPO1_2.fastq.gz          --runThreadN 1          --readFilesCommand zcat          --outFilterType BySJout          --alignSJoverhangMin 8          --alignSJDBoverhangMin 1          --outFilterMismatchNmax 999          --outSAMtype BAM SortedByCoordinate          --outSAMattrRGline ID:ENCSR000CPO1 LB:library PL:illumina PU:machine SM:GM12878
  
  # Index the BAM file
  samtools index Aligned.sortedByCoord.out.bam

Command exit status:
  137

Command output:
  Aug 16 16:55:56 ..... started STAR run
  Aug 16 16:55:56 ..... loading genome

Command error:
  WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
  /bin/bash: line 0: export: `Documents/com~apple~CloudDocs/Bioinformatics': not a valid identifie
  /bin/bash: line 0: export: `Research/nextflow/training/hands-on/bin': not a valid identifier
  Aug 16 16:55:56 ..... started STAR run
  Aug 16 16:55:56 ..... loading genome
  .command.sh: line 3:    11 Killed                  STAR --genomeDir genome_dir --readFilesIn ENCSR000CPO1_1.fastq.gz ENCSR000CPO1_2.fastq.gz --runThreadN 1 --readFilesCommand zcat --outFilterType BySJout --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999

Work dir:
  /Users/felixm/Library/Mobile Documents/com~apple~CloudDocs/Bioinformatics Research/nextflow/training/hands-on/work/a5/dba5978f207ddcaa1653f17a2dafd2

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details


I'm running a MacBook Pro Apple M2 Max (Apple Silicon)

@mribeirodantas
Copy link
Member

137 exit status refers to Docker not having enough RAM memory. You can increase this on your Docker configuration (Docker Desktop), but you shouldn't really be running Docker with x86 images on Apple Silicon. It's a pain as it requires emulation and it's usually much slower, tends to freeze, and so on. I work on macOS too and I always run my pipelines on Linux machines, such as in Gitpod. To learn, I'd strongly suggest you use Gitpod.

@felixm3
Copy link
Author

felixm3 commented Aug 17, 2023

I'm not familiar with Gitpod but will definitely look it up. Thank you for bringing it to my attention.

I increased the memory on my Docker Desktop settings and you're right, it is quite slow.

It does however run but throws an error in the very last step...

Any idea how to get around this final error please?

% nextflow run main.nf
N E X T F L O W  ~  version 23.04.1
Launching `main.nf` [distraught_fourier] DSL2 - revision: 083c39247f
executor >  local (27)
[b9/20b1d0] process > prepare_genome_samtools                [100%] 1 of 1 ✔
[d7/6088d8] process > prepare_genome_picard                  [100%] 1 of 1 ✔
[57/80f966] process > prepare_star_genome_index              [100%] 1 of 1 ✔
[92/e2fd78] process > prepare_vcf_file                       [100%] 1 of 1 ✔
[d4/c4b7bf] process > rnaseq_mapping_star (4)                [100%] 6 of 6 ✔
[cd/2b1040] process > rnaseq_gatk_splitNcigar (ENCSR000CPO2) [100%] 6 of 6 ✔
[ba/a42fe5] process > rnaseq_gatk_recalibrate (ENCSR000CPO2) [ 50%] 3 of 6
[9f/99bd02] process > rnaseq_call_variants (ENCSR000CPO)     [ 33%] 1 of 3
[85/7cd5be] process > post_process_vcf (ENCSR000CPO)         [100%] 1 of 1
[d2/9c3999] process > prepare_vcf_for_ase (ENCSR000CPO)      [  0%] 0 of 1
[-        ] process > ASE_knownSNPs                          -
ERROR ~ Error executing process > 'prepare_vcf_for_ase (ENCSR000CPO)'

Caused by:
  Process `prepare_vcf_for_ase (ENCSR000CPO)` terminated with an error exit status (127)

Command executed:

  awk 'BEGIN{OFS="	"} $4~/B/{print $1,$2,$3}' commonSNPs.diff.sites_in_files  > test.bed
  
  vcftools --vcf final.vcf --bed test.bed --recode --keep-INFO-all --stdout > known_snps.vcf
  
  grep -v '#'  known_snps.vcf | awk -F '\t' '{print $10}'                 |awk -F ':' '{print $2}'|perl -ne 'chomp($_);                 @v=split(/\,/,$_); if($v[0]!=0 ||$v[1] !=0)                {print  $v[1]/($v[1]+$v[0])."\n"; }' |awk '$1!=1'                 >AF.4R
  
  gghist.R -i AF.4R -o AF.histogram.pdf

Command exit status:
  127

Command output:
  (empty)

Command error:
  WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
executor >  local (27)
[b9/20b1d0] process > prepare_genome_samtools                [100%] 1 of 1 ✔
[d7/6088d8] process > prepare_genome_picard                  [100%] 1 of 1 ✔
[57/80f966] process > prepare_star_genome_index              [100%] 1 of 1 ✔
[92/e2fd78] process > prepare_vcf_file                       [100%] 1 of 1 ✔
[d4/c4b7bf] process > rnaseq_mapping_star (4)                [100%] 6 of 6 ✔
[cd/2b1040] process > rnaseq_gatk_splitNcigar (ENCSR000CPO2) [100%] 6 of 6 ✔
[5d/e05ce3] process > rnaseq_gatk_recalibrate (ENCSR000CPO1) [100%] 3 of 3
[77/bb5fa7] process > rnaseq_call_variants (ENCSR000COR)     [100%] 1 of 1
[85/7cd5be] process > post_process_vcf (ENCSR000CPO)         [100%] 1 of 1
[d2/9c3999] process > prepare_vcf_for_ase (ENCSR000CPO)      [100%] 1 of 1, failed: 1
[-        ] process > ASE_knownSNPs                          -
ERROR ~ Error executing process > 'prepare_vcf_for_ase (ENCSR000CPO)'

Caused by:
  Process `prepare_vcf_for_ase (ENCSR000CPO)` terminated with an error exit status (127)

Command executed:

  awk 'BEGIN{OFS="	"} $4~/B/{print $1,$2,$3}' commonSNPs.diff.sites_in_files  > test.bed
  
  vcftools --vcf final.vcf --bed test.bed --recode --keep-INFO-all --stdout > known_snps.vcf
  
  grep -v '#'  known_snps.vcf | awk -F '\t' '{print $10}'                 |awk -F ':' '{print $2}'|perl -ne 'chomp($_);                 @v=split(/\,/,$_); if($v[0]!=0 ||$v[1] !=0)                {print  $v[1]/($v[1]+$v[0])."\n"; }' |awk '$1!=1'                 >AF.4R
  
  gghist.R -i AF.4R -o AF.histogram.pdf

Command exit status:
  127

Command output:
  (empty)

Command error:
  WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
  /bin/bash: line 0: export: `Documents/com~apple~CloudDocs/Bioinformatics': not a valid identifie
  /bin/bash: line 0: export: `Research/nextflow/training/hands-on/bin': not a valid identifier
  
  VCFtools - 0.1.14
  (C) Adam Auton and Anthony Marcketta 2009
  
  Parameters as interpreted:
  	--vcf final.vcf
  	--recode-INFO-all
  	--recode
  	--stdout
  	--bed test.bed
  
  After filtering, kept 1 out of 1 Individuals
  Outputting VCF file...
  	Read 51 BED file entries.
  After filtering, kept 50 out of a possible 444 Sites
  Run Time = 0.00 seconds
  .command.sh: line 8: gghist.R: command not found

Work dir:
  /Users/felixm/Library/Mobile Documents/com~apple~CloudDocs/Bioinformatics Research/nextflow/training/hands-on/work/d2/9c399978caa0db1327a7bc02a969b8

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

@mribeirodantas
Copy link
Member

For some reason it can't find gghist.R. Does it have execution permission? Are you running from the folder of the GitHub repository, in the folder that the tutorial tells you to? I finished fixing the codes and training. You can follow it from there now 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants