Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--star_index argument is ignored with --aligner 'star_rsem' option #568

Closed
ftucos opened this issue Feb 11, 2021 · 2 comments
Closed

--star_index argument is ignored with --aligner 'star_rsem' option #568

ftucos opened this issue Feb 11, 2021 · 2 comments
Milestone

Comments

@ftucos
Copy link

ftucos commented Feb 11, 2021

Description of the bug

In my pipeline I specified the path to both the STAR and the RSEM index, since they were generated independently
This is how the command looks like:

...
STAR_INDEX=/mnt/disks/data/resources/genomes/STAR/GRCh38_overhang_50bp
RSEM_INDEX=/mnt/disks/data/resources/genomes/RSEM/GRCh38_gencode

$NEXTFLOW run nf-core/rnaseq \
	--input data/samplesheet.csv \
	--fasta $GENOME_SEQUENCE \
	--gtf $GENOME_ANNOTATION --gencode \
	--star_index $STAR_INDEX \
	--rsem_index $RSEM_INDEX \
	--aligner 'star_rsem' \
	--skip_markduplicates \
	-profile docker \
	--max_cpus 16 \
	--max_memory '100.GB'

as you can see, the STAR index was manually generated with STAR in a separated folder and not with RSEM.
The problem I'm facing is that the pipeline is calling STAR with the path to the rsem_index in place of the proper one.

I think tha this is happening because with the option --aligner 'star_rsem' STAR is not called directly but through the RSEM function rsem-calculate-expression thus ignoring the --star_index argument

Expected behaviour

I'm aware that the simple fix is to generate the star index together with the RSEM one with the rsem-prepare-reference function but I think it would be worth adding a warning in case someone tries to specify the STAR index with the --aligner 'star_rsem' option

Log files

Error executing process > 'RNASEQ:QUANTIFY_RSEM:RSEM_CALCULATEEXPRESSION (RT112_siNUMB_R1)'

Caused by:
  Process `RNASEQ:QUANTIFY_RSEM:RSEM_CALCULATEEXPRESSION (RT112_siNUMB_R1)` terminated with an error exit status (255)

Command executed:

  INDEX=`find -L ./ -name "*.grp" | sed 's/.grp//'`
  rsem-calculate-expression \
      --num-threads 6 \
      --temporary-folder ./tmp/ \
      --strandedness reverse \
      --paired-end \
      --star --star-output-genome-bam --star-gzipped-read-file --estimate-rspd --seed 1 \
      RT112_siNUMB_R1_1_val_1.fq.gz RT112_siNUMB_R1_2_val_2.fq.gz \
      $INDEX \
      RT112_siNUMB_R1
  
  rsem-calculate-expression --version | sed -e "s/Current version: RSEM v//g" > rsem.version.txt

Command exit status:
  255

Command output:
 STAR --genomeDir ./GRCh38_gencode  --outSAMunmapped Within  --outFilterType BySJout 
 --outSAMattributes NH HI AS NM MD  --outFilterMultimapNmax 20  --outFilterMismatchNmax 999 
 --outFilterMismatchNoverLmax 0.04  --alignIntronMin 20  --alignIntronMax 1000000  --alignMatesGapMax 1000000 
 --alignSJoverhangMin 8  --alignSJDBoverhangMin 1  --sjdbScore 1  --runThreadN 6  --genomeLoad NoSharedMemory 
 --outSAMtype BAM Unsorted  --quantMode TranscriptomeSAM  --outSAMheaderHD @HD VN:1.4 SO:unsorted
  --outFileNamePrefix ./tmp//RT112_siNUMB_R1  --readFilesCommand zcat 
  --readFilesIn RT112_siNUMB_R1_1_val_1.fq.gz RT112_siNUMB_R1_2_val_2.fq.gz

System

  • Hardware: Google Cloud Engine VM
  • OS: Ubuntu 20

Nextflow Installation

  • Version: nextflow-21.02.0-edge-all

Container engine

  • Engine: Docked
  • Version: 20.10.3, build 48d30b5
@ftucos ftucos added the bug Something isn't working label Feb 11, 2021
@drpatelh
Copy link
Member

Hi @ftucos ! Well spotted and apologies for the inconvenience! Yes, the pipeline has a number of workarounds to deal with STAR indices at the moment. The main reason I implemented things in this way is because we have to support an older version of STAR in this pipeline for legacy purposes in order to support the AWS iGenomes references we provide as standard (see footnote here). However, because we don't have any RSEM indices on AWS iGenomes we can use the latest version of STAR with that and so I added functionality to that module to build the entire index including the STAR files (see #511). In any case, for the purposes of simplicity with the Nextflow implementation I would suggest that the STAR and RSEM indices are placed in the same folder and provided via --rsem_index

@drpatelh drpatelh added documentation and removed bug Something isn't working labels Feb 17, 2021
@drpatelh drpatelh added this to the 3.1 milestone Apr 11, 2021
@drpatelh
Copy link
Member

Hi @ftucos. As you suggested, the pipeline will now generate a warning when --aligner star_rsem, --rsem_index and --star_index are provided together. Thanks for reporting!

image

drpatelh added a commit to drpatelh/nf-core-rnaseq that referenced this issue Apr 14, 2021
drpatelh added a commit to drpatelh/nf-core-rnaseq that referenced this issue Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants