Samtools error during run #583

brettChapman · 2021-03-22T01:25:21Z

Hi

I'm running rnaseq v3. I'm receiving an error during the samtools indexing:

Caused by:
  Process `RNASEQ:QUANTIFY_RSEM:BAM_SORT_SAMTOOLS:SAMTOOLS_INDEX (Shoot_R2)` terminated with an error exit status (1)

Command executed:

  samtools index Shoot_R2.sorted.bam
  echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//' > samtools.version.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  [E::hts_idx_check_range] Region 537013004..537013118 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6
  [E::sam_index] Read 'A00783:502:HJWN2DSXY:4:1558:11107:14027' with ref_name='chr2H', ref_length=672273650, flags=163, pos=537013005 cannot be indexed
  samtools index: failed to create index for "Shoot_R2.sorted.bam": Numerical result out of range

Work dir:
  /data/RGT_Planet_RNA-seq_align/work/6d/c8923f873699700a2c6e0d5e4292c2

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Is there a way to set the indexing to CSI instead if BAI?

I ran the workflow with the following command:

#!/bin/bash
#SBATCH --nodes=1
# allow use of all the memory on the node
#SBATCH --ntasks-per-node=32
#SBATCH --mem=0
# request all CPU cores on the node
#SBATCH --exclusive
# Customize --time --partition as appropriate
#SBATCH --partition=large

#srun -n 1 nextflow run nf-core/rnaseq -profile test,singularity

genome=/data/RGT_Planet_RNA-seq_align/genome/RGT_Planet_pseudomolecule_v1.fasta
gff=/data/RGT_Planet_RNA-seq_align/GFF/RGT_Planet.gff

srun -n 1 nextflow run nf-core/rnaseq --max_cpus ${SLURM_NTASKS_PER_NODE} --input samples.csv --fasta ${genome} --gff ${gff} --skip_trimming --save_reference --save_align_intermeds --aligner star_rsem -profile singularity --outdir RGT_Planet_nextflow_rnaseq_out

I skip trimming because we've already trimmed the reads with fastp.

Thank you for any help you can provide.

The text was updated successfully, but these errors were encountered:

brettChapman · 2021-03-23T02:32:18Z

I've now tried running using salmon instead of RSEM, and I still get an error about the index. Is there a way to change the behavior of samtools to use CSI indexing using a config file? or would indexing with CSI screw up all downstream analysis?

drpatelh · 2021-03-24T05:31:13Z

Copying across from Slack for my future self. "Hi @brettChapman ! I suspect that the indexing is failing due to the size of the genome? This has propped up for #nanoseq too for the Opposum genome I think 🤔 The trick was to get the pipeline to use an older version of SAMtools via a custom.config but I am not sure if that will break any downstream steps. I suspect it will because we have to explicitly name the channels by file extension with DSL2 so this really needs to be factored into the SAMTOOLS_INDEX module. This is a relatively new feature for SAMtools so will have to look into it and I will try and add it in for the next release when I get around to finessing that all up. How long is a piece of genome!"

samtools/samtools#1011 (comment)

drpatelh · 2021-05-07T11:17:57Z

Added a boolean --bam_csi_index parameter to the pipeline in #597 which will create and use *.csi indices in the pipeline as opposed to the conventional *.bai indices. Please use this parameter if you have genomes with really long chromosomes.

You may still get the error in the issue below but that needs to be fixed upstream in bx-python:
#608

However, to bypass the above error, you can customise the --rseqc_modules parameter to skip the steps that are erroring too e.g. --rseqc_modules 'bam_stat,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication'

brettChapman added the bug Something isn't working label Mar 22, 2021

drpatelh added this to the 3.1 milestone Apr 11, 2021

drpatelh mentioned this issue Apr 16, 2021

Add ability to use CSI indices in pipeline for genomes with super long chromosomes #597

Merged

drpatelh closed this as completed Apr 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Samtools error during run #583

Samtools error during run #583

brettChapman commented Mar 22, 2021

brettChapman commented Mar 23, 2021

drpatelh commented Mar 24, 2021 •

edited

drpatelh commented May 7, 2021

Samtools error during run #583

Samtools error during run #583

Comments

brettChapman commented Mar 22, 2021

brettChapman commented Mar 23, 2021

drpatelh commented Mar 24, 2021 • edited

drpatelh commented May 7, 2021

drpatelh commented Mar 24, 2021 •

edited