Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samtools error during run #583

Closed
brettChapman opened this issue Mar 22, 2021 · 3 comments · Fixed by #597
Closed

Samtools error during run #583

brettChapman opened this issue Mar 22, 2021 · 3 comments · Fixed by #597
Labels
bug Something isn't working
Milestone

Comments

@brettChapman
Copy link

Hi

I'm running rnaseq v3. I'm receiving an error during the samtools indexing:

Caused by:
  Process `RNASEQ:QUANTIFY_RSEM:BAM_SORT_SAMTOOLS:SAMTOOLS_INDEX (Shoot_R2)` terminated with an error exit status (1)

Command executed:

  samtools index Shoot_R2.sorted.bam
  echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//' > samtools.version.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  [E::hts_idx_check_range] Region 537013004..537013118 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6
  [E::sam_index] Read 'A00783:502:HJWN2DSXY:4:1558:11107:14027' with ref_name='chr2H', ref_length=672273650, flags=163, pos=537013005 cannot be indexed
  samtools index: failed to create index for "Shoot_R2.sorted.bam": Numerical result out of range

Work dir:
  /data/RGT_Planet_RNA-seq_align/work/6d/c8923f873699700a2c6e0d5e4292c2

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Is there a way to set the indexing to CSI instead if BAI?

I ran the workflow with the following command:

#!/bin/bash
#SBATCH --nodes=1
# allow use of all the memory on the node
#SBATCH --ntasks-per-node=32
#SBATCH --mem=0
# request all CPU cores on the node
#SBATCH --exclusive
# Customize --time --partition as appropriate
#SBATCH --partition=large

#srun -n 1 nextflow run nf-core/rnaseq -profile test,singularity

genome=/data/RGT_Planet_RNA-seq_align/genome/RGT_Planet_pseudomolecule_v1.fasta
gff=/data/RGT_Planet_RNA-seq_align/GFF/RGT_Planet.gff

srun -n 1 nextflow run nf-core/rnaseq --max_cpus ${SLURM_NTASKS_PER_NODE} --input samples.csv --fasta ${genome} --gff ${gff} --skip_trimming --save_reference --save_align_intermeds --aligner star_rsem -profile singularity --outdir RGT_Planet_nextflow_rnaseq_out 

I skip trimming because we've already trimmed the reads with fastp.

Thank you for any help you can provide.

@brettChapman brettChapman added the bug Something isn't working label Mar 22, 2021
@brettChapman
Copy link
Author

I've now tried running using salmon instead of RSEM, and I still get an error about the index. Is there a way to change the behavior of samtools to use CSI indexing using a config file? or would indexing with CSI screw up all downstream analysis?

@drpatelh
Copy link
Member

drpatelh commented Mar 24, 2021

Copying across from Slack for my future self. "Hi @brettChapman ! I suspect that the indexing is failing due to the size of the genome? This has propped up for #nanoseq too for the Opposum genome I think 🤔 The trick was to get the pipeline to use an older version of SAMtools via a custom.config but I am not sure if that will break any downstream steps. I suspect it will because we have to explicitly name the channels by file extension with DSL2 so this really needs to be factored into the SAMTOOLS_INDEX module. This is a relatively new feature for SAMtools so will have to look into it and I will try and add it in for the next release when I get around to finessing that all up. How long is a piece of genome!"

samtools/samtools#1011 (comment)

@drpatelh
Copy link
Member

drpatelh commented May 7, 2021

Added a boolean --bam_csi_index parameter to the pipeline in #597 which will create and use *.csi indices in the pipeline as opposed to the conventional *.bai indices. Please use this parameter if you have genomes with really long chromosomes.

You may still get the error in the issue below but that needs to be fixed upstream in bx-python:
#608

However, to bypass the above error, you can customise the --rseqc_modules parameter to skip the steps that are erroring too e.g. --rseqc_modules 'bam_stat,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants