Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bedtools coverage fails when gff file and genome file are not sorted the same way #1037

Closed
5 tasks done
IdoBar opened this issue Dec 12, 2023 · 1 comment · Fixed by #1052
Closed
5 tasks done

bedtools coverage fails when gff file and genome file are not sorted the same way #1037

IdoBar opened this issue Dec 12, 2023 · 1 comment · Fixed by #1052
Labels
bug Something isn't working

Comments

@IdoBar
Copy link
Contributor

IdoBar commented Dec 12, 2023

Check Documentation

I have checked the following places for your error:

Description of the bug

Bedtools coverage fails when there's a mismatch between the sorting order of the genome and the gff files.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Use the following flags in an Eager run:
--fasta "https://ftp.ensembl.org/pub/grch37/current/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna_sm.primary_assembly.fa.gz" \
--anno_file "https://ftp.ensembl.org/pub/grch37/current/gff3/homo_sapiens/Homo_sapiens.GRCh37.87.gff3.gz" \
--run_bedtools_coverage
  1. See error: Please provide your error message
Error executing process > 'bedtools (AB_libmerged)'

Caused by:
  Process `bedtools (AB_libmerged)` terminated with an error exit status (1)

Command executed:

  ## Create genome file from bam header
  samtools view -H AB_udghalf_libmerged_rmdup.bam | grep '@SQ' | sed 's#@SQ	SN:\|LN:##g' > genome.txt
  
  ##  Run bedtools
  bedtools coverage -nonamecheck -g genome.txt -sorted -a Homo_sapiens.GRCh37.87.gff3 -b AB_udghalf_libmerged_rmdup.bam | pigz -p 1 > "AB_udghalf_libmerged_rmdup".breadth.gz
  bedtools coverage -nonamecheck -g genome.txt -sorted -a Homo_sapiens.GRCh37.87.gff3 -b AB_udghalf_libmerged_rmdup.bam -mean | pigz -p 1 > "AB_udghalf_libmerged_rmdup".depth.gz

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error: Sorted input specified, but the file Homo_sapiens.GRCh37.87.gff3 has the following record with a different sort order than the genomeFile genome.txt
  GL000192.1	GRCh37	supercontig	1	547496	.	.	.	ID=supercontig:GL000192.1;Alias=NT_167207.1

Expected behaviour

I expect bedtools coverage to complete successfully.
I was able to overcome this by removing the -sorted flag and letting bedtools sort the files when running the command.

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file
  • The exact error:

System

  • Hardware: HPC
  • Executor: PBSpro
  • OS: RHEL
  • Version 7.6

Nextflow Installation

  • Version: 21.10.6 build 5660

Container engine

  • Engine: Singularity
  • version: 3.5.3
  • Image tag: nfcore/eager:2.5.0

Additional context

Fixed the problem by removing the -sorted flag from the command, see #1036
A similar change should be done to the DLS2 version of the bedtools module in line 24 of modules/nf-core/bedtools/main.nf, but I haven't tested it.

Thanks, Ido

@IdoBar IdoBar added the bug Something isn't working label Dec 12, 2023
@TCLamnidis TCLamnidis added this to the 2.5.1 - Bopfingen (patch) milestone Feb 8, 2024
@TCLamnidis TCLamnidis mentioned this issue Feb 16, 2024
7 tasks
@TCLamnidis
Copy link
Collaborator

Fixed by #1052

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants