Skip to content

v1.0.1 – Hefty mûmakil (patch 1)

Latest
Compare
Choose a tag to compare
@muffato muffato released this 19 Oct 01:11

Overview

The pipeline takes a CSV file that contains assembly accession number, Ensembl species names (as they may differ from Tree of Life ones !), output directories, and geneset versions.
Assembly accession numbers are optional. If missing, the pipeline assumes it can be retrieved from files named ACCESSION in the standard location on disk.
The pipeline downloads the Fasta files of the genes (cdna, cds, and protein sequences) as well as the GFF3 file.
All files are compressed with bgzip, and indexed with samtools faidx or tabix.

Steps involved:

  • Download from Ensembl the GFF3 file, and the sequences of the genes in
    Fasta format.
  • Compress and index all Fasta files with bgzip, samtools faidx, and
    samtools dict.
  • Compress and index the GFF3 file with bgzip and tabix.

Fixed since v1.0.0

When a samplesheet is provided, do not process the individual command-line parameters

Dependencies

All dependencies are automatically fetched by Singularity.

  • bgzip
  • samtools
  • tabix
  • python3
  • wget
  • awk
  • gzip