Skip to content

kussell-lab/AssemblyAlignmentGenerator

Repository files navigation

AssemblyAlignmentGenerator

This program generates core-gene alignments from a list of assemblies. It downloads the genomic sequences from ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/ and re-annotates them using Prokka. It then uses Roary to generate the pan-genome, and extracts the core genome, which are a set of genes that appear in all the assemblies. The protein sequences of each core gene are aligned by MUSCLE, and then back-translated to DNA sequences.

Installation

The program was written in Bash, Go and Python. It requires following programs:

and Python libaries:

  • pip install --user tqdm biopython

and Go libaries:

  • go get -u github.com/cheggaaa/pb
  • go get -u github.com/mattn/go-sqlite3
  • go get -u gopkg.in/alecthomas/kingpin.v2
  • go get -u github.com/kussell-lab/biogo/seq

A docker file is also provided for building a docker image (see https://docs.docker.com/ for how to use docker). The docker file also shows how to install this program in Ubuntu 17.10.

Usage

AssemblyAlignmentGenerate <assembly summary file> <accession list file> <output directory> <output prefix>

  • <assembly summary file> can be downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt;
  • <accession list file> contain a list of assembly accessions;
  • <output directory> contains the results;
  • <output prefix> is the prefix of the results.

The output is a XMFA file containing the final alignments of DNA sequences of the core genes. The file can be found in <output directory>/<output prefix>_core.xmfa.

About

Generating core-gene alignments from assemblies.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published