Skip to content

Metagenome read simulation of multiple synthetic communities

License

Notifications You must be signed in to change notification settings

nick-youngblut/MGSIM

Repository files navigation

MGSIM Upload Python Package PyPI version

MGSIM

Metagenome read simulation of multiple synthetic communities

Sections

REFERENCE

DOI

DESCRIPTION

Straight-forward simulations of metagenome data from a collection of reference bacterial/archaeal genomes.

Highlights

  • Can simulate Illumina, PacBio, and/or Nanopore reads
    • For Illumina, synthetic long reads (read clouds) can also be simulated
  • Generate communities differing in:
    • Sequencing depth
    • Richness
    • Beta diversity

The workflow:

  • [optional] Download reference genomes
  • Format reference genomes
    • e.g., rename contigs
  • Simulate communities
  • Simulate reads for each community

INSTALLATION

Dependencies

See environment.yml for a list of dependencies.

You can install via:

mamba env create -f environment.yml -n mgsim

mamba is much faster than conda

Install

via pip

pip install MGSIM

via setup.py

python setpy.py install

Testing

  • conda-forge::pytest>=5.3
  • conda-forge::pytest-console-scripts>=1.2

In the MGSIM base directory, use the command pytest to run all of the tests.

To run tests on a particular test file:

pytest -s --script-launch-mode=subprocess path/to/the/test/file

Example:

pytest -s --script-launch-mode=subprocess ./tests/test_Reads.py

HOW-TO

See all subcommands:

MGSIM --list

Download genomes

MGSIM genome_download -h

Simulate communities

MGSIM communities -h

Simulate reads for each genome in each community

Simulating Illumina, PacBio, and/or Nanopore reads

MGSIM reads -h

Simulating haplotagging reads (aka read-cloud data)

MGSIM ht_reads -h

Tutorial

Reference genome download

Create Taxon-accession table

mkdir -p tutorial

cat <<-EOF > tutorial/taxon_accession.tsv
Taxon	Accession
Escherichia coli O104-H4	NC_018658.1
Clostridium perfringens ATCC.13124	NC_008261
Methanosarcina barkeri [MS]	NZ_CP009528.1
EOF

Download genomes

MGSIM genome_download -d tutorial/ tutorial/taxon_accession.tsv > tutorial/genomes.tsv

Simulate communities

Simulate 2 communities

MGSIM communities --n-comm 2 tutorial/genomes.tsv tutorial/communities

Simulate reads

Illumina reads

MGSIM reads tutorial/genomes.tsv --sr-seq-depth 1e5 tutorial/communities_abund.txt tutorial/illumina_reads/

PacBio reads

MGSIM reads tutorial/genomes.tsv --pb-seq-depth 1e3 tutorial/communities_abund.txt tutorial/pacbio_reads/

Nanopore reads

MGSIM reads tutorial/genomes.tsv --np-seq-depth 1e3 tutorial/communities_abund.txt tutorial/nanopore_reads/

LICENSE

See LICENSE

About

Metagenome read simulation of multiple synthetic communities

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages