diff --git a/LICENSE b/LICENSE index 133587d..5322bc3 100644 --- a/LICENSE +++ b/LICENSE @@ -1,27 +1,43 @@ -Copyright (c) 2014, MarBL -All rights reserved. +PURPOSE + +Harvest is a suite of core-genome alignment and visualization tools +for quickly analyzing intraspecific microbial genomes. + +COPYRIGHT LICENSE + +Copyright © 2014, Battelle National Biodefense Institute (BNBI); +all rights reserved. Authored by: Brian Ondov, Todd Treangen, and +Adam Phillippy + +This Software was prepared for the Department of Homeland Security +(DHS) by the Battelle National Biodefense Institute, LLC (BNBI) as +part of contract HSHQDC-07-C-00020 to manage and operate the National +Biodefense Analysis and Countermeasures Center (NBACC), a Federally +Funded Research and Development Center. Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - -* Redistributions of source code must retain the above copyright notice, this - list of conditions and the following disclaimer. - -* Redistributions in binary form must reproduce the above copyright notice, - this list of conditions and the following disclaimer in the documentation - and/or other materials provided with the distribution. - -* Neither the name of the {organization} nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE -DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +modification, are permitted provided that the following conditions are +met: + +1. Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in the +documentation and/or other materials provided with the distribution. + +3. Neither the name of the copyright holder nor the names of its +contributors may be used to endorse or promote products derived from +this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/README.md b/README.md index 5e87088..37ad279 100644 --- a/README.md +++ b/README.md @@ -1,66 +1,10 @@ -Harvest -======= +## Harvest + +![](https://raw.githubusercontent.com/marbl/gingr/master/html/img/harvest.png) Harvest is a suite of core-genome alignment and visualization tools for quickly analyzing thousands of intraspecific -microbial genomes. Harvest includes: Parsnp, a fast core-genome -multi-aligner, Gingr, a dynamic visual platform, and harvest-tools, providing both a reference compressed binary archive and format conversion tools. - -##Release status - -07/21/14: `v1.0` - -##Harvest suite download: - -* OSX (10.7 or newer): - * https://github.com/marbl/harvest/releases/download/v1.0/harvest-OSX64.tar.gz - * MD5 sum: 31c2af7272897fd1926758a4e9399e46 -* Linux/*nix: - * https://github.com/marbl/harvest/releases/download/v1.0/harvest-Linux64.tar.gz - * MD5 sum: 170b054d6c5656df64e5c9f2c8c4072c - -##Harvest suite documentation (in prep): - -* http://harvest.readthedocs.org/en/latest/ - -##Individual Harvest components: - -1. **Parsnp** - * Description: core genome aligner - * Project url: http://github.com/marbl/parsnp - * Language: C, C++, Python - * OSX prebuilt binary: - * https://github.com/marbl/parsnp/releases/download/v1.0/parsnp-OSX64.gz - * MD5 sum: 08c19b4d12e8199b2ce098550b4100e6 - * Linux prebuilt binary: - * https://github.com/marbl/parsnp/releases/download/v1.0/parsnp-Linux64.gz - * MD5 sum: f82b6b9dae456fe9263ee6214b2633af - -2. **Gingr** - * Description: GUI, interactive visualization of multiple alignments, phylogenies and variants (SNPs etc) - * Gingr is able to display: - * Newick formatted trees - * XMFA formatted multi-alignments (with synteny view) - * VCF formatted variants - * Harvest tools GGR format - * Project url: http://github.com/marbl/gingr - * Language: C++ - * OSX prebuilt binary: - * https://github.com/marbl/gingr/releases/download/v1.0/gingr-OSX64.app.zip - * MD5 sum: 84c8d3818b48656132b5adf9a125783d - * Linux prebuilt binary: - * https://github.com/marbl/gingr/releases/download/v1.0/gingr-Linux64.gz - * MD5 sum: 3e4fdab0319be4c927b8738c2223f15b - -3. **Harvest tools** - * Description: binary format and conversion utilities - * Project url: http://github.com/marbl/harvest-tools - * Language: C++, Python - * OSX prebuilt binary: - * https://github.com/marbl/harvest-tools/releases/download/v1.0/harvesttools-OSX64.gz - * MD5 sum: 4cc638e05c82bab24f2f68317347661e - * Linux prebuilt binary: - * https://github.com/marbl/harvest-tools/releases/download/v1.0/harvesttools-Linux64.gz - * MD5 sum: 4f41d17dd49eb8be7861a0325bed9910 +microbial genomes. +For more information, see [harvest.readthedocs.org](http://harvest.readthedocs.org) diff --git a/docs/content/gingr.rst b/docs/content/gingr.rst index 84fe623..04a388c 100644 --- a/docs/content/gingr.rst +++ b/docs/content/gingr.rst @@ -1,21 +1,37 @@ -==================================================== -Gingr: Interactive visualization of multi-alignments -==================================================== +===== +Gingr +===== -Project home page: https://github.com/marbl/gingr +**Interactive visualization of alignments, trees and variants** -Gingr is an interactive tool for exploring large-scale phylogenies in tandem with their corresponding multi-alignments. Gingr can display informative overviews for hundreds or thousands of genomes, while allowing researchers to move quickly to more detailed views of specific subclades and genomic regions, even down to the nucleotide level of their multi-alignments. Additionally, its dynamic display of variants allows interactive selection of various filters, such as indels, poorly aligned regions and suspected sites of recombination. Gingr works chiefly in tandem with Parsnp, an efficient tool for core-genome multi-alignment and phylogenetic reconstruction. It is also applicable, however, to other analytical tools, accepting standard file formats such as multi-Fasta, XMFA, Newick and VCF. +.. image:: gingr/logo_med.png -Contents: +Gingr is an interactive tool for exploring large-scale phylogenies in tandem with their corresponding multi-alignments. Gingr can display informative overviews for hundreds or thousands of genomes, while allowing researchers to move quickly to more detailed views of specific subclades and genomic regions, even down to the nucleotide level of their multi-alignments. Additionally, its dynamic display of variants allows interactive selection of various filters, such as indels, poorly aligned regions and suspected sites of recombination. Gingr works chiefly in tandem with Parsnp, an efficient tool for core-genome multi-alignment and phylogenetic reconstruction. It is also applicable, however, to other analytical tools, accepting standard file formats such as multi-Fasta, XMFA, Newick and VCF. + +.. image:: gingr/screen.png + :width: 462 + :height: 290 + + +**Download (v1.2)** + + * `gingr-OSX64-v1.2.zip `_ + * `gingr-Linux64-v1.2.tar.gz `_ + +**Documentation** .. toctree:: :maxdepth: 2 - gingr/quickstart - gingr/components - gingr/installation - gingr/paramaters - gingr/faq + gingr/requirements gingr/tutorial - gingr/source - gingr/license + gingr/types + +**Resources** + +`All releases`_ ~ `Source code`_ ~ `Report an issue`_ + +.. _All releases: https://github.com/marbl/gingr/releases +.. _Source code: https://github.com/marbl/gingr +.. _Report an issue: https://github.com/marbl/gingr/issues + diff --git a/docs/content/gingr/browsing.rst b/docs/content/gingr/browsing.rst new file mode 100644 index 0000000..b062631 --- /dev/null +++ b/docs/content/gingr/browsing.rst @@ -0,0 +1,20 @@ +Browsing a Gingr file +--------------------- +* Download :download:`Gingr input file ` + +* Open in Gingr (File->Open) + +.. image:: ggr.png + +* The phylogeny appears on the left. Hover over a clade to highlight and outline the corresponding tracks to the right. + +.. image:: clade.png + +* Click to zoom in on the clade + +.. image:: zoomed.png + +* The multiple alignment appears on the right, shown as a SNP heatmap when zoomed out. To see the full alignment, zoom in with the mouse wheel or by selecting a region in the ruler. + +.. image:: bases.png + diff --git a/docs/content/gingr/doc.rst b/docs/content/gingr/doc.rst new file mode 100644 index 0000000..14f0725 --- /dev/null +++ b/docs/content/gingr/doc.rst @@ -0,0 +1,15 @@ +==================================================== +Gingr: Interactive visualization of multi-alignments +==================================================== + +Project home page: https://github.com/marbl/gingr + +Gingr is an interactive tool for exploring large-scale phylogenies in tandem with their corresponding multi-alignments. Gingr can display informative overviews for hundreds or thousands of genomes, while allowing researchers to move quickly to more detailed views of specific subclades and genomic regions, even down to the nucleotide level of their multi-alignments. Additionally, its dynamic display of variants allows interactive selection of various filters, such as indels, poorly aligned regions and suspected sites of recombination. Gingr works chiefly in tandem with Parsnp, an efficient tool for core-genome multi-alignment and phylogenetic reconstruction. It is also applicable, however, to other analytical tools, accepting standard file formats such as multi-Fasta, XMFA, Newick and VCF. + +Contents: + +.. toctree:: + :maxdepth: 2 + + gingr/quickstart + gingr/types diff --git a/docs/content/gingr/flowchart.png b/docs/content/gingr/flowchart.png new file mode 100644 index 0000000..84956c5 Binary files /dev/null and b/docs/content/gingr/flowchart.png differ diff --git a/docs/content/gingr/importing.rst b/docs/content/gingr/importing.rst new file mode 100644 index 0000000..0c7b972 --- /dev/null +++ b/docs/content/gingr/importing.rst @@ -0,0 +1,42 @@ +Importing other files +--------------------- +* Create a new workspace (File->New) + +.. image:: new.png + +* Download the data files + + * Alignment: :download:`xmfa ` + * Reference: :download:`fasta ` + * Annotations: :download:`genbank ` + * Phylogeny: :download:`newick ` + +* Open the XMFA alignment (File->Open). Since XMFA files can be accompanied by reference files, the Open dialog will appear. Choose the Fasta file as the reference in this window. + +.. image:: open.png + +* The preview panes allow you to ensure that the header for the reference is the same as the first sequence in the XMFA. This allows sequences between LCBs to be shown and allows annotations to be added later. + +.. image:: xmfa.png + +* The track highlighted in blue ("england.gbk.fna.srt") is the current reference for variants. Select a new reference by right-clicking on a track. + +.. image:: reref.png + +* Next, import the phylogenetic tree (File->Open) + +.. image:: tree.png + +* Reroot the tree at the midpoint (Tree->Reroot at midpoint) + +.. image:: reroot.png + +* The tree will now be balanced at the center of the longest path + +.. image:: rerooted.png + +* Finally, import the annotations (File->Open) + +.. image:: annotated.png + +* The workspace can be saved to share or return to later (File->Save) diff --git a/docs/content/gingr/logo_med.png b/docs/content/gingr/logo_med.png new file mode 100644 index 0000000..d593545 Binary files /dev/null and b/docs/content/gingr/logo_med.png differ diff --git a/docs/content/gingr/open-gingr.png b/docs/content/gingr/open-gingr.png new file mode 100644 index 0000000..6ef125b Binary files /dev/null and b/docs/content/gingr/open-gingr.png differ diff --git a/docs/content/gingr/open.png b/docs/content/gingr/open.png new file mode 100644 index 0000000..0086408 Binary files /dev/null and b/docs/content/gingr/open.png differ diff --git a/docs/content/gingr/quickstart.rst b/docs/content/gingr/quickstart.rst index 631804f..e69de29 100644 --- a/docs/content/gingr/quickstart.rst +++ b/docs/content/gingr/quickstart.rst @@ -1,91 +0,0 @@ -Quickstart -========== - -Before you run ---------------- - - 1. To run Gingr OSX, you will need to right click to open and bypass the unsigned developer notice: - - * Future releases will be signed - -Download, install & run ------------------------ -Parsnp is distributed as a precompiled binary that should be devoid of external dependencies (all included in dist). The three steps below represent the fastest way to start using the software: - -On OSX: -""""""" - 1. wget https://github.com/marbl/gingr/releases/download/v1.0/gingr-OSX64.app.zip - 2. unzip gingr-OSX64.app.zip - -On Linux: -""""""""" - - 1. wget https://github.com/marbl/gingr/releases/download/v1.0/gingr-Linux64.gz - 2. gzip -d gingr-Linux64.gz - -Basic usage: -"""""""""""" - - 1. On OSX simply click on Gingr app (right click to bypass unsigned developer notice) - 2. On Linux, simply run:: - ./gingr-Linux64 - - -Browsing a Gingr file --------------------- -* Download :download:`Gingr input file ` - -* Open in Gingr (File->Open) - -.. image:: ggr.png - -* The phylogeny appears on the left. Hover over a clade to highlight and outline the corresponding tracks to the right. - -.. image:: clade.png - -* Click to zoom in on the clade - -.. image:: zoomed.png - -* The multiple alignment appears on the right, shown as a SNP heatmap when zoomed out. To see the full alignment, zoom in with the mouse wheel or by selecting a region in the ruler. - -.. image:: bases.png - -Importing other files ---------------------- -* Create a new workspace (File->New) - -.. image:: new.png - -* Download the data files - - * Alignment: :download:`xmfa ` - * Reference: :download:`fasta ` - * Annotations: :download:`genbank ` - * Phylogeny: :download:`newick ` - -* Import the alignment with the refrence (File->Import Alignment (XMFA & Fasta)) - -.. image:: xmfa.png - -* The track highlighted in blue ("england.gbk.fna.srt") is the current reference for variants. Select a new reference by right-clicking on a track. - -.. image:: reref.png - -* Next, import the phylogenetic tree (File->Import tree (Newick)) - -.. image:: tree.png - -* Reroot the tree at the midpoint (Tree->Reroot at midpoint) - -.. image:: reroot.png - -* The tree will now be balanced at the center of the longest path - -.. image:: rerooted.png - -* Finally, import the annotations (File->Import annotations (Genbank)) - -.. image:: annotated.png - -* The workspace can be saved to share or return to later (File->Save) diff --git a/docs/content/gingr/quickstart.rst~ b/docs/content/gingr/quickstart.rst~ deleted file mode 100644 index 4f24904..0000000 --- a/docs/content/gingr/quickstart.rst~ +++ /dev/null @@ -1,37 +0,0 @@ -Quickstart -========== - -Browsing a Gingr file --------------------- -* Download [parsnp.ggr] -* Open in Gingr (File->Open) -* [ggr.png] -* The phylogeny appears on the left. Hover over a clade to highlight and outline the corresponding tracks to the right. -* [clade.png] -* Click to zoom in on the clade -* [zoomed.png] -* The multiple alignment appears on the right, shown as a SNP heatmap when zoomed out. To see the full alignment, zoom in with the mouse wheel or by selecting a region in the ruler. -* [bases.png] - -Importing other files ---------------------- -* Create a new workspace (File->New) -* [new.png] -* Download the data files - * Alignment: [parsnp.xmfa] - * Reference: [england1.fna] - * Annotations: [england1.gbk] - * Tree: [parsnp.tree] -* Import the alignment with the refrence (File->Import Alignment (XMFA & Fasta)) -* [xmfa.png] -* The track highlighted in blue ("england.gbk.fna.srt") is the current reference for variants. Select a new reference by right-clicking on a track. -* [reref.png] -* Next, import the phylogenetic tree (File->Import tree (Newick)) -* [tree.png] -* Reroot the tree at the midpoint (Tree->Reroot at midpoint) -* [reroot.png] -* The tree will now be balanced at the center of the longest path -* [rerooted.png] -* Finally, import the annotations (File->Import annotations (Genbank)) -* [annotated.png] -* The workspace can be saved to share or return to later (File->Save) diff --git a/docs/content/gingr/requirements.rst b/docs/content/gingr/requirements.rst new file mode 100644 index 0000000..d5a16f6 --- /dev/null +++ b/docs/content/gingr/requirements.rst @@ -0,0 +1,20 @@ +Requirements +------------ + +Mac +""" +* OS X 10.7 (Lion) or later (requires 64 bit architecture) + +Linux +""""" +* 64 bit architecture +* Common distributions + * The Gingr binary should work with most recent (within ~5 years) versions of common Linux distributions, e.g.: + * CentOS (6+) + * Ubuntu (9+) + * Fedora (10+) + * ...and many others +* Source + * If the Gingr binary does not work on a particular distribution, + it may be possible to build from `source `_ + * gcc 4.8+ is required for building diff --git a/docs/content/gingr/running.rst b/docs/content/gingr/running.rst new file mode 100644 index 0000000..e60ea98 --- /dev/null +++ b/docs/content/gingr/running.rst @@ -0,0 +1,24 @@ +Running Gingr +------------- + +Mac OS X +"""""""" +* Gingr.app can be moved to the Applications folder if desired +* Double-click Gingr.app to run +* Depending on your security settings, there may be an error that Gingr is not from the Mac App Store or is from an unidentified developer. To run it anyway: + * Right click on Gingr.app + * Select "Open" from the menu + * Click the "Open" button at the next prompt + +|img_open| + +Linux +""""" +* From the desktop + * Click on the "gingr" binary +* From a terminal + * Navigate to the folder with the "gingr" binary + * Run "./gingr" + +.. |img_open| image:: open-gingr.png + diff --git a/docs/content/gingr/screen-small.png b/docs/content/gingr/screen-small.png new file mode 100644 index 0000000..af09c90 Binary files /dev/null and b/docs/content/gingr/screen-small.png differ diff --git a/docs/content/gingr/screen.png b/docs/content/gingr/screen.png new file mode 100644 index 0000000..7a9104c Binary files /dev/null and b/docs/content/gingr/screen.png differ diff --git a/docs/content/gingr/tutorial.rst b/docs/content/gingr/tutorial.rst index 7828b03..01e0121 100644 --- a/docs/content/gingr/tutorial.rst +++ b/docs/content/gingr/tutorial.rst @@ -1 +1,8 @@ -fixme \ No newline at end of file +Tutorial +======== + +.. toctree:: + + running + browsing + importing diff --git a/docs/content/gingr/types.rst b/docs/content/gingr/types.rst new file mode 100644 index 0000000..e2f415b --- /dev/null +++ b/docs/content/gingr/types.rst @@ -0,0 +1,18 @@ +File formats +============ + +The flowchart below describes the various file formats that can be imported or +exported to/from Gingr (or the `harvesttools` command line utility). + +.. image:: flowchart.png + +* Alignments + * Core only: The Gingr file format stores core alignments, or alignments that involve all genomes. When alignments are loaded, blocks that are not core will be discarded. + * When importing MAF alignments, the first sequence of the first core block is used as the reference. Each reference contig will be padded with Ns up to its first LCB and between subsequent LCBs. If alignment blocks overlap in reference coordinate space, the block seen earlier in the file will be kept; the later one will be ignored. + * Multi-fasta: This format does not store rearrangement information, so the alignment is treated as a single LCB. +* References + * XMFA files can be accompanied by Fasta reference files to provide sequence between LCBs and to allow Genbank annotations (which must have matching GI numbers) to be loaded later. Genbank files that contain sequence can also be used as references. + * Multi-fasta alignments will use the first sequence as the reference. Genbank annotations can be loaded later if the GIs match. +* Variants + * VCF files must be imported with a Fasta reference. The only fields imported are sequence identifier (CHROM), position (POS), reference allele (REF), alternate alleles (ALT), quality (QUAL), filters (FILTER, including ##FILTER specifications in the header), and genotype (GT); all other information is ignored. Additionally, Since VCF does not store complete alignment information, any insertions larger than one base will be replaced by an LCB boundary when importing. If a genotype is diploid or polyploid, only the first haplotype is used in the multi-alignment (the others are ignored). Symbolic alleles and breakends are currently unsupported and will also be ignored. When writing to VCF, only the imported fields will be populated. Indel output is also currently unimplemented, so indels will be skipped when writing. + * The multi-fasta SNP output is the same format as multi-fasta alignments, but only contains columns with unfiltered ("PASS") variants (like a Mauve SNP file). This is useful for generating phylogenetic trees, but does not contain positional information or rearrangements. diff --git a/docs/content/harvest-tools.rst b/docs/content/harvest-tools.rst index ead417b..2eddaa5 100644 --- a/docs/content/harvest-tools.rst +++ b/docs/content/harvest-tools.rst @@ -1,12 +1,20 @@ -================================================================= -harvest-tools: binary archive and format conversion tool -================================================================= +============ +HarvestTools +============ +**Archiving and postprocessing** -Project home page: https://github.com/marbl/harvest-tools +HarvestTools is a utility for creating and interfacing with Gingr files, which +are efficient archives that the Harvest Suite uses to store reference-compressed +multi-alignments, phylogenetic trees, filtered variants and annotations. Though +designed for use with Parsnp and Gingr, HarvestTools can also be used for +generic conversion between standard bioinformatics file formats. -harvest-tools primarily serves as a reference-based compression tool for large genomic datasets. In addition, it serves as a proxy between several alignment and variant & annotated feature formats. While it was designed for use with both Parsnp & Gingr, it can also be used as standalone tool to efficiently compress multiple alignments, variants, phylogenies and annotations to a single convenient binary archive. +**Download (v1.2)** -Contents: + * `harvesttools-OSX64-v1.2.zip `_ + * `harvesttools-Linux64-v1.2.tar.gz `_ + +**Documentation** .. toctree:: :maxdepth: 2 @@ -19,3 +27,12 @@ Contents: harvest/tutorial harvest/source harvest/license + +**Resources** + +`All releases`_ ~ `Source code`_ ~ `Report an issue`_ + +.. _All releases: https://github.com/marbl/harvest-tools/releases +.. _Source code: https://github.com/marbl/harvest-tools +.. _Report an issue: https://github.com/marbl/harvest-tools/issues + diff --git a/docs/content/harvest.png b/docs/content/harvest.png new file mode 100644 index 0000000..ba84ff9 Binary files /dev/null and b/docs/content/harvest.png differ diff --git a/docs/content/harvest/quickstart.rst b/docs/content/harvest/quickstart.rst index 9fef850..b15fb1a 100644 --- a/docs/content/harvest/quickstart.rst +++ b/docs/content/harvest/quickstart.rst @@ -19,14 +19,14 @@ harvest-tools is distributed as a precompiled binary. The three steps below repr On OSX: """"""" - 1. wget https://github.com/marbl/harvest-tools/releases/download/v1.0/harvesttools-OSX64.gz - 2. gzip -d harvesttools-OSX64.gz + 1. wget https://github.com/marbl/harvest-tools/releases/download/v1.2/harvesttools-OSX64-v1.2.zip + 2. tar -xvf harvesttools-OSX64-v1.2.tar.gz On Linux: """"""""" - 1. wget https://github.com/marbl/harvest-tools/releases/download/v1.0/harvesttools-Linux64.gz - 2. gzip -d harvesttools-Linux64.gz + 1. wget https://github.com/marbl/harvest-tools/releases/download/v1.2/harvesttools-Linux64-v1.2.tar.gz + 2. tar -xvf harvesttools-Linux64-v1.2.tar.gz Basic usage: """""""""""" @@ -54,24 +54,27 @@ With harvest-tools file as input, fasta formatted SNP file as output:: Command-line parameters: """"""""""""""""""""""""" - - -b: ,,"" - - -B: - - -f: - - -F: - - -g: - - -h: (show this help) - - -i: - - -m: - - -n: - - -N: - - --midpoint-reroot - - -o: - - -q: (quiet mode) - - -S: - - -v: - - -V: - - -x: - - -X: + - -i + - -b ,,"" + - -B + - -f + - -F + - -g + - -a + - -m + - -M + - -n + - -N + - --midpoint-reroot (reroot the tree at its midpoint after loading) + - -o + - -S + - -u 0/1 (update the branch values to reflect genome length) + - -v + - -V + - -x + - -X + - -h (show this help) + - -q (quiet mode) Primary output files ------------- diff --git a/docs/content/parsnp.rst b/docs/content/parsnp.rst index 6e15081..2363a82 100644 --- a/docs/content/parsnp.rst +++ b/docs/content/parsnp.rst @@ -1,6 +1,7 @@ -========================================= -Parsnp: rapid core genome multi-alignment -========================================= +====== +Parsnp +====== +**Rapid core genome multi-alignment** Project home page: https://github.com/marbl/parsnp diff --git a/docs/content/parsnp/faq.rst b/docs/content/parsnp/faq.rst index 83f0f07..9a41804 100644 --- a/docs/content/parsnp/faq.rst +++ b/docs/content/parsnp/faq.rst @@ -17,6 +17,10 @@ Q. **How can I visualize the results?** A. Gingr (http://github.com/marbl/gingr) can open Parsnp output and provide an interactive display of multi-alignments, variants and the phylogenetic tree estimated from the core genome alignment. +Q. **What % of genome X is aligned? What is the core genome alignment size?** + + A. Within the log output, there are coverage values listed that individually indicate the percentage of a given genome that is included in the core genome alignment. Note, this includes the Muscle aligned-regions plus the maximal unique matches (MUMs). The core genome alignment size can then be calculated by multiplying the coverage value, for a given genome, by its length. + Q. **Only a small percentage (<40%) of the reference genome covered by the alignments, huh?** A. Parsnp is a conservative core genome alignment method that necessarily requires that all genomes are present in each aligned regions. The focus is on aligning 1000s of closely related bacterial strains quickly while maintaining sensitivity comparable to existing WGA methods. In additon, the core genome has been shown to contain as few as 30-40% of the gene content (even in very closely-related clades) due to reductive genome evolution and/or a large accessory genome (with plenty of IS/phage elements). However, for increased sensitivity w.r.t aligned regions, and alignments containing subsets, both Mugsy and Mauve are terrific tools for the job. diff --git a/docs/content/parsnp/installation.rst b/docs/content/parsnp/installation.rst index fc279ca..c231c4c 100644 --- a/docs/content/parsnp/installation.rst +++ b/docs/content/parsnp/installation.rst @@ -27,7 +27,7 @@ Before you start, if running OSX Mavericks, OpenMP is not supported via Clang, s * Install Macports, then: - sudo port install gcc49 - - sudo port gcc-select mp-gcc49 + - sudo port select gcc mp-gcc49 * (or) Install Homebrew, then: @@ -48,7 +48,7 @@ Once OpenMP support is added, the first (required!) step is to build libMUSCLE:: cd muscle ./autogen.sh - ./configure --prefix=`pwd` + ./configure --prefix=`pwd` CXXFLAGS=’-fopenmp’ make install Then, build Parsnp:: diff --git a/docs/content/parsnp/quickstart.rst b/docs/content/parsnp/quickstart.rst index 02f44e7..cbf4925 100644 --- a/docs/content/parsnp/quickstart.rst +++ b/docs/content/parsnp/quickstart.rst @@ -1,6 +1,8 @@ Quickstart ========== +Note: If you are currently using a Parsnp version prior to **v1.2** or Harvest version prior to **v1.1.2**, you should update to use the latest release before proceeding. The latest release includes a critical fix for FastTree2; see http://darlinglab.org/blog/2015/03/23/not-so-fast-fasttree.html for further details. + Before you run --------------- @@ -25,14 +27,14 @@ Parsnp is distributed as a precompiled binary that should be devoid of external On OSX: """"""" - 1. wget https://github.com/marbl/parsnp/releases/download/v1.0/parsnp-OSX64.gz - 2. unzip -d parsnp-OSX64.gz + 1. wget https://github.com/marbl/parsnp/releases/download/v1.2/parsnp-OSX64-v1.2.tar.gz + 2. tar -xvf parsnp-OSX64-v1.2.tar.gz On Linux: """"""""" - 1. wget https://github.com/marbl/parsnp/releases/download/v1.0/parsnp-Linux64.gz - 2. unzip -d parsnp-Linux64.gz + 1. wget https://github.com/marbl/parsnp/releases/download/v1.2/parsnp-Linux64-v1.2.tar.gz + 2. tar -xvf parsnp-Linux64-v1.2.tar.gz Basic usage: """""""""""" @@ -48,12 +50,19 @@ Parsnp quick start for three example scenarios. With reference & genbank file:: - parsnp -g -d -p + parsnp -g -d -p + +NOTE: + + 1. Genbank files are currently expected to have GI numbers for indexing. This means custom Genbank files (not downloaded from NCBI) will not have annotations appear in Gingr, though the alignment should still work. The dependency on GIs is expected to change in future versions. + 2. GenBank files can only be specific for the reference genome + 3. -g and -r are mutually exclusive; you can either provide a fasta file for your reference genome, or GenBank file, but not both. + 4. All non-reference genomes are captured with the -d parameter. These genomes *must* be in fasta format and located within the specified directory. With reference but without genbank file:: parsnp -r -d -p - + Autorecruit reference to a draft assembly:: parsnp -q -d -p @@ -63,41 +72,44 @@ Command-line parameters: Input/output:: - -c = : (c)urated genome directory, use all genomes in dir and ignore MUMi? (default = NO) - -d = : (d)irectory containing genomes/contigs/scaffolds - -g = : Gen(b)ank file(s) (gbk), comma separated list (default = None) - -o = : output directory? default [./P_CURRDATE_CURRTIME] - -q = : (optional) specify (assembled) query genome to use, in addition to genomes found in genome dir (default = NONE) - -r = : (r)eference genome (set to ! to pick random one from genome dir) + -c = : (c)urated genome directory, use all genomes in dir and ignore MUMi? (default = NO) + -d = : (d)irectory containing genomes/contigs/scaffolds + -r = : (r)eference genome (set to ! to pick random one from genome dir) + -g = : Gen(b)ank file(s) (gbk), comma separated list (default = None) + -o = : output directory? default [./P_CURRDATE_CURRTIME] + -q = : (optional) specify (assembled) query genome to use, in addition to genomes found in genome dir (default = NONE) + MUMi:: - -M = : calculate MUMi and exit? overrides all other choices! (default: NO) - -U = : max (M)UMi distance (default: autocutoff based on distribution of MUMi values) + -U = : max MUMi distance value for MUMi distribution + -M = : calculate MUMi and exit? overrides all other choices! (default: NO) + -i = : max MUM(i) distance (default: autocutoff based on distribution of MUMi values) MUM search:: - -a = : min (a)NCHOR length (default = 1.1*Log(S)) - -C = : maximal cluster D value? (default=100) - -z = : min LCB si(z)e? (default = 25) + -a = : min (a)NCHOR length (default = 1.1*Log(S)) + -C = : maximal cluster D value? (default=100) + -z = : min LCB si(z)e? (default = 25) LCB alignment:: - -D = : maximal diagonal difference? Either percentage (e.g. 0.2) or bp (e.g. 100bp) (default = 0.12) - -e = greedily extend LCBs? experimental! (default = NO) - -n = : alignment program (default: libMUSCLE) + -D = : maximal diagonal difference? Either percentage (e.g. 0.2) or bp (e.g. 100bp) (default = 0.12) + -e = greedily extend LCBs? experimental! (default = NO) + -n = : alignment program (default: libMUSCLE) + -u = : output unaligned regions? .unaligned (default: NO) -SNP filters:: +Recombination filtration:: - -R = : disable (R)epeat filtering? - -x = : enable recombination filtering? (default: NO) + -x = : enable filtering of SNPs located in PhiPack identified regions of recombination? (default: NO) Misc:: - -h = : (h)elp: print this message - -P = : max partition size? limits memory usage (default= 15000000) - -p = : number of threads to use? (default= 1) - -v = : (v)erbose output? (default = NO) + -h = : (h)elp: print this message and exit + -p = : number of threads to use? (default= 1) + -P = : max partition size? limits memory usage (default= 15000000) + -v = : (v)erbose output? (default = NO) + -V = : output (V)ersion and exit Output Files ------------- diff --git a/docs/index.rst b/docs/index.rst index 7264337..e42a848 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -3,53 +3,42 @@ You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. -================================================================= -Harvest software suite for rapid genome alignment and visualization -================================================================= +Harvest +======= -Project home page: https://github.com/marbl/harvest +.. image:: content/harvest.png -============== -Citation -============== +Harvest is a suite of core-genome alignment and visualization tools +for quickly analyzing thousands of intraspecific microbial +genomes, including variant calls, recombination +detection, and phylogenetic trees. -Preprint: +.. image:: content/gingr/screen.png + :width: 462 + :height: 290 - Treangen TJ, Ondov BD, Koren S, Phillippy AM. - "Rapid Core-Genome Alignment and Visualization for Thousands of Microbial Genomes." - *bioRxiv* (2014). doi: http://dx.doi.org/10.1101/007351 +**Tools** -============== -Release status -============== +* `Parsnp `_ - Core-genome alignment and analysis +* `Gingr `_ - Interactive visualization of alignments, trees and variants +* `HarvestTools `_ - Archiving and postprocessing -07/21/14: `v1.0` +**Citation** -================= -Overview -================= + Treangen TJ, Ondov BD, Koren S, Phillippy AM. + The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. + Genome Biology, 15 (11), 1-15 [`PDF `_] -Harvest is a suite of core-genome alignment and visualization tools -for quickly analyzing thousands of intraspecific microbial -genomes. Harvest includes Parsnp, a fast core-genome multi-aligner, harvest-tools, a binary archive format and format conversion tool, -and Gingr, a dynamic visual platform. Combined they provide -rapid core-genome alignments, variant calls, recombination -detection, and phylogenetic trees. -.. image:: example.png - :target: https://raw.githubusercontent.com/marbl/harvest/master/docs/example.png +**Download (v1.1.2, 24-Mar-2015)** -Contents: +* `Harvest-OSX64-v1.1.2.tar.gz `_ +* `Harvest-Linux64-v1.1.2.tar.gz `_ .. toctree:: - :maxdepth: 5 - :numbered: + :hidden: - content/hardware - content/installation - content/gingr content/parsnp + content/gingr content/harvest-tools - content/faq -