Improving genome bins through the combination of different binning programs
Switch branches/tags
Nothing to show
Clone or download
Latest commit 1be43d1 May 27, 2018
Permalink
Failed to load latest commit information.
examples v1.2 Dec 2, 2017
images v1.2 Dec 2, 2017
manual v1.2 May 21, 2018
previous_versions v1.0 Dec 16, 2017
.gitignore v1.2 Nov 30, 2017
Binning_refiner.py v1.2 May 27, 2018
LICENSE.txt v1.2 May 21, 2018
README.md v1.2 May 27, 2018
get_sankey_plot.R v1.2 May 27, 2018

README.md

logo

Publication

Dependencies:

Change Log:

Version 1.2 (2017-11-30):

  • Binning_refiner has been simplified to keep only the core functions, which made it much easier to install and use, hope you enjoy it :)

Important notification !!!

  • In the original version of Binning_refiner, the blast approach (as described in its publication) was used to identify the same contig among input bin sets. As Binning_refiner was designed to refine bins derived from the same set of assemblies and the blast step is time-consuming (especially for big dataset), the same assembly among different bin sets was identified by its ID rather than blastn, which made Binning_refiner much faster to run and more easier to install.

How to install:

  1. Install Python and Biopython.

     # for Katana users from UNSW, simply run
     $ module load python/3.5.2
    
  2. Download Binning_refiner.py to the place your want, it is ready to run now

     $ python path/to/Binning_refiner.py -h
    
  3. In case you want to see the correlations between your input bin sets (figure below), you need to have R and its following two packages installed: optparse and googleVis

Help information:

    python Binning_refiner.py -h
      -h, --help      show this help message and exit
      -1              first bin folder name
      -2              second bin folder name
      -3              third bin folder name
      -x1             file extension for bin set 1, default: fasta
      -x2             file extension for bin set 2, default: fasta
      -x3             file extension for bin set 3, default: fasta
      -prefix         prefix of refined bins, default: Refined
      -ms             minimal size for refined bins, default: 524288 (0.5Mbp)

How to run:

  1. All bins in one folder must have same file extension.

  2. Binning_refiner now compatible with both python2 and python3.

     # For two binning programs (e.g. MetaBAT and MyCC)
     python Binning_refiner.py -1 MetaBAT -2 MyCC -x1 fa -prefix Refined
    
     # For three binning programs (e.g. MetaBAT, MyCC and CONCOCT)
     python Binning_refiner.py -1 MetaBAT -2 MyCC -3 CONCOCT -x1 fa -x3 fa -prefix Refined
    

Output files:

  1. All refined bins larger than defined bin size cutoff.

  2. The id of the contigs in each refined bin.

  3. The size of refined bins and where its contigs come from.

  4. You may want to run get_sankey_plot.R to visualize the correlations between your input bin sets (Figure below). To run it, you need to have R and its following two packages installed: optparse and googleVis.

     # Example command
     Rscript get_sankey_plot.R -f GoogleVis_Sankey_0.5Mbp.csv -x 800 -y 1000
    

    Sankey_plot