# Binning COL_032024 seq data

Goal: use a combination of MetaBAT, MaxBin, CONCOCT. Then use DAS Tool toÂ integrate the results of these binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly

Software:

MetaBAT2: https://bitbucket.org/berkeleylab/metabat/src/master/README.md

MaxBin2:https://flowcraft.readthedocs.io/en/latest/user/components/maxbin2.html

CONCOCT: https://github.com/BinPro/CONCOCT; https://concoct.readthedocs.io/en/latest/usage.html

CheckM: https://github.com/Ecogenomics/CheckM/wiki

DAS_tool: https://github.com/cmks/DAS_Tool

In [None]:
#INSTALLATION env

module load conda/latest
conda create -n binning python=3.7
conda activate binning
conda install -c bioconda metabat2
conda install -c bioconda checkm-genome


**Started with dlab**

### MetaBAT2

https://bitbucket.org/berkeleylab/metabat/src/master/README.md

In [None]:
#!/bin/bash
#SBATCH -c 24  # Number of Cores per Task
#SBATCH --mem=50G  # Requested Memory
#SBATCH -p cpu  # Partition
#SBATCH -t 24:00:00  # Job time limit
#SBATCH --mail-type=ALL
#SBATCH -o /work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/binning/dlab/slurm-metabat2binning-%j.out  # %j = job ID  # %j = job ID

module load conda/latest
conda activate binning

#set parameters for binning:
SAMPLENAME="dlab"
BINDIR="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/binning/${SAMPLENAME}/MetaBAT2_bins"
mkdir -p $BINDIR
CONTIGPATH="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/mapping/${SAMPLENAME}"
CONTIGFILE="${SAMPLENAME}.contigs-fixed.fsa"

#create depth file for MetaBat2
jgi_summarize_bam_contig_depths --outputDepth $BINDIR/MetaBAT2_depth.txt $CONTIGPATH/*.bam

#MetaBat2 script with verbose output, minimum length (m)(has to be >=1500) and no min bin size 
metabat2 -i $CONTIGPATH/$CONTIGFILE -a $BINDIR/MetaBAT2_depth.txt \
-o $BINDIR/metabat2 -m 1500

# MetaBAT2 (v2:2.17)
# default parameters:
#-m [ --minContig ] arg (=2500)    Minimum size of a contig for binning (should be >=1500).
#  --maxP arg (=95)                  Percentage of 'good' contigs considered for binning decided by connection
#                                    among contigs. The greater, the more sensitive.
#  --minS arg (=60)                  Minimum score of a edge for binning (should be between 1 and 99). The 
#                                    greater, the more specific.
#  --maxEdges arg (=200)             Maximum number of edges per node. The greater, the more sensitive.
#  --pTNF arg (=0)                   TNF probability cutoff for building TNF graph. Use it to skip the 
#                                    preparation step. (0: auto).
#  -x [ --minCV ] arg (=1)           Minimum mean coverage of a contig in each library for binning.
#  --minCVSum arg (=1)               Minimum total effective mean coverage of a contig (sum of depth over 
#                                    minCV) for binning.
#  -s [ --minClsSize ] arg (=200000) Minimum size of a bin as the output.
#  -t [ --numThreads ] arg (=0)      Number of threads to use (0: use all cores).

#this runs CheckM immediately after and puts the results alongside your bins
checkm lineage_wf -x fa -t 3 $BINDIR/ $BINDIR/bins-stats

# JOB-ID: 26840000
# bash script file name: /nikea/COL/bash_scripts/Col_metabat2_binning.sh

MetaBAT2 documentation: \
In MetaBAT 2, parameter optimization will be unnecessary, though we allowed a few parameters so that advanced users might play with them. \
You can decrease -m (--minContig) when the qualities of both assembly and formed bins with default value are very good. \
You can decrease --maxP and --maxEdges when the qualities of both assembly and formed bins are very bad. \
You can increase --maxEdges when the completeness level is low, for many datasets we typically use 500. \
You can increase --minS when the qualities of both assembly and formed bins are very bad. \
Set --noAdd when added small or leftover contigs cause too much contamination. \
Set --pTNF positive numbers (1-99) to skip the TNF graph building preparation step. Otherwise, it will be automatically decided based on --maxP. Use this to reproduce previous result. \
Set --seed positive numbers to reproduce the result exactly. Otherwise, random seed will be set each time.

In [None]:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bin Id                 Marker lineage            # genomes   # markers   # marker sets    0     1    2    3   4   5+   Completeness   Contamination   Strain heterogeneity  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  metabat2.4         k__Bacteria (UID2495)            2993        140            85         6    131   3    0   0   0       94.71            2.55               0.00          
  metabat2.1    c__Alphaproteobacteria (UID3305)      564         349           230         25   324   0    0   0   0       91.35            0.00               0.00          
  metabat2.18        k__Bacteria (UID3187)            2258        188           117         51   136   1    0   0   0       70.98            0.05               0.00          
  metabat2.25   c__Gammaproteobacteria (UID4267)      119         544           284        177   365   2    0   0   0       62.59            0.70              50.00          
  metabat2.8    c__Alphaproteobacteria (UID3305)      564         349           230        122   212   14   1   0   0       61.86            3.93              76.47          
  metabat2.6         k__Bacteria (UID2570)            433         274           183        114   156   4    0   0   0       58.91            1.28               0.00          
  metabat2.11         k__Bacteria (UID203)            5449        104            58         74    30   0    0   0   0       44.83            0.00               0.00          
  metabat2.13        k__Bacteria (UID1453)            901         171           117        106    62   2    1   0   0       21.99            0.66               0.00          
  metabat2.14          k__Archaea (UID2)              207         149           107        131    18   0    0   0   0       11.05            0.00               0.00          
  metabat2.2          k__Bacteria (UID203)            5449        104            58         91    13   0    0   0   0        8.83            0.00               0.00          
  metabat2.22          k__Archaea (UID2)              207         149           107        144    5    0    0   0   0        1.94            0.00               0.00          
  metabat2.9              root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.7              root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.5              root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.3              root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.24             root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.23             root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.21             root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.20             root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.19             root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.17             root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.16             root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.15             root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.12             root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
  metabat2.10             root (UID1)                 5656         56            24         56    0    0    0   0   0        0.00            0.00               0.00          
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

In [None]:
#note for later:
#This might be useful for preparing results from metabat2 for downstream
for i in metabat2/*.fa ; do 
    bin=$(echo $i | cut -d "/" -f2 | cut -d "." -f1-2 | sed 's/\./_/g')
    grep ">" $i | cut -d ">" -f2 | sed "s/$/\t$bin/g" \
    >> metabat2/contig_bins.tsv
done

### CONCOCT
https://concoct.readthedocs.io/en/latest/installation.html

In [None]:
#INSTALLATION (recommended it is installed in an isolated env)
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

conda create -n concoct_env python=3 concoct

In [None]:
#!/bin/bash
#SBATCH -c 24  # Number of Cores per Task
#SBATCH --mem=50G  # Requested Memory
#SBATCH -p cpu  # Partition
#SBATCH -t 24:00:00  # Job time limit
#SBATCH --mail-type=ALL
#SBATCH -o /work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/binning/dlab/slurm-concoctbinning-%j.out  # %j = job ID  # %j = job ID

module load conda/latest
conda activate concoct_env

#set parameters
SAMPLENAME="dlab"
BINPATH="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/binning/${SAMPLENAME}/CONCOCT_bins"
mkdir -p $BINPATH
CONTIGPATH="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/mapping/${SAMPLENAME}"
CONTIGFILE="${SAMPLENAME}.contigs-fixed.fsa"
BAMPATH="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/mapping/${SAMPLENAME}"

TEMPDIR="/project/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL_files/binning/concoct_${SAMPLENAME}_temp"
mkdir -p $TEMPDIR

#creates the CONCOCT depth file
#this part cuts up the contigs into 10kb pieces for CONCOCT to use 
cut_up_fasta.py $CONTIGPATH/$CONTIGFILE -c 10000 -o 0 --merge_last -b $BINPATH/${SAMPLENAME}_contigs_cut.bed > $BINPATH/${SAMPLENAME}_contigs_cut.fa

#estimate contig coverage
concoct_coverage_table.py $BINPATH/${SAMPLENAME}_contigs_cut.bed $BAMPATH/*.bam > $BINPATH/coverage_table_${SAMPLENAME}.tsv || { echo 'Exit code 2: failed to create coverage file, exiting.' && exit; }

#run CONCOCT
concoct --composition_file $BINPATH/${SAMPLENAME}_contigs_cut.fa --coverage_file $BINPATH/coverage_table_${SAMPLENAME}.tsv -t 3 -b $TEMPDIR || { echo 'Exit code 3: CONCOCT failed to run, exiting.' && exit; }
merge_cutup_clustering.py $TEMPDIR/clustering_gt1000.csv > $TEMPDIR/${SAMPLENAME}_clustering_merged.csv || { echo 'Exit code 4: failed to merge clusters, exiting.' && exit; }
extract_fasta_bins.py $CONTIGPATH/$CONTIGFILE $TEMPDIR/${SAMPLENAME}_clustering_merged.csv --output_path $BINPATH || { echo 'Exit code 5: Bins were not extracted, exiting.' && exit; }

# Checkm is in binning env so switching environments 
conda deactivate

conda activate binning

#this runs CheckM immediately after and puts the results alongside your bins
checkm lineage_wf -x fa -t 3 $BINPATH  $BINPATH/CheckM_stats

# JOB-ID:26840140
# bash script file name: /nikea/COL/bash_scripts/Col_concoct_binning.sh