## Co-assembly with metaspades for benchmarking purposes

Sequences have been pushed through 1Col_assembly_032024 except for the assembly with megahit. Using **pstr** to benchmark

https://ablab.github.io/spades/running.html

In [None]:
#INSTALLATION
module load conda/latest
conda create -n spades_env
conda activate spades_env
conda install -c bioconda spades

In [None]:
#!/bin/bash
#SBATCH -c 24  # Number of Cores per Task
#SBATCH --mem=350G  # Requested Memory
#SBATCH -p cpu  # Partition
#SBATCH -t 48:00:00  # Job time limit
#SBATCH --mail-type=ALL
#SBATCH -o /work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/assembly/pstr/slurm-spades-assembly-%j.out  # %j = job ID

module load conda/latest
conda activate spades_env

SAMPLENAME="pstr"
WORKDIR="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/assembly/${SAMPLENAME}"

#ASSEMBLE reads into contigs with metaspades
spades.py --meta -m 350 \
-1 "$WORKDIR"/"$SAMPLENAME"_reads_R1_ALL.fastq.gz \
-2 "$WORKDIR"/"$SAMPLENAME"_reads_R2_ALL.fastq.gz \
-o $WORKDIR/metaspades_assembly

#working through error, when using the --continue you only need -o flag:
#spades.py --continue -o $WORKDIR/metaspades_assembly

#memory issues (keeps stopping at k55 step, the default memory is 250 Gb , will increase this too what I have specified in above memory allocation 
#spades.py -m 350 --restart-from k55 \
#-o $WORKDIR/metaspades_assembly


conda deactivate
echo "Metaspades assembly completed!"

# JOB-ID: 27160933
# bash script file name: nikea/COL/bash_scripts/Col_spades_assemble.sh

script failed (27042504), changed it so that I'm only working in the /work directory. 
doubled the requested memory and ran again (27119856). keeps getting stuck on the spades-core step at k55. When using the --continue flag, you can't change any parameters. Stopped at the same place because I didn't change the memory allocation in the actual spades script. Re-ran with the -restart-from flag and added -m 350 (27160933).

Comparing megahit and metaspades assembly, not sure whether I continue with contigs or scaffolds from metaspades, so including both in the comparison.

In [None]:
#!/bin/bash
#SBATCH -c 24  # Number of Cores per Task
#SBATCH --mem=50G  # Requested Memory
#SBATCH -p cpu  # Partition
#SBATCH -t 24:00:00  # Job time limit
#SBATCH --mail-type=ALL
#SBATCH -o /work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/assembly/pstr/slurm-metaquast-spades-%j.out  # %j = job ID

module load conda/latest
conda activate assembly

SAMPLENAME="pstr"
WORKDIR="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/assembly/${SAMPLENAME}"
MEGAHIT="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/assembly/${SAMPLENAME}/megahit_assembly"
METASPADES="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/assembly/${SAMPLENAME}/metaspades_assembly"

metaquast -o $WORKDIR/quast_comparison_output \
-l "megahit, metaspadescont, metaspadesscaf" -t 12 \
$MEGAHIT/"$SAMPLENAME".contigs.fa $METASPADES/contigs.fasta $METASPADES/scaffolds.fasta


# Job ID:27211407
# bash script file name: nikea/COL/bash_scripts/Col_comp_quast.sh

**Mapping on metaspades co-assembly (contigs file)**

In [None]:
#!/bin/bash
#SBATCH -c 24  # Number of Cores per Task
#SBATCH --mem=180G  # Requested Memory
#SBATCH -p cpu  # Partition
#SBATCH -t 24:00:00  # Job time limit
#SBATCH --mail-type=ALL
#SBATCH -o /work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/mapping/pstr/metaspades/slurm-mapping-%j.out  # %j = job ID

module load conda/latest
conda activate anvio-8

SAMPLENAME="pstr"
READSPATH="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/assembly/${SAMPLENAME}/repaired"
CONTIGPATH="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/assembly/${SAMPLENAME}/metaspades_assembly"
CONTIGFILE="contigs.fasta"
WORKPATH="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/mapping/${SAMPLENAME}/metaspades"
mkdir -p "$WORKPATH"
XTRAFILES="/project/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL_files/mapping/${SAMPLENAME}/metaspades"
mkdir -p "$XTRAFILES"
LISTPATH="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/"
SAMPLELIST="032024_pstr_sampleids.txt" 
 
anvi-script-reformat-fasta $CONTIGPATH/$CONTIGFILE -o $WORKPATH/"${SAMPLENAME}.metascontigs-fixed.fsa" -l 1000 --simplify-names --report-file $WORKPATH/contig-rename-report-txt

#fixes deflines (filters contigs and reformats so naming is cleaner)
#filtering seq length 1000bp...need to play around with filtering based on bp length
#deflines = sequence definition line. comes directly before its associated sequence in a fasta file


FIXEDCON="${SAMPLENAME}.metascontigs-fixed.fsa"

cd $WORKPATH
#this builds an index of your contigs, which only needs to happen once
bowtie2-build $FIXEDCON "$SAMPLENAME"_contigs
# will not accept path before contigs file - must be in the correct dir 

while IFS= read -r SAMPLEID; do
    #align reads to your contigs and collects that in a .sam file
    bowtie2 --threads 11 -x "$SAMPLENAME"_contigs -1 $READSPATH/"${SAMPLEID}"_host_removed_R1.tagged_filter_ready.fastq.gz -2 $READSPATH/"${SAMPLEID}"_host_removed_R2.tagged_filter_ready.fastq.gz -S $XTRAFILES/"${SAMPLEID}".sam
    #make sure to point it to the index not the FIXEDCON file (-x parameter)
    
    #converts your sam file to a bam file, but its neither sorted nor indexed, so we use an Anvi'O script to do so:
    samtools view -F 4 -b -S $XTRAFILES/"${SAMPLEID}".sam -o $WORKPATH/"${SAMPLEID}"-RAW.bam
   
    #index and sort your bam file
    anvi-init-bam $WORKPATH/"${SAMPLEID}"-RAW.bam -o $WORKPATH/"${SAMPLEID}".bam
    
    rm $WORKPATH/"${SAMPLEID}"-RAW.bam
done < "$LISTPATH/${SAMPLELIST}"
echo "Mapping success!"

#JOB ID: 27211616
#bash script: nikea/COL/bash_scripts/Col_metaspades_mapping.sh

much lower mapping compared to megahit assembly
pstr: 36-40.6% alignment rate (compared to 70-80% mapped on megahit co-assembly)
going to run metabat2 just for shits and giggles but I think megahit is the way to go

**Binning**

In [None]:
#!/bin/bash
#SBATCH -c 24  # Number of Cores per Task
#SBATCH --mem=50G  # Requested Memory
#SBATCH -p cpu  # Partition
#SBATCH -t 24:00:00  # Job time limit
#SBATCH --mail-type=ALL
#SBATCH -o /work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/binning/pstr/metaspades/slurm-metabat2binning-%j.out  # %j = job ID  # %j = job ID

module load conda/latest
conda activate binning

#set parameters for binning:
SAMPLENAME="pstr"
BINDIR="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/binning/${SAMPLENAME}/metaspades/MetaBAT2_bins"
mkdir -p $BINDIR
CONTIGPATH="/work/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL/mapping/${SAMPLENAME}/metaspades"
CONTIGFILE="${SAMPLENAME}.metascontigs-fixed.fsa"

#create depth file for MetaBat2
jgi_summarize_bam_contig_depths --outputDepth $BINDIR/MetaBAT2_depth.txt $CONTIGPATH/*.bam

#MetaBat2 script with verbose output, minimum length (m)(has to be >=1500) and no min bin size 
metabat2 -i $CONTIGPATH/$CONTIGFILE -a $BINDIR/MetaBAT2_depth.txt \
-o $BINDIR/metabat2 -m 1500

# MetaBAT2 (v2:2.17)
# default parameters:
#-m [ --minContig ] arg (=2500)    Minimum size of a contig for binning (should be >=1500).
#  --maxP arg (=95)                  Percentage of 'good' contigs considered for binning decided by connection
#                                    among contigs. The greater, the more sensitive.
#  --minS arg (=60)                  Minimum score of a edge for binning (should be between 1 and 99). The 
#                                    greater, the more specific.
#  --maxEdges arg (=200)             Maximum number of edges per node. The greater, the more sensitive.
#  --pTNF arg (=0)                   TNF probability cutoff for building TNF graph. Use it to skip the 
#                                    preparation step. (0: auto).
#  -x [ --minCV ] arg (=1)           Minimum mean coverage of a contig in each library for binning.
#  --minCVSum arg (=1)               Minimum total effective mean coverage of a contig (sum of depth over 
#                                    minCV) for binning.
#  -s [ --minClsSize ] arg (=200000) Minimum size of a bin as the output.
#  -t [ --numThreads ] arg (=0)      Number of threads to use (0: use all cores).

#this runs CheckM immediately after and puts the results alongside your bins
checkm lineage_wf -x fa -t 3 $BINDIR/ $BINDIR/CheckM-bins-stats

# JOB-ID:27212034
# bash script file name: /nikea/COL/bash_scripts/Col_metaspades_metabat2_binning.sh

fewer bins with lower checkm completeness scores. Better to stick with megahit!

In [None]:
----------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bin Id           Marker lineage      # genomes   # markers   # marker sets    0    1    2   3   4   5+   Completeness   Contamination   Strain heterogeneity  
----------------------------------------------------------------------------------------------------------------------------------------------------------------
  metabat2.2    k__Bacteria (UID203)      5449        104            58         92   12   0   0   0   0        3.45            0.00               0.00          
  metabat2.5     k__Archaea (UID2)        207         149           107        147   2    0   0   0   0        1.40            0.00               0.00          
  metabat2.11    k__Archaea (UID2)        207         148           106        146   2    0   0   0   0        1.26            0.00               0.00          
  metabat2.9        root (UID1)           5656         56            24         56   0    0   0   0   0        0.00            0.00               0.00          
  metabat2.8        root (UID1)           5656         56            24         56   0    0   0   0   0        0.00            0.00               0.00          
  metabat2.7        root (UID1)           5656         56            24         56   0    0   0   0   0        0.00            0.00               0.00          
  metabat2.6        root (UID1)           5656         56            24         56   0    0   0   0   0        0.00            0.00               0.00          
  metabat2.4        root (UID1)           5656         56            24         56   0    0   0   0   0        0.00            0.00               0.00          
  metabat2.3        root (UID1)           5656         56            24         56   0    0   0   0   0        0.00            0.00               0.00          
  metabat2.10       root (UID1)           5656         56            24         56   0    0   0   0   0        0.00            0.00               0.00          
  metabat2.1        root (UID1)           5656         56            24         56   0    0   0   0   0        0.00            0.00               0.00          
----------------------------------------------------------------------------------------------------------------------------------------------------------------

deleted most files but moved /metaspades_assembly to /project/pi_sarah_gignouxwolfsohn_uml_edu/nikea/COL_files/assembly to keep a bit longer just in case