# [AdvancedBioinformatics1] SPAdes

This Google colab notebook shows a simple instruction of SPAdes, a single-cell sequencing (SCS) assembler developed by Center for Algorithmic Biotechnology (CAB, https://cab.spbu.ru/software/spades/), and is used for my presentation in Advanced Bioinformatics I lecture.

0. Setup
1. Install
2. Usage Test
3. Example
4. Visualization

## 00. Setup
Because the sample data file offered by CAB requires authentification to download, it was downloaded into local and moved to my own Google drive.

For benchmark, I used MDA single-cell E. coli, M; 6.3 Gb, 29M reads, 2x100bp, insert size ~ 270bp (Illumina Genome Analyzer IIx)

Here, the drive is now mounted for further steps.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 01. Installation
SPAdes requires a 64-bit Linux system or Mac OS and Python (supported versions are Python 2.7, and Python3: 3.2 and higher) to be pre-installed on it. To obtain SPAdes you can either download binaries or download source code and compile it yourself. SPAdes linux binaries will be downloaded in this colab notebook.

In [2]:
# Install SPAdes by linux binaries
%%bash
wget http://cab.spbu.ru/files/release3.15.3/SPAdes-3.15.3-Linux.tar.gz
tar -xzf SPAdes-3.15.3-Linux.tar.gz

--2021-12-13 04:12:33--  http://cab.spbu.ru/files/release3.15.3/SPAdes-3.15.3-Linux.tar.gz
Resolving cab.spbu.ru (cab.spbu.ru)... 195.70.219.98
Connecting to cab.spbu.ru (cab.spbu.ru)|195.70.219.98|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 30447126 (29M) [application/octet-stream]
Saving to: ‘SPAdes-3.15.3-Linux.tar.gz’

     0K .......... .......... .......... .......... ..........  0%  221K 2m14s
    50K .......... .......... .......... .......... ..........  0%  441K 1m41s
   100K .......... .......... .......... .......... ..........  0% 61.6M 67s
   150K .......... .......... .......... .......... ..........  0%  445K 67s
   200K .......... .......... .......... .......... ..........  0% 32.8M 54s
   250K .......... .......... .......... .......... ..........  1% 77.1M 45s
   300K .......... .......... .......... .......... ..........  1%  153M 38s
   350K .......... .......... .......... .......... ..........  1%  454K 41s
   400K .......... .......

QUAST, a genome assembly evaluation tool, will also be installed to visualize the result of SPAdes assembly here.

QUAST can be easily installed using conda.

In [11]:
%%bash
# Install QUAST
wget https://downloads.sourceforge.net/project/quast/quast-5.0.2.tar.gz
tar -xzf quast-5.0.2.tar.gz

--2021-12-13 04:22:32--  https://downloads.sourceforge.net/project/quast/quast-5.0.2.tar.gz
Resolving downloads.sourceforge.net (downloads.sourceforge.net)... 204.68.111.105
Connecting to downloads.sourceforge.net (downloads.sourceforge.net)|204.68.111.105|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://iweb.dl.sourceforge.net/project/quast/quast-5.0.2.tar.gz [following]
--2021-12-13 04:22:33--  https://iweb.dl.sourceforge.net/project/quast/quast-5.0.2.tar.gz
Resolving iweb.dl.sourceforge.net (iweb.dl.sourceforge.net)... 192.175.120.182, 2607:f748:10:12::5f:2
Connecting to iweb.dl.sourceforge.net (iweb.dl.sourceforge.net)|192.175.120.182|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 34210765 (33M) [application/x-gzip]
Saving to: ‘quast-5.0.2.tar.gz’

     0K .......... .......... .......... .......... ..........  0% 1.41M 23s
    50K .......... .......... .......... .......... ..........  0% 2.83M 17s
   100K ..........

In [1]:
# QUAST requires matplotlib
!pip install matplotlib



## 02. Usage Text

In [None]:
# Prepare for sample fastq file
# MDA single-cell E. coli, 6.3Gb, 29M reads, 2x100 bp, insert size ~ 270 bp
# - Illumina
%%bash
cd /content/drive/MyDrive
bunzip2 ecoli_mda_lane1.fastq.bz2

Colab Notebooks
ecoli_mda_lane1.fastq
ecoli_mda_lane1.fastq.bz2


In [2]:
# Check the sample fastq file
%%bash
cd /content/drive/MyDrive
head ecoli_mda_lane1.fastq

@EAS18:1:1:1:0:331:0/1
NGGAGTTGTTCAAAATCATAATCACCGGTNTCCTCTCCGCTGCTTACCCCCAGCGAGGCACNNATCCGTATCGTCNNNNNNNNTGCGACNNACNCCCCTC
+
BaR\U`bbbbbb^[X`bbbabXbbbaaIKBbbbbbK`ISaXIaab__bbbba\\TIHV_aaBBbBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@EAS18:1:1:1:0:331:0/2
NCNNNNTNGNNNNNNNNNNNNNGATTANTGACNNNTNNGGNNNNCAGCCGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@EAS18:1:1:1:0:761:0/1
NGGATTACGGGTCAACGTTAGAACATCAANCATTAAAGGGGGGTATTTCAAGGTCGGCTCCATGCAGACTGGCGTCNNNNNNNCCAAGCNNCCNACCTAT


In [3]:
# Verify SPAdes installation
%%bash
cd SPAdes-3.15.3-Linux/bin/
./spades.py --test





Command line: ./spades.py	--test	

System information:
  SPAdes version: 3.15.3
  Python version: 3.9.5
  OS: Linux-5.4.104+-x86_64-with-glibc2.27

Output dir: /content/SPAdes-3.15.3-Linux/bin/spades_test
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
  Standard mode
  For multi-cell/isolate data we recommend to use '--isolate' option; for single-cell MDA data use '--sc'; for metagenomic data use '--meta'; for RNA-Seq use '--rna'.
  Reads:
    Library number: 1, library type: paired-end
      orientation: fr
      left reads: ['/content/SPAdes-3.15.3-Linux/share/spades/test_dataset/ecoli_1K_1.fq.gz']
      right reads: ['/content/SPAdes-3.15.3-Linux/share/spades/test_dataset/ecoli_1K_2.fq.gz']
      interlaced reads: not specified
      single reads: not specified
      merged reads: not specified
Read error correction parameters:
  Iterations: 1
  PHRED offset will be auto-detected
  Corrected reads will be compressed
Assembly parameters:


In [4]:
# SPAdes manual
%%bash
cd SPAdes-3.15.3-Linux/bin/
./spades.py

SPAdes genome assembler v3.15.3

Usage: spades.py [options] -o <output_dir>

Basic options:
  -o <output_dir>             directory to store all the resulting files (required)
  --isolate                   this flag is highly recommended for high-coverage isolate and multi-cell data
  --sc                        this flag is required for MDA (single-cell) data
  --meta                      this flag is required for metagenomic data
  --bio                       this flag is required for biosyntheticSPAdes mode
  --corona                    this flag is required for coronaSPAdes mode
  --rna                       this flag is required for RNA-Seq data
  --plasmid                   runs plasmidSPAdes pipeline for plasmid detection
  --metaviral                 runs metaviralSPAdes pipeline for virus detection
  --metaplasmid               runs metaplasmidSPAdes pipeline for plasmid detection in metagenomic datasets (equivalent for --meta --plasmid)
  --rnaviral                  this flag

In [5]:
# QUAST verification
%%bash
cd /content/quast-5.0.2/
./quast.py test_data/contigs_1.fasta \
               test_data/contigs_2.fasta \
               -r test_data/reference.fasta.gz \
               -g test_data/genes.gff

/content/quast-5.0.2/./quast.py test_data/contigs_1.fasta test_data/contigs_2.fasta -r test_data/reference.fasta.gz -g test_data/genes.gff

Version: 5.0.2, 0bb1dd1b

System information:
  OS: Linux-5.4.104+-x86_64-with-glibc2.27 (linux_64)
  Python version: 3.9.5
  CPUs number: 2

Started: 2021-12-13 04:29:26

Logging to /content/quast-5.0.2/quast_results/results_2021_12_13_04_29_26/quast.log
NOTICE: Maximum number of threads is set to 1 (use --threads option to set it manually)

CWD: /content/quast-5.0.2
Main parameters: 
  MODE: default, threads: 1, minimum contig length: 500, minimum alignment length: 65, \
  ambiguity: one, threshold for extensive misassembly size: 1000

Reference:
  /content/quast-5.0.2/test_data/reference.fasta.gz ==> reference

Contigs:
  Pre-processing...
  1  test_data/contigs_1.fasta ==> contigs_1
  2  test_data/contigs_2.fasta ==> contigs_2

module 'cgi' has no attribute 'escape'
Traceback (most recent call last):
  File "/content/quast-5.0.2/./quast.py", li

ERROR! exception caught!

In case you have troubles running QUAST, you can write to quast.support@cab.spbu.ru
or report an issue on our GitHub repository https://github.com/ablab/quast/issues
Please provide us with quast.log file from the output directory.


## 03. Example

./spades.py&nbsp;&nbsp;&nbsp;&nbsp;[options]&nbsp;&nbsp;&nbsp;&nbsp;-o&nbsp;&nbsp;&nbsp;&nbsp;  \<output_dir>

Options you want can be found in the official documentation.
(https://cab.spbu.ru/files/release3.15.3/manual.html)

✚

--isolate, --sc, --meta, --plasmid, --metaplasmid


--only-error-correction, --only-assembler


-t, -m, -k, -phred-offset

Three values of k for k-mer were tested. Other conditions and input file are always the same.

1) k=21,33,55 (not specified, DEFAULT)

2) k=21,33,55,77,99,123

3) k=21,33,55,77

In [None]:
# k=21,33,55 (not specified, DEFAULT)
%%bash
cd SPAdes-3.15.3-Linux/bin/
./spades.py --sc --12 /content/drive/MyDrive/ecoli_mda_lane1.fastq -o /content/SPAdes_Result/

![picture](https://drive.google.com/uc?id=1DeQvPjDEHcaphztI5WqdPBJlEnhpX2IS)

In [None]:
# k=21,33,55,77,99,123
%%bash
cd SPAdes-3.15.3-Linux/bin/
./spades.py --sc -k 21,33,55,77,99,123 --12 /content/drive/MyDrive/ecoli_mda_lane1.fastq -o /content/SPA_Result2/

![picture](https://drive.google.com/uc?id=1mFwlabnzjNjc8ge3goZg57lczwUtYndh)

In [None]:
# k=21,33,55,77
%%bash
cd SPAdes-3.15.3-Linux/bin/
./spades.py --sc -k 21,33,55,77 --12 /content/drive/MyDrive/ecoli_mda_lane1.fastq -o /content/SPA_Result3/

![picture](https://drive.google.com/uc?id=1I7afnjUScsTG3rOmi_3yKmxqDSJ5Kkjq)

## 04. Visualization

As mentioned already, another tool named QUAST will be used for visualzation. 

In [None]:
# QUAST
# quast -o <output_dir> <input data>
# Scaffold is recommended for resarch data after sequencing and assembly
quast -o /content/QUAST_Result /content/SPA_Result/scaffolds.fasta /content/SPA_Result2/scaffolds.fasta /content/SPA_Result3/scaffolds.fasta

![picture](https://drive.google.com/uc?id=1imIdyHatcaUooHnPNCwQzSThpXiLIXOP)

![picture](https://drive.google.com/uc?id=1Uk0EoeJ-6J2qAwPFHbZTsrQGFHJCOKkr)

![picture](https://drive.google.com/uc?id=1J5bdNCewghLHtlx7eqYt6bcWB0iiQoD_)

![picture](https://drive.google.com/uc?id=1O584lhwiqAgWHR9uV85lFmdNHivfgwxR)