## Improving consensus sequence for a draft genome assembly using Nanopolish

In this notebook we will get a draft genome assembly using canu and try to improve the consensus sequence. [Nanpolish](https://github.com/jts/nanopolish) works with signal-level ONT data, the basecalled reads and the draft assembly to generate a improved assembly.

The first step is to get the draft assembly. This can be done with any assembly tool for ONT data. Here we are using [Canu](https://github.com/marbl/canu):

In [None]:
canu -p agalactiae \
     -d data/agalactiae/canu_output \
     genomeSize=4.6m \
     useGrid=false \
     minReadLength=50 \
     minOverlapLength=50 \
     -nanopore-raw data/agalactiae/canu_output/agalactiae.contigs.fasta

Before using nanopolish, we need to do some preprocessing of the reads and assemblies. We will be using [BWA aligner](https://github.com/lh3/bwa) to accomplish the task of getting the input files to the nanopolish tool.

In first place, we index the draft assembly and perform an alignment against the basecalled reads file:


In [None]:
bwa index data/agalactiae/canu_output/agalactiae.contigs.fasta

In addition to the alignment, note that we are using samtools to sort the aligned reads file and index this file:

In [None]:
bwa mem -x ont2d -t 2 data/agalactiae/canu_output/agalactiae.contigs.fasta data/agalactiae/merged-output-full.fastq | samtools sort -o reads.sorted.bam -T reads.tmp - samtools index reads.sorted.bam

After getting the input files for nanopolish, we build an index mapping from basecalled reads to the ONT event data (directory with the original FAST5 files).

In [None]:
nanopolish index -d data/EColi/R9/Data_1D/E_coli_K12_1D_R9.2_SpotON_2/downloads/pass \
                               data/EColi/R9/Data_1D/ecoli_reads.fasta


Now we can perform the improvement of our draft assembly:

In [None]:
python3 nanopolish_makerange.py data/agalactiae/canu_output/agalactiae.contigs.fasta | parallel --results nanopolish.results -P 2 \
    nanopolish/nanopolish variants --consensus polished.{1}.fa -w {1} -r data/agalactiae/merged-output-full.fastq -b reads.sorted.bam -g data/agalactiae/canu_output/agalactiae.contigs.fasta -t 4 --min-candidate-frequency 0.1