# Processing Nanopore reads

## Quality control

In [None]:
!NanoStat --summary ./sequencing_summary/*_sequencing_summary.txt --readtype 1D

## Demultiplexing using porechop

First we use Porechop to demultiplex the reads. Here is the usage information. In Jupyter notebooks, the `!` tells the notebook to use the shell.

In [None]:
!porechop -h

The following looks for FASTQ files in the `fastq_pass` directory, and outputs the demultiplexed data into the `work` directory (using `-b`), using 16 threads (you should choose something appropriate for your machine) and requiring two barcodes in order to classify a read.

In [None]:
!porechop -i ./fastq_pass -b work --format fastq -t 16 --require_two_barcodes

You should check the output to make sure the inferred kit and barcode match up. This lists the output files.

In [None]:
!ls ./work

## Make a list of barcodes

I extract the detected barcodes from the `work` directory to use later.

In [None]:
import glob
fastq_files = glob.glob("work/BC*.fastq")
barcodes = [fq.split("/")[1].split(".")[0] for fq in fastq_files]
barcodes

## Overlap using minimap2

The following uses a special curly brace syntax to loop through the barcodes. Note that these files can be very large!

In [None]:
for bc in barcodes:
    !minimap2 -x ava-ont -t 16 work/{bc}.fastq work/{bc}.fastq > work/{bc}.paf.gz

## Assemble using miniasm

In [None]:
for bc in barcodes:
    !miniasm -o 300 -m 50 -i 0.1 -f work/{bc}.fastq work/{bc}.paf.gz > work/{bc}.gfa

## Convert GFA files to FASTA

In [None]:
for bc in barcodes:
    !awk '$1 ~/S/ {print ">"$2"\n"$3}' work/{bc}.gfa > work/{bc}.gfa.fas

## Polish using medaka