### This notebook does original MinION reads alignment, ARTIC primer trimming, and variants calling. 

We start from the raw MinION reads. The commands below show the process from 
- Zika_iSNV_R9.4_albacore2.3.1.tgz 
- -> 
- BC01.trimmed.sorted.bam, BC02.trimmed.sorted.bam, BC03.trimmed.sorted.bam

In [None]:
# In my machine learning env (conda activate ml), install the following packages: 
!conda install porechop bwa samtools pysam pyvcf

In [None]:
%%bash 
wget http://nanopore.s3.climb.ac.uk/Zika_iSNV_R9.4_albacore2.3.1.tgz
tar xvfz Zika_iSNV_R9.4_albacore2.3.1.tgz
cat zika-isnv/fast5_r94/workspace/pass/*.fastq > fast5_r94.pass.fastq

In [None]:
# Dedup
!python scripts/dedup.py fast5_r94.pass.fastq > fast5_r94.dedup.pass.fastq

In [None]:
%%bash 
THREADS=6
porechop -b fast5_r94.dedup.pass_porechop -t $THREADS -i fast5_r94.dedup.pass.fastq

# notebook wall-time error, run in terminal instead 

In [1]:
!bwa index refs/ZIKV_REF.fasta

[bwa_index] Pack FASTA... 0.00 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.00 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.00 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index refs/ZIKV_REF.fasta
[main] Real time: 0.006 sec; CPU: 0.012 sec


In [6]:
!mkdir bam 

In [1]:
%%bash
# Align reads -> Trim ARTIC primers 
THREADS=6
bwa mem -x ont2d -t $THREADS refs/ZIKV_REF.fasta fast5_r94.dedup.pass_porechop/BC01.fastq | samtools view -bS - | samtools sort - -o bam/BC01.sorted.bam
bwa mem -x ont2d -t $THREADS refs/ZIKV_REF.fasta fast5_r94.dedup.pass_porechop/BC02.fastq | samtools view -bS - | samtools sort - -o bam/BC02.sorted.bam
bwa mem -x ont2d -t $THREADS refs/ZIKV_REF.fasta fast5_r94.dedup.pass_porechop/BC03.fastq | samtools view -bS - | samtools sort - -o bam/BC03.sorted.bam
samtools index bam/BC01.sorted.bam
samtools index bam/BC02.sorted.bam
samtools index bam/BC03.sorted.bam
python scripts/align_trim.py --normalise 1000 refs/ZikaAsian.scheme.bed <bam/BC01.sorted.bam 2>/dev/null | samtools view -bS - | samtools sort - -o bam/BC01.trimmed.sorted.new.bam
python scripts/align_trim.py --normalise 1000 refs/ZikaAsian.scheme.bed <bam/BC02.sorted.bam 2>/dev/null | samtools view -bS - | samtools sort - -o bam/BC02.trimmed.sorted.new.bam
python scripts/align_trim.py --normalise 1000 refs/ZikaAsian.scheme.bed <bam/BC03.sorted.bam 2>/dev/null | samtools view -bS - | samtools sort - -o bam/BC03.trimmed.sorted.new.bam
samtools index bam/BC01.trimmed.sorted.new.bam
samtools index bam/BC02.trimmed.sorted.new.bam
samtools index bam/BC03.trimmed.sorted.new.bam

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 128292 sequences (60000649 bp)...
[M::process] read 128154 sequences (60000189 bp)...
[M::mem_process_seqs] Processed 128292 reads in 175.737 CPU sec, 29.925 real sec
[M::process] read 127468 sequences (60000698 bp)...
[M::mem_process_seqs] Processed 128154 reads in 174.107 CPU sec, 29.013 real sec
[M::process] read 127206 sequences (60000111 bp)...
[M::mem_process_seqs] Processed 127468 reads in 175.736 CPU sec, 29.294 real sec
[M::process] read 92435 sequences (43378178 bp)...
[M::mem_process_seqs] Processed 127206 reads in 175.185 CPU sec, 29.200 real sec
[M::mem_process_seqs] Processed 92435 reads in 127.632 CPU sec, 21.305 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -x ont2d -t 6 refs/ZIKV_REF.fasta fast5_r94.dedup.pass_porechop/BC01.fastq
[main] Real time: 140.629 sec; CPU: 828.827 sec
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 129310 sequences (60000034 bp)...
[M::process] read 1

In [4]:
%%bash 
# Remove intermediate files 
rm Zika_iSNV_R9.4_albacore2.3.1.tgz 
rm -rf zika-isnv 
rm fast5_r94.pass.fastq 
rm fast5_r94.dedup.pass.fastq 
rm -rf fast5_r94.dedup.pass_porechop 
rm bam/BC01.sorted.ba* bam/BC02.sorted.ba* bam/BC03.sorted.ba*

With the three trimmed.sorted.bam files, we then use freqs_modified.py to generate variants tables containing: 
- VariantCov
- ForwardVariantCov
- ReverseVariantCov
- RefCov
- ForwardRefCov
- ReverseRefCov

In [2]:
%%bash 
python scripts/freqs_modified.py --snpfreqmin 0.03 bam/BC01.trimmed.sorted.new.bam refs/ZIKV_REF.fasta > BC01_modified.variants.0.03.new.txt
python scripts/freqs_modified.py --snpfreqmin 0.03 bam/BC02.trimmed.sorted.new.bam refs/ZIKV_REF.fasta > BC02_modified.variants.0.03.new.txt
python scripts/freqs_modified.py --snpfreqmin 0.03 bam/BC03.trimmed.sorted.new.bam refs/ZIKV_REF.fasta > BC03_modified.variants.0.03.new.txt


In [None]:
%%bash 
python scripts/freqs.py --snpfreqmin 0.03 bam/BC01.trimmed.sorted.new.bam refs/ZIKV_REF.fasta > BC01_confirm.variants.0.03.new.txt
python scripts/freqs.py --snpfreqmin 0.03 bam/BC02.trimmed.sorted.new.bam refs/ZIKV_REF.fasta > BC02_confirm.variants.0.03.new.txt
python scripts/freqs.py --snpfreqmin 0.03 bam/BC03.trimmed.sorted.new.bam refs/ZIKV_REF.fasta > BC03_confirm.variants.0.03.new.txt


### Compare 
variants tables BC01/02/03_modified/confirm.variants.0.03.new.txt (called from raw reads) 

to 

variants tables BC01/02/03_modified/confirm.variants.0.03.txt (called from BC01/02/03.trimmed.sorted.bam files provided by [Nicholas J. Loman](https://github.com/nickloman/zika-isnv))

(https://github.com/hanmei5191/Grubaugh2019_reanalysis_MinION/tree/master/start_from_trimmed.sorted.bam). Only 1–2 positions were seen as inconsistent between varia

### [Conclusion] The variants tables BC01/02/03.variants.0.03.txt uploaded by [Nicholas J. Loman](https://github.com/nickloman/zika-isnv) were not generated from BC01/02/03.trimmed.sorted.bam in the same repo. 