# Processing Nanopore reads

## Quality control

In [1]:
!NanoStat --summary ./sequencing_summary/*_sequencing_summary.txt --readtype 1D

General summary:        
Active channels:                  511.0
Mean read length:                 640.2
Mean read quality:                  9.3
Median read length:               522.0
Median read quality:                9.6
Number of reads:              712,189.0
Read length N50:                  547.0
Total bases:              455,930,734.0
Number, percentage and megabases of reads above quality cutoffs
>Q5:	685675 (96.3%) 447.4Mb
>Q7:	643445 (90.3%) 423.4Mb
>Q10:	268512 (37.7%) 178.9Mb
>Q12:	6642 (0.9%) 3.7Mb
>Q15:	0 (0.0%) 0.0Mb
Top 5 highest mean basecall quality scores and their read lengths
1:	14.3 (482; 2cb5928a-b0d9-4bfc-8d9b-5cfafc7b16e3)
2:	14.3 (227; d26d8f41-5b0c-4395-8d1c-9a744e9b8ccd)
3:	14.1 (518; afa43673-b926-438f-8297-e98f3048d2a0)
4:	14.0 (508; 58abfb0d-59a4-499a-9086-ce1ef1f00037)
5:	14.0 (496; b619f9fe-d965-44a5-9b52-68763dfe929c)
Top 5 longest reads and their mean basecall quality score
1:	9437 (3.1; 62fe07ab-b1e8-4aac-ba03-13551a2d2d85)
2:

## Demultiplexing using porechop

First we use Porechop to demultiplex the reads. Here is the usage information. In Jupyter notebooks, the `!` tells the notebook to use the shell.

In [None]:
!porechop -h

The following looks for FASTQ files in the `fastq_pass` directory, and outputs the demultiplexed data into the `work` directory (using `-b`), using 16 threads (you should choose something appropriate for your machine) and requiring two barcodes in order to classify a read.

In [None]:
!porechop -i ./fastq_pass -b work --format fastq -t 16 --require_two_barcodes

You should check the output to make sure the inferred kit and barcode match up. This lists the output files.

In [2]:
!ls ./work

BC01.fastq  BC04.fastq	BC07.fastq  BC10.fastq	none.fastq
BC02.fastq  BC05.fastq	BC08.fastq  BC11.fastq
BC03.fastq  BC06.fastq	BC09.fastq  BC12.fastq


## Make a list of barcodes

I extract the detected barcodes from the `work` directory to use later.

In [1]:
import glob
all_files = glob.glob("work/*.fastq")
fastq_files = glob.glob("work/BC*.fastq")
barcodes = [fq.split("/")[1].split(".")[0] for fq in fastq_files]
barcodes

['BC10',
 'BC08',
 'BC04',
 'BC07',
 'BC03',
 'BC09',
 'BC06',
 'BC01',
 'BC12',
 'BC11',
 'BC05',
 'BC02']

## Summarise each output file

In [4]:
for f in all_files:
    !echo {f};NanoStat -t 16 --fastq {f}; echo "\n"

work/BC10.fastq
General summary:        
Mean read length:                465.5
Mean read quality:                10.1
Median read length:              399.0
Median read quality:              10.1
Number of reads:              43,176.0
Read length N50:                 413.0
Total bases:              20,097,790.0
Number, percentage and megabases of reads above quality cutoffs
>Q5:	43176 (100.0%) 20.1Mb
>Q7:	43083 (99.8%) 20.1Mb
>Q10:	23133 (53.6%) 10.8Mb
>Q12:	2108 (4.9%) 0.9Mb
>Q15:	4 (0.0%) 0.0Mb
Top 5 highest mean basecall quality scores and their read lengths
1:	16.7 (130)
2:	15.6 (129)
3:	15.1 (351)
4:	15.1 (352)
5:	15.0 (124)
Top 5 longest reads and their mean basecall quality score
1:	2376 (10.8)
2:	1858 (10.4)
3:	1811 (7.2)
4:	1784 (10.1)
5:	1768 (8.9)
work/BC08.fastq
General summary:        
Mean read length:                454.1
Mean read quality:                10.1
Median read length:              400.0
Median read quality:              10.1
Number of reads:              31,

work/BC05.fastq
General summary:        
Mean read length:                517.0
Mean read quality:                10.0
Median read length:              401.0
Median read quality:              10.1
Number of reads:              43,403.0
Read length N50:                 417.0
Total bases:              22,438,276.0
Number, percentage and megabases of reads above quality cutoffs
>Q5:	43403 (100.0%) 22.4Mb
>Q7:	43288 (99.7%) 22.4Mb
>Q10:	23440 (54.0%) 12.2Mb
>Q12:	1699 (3.9%) 0.7Mb
>Q15:	0 (0.0%) 0.0Mb
Top 5 highest mean basecall quality scores and their read lengths
1:	14.8 (362)
2:	14.6 (401)
3:	14.2 (362)
4:	14.2 (378)
5:	14.2 (382)
Top 5 longest reads and their mean basecall quality score
1:	3769 (7.2)
2:	2837 (7.3)
3:	2711 (9.1)
4:	2584 (10.0)
5:	2579 (10.6)
work/BC02.fastq
General summary:        
Mean read length:                528.4
Mean read quality:                10.0
Median read length:              403.0
Median read quality:              10.1
Number of reads:              67,2

## Identify references using Mash

The following uses a special curly brace syntax to loop through the barcodes.

In [3]:
for bc in barcodes:
    !echo {bc};mash screen -p 16 -w refs/denv.msh work/{bc}.fastq > work/{bc}.screen

Loading refs/denv.msh...
   75866 distinct hashes.
Streaming from work/BC10.fastq...
   Estimated distinct k-mers in pool: 6851362
Summing shared...
Reallocating to winners...
Computing coverage medians...
Writing output...
Loading refs/denv.msh...
   75866 distinct hashes.
Streaming from work/BC08.fastq...
   Estimated distinct k-mers in pool: 4951116
Summing shared...
Reallocating to winners...
Computing coverage medians...
Writing output...
Loading refs/denv.msh...
   75866 distinct hashes.
Streaming from work/BC04.fastq...
   Estimated distinct k-mers in pool: 13764173
Summing shared...
Reallocating to winners...
Computing coverage medians...
Writing output...
Loading refs/denv.msh...
   75866 distinct hashes.
Streaming from work/BC07.fastq...
   Estimated distinct k-mers in pool: 7904535
Summing shared...
Reallocating to winners...
Computing coverage medians...
Writing output...
Loading refs/denv.msh...
   75866 distinct hashes.
Streaming from work/BC03.fastq...
   Estimated disti

The following code sorts the Mash results, finds the top reference, and extracts it from the FASTA file.

In [10]:
for bc in barcodes:
    !samtools faidx refs/denv.fas `sort -gr work/{bc}.screen | cut -f 5 | head -1` > work/{bc}_ref.fa

## Map using graphmap

In [11]:
for bc in barcodes:
    !graphmap align -t 16 -r work/{bc}_ref.fa -d work/{bc}.fastq -o work/{bc}.graphmap.sam

[13:16:30 BuildIndexes] Loading reference sequences.
[13:16:30 SetupIndex_] Building the index for shape: '11110111101111'.
[13:16:30 Create] Allocated memory for a list of 5344 seeds (128 bits each) (0.00001 sec, diff: 0.00004 sec).
[13:16:30 Create] Memory consumption: [currentRSS = 3 MB, peakRSS = 3 MB]
[13:16:30 Create] Collecting seeds.
[13:16:30 Create] Minimizer seeds will be used. Minimizer window is 5.
[13:16:30 Create] [currentRSS = 3 MB, peakRSS = 3 MB] Sequence: 2/2, len: 10686, name: 'KF954945'
[13:16:30 Create] Final memory allocation after collecting seeds: [currentRSS = 4 MB, peakRSS = 4 MB]
[13:16:30 Create] Sorting the seeds using 16 threads.
[13:16:30 Create] Generating the hash table.
[13:16:30 Create] Calculating the distribution statistics for key counts.
[13:16:30 Create] Index statistics: average key count = 1.001240, max key count = 2.000000, std dev = 0.035191, percentil (99.00%) (count cutoff) = 2.000000
[13:16:30 Create] Memory consumption: [currentRSS = 4 M

[13:18:27 ProcessReads] Batch of 45504 reads (52 MiB) loaded in 0.41 sec. (20920056 bases)
[13:18:27 ProcessReads] Memory consumption: [currentRSS = 72 MB, peakRSS = 72 MB]
[13:18:27 ProcessReads] Using 16 threads.
[13:19:02 ProcessReads] [CPU time: 564.51 sec, RSS: 84 MB] Read: 45504/45504 (100.00%) [m: 45483, u: 21]                                                           

[13:19:03 ProcessReads] Memory consumption: [currentRSS = 84 MB, peakRSS = 85 MB]

[13:19:03 ProcessReads] All reads processed in 564.74 sec (or 9.41 CPU min).
[13:19:03 BuildIndexes] Loading reference sequences.
[13:19:03 SetupIndex_] Building the index for shape: '11110111101111'.
[13:19:03 Create] Allocated memory for a list of 5347 seeds (128 bits each) (0.00001 sec, diff: 0.00004 sec).
[13:19:03 Create] Memory consumption: [currentRSS = 3 MB, peakRSS = 3 MB]
[13:19:03 Create] Collecting seeds.
[13:19:03 Create] Minimizer seeds will be used. Minimizer window is 5.
[13:19:03 Create] [currentRSS = 3 MB, peakRSS

[13:20:11 ProcessReads] Batch of 14497 reads (16 MiB) loaded in 0.28 sec. (12359416 bases)
[13:20:11 ProcessReads] Memory consumption: [currentRSS = 25 MB, peakRSS = 25 MB]
[13:20:11 ProcessReads] Using 16 threads.
[13:20:21 ProcessReads] [CPU time: 170.10 sec, RSS: 37 MB] Read: 14497/14497 (100.00%) [m: 14479, u: 18]                                                           

[13:20:21 ProcessReads] Memory consumption: [currentRSS = 37 MB, peakRSS = 38 MB]

[13:20:21 ProcessReads] All reads processed in 170.28 sec (or 2.84 CPU min).
[13:20:22 BuildIndexes] Loading reference sequences.
[13:20:22 SetupIndex_] Building the index for shape: '11110111101111'.
[13:20:22 Create] Allocated memory for a list of 5344 seeds (128 bits each) (0.00001 sec, diff: 0.00004 sec).
[13:20:22 Create] Memory consumption: [currentRSS = 3 MB, peakRSS = 3 MB]
[13:20:22 Create] Collecting seeds.
[13:20:22 Create] Minimizer seeds will be used. Minimizer window is 5.
[13:20:22 Create] [currentRSS = 3 MB, peakRSS

[13:22:59 ProcessReads] Batch of 67233 reads (76 MiB) loaded in 0.52 sec. (10516216 bases)
[13:22:59 ProcessReads] Memory consumption: [currentRSS = 103 MB, peakRSS = 103 MB]
[13:22:59 ProcessReads] Using 16 threads.
[13:23:49 ProcessReads] [CPU time: 796.15 sec, RSS: 122 MB] Read: 67233/67233 (100.00%) [m: 67071, u: 162]                                                         

[13:23:49 ProcessReads] Memory consumption: [currentRSS = 122 MB, peakRSS = 123 MB]

[13:23:49 ProcessReads] All reads processed in 796.36 sec (or 13.27 CPU min).


## Convert, sort and index the BAM files

In [13]:
for bc in barcodes:
    !samtools view -bS work/{bc}.graphmap.sam | samtools sort - -o work/{bc}.graphmap.bam

In [None]:
for bc in barcodes:
    !samtools index work/{bc}.graphmap.bam

## Extract consensus from each BAM

In [3]:
for bc in barcodes:
    !echo {bc}
    !kindel consensus work/{bc}.graphmap.bam > work/{bc}_consensus.fa

BC10
loading sequences: 43119it [00:09, 4388.51it/s]
building consensus: 100%|█████████████| 10686/10686 [00:00<00:00, 110877.20it/s]
options:
- bam_path: work/BC10.graphmap.bam
- realign: False
- min_depth: 2
- min_overlap: 7
- clip_decay_threshold: 0.1
- trim_ends: False
- uppercase: False
- min, max observed depth[50:-50]: 14, 4786
observations:
- ambiguous sites: 
- insertion sites: 8463, 10648
- deletion sites: 48, 994, 1642, 2494, 4008, 4613, 4708, 5446, 5733, 7053, 7574, 7825, 8058, 8460, 8739, 8829, 9616, 9638, 10563, 10674, 10675, 10679, 10684
- clip-dominant regions: 

BC08
loading sequences: 31333it [00:07, 4473.98it/s]
building consensus: 100%|█████████████| 10674/10674 [00:00<00:00, 106760.46it/s]
options:
- bam_path: work/BC08.graphmap.bam
- realign: False
- min_depth: 2
- min_overlap: 7
- clip_decay_threshold: 0.1
- trim_ends: False
- uppercase: False
- min, max observed depth[50:-50]: 0, 3850
observations:
- ambiguous sites: 2089, 2090, 2091, 2092, 2093, 2094, 2095, 209

BC01
loading sequences: 14479it [00:03, 4041.14it/s]
building consensus: 100%|█████████████| 10723/10723 [00:00<00:00, 103237.74it/s]
options:
- bam_path: work/BC01.graphmap.bam
- realign: False
- min_depth: 2
- min_overlap: 7
- clip_decay_threshold: 0.1
- trim_ends: False
- uppercase: False
- min, max observed depth[50:-50]: 0, 2324
observations:
- ambiguous sites: 9957, 10652, 10653, 10654, 10655, 10656, 10657, 10658, 10659, 10660, 10661, 10662, 10663, 10664, 10665, 10666, 10667, 10668, 10669, 10670, 10671, 10672, 10673, 10674, 10675, 10676, 10677, 10678, 10679, 10680, 10681, 10682, 10683, 10684, 10685, 10686, 10687, 10688, 10689, 10690, 10691, 10692, 10693, 10694, 10695, 10696, 10697, 10698, 10699, 10700, 10701, 10702, 10703, 10704, 10705, 10706, 10707, 10708, 10709, 10710, 10711, 10712, 10713, 10714, 10715, 10716, 10717, 10718, 10719, 10720, 10721, 10722
- insertion sites: 6359, 6379, 6448, 6459, 6489, 6768, 9682, 9809, 10652
- deletion sites: 1, 109, 1299, 1371, 1656, 1661, 1963, 

In [10]:
from Bio import SeqIO
records = []
for bc in barcodes:
    record=SeqIO.read("work/"+bc+"_consensus.fa",format="fasta")
    record.id=bc
    record.name=bc
    record.description=bc
    records.append(record)
SeqIO.write(records,"work/consensus.fa",format="fasta")

12

## Concatenate sequences with references

In [13]:
!cat refs/denv.fas work/consensus.fa > work/consensus_withrefs.fa

## Align sequences

In [14]:
!fftnsi --thread 16 work/consensus_withrefs.fa > work/consensus_withrefs.fa.fftnsi

nthread = 16
nthreadpair = 16
nthreadtb = 16
ppenalty_ex = 0
stacksize: 8192 kb
generating a scoring matrix for nucleotide (dist=200) ... done
Gap Penalty = -1.53, +0.00, +0.00



Making a distance matrix ..

There are 6263 ambiguous characters.
  601 / 632 (thread   14)
done.

Constructing a UPGMA tree (efffree=0) ... 
  630 / 632
done.

Progressive alignment 1/2... 
STEP   459 / 631 (thread   11)
Reallocating..done. *alloclen = 23557
STEP   601 / 631 (thread    0)
Reallocating..done. *alloclen = 24895

done.

Making a distance matrix from msa.. 
  600 / 632 (thread   13)
done.

Constructing a UPGMA tree (efffree=1) ... 
  630 / 632
done.

Progressive alignment 2/2... 
STEP   487 / 631 (thread    4)
Reallocating..done. *alloclen = 23413
STEP   601 / 631 (thread    2)
Reallocating..done. *alloclen = 24653

done.

disttbfast (nuc) Version 7.407
alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
16 thread(s)

generating a scoring matrix for nucleotide (dist=200) ... d

001-0031-0 (thread    6) identical     001-0032-1 (thread    2) identical     001-0033-1 (thread    1) identical     001-0034-0 (thread    5) identical     001-0035-0 (thread    7) identical     001-0036-1 (thread    6) identical     001-0037-1 (thread    1) identical     001-0038-1 (thread    7) identical     001-0039-0 (thread    2) identical     001-0040-0 (thread    5) identical     001-0041-1 (thread    1) identical     001-0042-0 (thread    6) identical     001-0042-0 (thread    8) identical     001-0044-0 (thread    7) identical     001-0045-1 (thread    2) identical     001-0046-0 (thread    5) identical     001-0047-1 (thread    1) identical     001-0048-0 (thread    7) identical     001-0049-0 (thread    6) identical     001-0050-1 (thread    4) identical     001-0051-1 (thread    2) identical     001-0052-1 (thread    1) identical     001-0053-0 (thread    7) identical     001-0054-1 (thread    8) identical     001-0054-0 (thread

001-0454-0 (thread    7) identical     001-0455-0 (thread    5) identical     001-0455-1 (thread    6) identical     001-0457-0 (thread    1) identical     001-0458-0 (thread    2) identical     001-0459-1 (thread    3) identical     001-0460-0 (thread    4) identical     001-0461-1 (thread    8) identical     001-0462-0 (thread    6) identical     001-0462-1 (thread    5) identical     001-0464-1 (thread    1) identical     001-0465-0 (thread    7) identical     001-0466-0 (thread    2) identical     001-0467-1 (thread    3) identical     001-0468-1 (thread    5) identical     001-0469-0 (thread    6) identical     001-0470-0 (thread    4) identical     001-0471-0 (thread    1) identical     001-0472-1 (thread    7) identical     001-0473-1 (thread    8) identical     001-0474-0 (thread    2) identical     001-0475-1 (thread    3) identical     001-0476-0 (thread    5) identical     001-0476-1 (thread    6) identical     001-0478-1 (thre

001-0928-1 (thread    6) identical     001-0929-0 (thread    2) identical     001-0930-0 (thread    1) identical     001-0931-1 (thread    4) identical     001-0932-0 (thread    7) identical     001-0933-1 (thread    5) identical     001-0934-1 (thread    3) identical     001-0935-0 (thread    8) identical     001-0936-0 (thread    1) identical     001-0937-0 (thread    6) identical     001-0938-1 (thread    5) identical     001-0938-1 (thread    4) identical     001-0940-1 (thread    2) identical     001-0941-1 (thread    8) identical     001-0941-0 (thread    7) identical     001-0943-0 (thread    1) identical     001-0944-1 (thread    6) identical     001-0945-0 (thread    5) identical     001-0946-0 (thread    2) identical     001-0947-1 (thread    4) identical     001-0947-0 (thread    3) identical     001-0949-0 (thread    7) identical     001-0950-1 (thread    1) identical     001-0951-0 (thread    6) identical     001-0952-1 (thre

001-1260-0 (thread    8) identical     
Converged.
Segment  16/318 1132-1220
001-1260-1 (thread    1) identical     001-0108-0 (thread    8) identical     001-0343-0 (thread    2) identical     001-0585-1 (thread    7) identical     001-0819-1 (thread    5) identical     001-1024-1 (thread    4) identical     001-1194-1 (thread    3) identical     
Converged.
Segment  17/318 1220-1317
001-1259-0 (thread    2) identical     001-0002-1 (thread    8) identical     001-0043-0 (thread    1) identical     001-0196-0 (thread    6) identical     001-0393-0 (thread    1) identical     001-0608-1 (thread    1) identical     001-0824-1 (thread    2) identical     001-1008-1 (thread    4) identical     001-1170-0 (thread    2) identical     001-1260-1 (thread    5) identical     
Converged.
Segment  18/318 1317-1321
001-1260-0 (thread    8) identical     001-0196-0 (thread    1) identical     001-1243-1 (thread    7) identical     
Converged.
Segment  19/318 1321-1347
001-1260-0 (thread    1) iden

001-0528-1 (thread    6) identical     001-0153-1 (thread    6) identical     001-0529-0 (thread    7) identical     001-0530-1 (thread    5) identical     001-0531-1 (thread    8) identical     001-0532-0 (thread    3) identical     001-0533-1 (thread    2) identical     001-0534-0 (thread    4) identical     001-0535-1 (thread    1) identical     001-0536-0 (thread    5) identical     001-0537-1 (thread    7) identical     001-0538-0 (thread    3) identical     001-0538-0 (thread    6) identical     001-0540-0 (thread    4) identical     001-0541-1 (thread    2) identical     001-0542-1 (thread    1) identical     001-0543-1 (thread    8) identical     001-0544-0 (thread    5) identical     001-0545-1 (thread    7) identical     001-0546-0 (thread    3) identical     001-0547-1 (thread    6) identical     001-0548-0 (thread    4) identical     001-0549-0 (thread    1) identical     001-0550-1 (thread    8) identical     001-0551-1 (thread  

001-0892-0 (thread    1) identical     001-0893-1 (thread    4) identical     001-0894-1 (thread    6) identical     001-0895-0 (thread    3) identical     001-0896-0 (thread    5) identical     001-0897-0 (thread    7) identical     001-0898-0 (thread    8) identical     001-0899-1 (thread    2) identical     001-0900-1 (thread    3) identical     001-0901-1 (thread    4) identical     001-0902-1 (thread    7) identical     001-0903-0 (thread    6) identical     001-0904-1 (thread    2) identical     001-0905-0 (thread    8) identical     001-0906-1 (thread    4) identical     001-0907-0 (thread    1) identical     001-0908-1 (thread    6) identical     001-0909-0 (thread    3) identical     001-0910-1 (thread    8) identical     001-0911-0 (thread    2) identical     001-0912-0 (thread    7) identical     001-0913-0 (thread    5) identical     001-0914-1 (thread    1) identical     001-0915-1 (thread    3) identical     001-0916-0 (thre

001-1260-1 (thread    1) identical     
Converged.
Segment  58/318 2154-2260
001-1260-1 (thread    6) identical     001-0035-1 (thread    5) identical     001-0244-0 (thread    3) identical     001-0462-0 (thread    2) identical     001-0670-0 (thread    4) identical     001-0862-0 (thread    4) identical     001-1038-0 (thread    8) identical     001-1167-0 (thread    3) identical     001-1255-1 (thread    3) identical     
Converged.
Segment  59/318 2260-2298
001-1260-0 (thread    8) identical     001-0074-1 (thread    6) identical     001-0515-0 (thread    6) identical     001-0948-1 (thread    2) identical     001-1248-1 (thread    8) identical     
Converged.
Segment  60/318 2298-2314
001-1260-0 (thread    6) identical     001-0049-1 (thread    5) identical     001-0718-0 (thread    4) identical     
Converged.
Segment  61/318 2314-2334
001-1260-0 (thread    1) identical     001-0007-1 (thread    8) identical     001-0096-0 (thread    8) identical     001-0261-1 (thread    7) iden

001-0566-0 (thread    5) identical     001-0110-1 (thread    3) identical     001-0339-1 (thread    1) identical     001-0568-1 (thread    1) identical     001-0569-0 (thread    3) identical     001-0570-1 (thread    2) identical     001-0571-0 (thread    6) identical     001-0572-1 (thread    4) identical     001-0573-1 (thread    5) identical     001-0574-0 (thread    7) identical     001-0575-0 (thread    8) identical     001-0576-1 (thread    3) identical     001-0577-0 (thread    2) identical     001-0578-0 (thread    4) identical     001-0579-1 (thread    5) identical     001-0580-0 (thread    7) identical     001-0581-1 (thread    6) identical     001-0582-0 (thread    1) identical     001-0583-1 (thread    8) identical     001-0584-0 (thread    4) identical     001-0585-1 (thread    2) identical     001-0586-1 (thread    5) identical     001-0587-0 (thread    3) identical     001-0588-1 (thread    6) identical     001-0589-0 (thread    

001-1260-1 (thread    3) identical     001-1127-1 (thread    2) identical     001-1240-1 (thread    2) identical     
Converged.
Segment  94/318 3463-3508
001-1260-1 (thread    2) identical     001-0196-0 (thread    2) identical     001-0648-0 (thread    3) identical     001-1045-1 (thread    8) identical     
Converged.
Segment  95/318 3508-3566
001-1260-1 (thread    8) identical     001-0140-0 (thread    3) identical     001-0497-1 (thread    6) identical     001-0825-1 (thread    6) identical     001-1127-1 (thread    8) identical     
Converged.
Segment  96/318 3566-3624
001-1260-1 (thread    4) identical     001-0014-0 (thread    8) identical     001-0121-0 (thread    7) identical     001-0458-1 (thread    8) identical     001-0811-1 (thread    4) identical     001-1108-0 (thread    6) identical     
Converged.
Segment  97/318 3624-3687
001-1260-1 (thread    3) identical     001-0047-0 (thread    2) identical     001-0354-1 (thread    8) identical     001-0662-1 (thread    8) iden

001-0000-0 (thread    2) identical     001-0001-0 (thread    2) identical     001-0002-1 (thread    6) identical     001-0003-1 (thread    3) identical     001-0004-0 (thread    1) identical     001-0005-1 (thread    4) identical     001-0006-1 (thread    2) identical     001-0007-0 (thread    5) identical     001-0008-0 (thread    6) identical     001-0009-1 (thread    3) identical     001-0010-0 (thread    1) identical     001-0011-0 (thread    2) identical     001-0012-1 (thread    4) identical     001-0013-1 (thread    5) identical     001-0014-0 (thread    6) identical     001-0015-1 (thread    3) identical     001-0016-0 (thread    1) identical     001-0017-1 (thread    2) identical     001-0018-1 (thread    5) identical     001-0019-0 (thread    4) identical     001-0020-1 (thread    3) identical     001-0021-0 (thread    1) identical     001-0022-0 (thread    6) identical     001-0023-1 (thread    2) identical     001-0024-0 (thre

001-0217-1 (thread    6) identical     001-0218-0 (thread    1) identical     001-0219-1 (thread    4) identical     001-0220-0 (thread    2) identical     001-0220-0 (thread    5) identical     001-0222-1 (thread    3) identical     001-0223-1 (thread    6) identical     001-0224-0 (thread    1) identical     001-0225-0 (thread    2) identical     001-0226-1 (thread    5) identical     001-0227-0 (thread    3) identical     001-0228-1 (thread    6) identical     001-0229-1 (thread    4) identical     001-0230-0 (thread    1) identical     001-0231-1 (thread    2) identical     001-0232-0 (thread    5) identical     001-0232-1 (thread    3) identical     001-0234-0 (thread    6) identical     001-0235-0 (thread    1) identical     001-0236-1 (thread    2) identical     001-0237-1 (thread    4) identical     001-0238-0 (thread    3) identical     001-0239-1 (thread    5) identical     001-0240-0 (thread    6) identical     001-0241-1 (thre

 001-0900-1 (thread    2) identical     001-0901-0 (thread    8) identical     001-0902-1 (thread    7) identical     001-0903-0 (thread    4) identical     001-0904-0 (thread    3) identical     001-0905-0 (thread    6) identical     001-0906-1 (thread    1) identical     001-0907-0 (thread    5) identical     001-0908-0 (thread    2) identical     001-0909-0 (thread    7) identical     001-0910-1 (thread    8) identical     001-0911-1 (thread    6) identical     001-0912-0 (thread    3) identical     001-0913-1 (thread    4) identical     001-0914-1 (thread    5) identical     001-0915-0 (thread    1) identical     001-0916-0 (thread    2) identical     001-0917-1 (thread    7) identical     001-0918-1 (thread    6) identical     001-0919-0 (thread    5) identical     001-0919-0 (thread    3) identical     001-0921-1 (thread    1) identical     001-0922-1 (thread    4) identical     001-0922-0 (thread    8) identical     001-0924-1 (t

001-1260-1 (thread    7) identical     001-0681-1 (thread    5) identical     
Converged.
Segment 136/318 4742-4745
001-1260-1 (thread    1) identical       5) identical     
Converged.
Segment 137/318 4745-4751
001-1260-0 (thread    3) identical     001-0638-1 (thread    2) identical     
Converged.
Segment 138/318 4751-4763
001-1260-0 (thread    8) identical     244-0 (thread    6) identical     001-1223-0 (thread    6) identical     
Converged.
Segment 139/318 4763-4767
001-1260-0 (thread    3) identical     001-0011-0 (thread    5) identical     (thread    4) identical     
Converged.
Segment 140/318 4767-4776
001-1260-0 (thread    7) identical     001-0046-1 (thread    1) identical     al     
Converged.
Segment 141/318 4776-4800
001-1260-0 (thread    1) identical     001-0258-0 (thread    4) identical     001-0868-1 (thread    2) identical     
Converged.
Segment 142/318 4800-4803
001-1260-0 (thread    5) identical     001-0037-0 (thread    1) identical      
Converged.
Segment 1

001-1260-1 (thread    7) identical     identical     
Converged.
Segment 182/318 5819-5822
001-1260-1 (thread    6) identical      (thread    4) identical     001-1051-0 (thread    3) identical     
Converged.
Segment 183/318 5822-5825
001-1260-1 (thread    7) identical     001-0039-0 (thread    1) identical     001-0247-1 (thread    4) identical     001-0954-1 (thread    7) identical     
Converged.
Segment 184/318 5825-5828
001-1260-1 (thread    1) identical       l     
Converged.
Segment 185/318 5828-5849
001-1260-0 (thread    2) identical     001-0007-0 (thread    1) identical     001-0161-0 (thread    1) identical     001-0494-0 (thread    6) identical      identical     
Converged.
Segment 186/318 5849-5876
002-1260-0 (thread    3) identical     001-0238-1 (thread    6) identical     001-0810-1 (thread    5) identical     001-1259-1 (thread    1) identical     al     002-1151-0 (thread    5) identical     
Converged.

Reached 2
Segment 187/318 5876-5933
001-1260-0 (thread    7) 

001-1260-0 (thread    7) identical     001-0481-1 (thread    4) identical     l     
Converged.
Segment 221/318 7129-7247
001-1260-1 (thread    6) identical     001-0009-0 (thread    4) identical     001-0213-0 (thread    3) identical     001-0423-1 (thread    7) identical     001-0633-1 (thread    6) identical     001-0815-1 (thread    4) identical     001-0978-1 (thread    4) identical     001-1124-1 (thread    4) identical     001-1230-1 (thread    7) identical     
Converged.
Segment 222/318 7247-7304
001-1260-1 (thread    7) identical     001-0031-0 (thread    6) identical     001-0352-1 (thread    6) identical     001-0717-1 (thread    5) identical     001-1065-1 (thread    3) identical     
Converged.
Segment 223/318 7304-7306
001-1260-1 (thread    7) identical     al     ad    1) identical         5) identical     
Converged.
Segment 224/318 7306-7358
001-1260-1 (thread    6) identical     001-0010-1 (thread    7) identical     001-0174-1 (thread    8) identical     001-0607-0 

001-0008-1 (thread    8) identical     001-0009-0 (thread    8) identical     001-0010-1 (thread    8) identical     001-0011-0 (thread    8) identical     001-0012-1 (thread    8) identical     001-0013-0 (thread    8) identical     001-0014-1 (thread    8) identical     001-0015-0 (thread    8) identical     001-0016-1 (thread    8) identical     001-0017-0 (thread    8) identical     001-0018-1 (thread    8) identical     001-0019-0 (thread    8) identical     001-0020-1 (thread    8) identical     001-0021-0 (thread    8) identical     001-0022-1 (thread    8) identical     001-0023-0 (thread    8) identical     001-0024-1 (thread    8) identical     001-0025-1 (thread    3) identical     001-0026-0 (thread    8) identical     001-0027-0 (thread    6) identical     001-0028-0 (thread    2) identical     001-0029-0 (thread    8) identical     001-0030-1 (thread    3) identical     001-0031-1 (thread    5) identical     001-0032-1 (thread

al     001-0267-1 (thread    4) identical     001-0268-0 (thread    5) identical     001-0269-1 (thread    1) identical     001-0270-1 (thread    6) identical     001-0271-0 (thread    2) identical     001-0272-0 (thread    3) identical     001-0273-1 (thread    7) identical     001-0274-1 (thread    4) identical     001-0275-0 (thread    8) identical     001-0276-0 (thread    5) identical     001-0277-0 (thread    6) identical     001-0278-0 (thread    3) identical     001-0279-1 (thread    1) identical     001-0280-1 (thread    7) identical     001-0281-0 (thread    4) identical     001-0282-1 (thread    2) identical     001-0283-1 (thread    8) identical     001-0284-0 (thread    5) identical     001-0285-1 (thread    6) identical     001-0286-0 (thread    3) identical     001-0287-1 (thread    4) identical     001-0288-0 (thread    7) identical     001-0289-1 (thread    1) identical     001-0290-1 (thread    6) identical     001-029

al     001-0668-1 (thread    5) identical     001-0669-0 (thread    2) identical     001-0670-1 (thread    8) identical     001-0671-0 (thread    6) identical     001-0672-1 (thread    1) identical     001-0673-0 (thread    4) identical     001-0674-0 (thread    7) identical     001-0675-1 (thread    3) identical     001-0676-0 (thread    5) identical     001-0677-1 (thread    2) identical     001-0678-0 (thread    8) identical     001-0679-1 (thread    6) identical     001-0680-1 (thread    4) identical     001-0681-0 (thread    1) identical     001-0682-1 (thread    3) identical     001-0683-0 (thread    7) identical     001-0684-0 (thread    5) identical     001-0685-1 (thread    2) identical     001-0686-1 (thread    6) identical     001-0687-0 (thread    4) identical     001-0688-0 (thread    8) identical     001-0689-1 (thread    1) identical     001-0690-0 (thread    3) identical     001-0691-1 (thread    7) identical     001-069

001-1046-1 (thread    3) identical     001-1047-0 (thread    6) identical     001-1048-1 (thread    1) identical     001-1049-1 (thread    2) identical     001-1050-0 (thread    5) identical     001-1051-1 (thread    8) identical     001-1052-0 (thread    4) identical     001-1053-1 (thread    6) identical     001-1054-0 (thread    1) identical     001-1055-1 (thread    2) identical     001-1056-0 (thread    5) identical     001-1056-0 (thread    3) identical     001-1056-1 (thread    8) identical     001-1059-0 (thread    7) identical     001-1060-1 (thread    6) identical     001-1061-0 (thread    1) identical     001-1062-1 (thread    2) identical     001-1063-1 (thread    5) identical     001-1064-1 (thread    7) identical     001-1065-1 (thread    1) identical     001-1066-0 (thread    8) identical     001-1067-0 (thread    6) identical     001-1068-1 (thread    5) identical     001-1069-0 (thread    7) identical     001-1070-0 (thre

Segment 261/318 8716-8760
001-1260-0 (thread    7) identical     001-0038-1 (thread    5) identical        8) identical     tical     
Converged.
Segment 262/318 8760-8786
001-1260-0 (thread    7) identical     read    6) identical     l     al        2) identical     
Converged.
Segment 263/318 8786-8792
001-1260-0 (thread    3) identical     195-0 (thread    1) identical      identical     1 (thread    1) identical     
Converged.
Segment 264/318 8792-8797
001-1260-0 (thread    3) identical     001-0064-1 (thread    4) identical        1) identical     cal     1 (thread    8) identical     
Converged.
Segment 265/318 8797-8833
001-1260-1 (thread    1) identical     al     01-0440-1 (thread    3) identical     1-0879-1 (thread    4) identical      8) identical     
Converged.
Segment 266/318 8833-8868
001-1260-1 (thread    3) identical     001-0112-1 (thread    1) identical        7) identical       6) identical     
Converged.
Segment 267/318 8868-8876
001-1260-0 (thread    4) identi

002-0468-0 (thread    4) identical     001-0118-1 (thread    2) identical     001-0514-1 (thread    3) identical        7) identical     001-1173-0 (thread    5) identical     002-0109-0 (thread    7) identical     002-0469-1 (thread    5) identical     002-0470-0 (thread    6) identical     002-0471-0 (thread    3) identical     002-0472-1 (thread    1) identical     002-0473-0 (thread    8) identical     002-0474-1 (thread    2) identical     002-0475-1 (thread    7) identical     002-0476-0 (thread    4) identical     002-0477-1 (thread    5) identical     002-0478-0 (thread    6) identical     002-0479-1 (thread    3) identical     002-0480-0 (thread    1) identical     002-0481-1 (thread    8) identical     002-0482-1 (thread    7) identical     002-0483-0 (thread    2) identical     002-0484-1 (thread    5) identical     002-0485-0 (thread    6) identical     002-0486-1 (thread    3) identical     002-0487-0 (thread    4) identical     002-04

002-0821-0 (thread    5) identical     002-0822-0 (thread    8) identical     002-0823-1 (thread    4) identical     002-0824-1 (thread    6) identical     002-0825-1 (thread    1) identical     002-0826-0 (thread    2) identical     002-0827-0 (thread    3) identical     002-0828-1 (thread    5) identical     002-0829-0 (thread    8) identical     002-0830-1 (thread    4) identical     002-0831-0 (thread    6) identical     002-0831-0 (thread    7) identical     002-0833-0 (thread    2) identical     002-0834-1 (thread    1) identical     002-0835-1 (thread    3) identical     002-0836-0 (thread    5) identical     002-0837-1 (thread    8) identical     002-0838-1 (thread    2) identical     002-0838-1 (thread    6) identical     002-0840-0 (thread    7) identical     002-0841-1 (thread    3) identical     002-0842-1 (thread    8) identical     002-0843-0 (thread    5) identical     002-0844-0 (thread    1) identical     002-0845-1 (thre

002-1260-1 (thread    7) identical     
Converged.

Reached 2
Segment 306/318 10226-10334
001-1260-1 (thread    5) identical     001-0024-0 (thread    2) identical     001-0250-0 (thread    7) identical     001-0481-1 (thread    6) identical     001-0715-0 (thread    5) identical     001-0924-1 (thread    2) identical     001-1102-0 (thread    7) identical     001-1232-0 (thread    5) identical     
Converged.
Segment 307/318 10334-10370
001-1260-0 (thread    1) identical     001-0091-1 (thread    1) identical     ntical      ad    4) identical     
Converged.
Segment 308/318 10370-10451
001-1260-1 (thread    4) identical     001-0038-0 (thread    4) identical     ntical     001-0707-1 (thread    7) identical     001-0997-0 (thread    4) identical     001-1209-1 (thread    6) identical     
Converged.
Segment 309/318 10451-10569
001-1260-1 (thread    3) identical     001-0021-1 (thread    6) identical     001-0272-1 (thread    7) identical     001-0482-1 (thread    7) identical     001

002-1224-0 (thread    1) identical     001-0005-0 (thread    8) identical     001-0015-0 (thread    3) identical     001-0032-0 (thread    8) identical     001-0048-1 (thread    5) identical     001-0065-1 (thread    8) identical     001-0078-0 (thread    3) identical     001-0090-0 (thread    5) identical     001-0107-1 (thread    1) identical     001-0121-1 (thread    5) identical     001-0132-1 (thread    6) identical     001-0144-0 (thread    4) identical     001-0152-1 (thread    8) identical     001-0163-1 (thread    5) identical     001-0170-0 (thread    4) identical     001-0186-1 (thread    5) identical     001-0200-1 (thread    7) identical     001-0214-0 (thread    8) identical     001-0228-0 (thread    6) identical     001-0242-0 (thread    5) identical     001-0257-1 (thread    7) identical     001-0270-0 (thread    8) identical     001-0284-1 (thread    5) identical     001-0312-0 (thread    3) identical     001-0325-1 (thread    5) identical     001-0338-0 (thread    1) 

002-1260-0 (thread    1) worse         002-1232-1 (thread    2) identical     002-1235-0 (thread    5) worse      002-1238-1 (thread    5) identical     002-1241-0 (thread    5) identical     002-1244-1 (thread    5) identical     002-1248-0 (thread    8) worse      002-1250-1 (thread    4) worse      002-1251-1 (thread    6) identical     002-1252-0 (thread    2) worse      002-1255-0 (thread    5) worse      002-1256-1 (thread    2) identical     002-1257-1 (thread    6) worse      002-1258-0 (thread    4) worse      
Reached 2
done
dvtditr (nuc) Version 7.407
alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
8 thread(s)


Strategy:
 FFT-NS-i (Standard)
 Iterative refinement method (max. 2 iterations)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich re

## Make a quick tree

In [None]:
!iqtree -s work/consensus_withrefs.fa.fftnsi -m GTR+G4 -nt 4 -pre work/consensus_withrefs -fast

IQ-TREE multicore version 1.6.9 for Linux 64-bit built Dec 19 2018
Developed by Bui Quang Minh, Nguyen Lam Tung, Olga Chernomor,
Heiko Schmidt, Dominik Schrempf, Michael Woodhams.

Host:    0710d30140b8 (AVX512, FMA3, 187 GB RAM)
Command: iqtree -s work/consensus_withrefs.fa.fftnsi -m GTR+G4 -nt 4 -pre work/consensus_withrefs -fast
Seed:    572812 (Using SPRNG - Scalable Parallel Random Number Generator)
Time:    Fri Mar  8 14:22:44 2019
Kernel:  AVX+FMA - 4 threads (48 CPU cores detected)

Reading alignment file work/consensus_withrefs.fa.fftnsi ... Fasta format detected
Alignment most likely contains DNA/RNA sequences
Alignment has 632 sequences with 12722 columns, 8082 distinct patterns
6616 parsimony-informative, 1590 singleton sites, 4516 constant sites
          Gap/Ambiguity  Composition  p-value
   1  A75711     15.75%    passed     86.97%
   2  AB074760   15.62%    passed     50.05%
   3  AB074761   15.62%    passed     94.48%
   4  AB178040   15.62%    passed     97.51%
   5 


Create initial parsimony tree by phylogenetic likelihood library (PLL)... 1.195 seconds

NOTE: 662 MB RAM (0 GB) is required!
Estimate model parameters (epsilon = 0.500)
1. Initial log-likelihood: -507324.088
