# MiSeq Assembly

Since the blast analysis isn't working well, a de novo assembly is the next best thing to get an accurate representation of the in situ strain information.

After talking with Mike Schatz he recommended using SPADES so I'll give that a shot.

In [None]:
cd /data1/share/scratch/miseq-assembly/flub

spades.py --threads 40 --mem 110 --threads 20 -1 f3.r1.fastq -2 f3.r2.fastq -o f3-spades
#spades.py --threads 40 --mem 110 --threads 20 -1 f3.r1.fastq -2 f3.r2.fastq --meta -o f3-spades

LOL so this produced 4996 contigs and the longest contig was complete nonsense. Looks like some normalization needs to take place. 

The workflow that Mike implemented (with good results) is as follows:

Isolate the PB2 segment alignments:
$ samtools view -h miseq-minion.bam H3N2-PB2 | samtools view -Sb - -o pb2.bam

Compute average coverage:
$ samtools depth pb2.bam | awk '{s+=$3;c++} END{print s/c}'
7760.49

Downsample to keep 0.7% of the pairs (about 100x coverage)
$ samtools view -hs .007 pb2.bam | samtools view -Sb - -o pb2.100x.bam

Check the coverage (99.06x):
$ samtools depth pb2.100x.bam | awk '{s+=$3;c++} END{print s/c}'
99.0674

Convert to fastq:
$ bamtools convert -in pb2.100x.bam -format fastq > pb2.100x.fastq

Assemble with spades (this could have just been k=21):
$ ~/build/packages/SPAdes-3.7.1-Linux/bin/spades.py -s pb2.100x.fastq -o spades

Align the K21 contigs to the references:
$ nucmer -maxmatch all-flu.complete.fasta spades/K21/final_contigs.fasta

Check the alignments:
$ show-coords out.delta
/home/mschatz/flu/all-flu.complete.fasta /home/mschatz/flu/pb2/spades/K21/final_contigs.fasta
NUCMER

    [S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  | [TAGS]
=====================================================================================
       1     2284  |       52     2335  |     2284     2284  |   100.00  | H3N2-PB2     NODE_2_length_2398_cov_89.1927
       1     2341  |       28     2368  |     2341     2341  |    84.11  | H1N1-PB2     NODE_2_length_2398_cov_89.1927

I then checked the other short contig using the EBI vector screening tools:
http://www.ebi.ac.uk/Tools/sss/ncbiblast/vectors.html

# FluB Spades Downsample Assembly

In [None]:
cd /home/alan/projects/MinION-notebook/miseq-assembly/flub/spades

# we should be able to use the quads references since we're just trying to partition the reads based on segment
ln -s ~/projects/MinION-notebook/clinical-analysis/miseq/flub/flub.quads-references.sort.bam
ln -s ~/projects/MinION-notebook/clinical-analysis/miseq/flub/flub.quads-references.sort.bam.bai

# separate reads by segment
for i in HA NA NP PA NS MP PB1 PB2; do samtools view -h flub.quads-references.sort.bam $i \
| samtools view -Sb - -o $i.bam; done

# calculate avg cov
for i in HA NA NP PA NS MP PB1 PB2; do echo -ne $i'\t' && samtools depth $i.bam | awk '{s+=$3;c++} END{print s/c}'; done

Segment | Avg Cov | % to keep
------- | ------- | ---------
HA | 4591.07 | 45.9107
NA | 7268.27 | 72.6827
NP | 1364.96 | 13.6496
PA | 7768.4 | 77.684
NS | 7711.55 | 77.1155
MP | 7419.86 | 74.1986
PB1 | 6508.11 | 65.0811
PB2 | 6856.19 | 68.5619

In [18]:
cd /home/alan/projects/MinION-notebook/miseq-assembly/flub/spades

samtools view -hs .007 PB2.bam | samtools view -sB - -o PB2.100x.bam
samtools view -hs .007 PB1.bam | samtools view -sB - -o PB1.100x.bam
samtools view -hs .004 HA.bam | samtools view -sB - -o HA.100x.bam
samtools view -hs .003 NA.bam | samtools view -sB - -o NA.100x.bam
samtools view -hs .08 NP.bam | samtools view -sB - -o NP.100x.bam
samtools view -hs .003 MP.bam | samtools view -sB - -o MP.100x.bam
samtools view -hs .005 PA.bam | samtools view -sB - -o PA.100x.bam
samtools view -hs .005 NS.bam | samtools view -sB - -o NS.100x.bam

for i in HA NA NP PA NS MP PB1 PB2; do echo -ne $i'\t' && samtools depth $i.100x.bam | awk '{s+=$3;c++} END{print s/c}'; done

for i in HA NA NP NS MP PB1 PB2 PA; do bamtools convert -in $i.100x.bam -format fastq > $i.100x.fastq; done 

HA	114.77
NA	94.5931
NP	110.015
PA	114.514
NS	104.313
MP	98.0843
PB1	98.4909
PB2	106.83


In [None]:
cd /home/alan/projects/MinION-notebook/miseq-assembly/flub/spades 

for i in HA NA NP NS MP PB1 PB2 PA; do \
/opt/bioinformatics-software/SPAdes-3.7.1-Linux/bin/spades.py -s $i.100x.fastq -o $i-spades; done

In [None]:
cd /home/alan/projects/MinION-notebook/miseq-assembly/flub/spades 

grep -c '>' */scaffolds.fasta

for i in HA NA NP PA NS MP PB1 PB2; do nucmer -maxmatch -p $i quads-study-reference-sequence.fasta \
$i-spades/scaffolds.fasta ; done

In [None]:
cd /home/alan/projects/MinION-notebook/miseq-assembly/flub/spades 

for i in HA NA NP PA NS MP PB1 PB2; do show-coords -cl $i.delta; done | grep -vP '/|NUCMER'

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
1 | 1758 | 1970 | 216 | 1758 | 1755 | 89.08 | 1758 | 2193 | 100.00 | 80.03 | FluB-HA NODE_1_length_2193_cov_74.5484_ID_3
1 | 1401 | 1748 | 348 | 1401 | 1401 | 94.22 | 1401 | 1931 | 100.00 | 72.55 | FluB-NA NODE_1_length_1931_cov_52.219_ID_3
1 | 1683 | 213 | 1895 | 1683 | 1683 | 98.99 | 1683 | 2271 | 100.00 | 74.11 | FluB-NP NODE_1_length_2271_cov_53.1926_ID_7
203 | 296 | 207 | 300 | 94 | 94 | 100.00 | 1683 | 501 | 5.59 | 18.76 | FluB-NP NODE_2_length_501_cov_0.925134_ID_9
1 | 2181 | 2392 | 212 | 2181 | 2181 | 96.52 | 2181 | 2696 | 100.00 | 80.90 | FluB-PA NODE_1_length_2696_cov_56.6069_ID_3
11 | 1027 | 280 | 1296 | 1017 | 1017 | 97.64 | 1027 | 1559 | 99.03 | 65.23 | FluB-NS NODE_1_length_1559_cov_46.3457_ID_3
1 | 1076 | 187 | 1262 | 1076 | 1076 | 95.26 | 1076 | 1510 | 100.00 | 71.26 | FluB-MP NODE_1_length_1510_cov_51.0195_ID_3
1 | 2259 | 2463 | 205 | 2259 | 2259 | 92.87 | 2259 | 2665 | 100.00 | 84.77 | FluB-PB1 NODE_1_length_2665_cov_52.4622_ID_35
2134 | 2259 | 127 | 2 | 126 | 126 | 95.24 | 2259 | 253 | 5.58 | 49.80 | FluB-PB1 NODE_2_length_253_cov_217.238_ID_49
1 | 2313 | 2521 | 209 | 2313 | 2313 | 93.13 | 2313 | 2690 | 100.00 | 85.99 | FluB-PB2 NODE_1_length_2690_cov_62.9645_ID_37
197 | 343 | 300 | 160 | 147 | 141 | 93.20 | 2313 | 436 | 6.36 | 32.34 | FluB-PB2 NODE_2_length_436_cov_1.11974_ID_43
2032 | 2157 | 127 | 2 | 126 | 126 | 93.65 | 2313 | 254 | 5.45 | 49.61 | FluB-PB2 NODE_3_length_254_cov_41.1969_ID_53

These are pretty damn close to the quads study references! But these are the CDS.. wish I would've thought about that before. But at least this is proof of concept that it can work well for even structural differences. I'll wrap this up in a shell script and then use the blast references to see what comes of the full segments.

## Spades Downsample Assembly with 11-9 Blast Hits

These steps were executed on my laptop so no paths are provided.

### FluB

Prior to doing anyhting else, in an effort to clean the smaller contigs that were produced via the assembly step above, the data were trimmed of their adaptors using Trimmomatic.

In [None]:
trimmomatic PE -threads 6 flub.r1.fastq.gz flub.r2.fastq.gz flub.trimmed.r1.fastq flub.trimmed.se.r1.fastq \
flub.trimmed.r2.fastq flub.trimmed.se.r2.fastq ILLUMINACLIP:/usr/local/share/trimmomatic/adapters/TruSeq3-PE-2.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25

The MinION 11-9 blast hits were used to segregate the MiSeq reads into their segments using bwa mem.

In [None]:
bwa mem -t 6 -x intractg 11-9-top-seg-hits.fasta flub.trimmed.r1.fastq flub.trimmed.r2.fastq > fluB.11-9-top-seg-hits.sam

The pipeline was then implemented in pyhton and executed as follows

In [None]:
python downsampleAssembler.py -r 11-9-top-seg-hits.fasta -b fluB.11-9-top-seg-hits.sort.bam

cat *spades-out/contigs.fasta > fluB.11-9-top-seg-hits.contigs.fasta

nucmer --maxmatch 11-9-top-seg-hits.fasta fluB.11-9-top-seg-hits.contigs.fasta

show-coords -cl out.delta

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
4 | 1830 | 21 | 1847 | 1827 | 1827 | 99.51 | 1834 | 1905 | 99.62 | 95.91 | HA	NODE_1_length_1905_cov_52.721_ID_5
1 | 1190 | 7 | 1196 | 1190 | 1190 | 99.66 | 1190 | 1206 | 100.00 | 98.67 | MP	NODE_1_length_1206_cov_57.0389_ID_5
1 | 1551 | 1569 | 19 | 1551 | 1551 | 99.61 | 1551 | 1576 | 100.00 | 98.41 | NA	NODE_1_length_1576_cov_54.3161_ID_5
1 | 1809 | 1839 | 31 | 1809 | 1809 | 99.61 | 1809 | 1861 | 100.00 | 97.21 | NP	NODE_1_length_1861_cov_52.0934_ID_5
1 | 1096 | 7 | 1102 | 1096 | 1096 | 99.82 | 1096 | 1157 | 100.00 | 94.73 | NS	NODE_1_length_1157_cov_54.4689_ID_5
1 | 2181 | 2285 | 105 | 2181 | 2181 | 99.95 | 2181 | 2327 | 100.00 | 93.73 | PA	NODE_1_length_2327_cov_57.0414_ID_5
290 | 379 | 1 | 90 | 90 | 90 | 98.89 | 1834 | 2486 | 4.91 | 3.62 | HA	NODE_1_length_2486_cov_48.1852_ID_147
1 | 2313 | 121 | 2433 | 2313 | 2313 | 99.83 | 2313 | 2486 | 100.00 | 93.04 | PB1	NODE_1_length_2486_cov_48.1852_ID_147
50 | 176 | 254 | 128 | 127 | 127 | 100.00 | 2313 | 254 | 5.49 | 50.00 | PB1	NODE_2_length_254_cov_28.3622_ID_150
2032 | 2158 | 127 | 1 | 127 | 127 | 100.00 | 2313 | 254 | 5.49 | 50.00 | PB1	NODE_2_length_254_cov_28.3622_ID_150
132 | 258 | 1 | 127 | 127 | 127 | 98.43 | 2313 | 252 | 5.49 | 50.40 | PB1	NODE_3_length_252_cov_10.552_ID_153
2240 | 2313 | 126 | 199 | 74 | 74 | 100.00 | 2313 | 252 | 3.20 | 29.37 | PB1	NODE_3_length_252_cov_10.552_ID_153
2240 | 2313 | 1 | 74 | 74 | 74 | 100.00 | 2313 | 144 | 3.20 | 51.39 | PB1	NODE_4_length_144_cov_44.3529_ID_156
1 | 2369 | 2423 | 55 | 2369 | 2369 | 97.55 | 2369 | 2432 | 100.00 | 97.41 | PB2	NODE_1_length_2432_cov_47.3761_ID_65
19 | 145 | 253 | 127 | 127 | 127 | 96.06 | 2369 | 253 | 5.36 | 50.20 | PB2	NODE_2_length_253_cov_170.183_ID_68
2153 | 2281 | 129 | 1 | 129 | 129 | 99.22 | 2369 | 253 | 5.45 | 50.99 | PB2	NODE_2_length_253_cov_170.183_ID_68

All seem to be very similar. Note that PB1 maps to the beginning of HA for some reason.

Node 2 in PB2 is a putative DI as well as nodes 2 and 3 in PB1. Node 4 in PB1 is garbage? It may also be one half of a DI since it does align to the first 120 ish bases. 

#### Contig Resequecing

Now that we have long assemblies we can go forward with resequencing to see if they're erroneous or not. This will involve mapping the raw reads back to the contigs and visually inspecting them. I can also implement a haplotype module in the assembly script that will help out, though I'd need to think of one first...

In [None]:
# the reference file has only the complete segments, not the DIs
bwa index fluB.11-9-top-seg-hits.full-segs.contigs.fasta

bwa mem -t 6 -x intractg fluB.11-9-top-seg-hits.full-segs.contigs.fasta flub.trimmed.r1.fastq flub.trimmed.r2.fastq \
> fluB.11-9-top-seg-hits.full-segs.contigs.sam

cat flub.trimmed.se.r1.fastq flub.trimmed.se.r2.fastq > fluB.trimmed.se.fastq

bwa mem -t 6 -x intractg fluB.11-9-top-seg-hits.full-segs.contigs.fasta fluB.trimmed.se.fastq \
> fluB.11-9-top-seg-hits.full-segs.contigs.se.sam

for i in $(ls *sam | perl -pe 's/\.sam//g'); do samtools view -b -o $i.bam $i.sam && \
samtools sort -o $i.sort.bam $i.bam && samtools index $i.sort.bam; done

rm *sam *contigs.bam *se.bam

samtools merge fluB.11-9-top-seg-hits.full-segs.contigs.merged.bam \
fluB.11-9-top-seg-hits.full-segs.contigs.sort.bam fluB.11-9-top-seg-hits.full-segs.contigs.se.sort.bam

samtools index fluB.11-9-top-seg-hits.full-segs.contigs.merged.bam

![title](docs/fluB.11-9-top-seg-hits.full-segs.contigs-PB2.png)
![title](docs/fluB.11-9-top-seg-hits.full-segs.contigs-PB1.png)
![title](docs/fluB.11-9-top-seg-hits.full-segs.contigs-PA.png)
![title](docs/fluB.11-9-top-seg-hits.full-segs.contigs-HA.png)
![title](docs/fluB.11-9-top-seg-hits.full-segs.contigs-NA.png)
![title](docs/fluB.11-9-top-seg-hits.full-segs.contigs-NP.png)
![title](docs/fluB.11-9-top-seg-hits.full-segs.contigs-MP.png)
![title](docs/fluB.11-9-top-seg-hits.full-segs.contigs-NS.png)

These look wonderful in regards to their sequence content. However, some of the termini are shortened for some strange reason. Not sure what happened but I'm assuming it's safe to go forward with tuning the parameters to capture these sequences.

The variable position in the NS segment has a T represented on both strands which is suggestive of an actual variant since it's not in the primer regions.

### H3N2-70

In [None]:
trimmomatic PE -threads 6 c3.r1.fastq.gz c3.r2.fastq.gz c3.trimmed.r1.fastq c3.trimmed.se.r1.fastq \
c3.trimmed.r2.fastq c3.trimmed.se.r2.fastq ILLUMINACLIP:/usr/local/share/trimmomatic/adapters/TruSeq3-PE-2.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25

# Input Read Pairs: 814706 Both Surviving: 364547 (44.75%) Forward Only Surviving: 408332 (50.12%) Reverse Only Surviving: 757 (0.09%) Dropped: 41070 (5.04%)

bwa index 6-4-h3n2-70-top-seg-hits.fasta
bwa mem -t 6 -x intractg 6-4-h3n2-70-top-seg-hits.fasta c3.trimmed.r1.fastq c3.trimmed.r2.fastq > c3.flu-6-4-h3n2-70.fludb.top-seg-hits.sam
samtools view -b -o c3.flu-6-4-h3n2-70.fludb.top-seg-hits.bam c3.flu-6-4-h3n2-70.fludb.top-seg-hits.sam
samtools sort -o c3.flu-6-4-h3n2-70.fludb.top-seg-hits.sort.bam c3.flu-6-4-h3n2-70.fludb.top-seg-hits.bam
samtools index c3.flu-6-4-h3n2-70.fludb.top-seg-hits.sort.bam

python ~/projects/MinION-Flu-Analysis/scripts/downsampleAssembler.py -r 6-4-h3n2-70-top-seg-hits.fasta -b c3.flu-6-4-h3n2-70.fludb.top-seg-hits.sort.bam

cat */contigs.fasta > c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.fasta
nucmer --maxmatch 6-4-h3n2-70-top-seg-hits.fasta c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.fasta

show-coords -clT out.delta | perl -pe 's/\[|\]//g' | perl -pe 's/\t/ | /g'

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
1 | 1727 | 1857 | 131 | 1727 | 1727 | 96.29 | 1727 | 1877 | 100.00 | 92.01 | HA | NODE_1_length_1877_cov_56.0194_ID_5
859 | 957 | 1 | 99 | 99 | 99 | 97.98 | 1727 | 1877 | 5.73 | 5.27 | HA | NODE_1_length_1877_cov_56.0194_ID_5
1 | 1027 | 1136 | 110 | 1027 | 1027 | 98.05 | 1027 | 1142 | 100.00 | 89.93 | MP | NODE_1_length_1142_cov_50.7803_ID_5
835 | 933 | 99 | 1 | 99 | 99 | 98.99 | 1027 | 1142 | 9.64 | 8.67 | MP | NODE_1_length_1142_cov_50.7803_ID_5
1 | 1466 | 7 | 1473 | 1466 | 1467 | 97.34 | 1466 | 1483 | 100.00 | 98.92 | NA | NODE_1_length_1483_cov_45.219_ID_117
1 | 109 | 7 | 115 | 109 | 109 | 99.08 | 1466 | 382 | 7.44 | 28.53 | NA | NODE_2_length_382_cov_153.267_ID_120
1208 | 1466 | 113 | 372 | 259 | 260 | 98.46 | 1466 | 382 | 17.67 | 68.06 | NA | NODE_2_length_382_cov_153.267_ID_120
1 | 138 | 7 | 144 | 138 | 138 | 99.28 | 1466 | 321 | 9.41 | 42.99 | NA | NODE_3_length_321_cov_79.7629_ID_123
1301 | 1466 | 145 | 311 | 166 | 167 | 98.80 | 1466 | 321 | 11.32 | 52.02 | NA | NODE_3_length_321_cov_79.7629_ID_123
1 | 1566 | 1701 | 137 | 1566 | 1565 | 98.72 | 1566 | 1711 | 100.00 | 91.47 | NP | NODE_1_length_1711_cov_50.0783_ID_5
353 | 478 | 126 | 1 | 126 | 126 | 96.83 | 1566 | 1711 | 8.05 | 7.36 | NP | NODE_1_length_1711_cov_50.0783_ID_5
1 | 890 | 11 | 900 | 890 | 890 | 98.31 | 890 | 1032 | 100.00 | 86.24 | NS | NODE_1_length_1032_cov_47.7735_ID_5
566 | 688 | 910 | 1032 | 123 | 123 | 99.19 | 1566 | 1032 | 7.85 | 11.92 | NP | NODE_1_length_1032_cov_47.7735_ID_5
1 | 2166 | 7 | 2172 | 2166 | 2166 | 96.95 | 2232 | 2172 | 97.04 | 99.72 | PA | NODE_1_length_2172_cov_41.7242_ID_137
1 | 138 | 7 | 144 | 138 | 138 | 95.65 | 2232 | 453 | 6.18 | 30.46 | PA | NODE_2_length_453_cov_80.8589_ID_140
2040 | 2232 | 143 | 336 | 193 | 194 | 97.42 | 2232 | 453 | 8.65 | 42.83 | PA | NODE_2_length_453_cov_80.8589_ID_140
1059 | 1165 | 453 | 347 | 107 | 107 | 98.13 | 2340 | 453 | 4.57 | 23.62 | PB1 | NODE_2_length_453_cov_80.8589_ID_140
45 | 171 | 1 | 127 | 127 | 127 | 96.06 | 2232 | 253 | 5.69 | 50.20 | PA | NODE_3_length_253_cov_30.754_ID_143
1674 | 1800 | 127 | 253 | 127 | 127 | 98.43 | 2232 | 253 | 5.69 | 50.20 | PA | NODE_3_length_253_cov_30.754_ID_143
1 | 2340 | 40 | 2380 | 2340 | 2341 | 97.31 | 2340 | 2396 | 100.00 | 97.70 | PB1 | NODE_1_length_2396_cov_52.5677_ID_5
1 | 2280 | 52 | 2331 | 2280 | 2280 | 97.76 | 2280 | 2500 | 100.00 | 91.20 | PB2 | NODE_1_length_2500_cov_48.6941_ID_15
1675 | 1801 | 2374 | 2500 | 127 | 127 | 98.43 | 2280 | 2500 | 5.57 | 5.08 | PB2 | NODE_1_length_2500_cov_48.6941_ID_15

In [None]:
bwa index c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.fasta
bwa mem -t 6 -x intractg c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.fasta c3.trimmed.r1.fastq c3.trimmed.r2.fastq \
> c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.sam
samtools view -b -o c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.bam c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.sam
samtools sort -o c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.sort.bam c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.bam
samtools index c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.sort.bam

cat c3.trimmed.se.r1.fastq c3.trimmed.se.r2.fastq > c3.trimmed.se.fastq

bwa mem -t 6 -x intractg c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.fasta c3.trimmed.se.fastq \
> c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.se.sam

for i in $(ls *sam | perl -pe 's/\.sam//g'); do samtools view -b -o $i.bam $i.sam && \
samtools sort -o $i.sort.bam $i.bam && samtools index $i.sort.bam; done

samtools merge c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.merged.sort.bam \
c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.sort.bam c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.se.sort.bam

samtools index c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.merged.sort.bam

### H3N2-90

In [None]:
trimmomatic PE -threads 6 f1.r1.fastq.gz f1.r2.fastq.gz f1.trimmed.r1.fastq f1.trimmed.se.r1.fastq \
f1.trimmed.r2.fastq f1.trimmed.se.r2.fastq ILLUMINACLIP:/usr/local/share/trimmomatic/adapters/TruSeq3-PE-2.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25

#Input Read Pairs: 600706 Both Surviving: 316954 (52.76%) Forward Only Surviving: 233196 (38.82%) Reverse Only Surviving: 757 (0.13%) Dropped: 49799 (8.29%)

bwa index 8-13-h3n2-90-top-seg-hits.fasta
bwa mem -t 6 -x intractg 8-13-h3n2-90-top-seg-hits.fasta f1.trimmed.r1.fastq f1.trimmed.r2.fastq > f1.8-13-h3n2-90-top-seg-hits.sam
samtools view -b -o f1.8-13-h3n2-90-top-seg-hits.bam f1.8-13-h3n2-90-top-seg-hits.sam
samtools sort -o f1.8-13-h3n2-90-top-seg-hits.sort.bam f1.8-13-h3n2-90-top-seg-hits.bam
samtools index f1.8-13-h3n2-90-top-seg-hits.sort.bam

python ~/projects/MinION-Flu-Analysis/scripts/downsampleAssembler.py -r 8-13-h3n2-90-top-seg-hits.fasta -b f1.8-13-h3n2-90-top-seg-hits.sort.bam

cat */contigs.fasta > f1.8-13-h3n2-90-top-seg-hits.contigs.fasta
nucmer --maxmatch 8-13-h3n2-90-top-seg-hits.fasta f1.8-13-h3n2-90-top-seg-hits.contigs.fasta

show-coords -clT out.delta | perl -pe 's/\[|\]//g' | perl -pe 's/\t/ | /g'

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
1 | 1762 | 7 | 1768 | 1762 | 1762 | 99.21 | 1762 | 1779 | 100.00 | 99.04 | HA | NODE_1_length_1779_cov_56.2324_ID_5
1 | 1027 | 7 | 1033 | 1027 | 1027 | 99.61 | 1027 | 1045 | 100.00 | 98.28 | MP | NODE_1_length_1045_cov_61.0294_ID_5
1 | 1466 | 7 | 1472 | 1466 | 1466 | 99.45 | 1466 | 1486 | 100.00 | 98.65 | NA | NODE_1_length_1486_cov_52.6196_ID_5
1 | 1566 | 1652 | 87 | 1566 | 1566 | 99.68 | 1566 | 1674 | 100.00 | 93.55 | NP | NODE_1_length_1674_cov_48.8474_ID_5
1 | 890 | 14 | 903 | 890 | 890 | 99.55 | 890 | 1012 | 100.00 | 87.94 | NS | NODE_1_length_1012_cov_51.4633_ID_5
443 | 545 | 910 | 1012 | 103 | 103 | 99.03 | 890 | 1012 | 11.57 | 10.18 | NS | NODE_1_length_1012_cov_51.4633_ID_5
1 | 2232 | 8 | 2241 | 2232 | 2234 | 99.46 | 2232 | 2254 | 100.00 | 99.11 | PA | NODE_1_length_2254_cov_55.5205_ID_5
1 | 2341 | 9 | 2349 | 2341 | 2341 | 99.66 | 2341 | 2363 | 100.00 | 99.07 | PB1 | NODE_1_length_2363_cov_49.4275_ID_5
1 | 2341 | 7 | 2347 | 2341 | 2341 | 99.66 | 2341 | 2357 | 100.00 | 99.32 | PB2 | NODE_1_length_2357_cov_50.7623_ID_284
1 | 209 | 334 | 126 | 209 | 209 | 98.56 | 2341 | 340 | 8.93 | 61.47 | PB2 | NODE_2_length_340_cov_28.9953_ID_287
1906 | 2032 | 127 | 1 | 127 | 127 | 99.21 | 2341 | 340 | 5.43 | 37.35 | PB2 | NODE_2_length_340_cov_28.9953_ID_287
246 | 451 | 1 | 201 | 206 | 201 | 97.09 | 2341 | 325 | 8.80 | 61.85 | PB2 | NODE_3_length_325_cov_1.4596_ID_290
1844 | 1970 | 199 | 325 | 127 | 127 | 99.21 | 2341 | 325 | 5.43 | 39.08 | PB2 | NODE_3_length_325_cov_1.4596_ID_290
1941 | 2067 | 254 | 128 | 127 | 127 | 99.21 | 2341 | 254 | 5.43 | 50.00 | PB2 | NODE_4_length_254_cov_1_ID_293

In [None]:
bwa index f1.8-13-h3n2-90-top-seg-hits.contigs.fasta
bwa mem -t 6 -x intractg f1.8-13-h3n2-90-top-seg-hits.contigs.fasta f1.trimmed.r1.fastq f1.trimmed.r2.fastq > f1.8-13-h3n2-90-top-seg-hits.contigs.sam
samtools view -b -o f1.8-13-h3n2-90-top-seg-hits.contigs.bam f1.8-13-h3n2-90-top-seg-hits.contigs.sam
samtools sort -o f1.8-13-h3n2-90-top-seg-hits.contigs.sort.bam f1.8-13-h3n2-90-top-seg-hits.contigs.bam
samtools index f1.8-13-h3n2-90-top-seg-hits.contigs.sort.bam

cat f1.trimmed.se.r1.fastq f1.trimmed.se.r2.fastq > f1.trimmed.se.fastq

bwa mem -t 6 -x intractg f1.8-13-h3n2-90-top-seg-hits.contigs.fasta f1.trimmed.se.fastq \
> f1.8-13-h3n2-90-top-seg-hits.contigs.se.sam

for i in $(ls *sam | perl -pe 's/\.sam//g'); do samtools view -b -o $i.bam $i.sam && \
samtools sort -o $i.sort.bam $i.bam && samtools index $i.sort.bam; done

samtools merge f1.8-13-h3n2-90-top-seg-hits.contigs.merged.sort.bam \
f1.8-13-h3n2-90-top-seg-hits.contigs.sort.bam f1.8-13-h3n2-90-top-seg-hits.contigs.se.sort.bam

samtools index f1.8-13-h3n2-90-top-seg-hits.contigs.merged.sort.bam

There's a lot more noise in this H3N2-90 run than the other samples (which have noise that is limited to the termini).

### H1N1

In [None]:
cd /home/alan/projects/MinION-notebook/clinical-analysis/miseq/h1n1

java -jar /opt/bioinformatics-software/Trimmomatic-0.35/trimmomatic-0.35.jar PE -threads 30 h1n1.r1.fastq.gz \
h1n1.r2.fastq.gz h1n1.trimmed.r1.fastq h1n1.trimmed.se.r1.fastq h1n1.trimmed.r2.fastq h1n1.trimmed.se.r2.fastq \
ILLUMINACLIP:/opt/bioinformatics-software/Trimmomatic-0.35/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 \
SLIDINGWINDOW:4:15 MINLEN:25

# Input Read Pairs: 434829 Both Surviving: 83917 (19.30%) Forward Only Surviving: 328756 (75.61%) Reverse Only Surviving: 277 (0.06%) Dropped: 21879 (5.03%)

bwa index 6-4-h1n1-top-seg-hits.fasta
bwa mem -t 6 -x intractg 6-4-h1n1-top-seg-hits.fasta a1.trimmed.r1.fastq a1.trimmed.r2.fastq > a1.flu-6-4-h1n1.fludb.top-seg-hits.sam
samtools view -b -o a1.flu-6-4-h1n1.fludb.top-seg-hits.bam a1.flu-6-4-h1n1.fludb.top-seg-hits.sam
samtools sort -o a1.flu-6-4-h1n1.fludb.top-seg-hits.sort.bam a1.flu-6-4-h1n1.fludb.top-seg-hits.bam
samtools index a1.flu-6-4-h1n1.fludb.top-seg-hits.sort.bam

python ~/projects/MinION-Flu-Analysis/scripts/downsampleAssembler.py -r 6-4-h1n1-top-seg-hits.fasta -b a1.flu-6-4-h1n1.fludb.top-seg-hits.sort.bam

cat */contigs.fasta > a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.fasta
nucmer --maxmatch 6-4-h1n1-top-seg-hits.fasta a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.fasta

show-coords -clT out.delta | perl -pe 's/\[|\]//g' | perl -pe 's/\t/ | /g'

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
1 | 1777 | 1792 | 16 | 1777 | 1777 | 99.44 | 1777 | 1803 | 100.00 | 98.56 | HA | NODE_1_length_1803_cov_54.429_ID_5
5 | 1031 | 12 | 1038 | 1027 | 1027 | 99.42 | 1031 | 1048 | 99.61 | 98.00 | MP | NODE_1_length_1048_cov_55.8284_ID_5
1 | 1410 | 1447 | 38 | 1410 | 1410 | 99.79 | 1410 | 1478 | 100.00 | 95.40 | NA | NODE_1_length_1478_cov_57.8579_ID_5
1 | 1497 | 1529 | 33 | 1497 | 1497 | 99.87 | 1497 | 1582 | 100.00 | 94.63 | NP | NODE_1_length_1582_cov_57.0117_ID_5
1 | 863 | 873 | 11 | 863 | 863 | 99.88 | 863 | 908 | 100.00 | 95.04 | NS | NODE_1_length_908_cov_58.1498_ID_5
1 | 2233 | 2242 | 10 | 2233 | 2233 | 99.55 | 2233 | 2253 | 100.00 | 99.11 | PA | NODE_1_length_2253_cov_54.9699_ID_5
1 | 2327 | 28 | 2355 | 2327 | 2328 | 98.28 | 2327 | 2370 | 100.00 | 98.23 | PB1 | NODE_1_length_2370_cov_58.1881_ID_5
4 | 2341 | 2353 | 16 | 2338 | 2338 | 99.19 | 2341 | 2367 | 99.87 | 98.77 | PB2 | NODE_1_length_2367_cov_58.792_ID_5

In [None]:
bwa index a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.fasta
bwa mem -t 6 -x intractg a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.fasta a1.trimmed.r1.fastq a1.trimmed.r2.fastq > a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.sam
samtools view -b -o a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.bam a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.sam
samtools sort -o a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.sort.bam a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.bam
samtools index a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.sort.bam

cat a1.trimmed.se.r?.fastq > a1.trimmed.se.fastq
bwa mem -t 6 -x intractg a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.fasta a1.trimmed.se.fastq > a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.se.sam
samtools view -b -o a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.se.bam a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.se.sam
samtools sort -o a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.sort.se.bam a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.se.bam
samtools index a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.sort.se.bam

samtools merge a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.merged.sort.bam \
a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.sort.bam a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.sort.se.bam

samtools index a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.merged.sort.bam

### H3N2-39

In [None]:
trimmomatic PE -threads 6 a39.r1.fastq.gz a39.r2.fastq.gz a39.trimmed.r1.fastq a39.trimmed.se.r1.fastq \
a39.trimmed.r2.fastq a39.trimmed.se.r2.fastq ILLUMINACLIP:/usr/local/share/trimmomatic/adapters/TruSeq3-PE-2.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25

# 199626 Both Surviving: 196456 (98.41%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 3170 (1.59%)

for i in HA NA NP NS MP PA PB1 PB2; do python ~/projects/MinION-notebook/scripts/read-fludb-blastxml.py flu-4-5.fludb.xml \
| sort -nrk4 | grep -m1 $i; done | cut -f1 | grep --no-group-separator -A1 -F -f - \
~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta  > flu-4-5.fludb.top-seg-hits.fasta

# Input Read Pairs: 687459 Both Surviving: 394107 (57.33%) Forward Only Surviving: 266824 (38.81%) Reverse Only Surviving: 524 (0.08%) Dropped: 26004 (3.78%)

bwa index flu-4-5.fludb.top-seg-hits.fasta
bwa mem -t 6 -x intractg flu-4-5.fludb.top-seg-hits.fasta a39.trimmed.r1.fastq a39.trimmed.r2.fastq > a39.flu-4-5.fludb.top-seg-hits.sam
samtools view -b -o a39.flu-4-5.fludb.top-seg-hits.bam a39.flu-4-5.fludb.top-seg-hits.sam
samtools sort -o a39.flu-4-5.fludb.top-seg-hits.sort.bam a39.flu-4-5.fludb.top-seg-hits.bam
samtools index a39.flu-4-5.fludb.top-seg-hits.sort.bam

python ~/projects/MinION-Flu-Analysis/scripts/downsampleAssembler.py -r flu-4-5.fludb.top-seg-hits.fasta -b a39.flu-4-5.fludb.top-seg-hits.sort.bam

cat */contigs.fasta > a39.flu-4-5.fludb.top-seg-hits.contigs.fasta
nucmer --maxmatch flu-4-5.fludb.top-seg-hits.fasta a39.flu-4-5.fludb.top-seg-hits.contigs.fasta

show-coords -clT out.delta | perl -pe 's/\[|\]//g' | perl -pe 's/\t/ | /g'

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
1 | 1701 | 68 | 1768 | 1701 | 1701 | 99.76 | 1701 | 1812 | 100.00 | 93.87 | HA | NODE_1_length_1812_cov_53.0825_ID_5
1 | 1027 | 10 | 1036 | 1027 | 1027 | 99.61 | 1027 | 1064 | 100.00 | 96.52 | MP | NODE_1_length_1064_cov_47.7118_ID_5
1 | 1466 | 7 | 1472 | 1466 | 1466 | 99.45 | 1466 | 1581 | 100.00 | 92.73 | NA | NODE_1_length_1581_cov_50.3865_ID_5
715 | 813 | 1581 | 1483 | 99 | 99 | 98.99 | 1466 | 1581 | 6.75 | 6.26 | NA | NODE_1_length_1581_cov_50.3865_ID_5
1 | 1566 | 1583 | 18 | 1566 | 1566 | 99.55 | 1566 | 1591 | 100.00 | 98.43 | NP | NODE_1_length_1591_cov_55.653_ID_5
1 | 890 | 904 | 15 | 890 | 890 | 99.21 | 890 | 920 | 100.00 | 96.74 | NS | NODE_1_length_920_cov_57.1173_ID_5
1 | 2232 | 2249 | 17 | 2232 | 2233 | 99.55 | 2232 | 2255 | 100.00 | 99.02 | PA | NODE_1_length_2255_cov_54.7998_ID_58
218 | 344 | 253 | 127 | 127 | 127 | 100.00 | 2232 | 253 | 5.69 | 50.20 | PA | NODE_2_length_253_cov_4.4127_ID_61
1362 | 1488 | 127 | 1 | 127 | 127 | 100.00 | 2232 | 253 | 5.69 | 50.20 | PA | NODE_2_length_253_cov_4.4127_ID_61
1 | 2274 | 45 | 2318 | 2274 | 2274 | 99.43 | 2274 | 2378 | 100.00 | 95.63 | PB1 | NODE_1_length_2378_cov_47.6579_ID_5
1 | 2280 | 2369 | 90 | 2280 | 2280 | 99.82 | 2280 | 2412 | 100.00 | 94.53 | PB2 | NODE_1_length_2412_cov_52.5873_ID_5

In [None]:
bwa index a39.flu-4-5.fludb.top-seg-hits.contigs.fasta
bwa mem -t 6 -x intractg a39.flu-4-5.fludb.top-seg-hits.contigs.fasta a39.trimmed.r1.fastq a39.trimmed.r2.fastq > a39.flu-4-5.fludb.top-seg-hits.contigs.sam
samtools view -b -o a39.flu-4-5.fludb.top-seg-hits.contigs.bam a39.flu-4-5.fludb.top-seg-hits.contigs.sam
samtools sort -o a39.flu-4-5.fludb.top-seg-hits.contigs.sort.bam a39.flu-4-5.fludb.top-seg-hits.contigs.bam
samtools index a39.flu-4-5.fludb.top-seg-hits.contigs.sort.bam

## Identify Variantes from Each Assembly

Now that we have an "accurate" representation of the real sequences we can now identify the unique variants for each strain in order to assess the MinION's ability to do, intrahost snv calling, phasing and determine the required coverage and platform comparisons.

Based on the alignment of all strains, the FluB segment is easily identifiable and won't be consdiered when looking for specific variants.

![title](docs/miseq-contigs-PB1-alignment.png)

### H3N3-39 vs H3N2-90

In [None]:
nucmer a39.flu-4-5.fludb.top-seg-hits.contigs.fasta f1.8-13-h3n2-90-top-seg-hits.contigs.fasta

# sets minimum alignment length to 500 (weeding out the crap)
delta-filter -l 500 out.delta > out.filter

show-coords -clT out.filter | perl -pe 's/\[|\]//g' | perl -pe 's/\t/ | /g'

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
33 | 1811 | 1 | 1779 | 1779 | 1779 | 99.83 | 1812 | 1779 | 98.18 | 100.00 | HA-h3n2-39 | HA-h3n2-90
4 | 1048 | 1 | 1045 | 1045 | 1045 | 99.90 | 1064 | 1045 | 98.21 | 100.00 | MP-h3n2-39 | MP-h3n2-90
1 | 1481 | 1 | 1481 | 1481 | 1481 | 99.32 | 1581 | 1486 | 93.67 | 99.66 | NA-h3n2-39 | NA-h3n2-90
9 | 1589 | 78 | 1658 | 1581 | 1581 | 99.87 | 1591 | 1674 | 99.37 | 94.44 | NP-h3n2-39 | NP-h3n2-90
6 | 917 | 912 | 1 | 912 | 912 | 99.78 | 920 | 1012 | 99.13 | 90.12 | NS-h3n2-39 | NS-h3n2-90
4 | 2255 | 2254 | 2 | 2252 | 2253 | 99.69 | 2255 | 2254 | 99.87 | 99.96 | PA-h3n2-39 | PA-h3n2-90
13 | 2375 | 1 | 2363 | 2363 | 2363 | 99.66 | 2378 | 2363 | 99.37 | 100.00 | PB1-h3n2-39 | PB1-h3n2-90
46 | 2402 | 2357 | 1 | 2357 | 2357 | 99.87 | 2412 | 2357 | 97.72 | 100.00 | PB2-h3n2-39 | PB2-h3n2-90

In [None]:
show-snps -T out.filter

P1 | SUB | SUB | P2 | BUFF | DIST | R | Q | FRM | TAGS
---|-----|-----|----|------|------|---|---|-----|-----
739 | G | A | 707 | 456 | 707 | 0 | 0 | 1 | 1 | HA-h3n2-39 | HA-h3n2-90
1195 | A | G | 1163 | 386 | 617 | 0 | 0 | 1 | 1 | HA-h3n2-39 | HA-h3n2-90
1581 | G | A | 1549 | 231 | 231 | 0 | 0 | 1 | 1 | HA-h3n2-39 | HA-h3n2-90
877 | A | G | 874 | 172 | 172 | 0 | 0 | 1 | 1 | MP-h3n2-39 | MP-h3n2-90
10 | A | G | 10 | 10 | 10 | 0 | 0 | 1 | 1 | NA-h3n2-39 | NA-h3n2-90
74 | A | T | 74 | 2 | 74 | 0 | 0 | 1 | 1 | NA-h3n2-39 | NA-h3n2-90
76 | C | T | 76 | 2 | 76 | 0 | 0 | 1 | 1 | NA-h3n2-39 | NA-h3n2-90
340 | G | A | 340 | 6 | 340 | 0 | 0 | 1 | 1 | NA-h3n2-39 | NA-h3n2-90
346 | G | A | 346 | 6 | 346 | 0 | 0 | 1 | 1 | NA-h3n2-39 | NA-h3n2-90
430 | A | G | 430 | 84 | 430 | 0 | 0 | 1 | 1 | NA-h3n2-39 | NA-h3n2-90
676 | G | A | 676 | 12 | 676 | 0 | 0 | 1 | 1 | NA-h3n2-39 | NA-h3n2-90
688 | T | C | 688 | 12 | 688 | 0 | 0 | 1 | 1 | NA-h3n2-39 | NA-h3n2-90
1102 | C | T | 1102 | 3 | 385 | 0 | 0 | 1 | 1 | NA-h3n2-39 | NA-h3n2-90
1105 | A | G | 1105 | 3 | 382 | 0 | 0 | 1 | 1 | NA-h3n2-39 | NA-h3n2-90
491 | C | T | 560 | 88 | 491 | 0 | 0 | 1 | 1 | NP-h3n2-39 | NP-h3n2-90
579 | T | C | 648 | 88 | 579 | 0 | 0 | 1 | 1 | NP-h3n2-39 | NP-h3n2-90
248 | T | C | 670 | 243 | 248 | 0 | 0 | 1 | -1 | NS-h3n2-39 | NS-h3n2-90
657 | T | A | 261 | 261 | 261 | 0 | 0 | 1 | -1 | NS-h3n2-39 | NS-h3n2-90
37 | . | T | 2220 | 15 | 35 | 0 | 0 | 1 | -1 | PA-h3n2-39 | PA-h3n2-90
52 | A | G | 2205 | 15 | 50 | 0 | 0 | 1 | -1 | PA-h3n2-39 | PA-h3n2-90
594 | T | C | 1663 | 108 | 592 | 0 | 0 | 1 | -1 | PA-h3n2-39 | PA-h3n2-90
702 | T | C | 1555 | 108 | 700 | 0 | 0 | 1 | -1 | PA-h3n2-39 | PA-h3n2-90
1875 | G | A | 382 | 293 | 381 | 0 | 0 | 1 | -1 | PA-h3n2-39 | PA-h3n2-90
2168 | T | C | 89 | 78 | 88 | 0 | 0 | 1 | -1 | PA-h3n2-39 | PA-h3n2-90
2246 | T | C | 11 | 10 | 10 | 0 | 0 | 1 | -1 | PA-h3n2-39 | PA-h3n2-90
776 | T | C | 764 | 24 | 764 | 0 | 0 | 1 | 1 | PB1-h3n2-39 | PB1-h3n2-90
800 | A | G | 788 | 24 | 788 | 0 | 0 | 1 | 1 | PB1-h3n2-39 | PB1-h3n2-90
863 | C | T | 851 | 63 | 851 | 0 | 0 | 1 | 1 | PB1-h3n2-39 | PB1-h3n2-90
1093 | G | A | 1081 | 133 | 1081 | 0 | 0 | 1 | 1 | PB1-h3n2-39 | PB1-h3n2-90
1226 | C | T | 1214 | 133 | 1150 | 0 | 0 | 1 | 1 | PB1-h3n2-39 | PB1-h3n2-90
1586 | A | G | 1574 | 285 | 790 | 0 | 0 | 1 | 1 | PB1-h3n2-39 | PB1-h3n2-90
1871 | T | C | 1859 | 138 | 505 | 0 | 0 | 1 | 1 | PB1-h3n2-39 | PB1-h3n2-90
2009 | A | G | 1997 | 138 | 367 | 0 | 0 | 1 | 1 | PB1-h3n2-39 | PB1-h3n2-90
1236 | T | G | 1167 | 256 | 1167 | 0 | 0 | 1 | -1 | PB2-h3n2-39 | PB2-h3n2-90
1492 | A | C | 911 | 101 | 911 | 0 | 0 | 1 | -1 | PB2-h3n2-39 | PB2-h3n2-90
1593 | C | T | 810 | 101 | 810 | 0 | 0 | 1 | -1 | PB2-h3n2-39 | PB2-h3n2-90

### H3N2-70 vs H3N2-90

In [None]:
nucmer  nucmer c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.fasta f1.8-13-h3n2-90-top-seg-hits.contigs.fasta
delta-filter -l 500 out.delta > out.filter
show-coords -clT out.filter | perl -pe 's/\[|\]//g' | perl -pe 's/\t/ | /g'

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
101 | 1877 | 1777 | 1 | 1777 | 1777 | 96.17 | 1877 | 1779 | 94.67 | 99.89 | HA-h3n2-70 | HA-h3n2-90
101 | 1142 | 1042 | 1 | 1042 | 1042 | 97.79 | 1142 | 1045 | 91.24 | 99.71 | MP-h3n2-70 | MP-h3n2-90
1 | 1483 | 1 | 1482 | 1483 | 1482 | 97.03 | 1483 | 1486 | 100.00 | 99.73 | NA-h3n2-70 | NA-h3n2-90
128 | 1707 | 78 | 1658 | 1580 | 1581 | 98.48 | 1711 | 1674 | 92.34 | 94.44 | NP-h3n2-70 | NP-h3n2-90
1 | 909 | 4 | 912 | 909 | 909 | 98.24 | 1032 | 1012 | 88.08 | 89.82 | NS-h3n2-70 | NS-h3n2-90
1 | 2172 | 2 | 2173 | 2172 | 2172 | 96.87 | 2172 | 2254 | 100.00 | 96.36 | PA-h3n2-70 | PA-h3n2-90
34 | 2394 | 3 | 2363 | 2361 | 2361 | 97.29 | 2396 | 2363 | 98.54 | 99.92 | PB1-h3n2-70 | PB1-h3n2-90
19 | 2374 | 1 | 2356 | 2356 | 2356 | 97.75 | 2500 | 2357 | 94.24 | 99.96 | PB2-h3n2-70 | PB2-h3n2-90

### H3N2-70 vs H3N2-70

In [None]:
nucmer  nucmer c3.flu-6-4-h3n2-70.fludb.top-seg-hits.contigs.fasta a39.flu-4-5.fludb.top-seg-hits.contigs.fasta
delta-filter -l 500 out.delta > out.filter
show-coords -clT out.filter | perl -pe 's/\[|\]//g' | perl -pe 's/\t/ | /g'

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
101 | 1877 | 1809 | 33 | 1777 | 1777 | 96.12 | 1877 | 1812 | 94.67 | 98.07 | HA-h3n2-70 | HA-h3n2-39
101 | 1142 | 1045 | 4 | 1042 | 1042 | 97.89 | 1142 | 1064 | 91.24 | 97.93 | MP-h3n2-70 | MP-h3n2-39
1 | 1482 | 1 | 1481 | 1482 | 1481 | 97.10 | 1483 | 1581 | 99.93 | 93.67 | NA-h3n2-70 | NA-h3n2-39
128 | 1709 | 9 | 1591 | 1582 | 1583 | 98.36 | 1711 | 1591 | 92.46 | 99.50 | NP-h3n2-70 | NP-h3n2-39
1 | 909 | 914 | 6 | 909 | 909 | 98.02 | 1032 | 920 | 88.08 | 98.80 | NS-h3n2-70 | NS-h3n2-39
1 | 2172 | 2255 | 84 | 2172 | 2172 | 96.92 | 2172 | 2255 | 100.00 | 96.32 | PA-h3n2-70 | PA-h3n2-39
34 | 2396 | 15 | 2377 | 2363 | 2363 | 97.04 | 2396 | 2378 | 98.62 | 99.37 | PB1-h3n2-70 | PB1-h3n2-39
19 | 2374 | 2402 | 47 | 2356 | 2356 | 97.62 | 2500 | 2412 | 94.24 | 97.68 | PB2-h3n2-70 | PB2-h3n2-39

### Assembly Comparison Summary

H3N2-39 and H3N2-90 are very similar (every segment > 99% similar) so these will be the true test. The others aren't so similar and shoudl easily segregate out.

* H3N2-90 vs H3N2-70
    * HA 96.17
    * NA 97.03
* H3N2-70 vs H3N2-39
    * HA 96.12
    * NA 97.10
* H3N2-90 vs H3N2-39
    * HA 99.83
    * NA 99.32

## Map MinION to MiSeq Assembly

This will allow us to see the exact discrepancies between the platforms without having a bias in the reference base, presumably.

As the header says, I'm going to map the MinION data to the MiSeq assemblies and compare their consensus sequences. They should be the most similar sequences we've had thus far.

### FluB

In [None]:
cd /home/alan/projects/MinION-notebook/clinical-analysis/consensus-comparison/flub/minion-to-miseq-assembly

ln -s ~/projects/MinION-notebook/miseq-assembly/flub/flub-11-9-blast-hits-assembly/fluB.11-9-top-seg-hits.full-segs.contigs.fasta
ln -s ~/projects/MinION-notebook/raw-data/MinION/11-9/flu-11-9.2d.fastq

bwa index fluB.11-9-top-seg-hits.full-segs.contigs.fasta
bwa mem -t 30 -x ont2d fluB.11-9-top-seg-hits.full-segs.contigs.fasta flu-11-9.2d.fastq > flu-11-9.miseq-assembly.sam

samtools view -b -o flu-11-9.miseq-assembly.bam flu-11-9.miseq-assembly.sam
samtools sort -o flu-11-9.miseq-assembly.sort.bam flu-11-9.miseq-assembly.bam 
samtools index flu-11-9.miseq-assembly.sort.bam

rm flu-11-9.miseq-assembly.bam flu-11-9.miseq-assembly.sam

for i in HA NA NS NP MP PA PB1 PB2; do grep -A1 --no-group-separator $i fluB.11-9-top-seg-hits.full-segs.contigs.fasta \
> $i.fasta && python ~/projects/MinION-notebook/scripts/bamfile_consensus_generate.py flu-11-9.miseq-assembly.sort.bam $i \
>> $i.fasta && muscle -quiet -in $i.fasta -out $i.aln && python ~/projects/MinION-notebook/scripts/pairwise-pid.py \
$i.aln fasta; done

RefName | RefLen | #Variants | #Deletions
--------|--------|-----------|-----------
HA | 1905 | 0 | 9
NA | 1576 | 0 | 4
NS | 1157 | 1 | 25
NP | 1861 | 2 | 1
MP | 1206 | 0 | 3
PA | 2327 | 0 | 0
PB1 | 2486 | 1 | 67
PB2 | 2432 | 33 | 22

This STILL didn't work, but at least we have solid proof that the MinION can't do this. It may be  fixable by exploring the method/paramter space for different aligners but who knows how long that'll take. 

The NS variant is at the end of the alignment where there's extremely low coverage.

The NP variants are at the beginning of extremely poly(G) regions. 
![title](docs/FluB-NP-miseq-assembly-vs-minion-consensus.png)

The PB1 variant has good coverage but is at the end of the alignment. 


### H3N2-39

In [None]:
cd /home/alan/projects/MinION-notebook/clinical-analysis/consensus-comparison/h3n2-39/minion-to-miseq-assembly

ln -s /home/alan/projects/MinION-notebook/miseq-assembly/h3n2-39/a39.flu-4-5.fludb.top-seg-hits.contigs.fasta
ln -s ~/projects/MinION-notebook/raw-data/MinION/4-5/flu-4-5.2d.fastq

bwa index a39.flu-4-5.fludb.top-seg-hits.contigs.fasta
bwa mem -t 30 -x ont2d a39.flu-4-5.fludb.top-seg-hits.contigs.fasta flu-4-5.2d.fastq > flu-4-5.miseq-assembly.sam

samtools view -b -o flu-4-5.miseq-assembly.bam flu-4-5.miseq-assembly.sam
samtools sort -o flu-4-5.miseq-assembly.sort.bam flu-4-5.miseq-assembly.bam 
samtools index flu-4-5.miseq-assembly.sort.bam

rm flu-4-5.miseq-assembly.bam flu-4-5.miseq-assembly.sam

for i in HA NA NS NP MP PA PB1 PB2; do grep -A1 --no-group-separator $i a39.flu-4-5.fludb.top-seg-hits.contigs.fasta \
> $i.fasta && python ~/projects/MinION-notebook/scripts/bamfile_consensus_generate.py flu-4-5.miseq-assembly.sort.bam $i \
>> $i.fasta && muscle -quiet -in $i.fasta -out $i.aln && python ~/projects/MinION-notebook/scripts/pairwise-pid.py \
$i.aln fasta; done

Seg | Len | Reads Mapped 
----|-----|-------------
HA | 1812 | 21
MP | 1064 | 65
NA | 1581 | 11
NP | 1591 | 12
NS | 920 | 8
PA | 2255 | 9
PB1 | 2378 | 10
PB2 | 2412 | 3

RefName | RefLen | #Variants | #Deletions
--------|--------|-----------|-----------
HA | 1812 | 2 | 26
NA | 1581 | 1 | 106
NS | 920 | 2 | 17
NP | 1591 | 0 | 15
MP | 1064 | 0 | 9
PA | 2255 | 9 | 36
PB1 | 2378 | 23 | 46
PB2 | 2412 | 87 | 85

### H3N2-70

In [None]:
cd /home/alan/projects/MinION-notebook/clinical-analysis/consensus-comparison/h3n2-70/minion-to-miseq-assembly

ln -s /home/alan/projects/MinION-notebook/miseq-assembly/h3n2-39/a39.flu-4-5.fludb.top-seg-hits.contigs.fasta
ln -s ~/projects/MinION-notebook/raw-data/MinION/4-5/flu-4-5.2d.fastq

bwa index a39.flu-4-5.fludb.top-seg-hits.contigs.fasta
bwa mem -t 30 -x ont2d a39.flu-4-5.fludb.top-seg-hits.contigs.fasta flu-4-5.2d.fastq > flu-4-5.miseq-assembly.sam

samtools view -b -o flu-4-5.miseq-assembly.bam flu-4-5.miseq-assembly.sam
samtools sort -o flu-4-5.miseq-assembly.sort.bam flu-4-5.miseq-assembly.bam 
samtools index flu-4-5.miseq-assembly.sort.bam

rm flu-4-5.miseq-assembly.bam flu-4-5.miseq-assembly.sam

for i in HA NA NS NP MP PA PB1 PB2; do grep -A1 --no-group-separator $i a39.flu-4-5.fludb.top-seg-hits.contigs.fasta \
> $i.fasta && python ~/projects/MinION-notebook/scripts/bamfile_consensus_generate.py flu-4-5.miseq-assembly.sort.bam $i \
>> $i.fasta && muscle -quiet -in $i.fasta -out $i.aln && python ~/projects/MinION-notebook/scripts/pairwise-pid.py \
$i.aln fasta; done

Seg | Len | Reads Mapped 
----|-----|-------------
HA | 1812 | 21
MP | 1064 | 65
NA | 1581 | 11
NP | 1591 | 12
NS | 920 | 8
PA | 2255 | 9
PB1 | 2378 | 10
PB2 | 2412 | 3

RefName | RefLen | #Variants | #Deletions
--------|--------|-----------|-----------
HA | 1812 | 2 | 26
NA | 1581 | 1 | 106
NS | 920 | 2 | 17
NP | 1591 | 0 | 15
MP | 1064 | 0 | 9
PA | 2255 | 9 | 36
PB1 | 2378 | 23 | 46
PB2 | 2412 | 87 | 85

### H1N1

In [None]:
cd /home/alan/projects/MinION-notebook/clinical-analysis/consensus-comparison/h1n1/minion-to-miseq-assembly

bwa index a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.fasta
bwa mem -t 30 -x ont2d a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.fasta flu-6-4.2d.fastq >  flu-6-4.miseq-assembly.sam

samtools view -b -o  flu-6-4.miseq-assembly.bam  flu-6-4.miseq-assembly.sam
samtools sort -o  flu-6-4.miseq-assembly.sort.bam  flu-6-4.miseq-assembly.bam 
samtools index  flu-6-4.miseq-assembly.sort.bam

rm  flu-6-4.miseq-assembly.bam  flu-6-4.miseq-assembly.sam

# interrogate pairwise alignments
for i in HA NA NS NP MP PA PB1 PB2; do grep -A1 --no-group-separator $i a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.fasta \
> $i.fasta && python ~/projects/MinION-notebook/scripts/bamfile_consensus_generate.py  flu-6-4.miseq-assembly.sort.bam $i \
>> $i.fasta && muscle -quiet -in $i.fasta -out $i.aln && python ~/projects/MinION-notebook/scripts/pairwise-pid.py \
$i.aln fasta; done

# look at it with nucmer
for i in HA NA NS NP MP PA PB1 PB2; do python ~/projects/MinION-notebook/scripts/bamfile_consensus_generate.py \
flu-6-4.miseq-assembly.sort.bam $i; done > flu-6-4.miseq-assembly.consensus.fasta

nucmer a1.flu-6-4-h1n1.fludb.top-seg-hits.contigs.fasta flu-6-4.miseq-assembly.consensus.fasta
show-coords -lcT <(delta-filter -q -l 500 out.delta ) | perl -pe 's/\t/ | /g'

Reads Mapped to MiSeq Assembly:

Seg | Len | Reads Mapped 
----|-----|-------------
HA | 1803 | 608
MP | 1048 | 3890
NA | 1478 | 367
NP | 1582 | 805
NS | 908 | 2623
PA | 2253 | 401
PB1 | 2370 | 316
PB2 | 2367 | 313

MinION Reads Mapped to Blast Hits:

Seg | Len | Reads Mapped 
----|-----|-------------
MP | 1031 | 3887
PA | 2233 | 393
PB2 | 2341 | 308
PB1 | 2327 | 297
NA | 1410 | 245
NS | 863 | 2607
NP | 1497 | 805
HA | 1777 | 531

MiSeq Assembly vs MinION to MiSeq Assembly Consensus

RefName | RefLen | #Variants | #Deletions
--------|--------|-----------|-----------
HA | 1803 | 5 | 23
NA | 1478 | 2 | 6
NS | 908 | 1 | 5
NP | 1582 | 6 | 2
MP | 1048 | 76 | 5
PA | 2253 | 279 | 25
PB1 | 2370 | 116 | 21
PB2 | 2367 | 200 | 18

MiSeq Assembly vs MinION to MiSeq Assembly Consensus

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
1 | 1803 | 1 | 1803 | 1803 | 1803 | 98.56 | 1803 | 1803 | 100.00 | 100.00 | HA | HA
1 | 1048 | 1 | 1048 | 1048 | 1048 | 92.27 | 1048 | 1048 | 100.00 | 100.00 | MP | MP
1 | 1478 | 1 | 1478 | 1478 | 1478 | 99.46 | 1478 | 1478 | 100.00 | 100.00 | NA | NA
1 | 1579 | 1 | 1579 | 1579 | 1579 | 99.56 | 1582 | 1582 | 99.81 | 99.81 | NP | NP
1 | 908 | 1 | 908 | 908 | 908 | 99.34 | 908 | 908 | 100.00 | 100.00 | NS | NS
1 | 2253 | 1 | 2253 | 2253 | 2253 | 86.55 | 2253 | 2253 | 100.00 | 100.00 | PA | PA
1 | 2370 | 1 | 2370 | 2370 | 2370 | 94.30 | 2370 | 2370 | 100.00 | 100.00 | PB1 | PB1
1 | 2367 | 1 | 2367 | 2367 | 2367 | 90.83 | 2367 | 2367 | 100.00 | 100.00 | PB2 | PB2

MinION Blast Consensus vs MiSeq Blast Consensus:

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
1 | 1777 | 1 | 1777 | 1777 | 1777 | 98.48 | 1777 | 1777 | 100.00 | 100.00 | HA | HA
3 | 1029 | 5 | 1031 | 1027 | 1027 | 92.11 | 1029 | 1031 | 99.81 | 99.61 | MP | MP
1 | 1410 | 1 | 1410 | 1410 | 1410 | 99.57 | 1410 | 1410 | 100.00 | 100.00 | NA | NA
1 | 1497 | 1 | 1497 | 1497 | 1497 | 99.40 | 1497 | 1497 | 100.00 | 100.00 | NP | NP
1 | 863 | 1 | 863 | 863 | 863 | 99.54 | 863 | 863 | 100.00 | 100.00 | NS | NS
1 | 2233 | 1 | 2233 | 2233 | 2233 | 86.39 | 2233 | 2233 | 100.00 | 100.00 | PA | PA
1 | 2327 | 1 | 2327 | 2327 | 2327 | 93.51 | 2327 | 2327 | 100.00 | 100.00 | PB1 | PB1
4 | 2341 | 4 | 2341 | 2338 | 2338 | 90.12 | 2341 | 2341 | 99.87 | 99.87 | PB2 | PB2

So there are more reads mapped to the MiSeq but just as many variants... There's a slight improvement with the percent identity but nothing to write home about.

## Lab Strains

Since the MiSeq assemblies and the MinION consensus sequences (generated with the MiSeq assemblies) don't agree, a way to validate the methods is to test against the lab strain since we have the reference. Basically this is to see how similar the MiSeq assembly is to the reference sequence and then test the MinION consensus against the reference, from both the blast analysis and the MiSeq assembly consensus.

In [None]:
for i in $(ls *r1.fastq.gz | perl -pe 's/\.r1.+//g'); do trimmomatic PE -threads 6 $i.r1.fastq.gz $i.r2.fastq.gz \
$i.trimmed.r1.fastq $i.trimmed.se.r1.fastq $i.trimmed.r2.fastq $i.trimmed.se.r2.fastq \
ILLUMINACLIP:/usr/local/share/trimmomatic/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25; done

for i in *fasta; do bwa index $i; done

for i in h1n1 h3n2 flub; do bwa mem -t 6 -x intractg $i-ref.fasta $i-standard.trimmed.r1.fastq $i-standard.trimmed.r2.fastq \
> $i.sam && samtools view -b -o $i.bam $i.sam && samtools sort -o $i.sort.bam $i.bam && samtools index $i.sort.bam \
&& rm -f $i.sam $i.bam; done

for i in *bam; do python ~/projects/MinION-Flu-Analysis/scripts/downsampleAssembler.py \
-r quads-study-reference-sequence.fasta -b $i; done

cat H1N1-assembly.fasta H3N2-assembly.fasta FluB-assembly.fasta > all-miseq-assemblies.fasta

bwa mem -t 6 -x intractg quads-study-reference-sequence.fasta flu-mix-standard.trimmed.r1.fastq \
flu-mix-standard.trimmed.r2.fastq > flu-mix.sam && samtools view -b -o flu-mix.bam flu-mix.sam && \
samtools sort -o flu-mix.sort.bam flu-mix.bam && samtools index flu-mix.sort.bam && rm -f flu-mix.sam flu-mix.bam

# map MinION data to assemblies and generate consensus sequences

bwa mem -t 6 -x ont2d all-miseq-assemblies.fasta flu-4-20.fastq > flu-4-20.sam && samtools view -b -o flu-4-20.bam flu-4-20.sam \
&& samtools sort -o flu-4-20.sort.bam flu-4-20.bam && samtools index flu-4-20.sort.bam

for i in $(samtools idxstats flu-4-20.sort.bam | cut -f1 | grep -v '*'); do \
python ~/projects/MinION-Flu-Analysis/scripts/bamfile_consensus_generate.py flu-4-20.sort.bam $i; done > flu-4-20.consensus.fasta

nucmer all-miseq-assemblies.fasta flu-4-20.consensus.fasta
show-coords -lcdT <( delta-filter -q -l 500 out.delta) | perl -pe 's/\t/ | /g'

# original MinION consensus sequences - mapped to the reference sequences
scp zoidberg.bio.nyu.edu:/home/alan/projects/MinION-notebook/quads-analysis/minion/all-minion-4-20.consensus.fasta ./
nucmer all-miseq-assemblies.fasta all-minion-4-20.consensus.fasta
show-coords -lcdT <( delta-filter -q -l 500 out.delta) | perl -pe 's/\t/ | /g'

MinION to MiSeq Assemblies Consensus vs MiSeq Assemblies

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
1 | 1034 | 1 | 1034 | 1034 | 1034 | 99.52 | 1034 | 1034 | 100.00 | 100.00 | NODE_1_length_1034_cov_55.4653_ID_36 | NODE_1_length_1034_cov_55.4653_ID_36
1 | 1054 | 1 | 1054 | 1054 | 1054 | 99.15 | 1054 | 1054 | 100.00 | 100.00 | NODE_1_length_1054_cov_52.2665_ID_5 | NODE_1_length_1054_cov_52.2665_ID_5
1 | 909 | 1 | 909 | 909 | 909 | 99.01 | 1090 | 937 | 83.39 | 97.01 | NODE_1_length_1090_cov_44.5161_ID_5 | NODE_1_length_1090_cov_44.5161_ID_5
75 | 1145 | 8 | 1078 | 1071 | 1071 | 99.16 | 1145 | 1078 | 93.54 | 99.35 | NODE_1_length_1145_cov_60.6287_ID_5 | NODE_1_length_1145_cov_60.6287_ID_5
1 | 1217 | 1 | 1217 | 1217 | 1217 | 98.69 | 1217 | 1217 | 100.00 | 100.00 | NODE_1_length_1217_cov_56.7908_ID_5 | NODE_1_length_1217_cov_56.7908_ID_5
121 | 1269 | 1 | 1149 | 1149 | 1149 | 99.39 | 1269 | 1149 | 90.54 | 100.00 | NODE_1_length_1269_cov_51.0648_ID_5 | NODE_1_length_1269_cov_51.0648_ID_5
9 | 1520 | 1 | 1512 | 1512 | 1512 | 99.34 | 1520 | 1512 | 99.47 | 100.00 | NODE_1_length_1520_cov_51.8306_ID_5 | NODE_1_length_1520_cov_51.8306_ID_5
1 | 1581 | 1 | 1581 | 1581 | 1581 | 99.87 | 1581 | 1581 | 100.00 | 100.00 | NODE_1_length_1581_cov_57.4464_ID_5 | NODE_1_length_1581_cov_57.4464_ID_5
4 | 1583 | 4 | 1583 | 1580 | 1580 | 99.49 | 1583 | 1583 | 99.81 | 99.81 | NODE_1_length_1583_cov_51.6312_ID_5 | NODE_1_length_1583_cov_51.6312_ID_5
29 | 1676 | 26 | 1673 | 1648 | 1648 | 99.21 | 1676 | 1673 | 98.33 | 98.51 | NODE_1_length_1676_cov_52.7844_ID_5 | NODE_1_length_1676_cov_52.7844_ID_5
62 | 1883 | 1 | 1822 | 1822 | 1822 | 99.34 | 1883 | 1822 | 96.76 | 100.00 | NODE_1_length_1883_cov_55.9431_ID_5 | NODE_1_length_1883_cov_55.9431_ID_5
75 | 1893 | 1 | 1819 | 1819 | 1819 | 98.30 | 1893 | 1819 | 96.09 | 100.00 | NODE_1_length_1893_cov_51.9638_ID_5 | NODE_1_length_1893_cov_51.9638_ID_5
1 | 1906 | 1 | 1906 | 1906 | 1906 | 98.85 | 1906 | 1906 | 100.00 | 100.00 | NODE_1_length_1906_cov_50.5003_ID_5 | NODE_1_length_1906_cov_50.5003_ID_5
1 | 1988 | 1 | 1988 | 1988 | 1988 | 98.99 | 1988 | 1988 | 100.00 | 100.00 | NODE_1_length_1988_cov_51.8023_ID_3 | NODE_1_length_1988_cov_51.8023_ID_3
25 | 2307 | 1 | 2283 | 2283 | 2283 | 99.26 | 2307 | 2283 | 98.96 | 100.00 | NODE_1_length_2307_cov_53.339_ID_5 | NODE_1_length_2307_cov_53.339_ID_5
1 | 2268 | 1 | 2268 | 2268 | 2268 | 98.85 | 2330 | 2283 | 97.34 | 99.34 | NODE_1_length_2330_cov_43.5202_ID_5 | NODE_1_length_2330_cov_43.5202_ID_5
1 | 2356 | 1 | 2356 | 2356 | 2356 | 99.02 | 2357 | 2356 | 99.96 | 100.00 | NODE_1_length_2357_cov_53.6274_ID_5 | NODE_1_length_2357_cov_53.6274_ID_5
1 | 2360 | 1 | 2360 | 2360 | 2360 | 98.69 | 2361 | 2360 | 99.96 | 100.00 | NODE_1_length_2361_cov_58.5027_ID_5 | NODE_1_length_2361_cov_58.5027_ID_5
1 | 2391 | 1 | 2391 | 2391 | 2391 | 98.95 | 2391 | 2391 | 100.00 | 100.00 | NODE_1_length_2391_cov_52.3843_ID_72 | NODE_1_length_2391_cov_52.3843_ID_72
1 | 2348 | 1 | 2348 | 2348 | 2348 | 98.98 | 2395 | 2348 | 98.04 | 100.00 | NODE_1_length_2395_cov_53.6261_ID_5 | NODE_1_length_2395_cov_53.6261_ID_5
1 | 2412 | 1 | 2412 | 2412 | 2412 | 98.47 | 2412 | 2412 | 100.00 | 100.00 | NODE_1_length_2412_cov_51.4709_ID_5 | NODE_1_length_2412_cov_51.4709_ID_5
1 | 2383 | 1 | 2383 | 2383 | 2383 | 99.03 | 2453 | 2386 | 97.15 | 99.87 | NODE_1_length_2453_cov_52.9497_ID_5 | NODE_1_length_2453_cov_52.9497_ID_5
1 | 2392 | 1 | 2392 | 2392 | 2392 | 99.29 | 2461 | 2392 | 97.20 | 100.00 | NODE_1_length_2461_cov_50.4027_ID_61 | NODE_1_length_2461_cov_50.4027_ID_61
1 | 890 | 1 | 890 | 890 | 890 | 99.66 | 893 | 893 | 99.66 | 99.66 | NODE_1_length_893_cov_69.4517_ID_5 | NODE_1_length_893_cov_69.4517_ID_5

MinION to Reference Sequences Consensus vs MiSeq Assemblies

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
110 | 1867 | 1758 | 1 | 1758 | 1758 | 98.58 | 1906 | 1758 | 92.24 | 100.00 | NODE_1_length_1906_cov_50.5003_ID_5 | FluB-HA
107 | 1182 | 1076 | 1 | 1076 | 1076 | 98.42 | 1217 | 1076 | 88.41 | 100.00 | NODE_1_length_1217_cov_56.7908_ID_5 | FluB-MP
215 | 1615 | 1401 | 1 | 1401 | 1401 | 99.64 | 1676 | 1401 | 83.59 | 100.00 | NODE_1_length_1676_cov_52.7844_ID_5 | FluB-NA
1 | 1662 | 1662 | 1 | 1662 | 1662 | 98.98 | 1988 | 1683 | 83.60 | 98.75 | NODE_1_length_1988_cov_51.8023_ID_3 | FluB-NP
197 | 1223 | 1 | 1027 | 1027 | 1027 | 99.61 | 1269 | 1027 | 80.93 | 100.00 | NODE_1_length_1269_cov_51.0648_ID_5 | FluB-NS
106 | 2286 | 2181 | 1 | 2181 | 2181 | 99.40 | 2395 | 2181 | 91.06 | 100.00 | NODE_1_length_2395_cov_53.6261_ID_5 | FluB-PA
28 | 2286 | 1 | 2259 | 2259 | 2259 | 98.94 | 2391 | 2259 | 94.48 | 100.00 | NODE_1_length_2391_cov_52.3843_ID_72 | FluB-PB1
30 | 2342 | 1 | 2313 | 2313 | 2313 | 98.49 | 2412 | 2313 | 95.90 | 100.00 | NODE_1_length_2412_cov_51.4709_ID_5 | FluB-PB2
153 | 1853 | 1701 | 1 | 1701 | 1701 | 98.35 | 1893 | 1701 | 89.86 | 100.00 | NODE_1_length_1893_cov_51.9638_ID_5 | H1N1-HA
37 | 1018 | 1 | 982 | 982 | 982 | 99.29 | 1054 | 982 | 93.17 | 100.00 | NODE_1_length_1054_cov_52.2665_ID_5 | H1N1-MP
1 | 1008 | 1008 | 1 | 1008 | 1008 | 99.50 | 1034 | 1410 | 97.49 | 71.49 | NODE_1_length_1034_cov_55.4653_ID_36 | H1N1-NA
33 | 1529 | 1497 | 1 | 1497 | 1497 | 99.87 | 1581 | 1497 | 94.69 | 100.00 | NODE_1_length_1581_cov_57.4464_ID_5 | H1N1-NP
37 | 859 | 838 | 1 | 823 | 838 | 97.85 | 893 | 838 | 92.16 | 100.00 | NODE_1_length_893_cov_69.4517_ID_5 | H1N1-NS
122 | 2272 | 2151 | 1 | 2151 | 2151 | 99.54 | 2307 | 2151 | 93.24 | 100.00 | NODE_1_length_2307_cov_53.339_ID_5 | H1N1-PA
54 | 2327 | 2274 | 1 | 2274 | 2274 | 99.16 | 2357 | 2274 | 96.48 | 100.00 | NODE_1_length_2357_cov_53.6274_ID_5 | H1N1-PB1
40 | 2319 | 1 | 2280 | 2280 | 2280 | 99.74 | 2461 | 2280 | 92.65 | 100.00 | NODE_1_length_2461_cov_50.4027_ID_61 | H1N1-PB2
141 | 1841 | 1701 | 1 | 1701 | 1701 | 99.24 | 1883 | 1701 | 90.33 | 100.00 | NODE_1_length_1883_cov_55.9431_ID_5 | H3N2-HA
132 | 890 | 1 | 759 | 759 | 759 | 99.87 | 1145 | 759 | 66.29 | 100.00 | NODE_1_length_1145_cov_60.6287_ID_5 | H3N2-MP
80 | 1486 | 1407 | 1 | 1407 | 1407 | 99.43 | 1520 | 1407 | 92.57 | 100.00 | NODE_1_length_1520_cov_51.8306_ID_5 | H3N2-NA
54 | 1550 | 1 | 1497 | 1497 | 1497 | 99.53 | 1583 | 1497 | 94.57 | 100.00 | NODE_1_length_1583_cov_51.6312_ID_5 | H3N2-NP
181 | 873 | 693 | 1 | 693 | 693 | 98.85 | 1090 | 693 | 63.58 | 100.00 | NODE_1_length_1090_cov_44.5161_ID_5 | H3N2-NS
493 | 2181 | 1 | 1689 | 1689 | 1689 | 99.11 | 2330 | 1689 | 72.49 | 100.00 | NODE_1_length_2330_cov_43.5202_ID_5 | H3N2-PA
32 | 2305 | 1 | 2274 | 2274 | 2274 | 98.77 | 2361 | 2274 | 96.32 | 100.00 | NODE_1_length_2361_cov_58.5027_ID_5 | H3N2-PB1
39 | 2318 | 1 | 2280 | 2280 | 2280 | 99.12 | 2453 | 2280 | 92.95 | 100.00 | NODE_1_length_2453_cov_52.9497_ID_5 | H3N2-PB2

MiSeq Assemblies vs Reference Sequences

S1 | E1 | S2 | E2 | LEN 1 | LEN 2 | % IDY | LEN R | LEN Q | COV R | COV Q | TAGS
---|----|----|----|-------|-------|-------|-------|-------|-------|-------|-----
1 | 1008 | 1008 | 1 | 1008 | 1008 | 100.00 | 1410 | 1034 | 71.49 | 97.49 | H1N1-NA | NODE_1_length_1034_cov_55.4653_ID_36
1 | 982 | 37 | 1018 | 982 | 982 | 100.00 | 982 | 1054 | 100.00 | 93.17 | H1N1-MP | NODE_1_length_1054_cov_52.2665_ID_5
1 | 693 | 873 | 181 | 693 | 693 | 100.00 | 693 | 1090 | 100.00 | 63.58 | H3N2-NS | NODE_1_length_1090_cov_44.5161_ID_5
1 | 759 | 132 | 890 | 759 | 759 | 100.00 | 759 | 1145 | 100.00 | 66.29 | H3N2-MP | NODE_1_length_1145_cov_60.6287_ID_5
1 | 1076 | 1182 | 107 | 1076 | 1076 | 100.00 | 1076 | 1217 | 100.00 | 88.41 | FluB-MP | NODE_1_length_1217_cov_56.7908_ID_5
1 | 1027 | 197 | 1223 | 1027 | 1027 | 100.00 | 1027 | 1269 | 100.00 | 80.93 | FluB-NS | NODE_1_length_1269_cov_51.0648_ID_5
1 | 1407 | 1486 | 80 | 1407 | 1407 | 100.00 | 1407 | 1520 | 100.00 | 92.57 | H3N2-NA | NODE_1_length_1520_cov_51.8306_ID_5
1 | 1497 | 1529 | 33 | 1497 | 1497 | 100.00 | 1497 | 1581 | 100.00 | 94.69 | H1N1-NP | NODE_1_length_1581_cov_57.4464_ID_5
1 | 1497 | 54 | 1550 | 1497 | 1497 | 99.93 | 1497 | 1583 | 100.00 | 94.57 | H3N2-NP | NODE_1_length_1583_cov_51.6312_ID_5
1 | 1401 | 1615 | 215 | 1401 | 1401 | 100.00 | 1401 | 1676 | 100.00 | 83.59 | FluB-NA | NODE_1_length_1676_cov_52.7844_ID_5
1 | 1701 | 1841 | 141 | 1701 | 1701 | 99.94 | 1701 | 1883 | 100.00 | 90.33 | H3N2-HA | NODE_1_length_1883_cov_55.9431_ID_5
1 | 1701 | 1853 | 153 | 1701 | 1701 | 99.88 | 1701 | 1893 | 100.00 | 89.86 | H1N1-HA | NODE_1_length_1893_cov_51.9638_ID_5
1 | 1758 | 1867 | 110 | 1758 | 1758 | 100.00 | 1758 | 1906 | 100.00 | 92.24 | FluB-HA | NODE_1_length_1906_cov_50.5003_ID_5
1 | 1662 | 1662 | 1 | 1662 | 1662 | 99.94 | 1683 | 1988 | 98.75 | 83.60 | FluB-NP | NODE_1_length_1988_cov_51.8023_ID_3
1 | 2151 | 2272 | 122 | 2151 | 2151 | 100.00 | 2151 | 2307 | 100.00 | 93.24 | H1N1-PA | NODE_1_length_2307_cov_53.339_ID_5
1 | 1689 | 493 | 2181 | 1689 | 1689 | 100.00 | 1689 | 2330 | 100.00 | 72.49 | H3N2-PA | NODE_1_length_2330_cov_43.5202_ID_5
1 | 2274 | 2327 | 54 | 2274 | 2274 | 100.00 | 2274 | 2357 | 100.00 | 96.48 | H1N1-PB1 | NODE_1_length_2357_cov_53.6274_ID_5
1 | 2274 | 32 | 2305 | 2274 | 2274 | 99.91 | 2274 | 2361 | 100.00 | 96.32 | H3N2-PB1 | NODE_1_length_2361_cov_58.5027_ID_5
1 | 2259 | 28 | 2286 | 2259 | 2259 | 100.00 | 2259 | 2391 | 100.00 | 94.48 | FluB-PB1 | NODE_1_length_2391_cov_52.3843_ID_72
1 | 2181 | 2286 | 106 | 2181 | 2181 | 100.00 | 2181 | 2395 | 100.00 | 91.06 | FluB-PA | NODE_1_length_2395_cov_53.6261_ID_5
1 | 2313 | 30 | 2342 | 2313 | 2313 | 100.00 | 2313 | 2412 | 100.00 | 95.90 | FluB-PB2 | NODE_1_length_2412_cov_51.4709_ID_5
1 | 2280 | 39 | 2318 | 2280 | 2280 | 100.00 | 2280 | 2453 | 100.00 | 92.95 | H3N2-PB2 | NODE_1_length_2453_cov_52.9497_ID_5
1 | 2280 | 40 | 2319 | 2280 | 2280 | 100.00 | 2280 | 2461 | 100.00 | 92.65 | H1N1-PB2 | NODE_1_length_2461_cov_50.4027_ID_61
1 | 838 | 859 | 37 | 838 | 823 | 98.21 | 838 | 893 | 100.00 | 92.16 | H1N1-NS | NODE_1_length_893_cov_69.4517_ID_5

Based on these results it seems like the assembly is perfect and there's nothing to worry about there. However, the MinION data are just too noisy to get an accurate consensus sequence from these. Error correction may work but I don't know if we can afford to go down that rabbit hole. 

One thing that I can do just to see the efficacy of the method, is to blast the MinION lab sequences and do the whole alignment pipeline with that, though I don't see the point since the clinical results are as bad as they are, even when aligning to the MiSeq assemblies!