## MinION Blast Results Comparison

The purpose of this analysis is to see whether or not different subtypes mixed in the same pool are contaminating the results of the Flu blast queries. That is to say, does an H1N1 MinION read yield any other subtype as a blast hit? This is addressing the unlikely possibility that there are similarities between the different subtypes sequenced that lead to strains that are not actually represented in the sample, thus the erroneous consensus sequences.

To do this, the MinION reads will be segregated out via alignment to MiSeq consensus sequences. From there, a blast analysis will be conducted on a subtype-specific subset of reads (maybe even segment-specific?) to identify possible overlap between the different subtypes. From there, consensus sequences will be generated from those blast results and compared to the MiSeq consensus sequences (as well as other blast results with an entire flowcell's worth of data) to interrogate the discrepancies.

In [None]:
cd /home/alan/projects/MinION-notebook/clinical-analysis/blast-result-comparison

# using data from the consensus comparison analysis to segregate the reads prior to blasting.
samtools view -b -o flu-11-9.2d.mapped-only.bam -F 4 flu-11-9.2d.sort.bam
samtools fasta flu-11-9.2d.mapped-only.bam > flu-11-9.2d.mapped-only.fasta

blastn -db ~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta -query flu-11-9.2d.mapped-only.fasta -out flu-11-9.2d.mapped-only.blastn.xml -num_threads 40 -evalue 0.00005 -outfmt 5 -culling_limit 2 -max_target_seqs 1

In [18]:
cd /home/alan/projects/MinION-notebook/clinical-analysis/blast-result-comparison

python ~/projects/MinION-notebook/scripts/read-fludb-blastxml.py flu-11-9.2d.mapped-only.blastn.xml | cut -f2,3 | sort | uniq -c

    352 FluB	HA
     83 FluB	MP
    257 FluB	NA
     24 FluB	NP
    179 FluB	NS
    169 FluB	PA
     57 FluB	PB1
    232 FluB	PB2
      1 H1N1	NA
      1 H1N1	NS
     22 H1N1	PB1


There are 24 hits to H1N1 that were classified as FluB based on the alignment. All but two hits are to H1N1 PB1 - NA and NS are the exceptions. This is not nearly enough to cause a discrepancy in the consensus generation. There are 218 reads that mapped to the PB1 segment so 24 shouldn't blip on the radar. However, based on the blast results from the consensus comparison this could have some effect on the PB1 region, though that would require more that two sequences hitting the same sequence, which isn't the case here. I think we can claim that this does not have an affect on the consensus generation for FluB.

With this same methodology, I'm going to investigate whether the segments add to any cross contamination.

In [10]:
seg='NP'

cd /home/alan/projects/MinION-notebook/clinical-analysis/blast-result-comparison/segment-cross-contamination

samtools view flu-11-9.2d.mapped-only.bam | grep $seg | cut -f1,10 | perl -pe 's/^(.+)\t(.+)/>$1\n$2/g' > flu-11-9.2d.mapped-only.$seg.fasta

blastn -db ~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta -query flu-11-9.2d.mapped-only.$seg.fasta -out flu-11-9.2d.mapped-only.$seg.blastn.xml -outfmt 5 -evalue 0.00005 -culling_limit 2 -max_target_seqs 1 -num_threads 40

python ~/projects/MinION-notebook/scripts/read-fludb-blastxml.py flu-11-9.2d.mapped-only.$seg.blastn.xml | cut -f3 | sort | uniq -c

     24 NP
      1 PA


In [11]:
seg='PB2'

cd /home/alan/projects/MinION-notebook/clinical-analysis/blast-result-comparison/segment-cross-contamination

samtools view flu-11-9.2d.mapped-only.bam | grep $seg | cut -f1,10 | perl -pe 's/^(.+)\t(.+)/>$1\n$2/g' > flu-11-9.2d.mapped-only.$seg.fasta

blastn -db ~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta -query flu-11-9.2d.mapped-only.$seg.fasta -out flu-11-9.2d.mapped-only.$seg.blastn.xml -outfmt 5 -evalue 0.00005 -culling_limit 2 -max_target_seqs 1 -num_threads 40

python ~/projects/MinION-notebook/scripts/read-fludb-blastxml.py flu-11-9.2d.mapped-only.$seg.blastn.xml | cut -f3 | sort | uniq -c

      2 HA
      1 NS
     23 PB1
    242 PB2


In [12]:
seg='PB1'

cd /home/alan/projects/MinION-notebook/clinical-analysis/blast-result-comparison/segment-cross-contamination

samtools view flu-11-9.2d.mapped-only.bam | grep $seg | cut -f1,10 | perl -pe 's/^(.+)\t(.+)/>$1\n$2/g' > flu-11-9.2d.mapped-only.$seg.fasta

blastn -db ~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta -query flu-11-9.2d.mapped-only.$seg.fasta -out flu-11-9.2d.mapped-only.$seg.blastn.xml -outfmt 5 -evalue 0.00005 -culling_limit 2 -max_target_seqs 1 -num_threads 40

python ~/projects/MinION-notebook/scripts/read-fludb-blastxml.py flu-11-9.2d.mapped-only.$seg.blastn.xml | cut -f3 | sort | uniq -c

     57 PB1


In [13]:
seg='NS'

cd /home/alan/projects/MinION-notebook/clinical-analysis/blast-result-comparison/segment-cross-contamination

samtools view flu-11-9.2d.mapped-only.bam | grep $seg | cut -f1,10 | perl -pe 's/^(.+)\t(.+)/>$1\n$2/g' > flu-11-9.2d.mapped-only.$seg.fasta

blastn -db ~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta -query flu-11-9.2d.mapped-only.$seg.fasta -out flu-11-9.2d.mapped-only.$seg.blastn.xml -outfmt 5 -evalue 0.00005 -culling_limit 2 -max_target_seqs 1 -num_threads 40

python ~/projects/MinION-notebook/scripts/read-fludb-blastxml.py flu-11-9.2d.mapped-only.$seg.blastn.xml | cut -f3 | sort | uniq -c

    182 NS


In [14]:
seg='NA'

cd /home/alan/projects/MinION-notebook/clinical-analysis/blast-result-comparison/segment-cross-contamination

samtools view flu-11-9.2d.mapped-only.bam | grep $seg | cut -f1,10 | perl -pe 's/^(.+)\t(.+)/>$1\n$2/g' > flu-11-9.2d.mapped-only.$seg.fasta

blastn -db ~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta -query flu-11-9.2d.mapped-only.$seg.fasta -out flu-11-9.2d.mapped-only.$seg.blastn.xml -outfmt 5 -evalue 0.00005 -culling_limit 2 -max_target_seqs 1 -num_threads 40

python ~/projects/MinION-notebook/scripts/read-fludb-blastxml.py flu-11-9.2d.mapped-only.$seg.blastn.xml | cut -f3 | sort | uniq -c 

    259 NA


In [15]:
seg='PA'

cd /home/alan/projects/MinION-notebook/clinical-analysis/blast-result-comparison/segment-cross-contamination

samtools view flu-11-9.2d.mapped-only.bam | grep $seg | cut -f1,10 | perl -pe 's/^(.+)\t(.+)/>$1\n$2/g' > flu-11-9.2d.mapped-only.$seg.fasta

blastn -db ~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta -query flu-11-9.2d.mapped-only.$seg.fasta -out flu-11-9.2d.mapped-only.$seg.blastn.xml -outfmt 5 -evalue 0.00005 -culling_limit 2 -max_target_seqs 1 -num_threads 40

python ~/projects/MinION-notebook/scripts/read-fludb-blastxml.py flu-11-9.2d.mapped-only.$seg.blastn.xml | cut -f3 | sort | uniq -c

      1 HA
      1 NP
    170 PA


In [16]:
seg='HA'

cd /home/alan/projects/MinION-notebook/clinical-analysis/blast-result-comparison/segment-cross-contamination

samtools view flu-11-9.2d.mapped-only.bam | grep $seg | cut -f1,10 | perl -pe 's/^(.+)\t(.+)/>$1\n$2/g' > flu-11-9.2d.mapped-only.$seg.fasta

blastn -db ~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta -query flu-11-9.2d.mapped-only.$seg.fasta -out flu-11-9.2d.mapped-only.$seg.blastn.xml -outfmt 5 -evalue 0.00005 -culling_limit 2 -max_target_seqs 1 -num_threads 40

python ~/projects/MinION-notebook/scripts/read-fludb-blastxml.py flu-11-9.2d.mapped-only.$seg.blastn.xml | cut -f3 | sort | uniq -c

    352 HA
      3 MP
      1 NA
      1 PA


In [17]:
seg='MP'

cd /home/alan/projects/MinION-notebook/clinical-analysis/blast-result-comparison/segment-cross-contamination

samtools view flu-11-9.2d.mapped-only.bam | grep $seg | cut -f1,10 | perl -pe 's/^(.+)\t(.+)/>$1\n$2/g' > flu-11-9.2d.mapped-only.$seg.fasta

blastn -db ~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta -query flu-11-9.2d.mapped-only.$seg.fasta -out flu-11-9.2d.mapped-only.$seg.blastn.xml -outfmt 5 -evalue 0.00005 -culling_limit 2 -max_target_seqs 1 -num_threads 40

python ~/projects/MinION-notebook/scripts/read-fludb-blastxml.py flu-11-9.2d.mapped-only.$seg.blastn.xml | cut -f3 | sort | uniq -c

      2 HA
     85 MP


Very little cross contamination. PB2 has the worst, but still nothing that is even close to affecting the consensus. 

## Blast MiSeq

To identify whether or not the samples we have are mixed infections I'm going to blast thes MiSeq reads to see if that will shed light on anything.

In [None]:
cd /home/alan/projects/MinION-notebook/miseq-blast/flub

python ~/projects/MinION-notebook/MinION-Flu-Analysis/scripts/interleave-fastq.py f3.r1.fastq f3.r2.fastq > f3.interleaved.fasta

perl ~/projects/MinION-notebook/MinION-Flu-Analysis/scripts/1line-fasta.pl f3.interleaved.fastq > f3.interleaved.fasta

blastn \
-db ~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta \
-query f3.interleaved.fasta \
-out f3.interleaved.fludb.blastn.xml \
-outfmt 5 \
-num_threads 25 \
-culling_limit 2 \
-max_target_seqs 1 \
-evalue 0.00005

# python ~/projects/MinION-notebook/MinION-Flu-Analysis/scripts/read-fludb-blastxml.py \
# f3.interleaved.fludb.blastn.xml \
# > f3.interleaved.fludb.blastn.seg-counts.tsv

The blast suggested that there were ~0.1% FluA reads in the sample which suggests that this is not the source of noise that is seen in the consensus generation of FluB so must be something else. 

There were 1314651 total hits, 1942 being FluA.

I was concerned about there being potential overlap between FluA and FluB sequences due to these results. However, I would expect there to one part of one or two segments to be similar instead of all segments. These results suggest that this is either:
A) an actual mixed infection
B) cross contamination on the prep side
C) erroneous demuxing (most probable, IMO). I will check this out.

182 1
165 2
231 3
329 4
247 5
151 6
412 7
225 8

Also, there may be adaptor readthrough...

To assess whether this is due to demuxing I'm going to blast the lab strain of FluB and see if there's any FluA in there.

Also, after talking to Adam, we thought it would be a good idea to test the plasmid to see if there's any cross contamination of other segments. This'll have to be run on the nt database presumably to avoid overfitting? We'll have to see. I'll run it on the FluDB for now

In [None]:
cd /home/alan/projects/MinION-notebook/miseq-blast/lab-strain-clinical-flub

#from khmer 
interleave-reads.py flub-standard.r1.fastq.gz flub-standard.r2.fastq.gz > flub-standard.fastq

#from fastx
fastq_to_fasta -n -i flub-standard.fastq -o flub-standard.fasta

blastn \
-db ~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta \
-query flub-standard.fasta \
-out flub-standard.fludb.blastn.xml \
-outfmt 5 \
-num_threads 25
-culling_limit 2 \
-max_target_seqs 1 \
-evalue 0.00005

In [None]:
cd /home/alan/projects/MinION-notebook/miseq-blast/plasmid

#from khmer 
interleave-reads.py plasmid.r1.fastq.gz plasmid.r2.fastq.gz > plasmid.fastq

#from fastx
fastq_to_fasta -n -i plasmid.fastq -o plasmid.fasta

blastn \
-db ~/projects/MinION-notebook/clinical-analysis/fludb/all-H1N1-H3N2-FluB-full-segs.fasta \
-query plasmid.fasta \
-out plasmid.fludb.blastn.xml \
-outfmt 5 \
-num_threads 15
-culling_limit 2 \
-max_target_seqs 1 \
-evalue 0.00005