## Evaluating whether Bowtie and Bowtie2 work differently in filtering stage

**20170515**

Taken from [the Bowtie2 FAQ web page](http://bowtie-bio.sourceforge.net/bowtie2/faq.shtml), the main differences between Bowtie and Bowtie2 are:
* Bowtie 2 fully supports gapped alignment with affine gap penalties. Number of gaps and gap lengths are not restricted, except via the user-supplied scoring scheme. Bowtie 1 only finds ungapped alignments.
* For reads longer than about 50 bp Bowtie 2 is generally faster, more sensitive, and uses less memory than Bowtie * For relatively short reads (e.g. less than 50 bp) Bowtie 1 is sometimes faster and/or more sensitive.
* Bowtie 2 supports a "local" alignment mode, which doesn't require that reads align end-to-end. This produces alignments that might be "trimmed" (or "soft clipped") at one or both extremes in a way that optimizes alignment score. Bowtie 2 also supports an "end-to-end" alignment mode which, like Bowtie 1, requires that the read align entirely.
* There is no upper limit on read length in Bowtie 2. Bowtie 1 had an upper limit of around 1000 bp.
* Bowtie 2 does away with Bowtie 1's notion of alignment "stratum". In Bowtie 2 all alignments lie along a continuous spectrum of alignment scores.
* There is no longer a distinction between "end-to-end" and "Maq-like" modes as in Bowtie 1. There is just one scoring scheme, similar to Needleman-Wunsch and Smith-Waterman.
* Bowtie 2's paired-end alignment mode is more flexible than Bowtie 1's. For example, for pairs that do not align in a paired fashion, it will attempt to find unpaired alignments for each mate.
* Bowtie 2 does not align colorspace reads.

Given that Bowtie2 fully supports gapped alignments, I think it's worth evaluating how much this affects the filtering stage in our pipeline, especially given the amount of genetic variation and indels in my data compared to that of my peers.

I verified that the output SAM file formats are about the same, so I'm going to run the Bowtie filtering steps in this notebook using both Bowtie and Bowtie2 to compare results.

### Bowtie2

I'm going to use the same bowtie index and fasta file of loci to make an alignment with Bowtie2, and compare results to those in the Bowtie filtering step of this [CRAGIG_RUN1 notebook](https://github.com/nclowell/RAD_Scallops/blob/master/CRAGIG_run1/Notebooks/data_processing/CRAGIG_RUN1.ipynb).


In [10]:
cd /mnt/hgfs/SHARED_FOLDER/WorkingFolder/Bowtie/bowtie-1.2

/mnt/hgfs/SHARED_FOLDER/WorkingFolder/Bowtie/bowtie-1.2


In [11]:
# f refers to input file format as fasta
# v refers to the number of mismatches allowed
# --sam refers to the option to print alignments in SAM format
# --sam-nohead to repress header
!bowtie2 -f -v 3 --sam --sam-nohead \
batch_100 \
seqsforBOWTIE.fa \
batch_100_BOWTIEout_bowtie2_test.sam

(INFO): Before arg handling:
(INFO):   Wrapper args:
[  ]
(INFO):   Binary args:
[ -f -v 3 --sam --sam-nohead batch_100 seqsforBOWTIE.fa batch_100_BOWTIEout_bowtie2_test.sam ]
(INFO): After arg handling:
(INFO):   Binary args:
[ -f -v 3 --sam --sam-nohead batch_100 seqsforBOWTIE.fa batch_100_BOWTIEout_bowtie2_test.sam ]
(INFO): Cannot find any index option (--reference-string, --ref-string or -x) in the given command line.
(INFO): Using the small index (1.1.bt2).
(INFO): "/usr/bin/bowtie2-align-s" --wrapper basic-0 -f -v 3 --sam --sam-nohead batch_100 seqsforBOWTIE.fa batch_100_BOWTIEout_bowtie2_test.sam
Bowtie 2 version 2.2.6 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: 
  bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]

  <bt2-idx>  Index filename prefix (minus trailing .X.bt2).
             NOTE: Bowtie 1 and Bowtie 2 indexes are not compatible.
  <m1>       Files with #1 mates, paired with files in <m2>.
             C