# Identifying Outlier Loci with Bayescan
#### stacks batch 8

This notebook contains procedures and code used to identify outliers in the final filtered genepop file. This includes:

1. East v. West
2. West, south / west coast


<br>

**Programs used: **
<br>
[Github](https://github.com/whitlock/OutFLANK/blob/master/R/OutFLANK.R) 

[PDF Manual](https://github.com/whitlock/OutFLANK/blob/master/OutFLANK%20readme.pdf)


<br>

### (1) Convert Genepop file to OutFLANK file format.
Luckily, OutFLANK has a nice R function for this. However, you still need to manipulate your Genepop file to a certain file format to put it into that R function. the following python script will take a genepop file and a population map, and output three of the inputs for the OutFLANK function `MakeDiploidFSTMat()`. This is: 
1. a file containing a matrix of individuals (rows) x loci (columns) without headings. Alleles are coded in a `0`,`1`, `2`, `9` format. 
2. a file where each locus name is on a new line, as a string. This can be read directly into R as a list
3. a file where each sample's population name is on a new line (same order as matrix rows). This can also be read directly into R as a list. 

<br>
The script below has one issue: it only works if all loci are coded with `0101` / `0202` genotypes. This is NOT the case when stacks called genotypes. As a result, I used an extra R script to convert my stacks genotypes to the `0101/0202` format. 



In [1]:
pwd

u'/mnt/hgfs/PCod-Compare-repo/notebooks'

In [2]:
cd ../analyses/outliers

/mnt/hgfs/PCod-Compare-repo/analyses/outliers


In [3]:
!python convert_genepop_to_SNPmat.py -h

usage: convert_genepop_to_SNPmat.py [-h] [-i INPUT] [-p POPMAP] [-o OUTPUT]
                                    [-ol OUTLOCUSNAMES] [-op OUTPOPNAMES]

produce SNPmat file, and files containing loci / population lists for OutFLANK
outlier analysis.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        genepop file that you want to run through OutFLANK
  -p POPMAP, --popmap POPMAP
                        population map from stacks (each line has sample - tab
                        - population
  -o OUTPUT, --output OUTPUT
                        bash shell script file name. must have file extension
                        .sh
  -ol OUTLOCUSNAMES, --outLocusNames OUTLOCUSNAMES
                        text file with the name of each locus on each line, to
                        be read into R
  -op OUTPOPNAMES, --outPopNames OUTPOPNAMES
                        text file with the name of each samp

In [21]:
!python convert_genepop_to_SNPmat.py \
-i OutFLANK/batch_8_final_filtered_aligned_genepop_forOutflank_eastwest_bypop.txt \
-p ../../scripts/PopMap_EastWest_bypop.txt \
-o OutFLANK/batch_8_eastwest_SNPmat.txt \
-ol OutFLANK/batch_8_eastwest_SNPmat_locusnames.txt \
-op OutFLANK/batch_8_eastwest_SNPmat_popnames.txt

Subsample of loci 2018-03-23 13:45:35

Done creating SNPmat file.


In [18]:
!python convert_genepop_to_SNPmat.py \
-i OutFLANK/batch_8_final_filtered_aligned_genepop_forOutflank_west.txt \
-p ../../scripts/PopMap_West_bypop.txt \
-o OutFLANK/batch_8_west_bypop_SNPmat.txt \
-ol OutFLANK/batch_8_west_bypop_SNPmat_locusnames.txt \
-op OutFLANK/batch_8_west_bypop_SNPmat_popnames.txt

Subsample of loci 2018-03-23 13:45:35

Done creating SNPmat file.


In [19]:
!python convert_genepop_to_SNPmat.py \
-i OutFLANK/batch_8_final_filtered_aligned_genepop_forOutflank_east.txt \
-p ../../scripts/PopMap_EastCoastal.txt \
-o OutFLANK/batch_8_east_SNPmat.txt \
-ol OutFLANK/batch_8_east_SNPmat_locusnames.txt \
-op OutFLANK/batch_8_east_SNPmat_popnames.txt

Subsample of loci 2018-03-23 13:45:35

Done creating SNPmat file.


<br>


**(2) Run OutFLANK and produce summary file containing outliers.** I used [this R script](https://github.com/mfisher5/PCod-Korea-repo/blob/master/analyses/R/OutFLANK_KorPCod_MF.R), which is well annotated. 
<br>


<br>
### OutFLANK output

**East v. West**: found NO OUTLIERS
<br>
**West**: found 17 outliers
<br>
**East Coastal**: found 64 outliers


<br>
### lesson from OutFLANK:
When I ran the western sites as two populations (south, west), I didn't get any outliers. For OutFLANK to detect outliers, I had to go back to my "PopMap" files that I made for the SNPmat file conversions above, and have each sampling site as its own populations. 