### Running in Docker container on Ostrich

#### Started Docker container with the following command:

```docker run -p 8888:8888 -v /Users/sam/data/:/data -v /Users/sam/owl_home/:/owl_home -v /Users/sam/owl_web/:/owl_web -v /Users/sam/gitrepos:/gitrepos -it f99537d7e06a```

The command allows access to Jupyter Notebook over port 8888 and makes my Jupyter Notebook GitHub repo and my data files on Owl/home and Owl/web accessible to the Docker container.

Once the container was started, started Jupyter Notebook with the following command inside the Docker container:

```jupyter notebook```

This is configured in the Docker container to launch a Jupyter Notebook without a browser on port 8888.

The Docker container is running on an image created from this [Dockerfile (Git commit 443bc42)](https://github.com/sr320/LabDocs/blob/443bc425cd36d23a07cf12625f38b7e3a397b9be/code/dockerfiles/Dockerfile.bio)

In [1]:
%%bash
date

Fri Dec  2 05:06:52 UTC 2016


### Check computer specs

In [2]:
%%bash
hostname

4bd1957ce190


In [3]:
%%bash
lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    8
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 26
Model name:            Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
Stepping:              5
CPU MHz:               2260.998
BogoMIPS:              4521.99
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K


### Load Rmagics capability for Jupyter Notebook

In [2]:
%load_ext rpy2.ipython

### View format of Fst file comparing all three populations

In [6]:
%%bash
head /data/20161117_oly_gbs_vcf_analysis/1HL_1NF_1SN.fst

CHROM	POS	WEIR_AND_COCKERHAM_FST
1	8	-0.00721894
7	26	-nan
7	126	0.0330309
13	12	-0.0931512
13	38	-0.0296922
13	43	-0.0246937
13	49	-0.0156452
13	51	-0.0156452
13	57	-0.0156452


#### Negative Fst values are essentially the equivalent to a Fst value = 0 (the Fst scale runs from 0 -1; it's a correlation measurement). 

#### For calculation purposes (and, from following [Katherine Silliman's notebook](https://github.com/ksil91/2016_Notebook/blob/master/2bRAD%20Subset%20Population%20Structure%20Analysis.ipynb)), I'll convert all negative Fst values to 0.

### Calculate mean Fst values between populations

#### The cells below use R to calculate the mean Fst values.

#### First, the .fst file is stored into a variable

#### Second, R evaluates all Fst values less than 0 and converts those values to 0

#### Third, R calculates and prints the mean Fst value for the populations being compared

### Mean Fst between all three populations

In [8]:
%%R
all_fst <- read.table("/data/20161117_oly_gbs_vcf_analysis/1HL_1NF_1SN.fst",header = TRUE)
all_fst[which(all_fst$WEIR_AND_COCKERHAM_FST < 0),3] <- 0
all_fst

       CHROM POS WEIR_AND_COCKERHAM_FST
1          1   8            0.00000e+00
2          7  26                    NaN
3          7 126            3.30309e-02
4         13  12            0.00000e+00
5         13  38            0.00000e+00
6         13  43            0.00000e+00
7         13  49            0.00000e+00
8         13  51            0.00000e+00
9         13  57            0.00000e+00
10        13  60            0.00000e+00
11        13 136            0.00000e+00
12        13 137            0.00000e+00
13        13 149            0.00000e+00
14        13 151            0.00000e+00
15        17 159            0.00000e+00
16        21   3            1.39247e-02
17        21   9            1.39247e-02
18        21  18            1.39715e-02
19        21  28            1.39715e-02
20        21  36            1.39715e-02
21        21  37            2.89234e-02
22        21  52            1.39715e-02
23        21  59            1.39715e-02
24        21  68            3.23994e-03


In [9]:
%%R
print(paste("Mean FST across these loci:",mean(all_fst$WEIR_AND_COCKERHAM_FST, na.rm= TRUE),sep=" "))

[1] "Mean FST across these loci: 0.139539326187024"


### Mean Fst between HL and NF populations

In [5]:
%%R
1HL_NF_fst <- read.table("/data/20161117_oly_gbs_vcf_analysis/1HL_NF.fst",header = TRUE)
1HL_NF_fst[which(1HL_NF_fst$WEIR_AND_COCKERHAM_FST < 0),3] <- 0
1HL_NF_fst


Error in (function (file = "", n = NULL, text = NULL, prompt = "?", keep.source = getOption("keep.source"),  : 
  <text>:1:15: unexpected symbol
1: withVisible({1HL_NF_fst
                  ^


##### Try variables that don't start with a number?

In [6]:
%%R
HL_NF_fst <- read.table("/data/20161117_oly_gbs_vcf_analysis/1HL_NF.fst",header = TRUE)
HL_NF_fst[which(HL_NF_fst$WEIR_AND_COCKERHAM_FST < 0),3] <- 0
print(paste("Mean FST across these loci:",mean(HL_NF_fst$WEIR_AND_COCKERHAM_FST, na.rm= TRUE),sep=" "))

[1] "Mean FST across these loci: 0.143075552548742"


### Mean Fst between HL and SN populations

In [7]:
%%R
HL_SN_fst <- read.table("/data/20161117_oly_gbs_vcf_analysis/1HL_SN.fst",header = TRUE)
HL_SN_fst[which(HL_SN_fst$WEIR_AND_COCKERHAM_FST < 0),3] <- 0
print(paste("Mean FST across these loci:",mean(HL_SN_fst$WEIR_AND_COCKERHAM_FST, na.rm= TRUE),sep=" "))

[1] "Mean FST across these loci: 0.155234939276722"


### Mean Fst between NF and SN populations

In [8]:
%%R
NF_SN_fst <- read.table("/data/20161117_oly_gbs_vcf_analysis/1NF_1SN.fst",header = TRUE)
NF_SN_fst[which(NF_SN_fst$WEIR_AND_COCKERHAM_FST < 0),3] <- 0
print(paste("Mean FST across these loci:",mean(NF_SN_fst$WEIR_AND_COCKERHAM_FST, na.rm= TRUE),sep=" "))

[1] "Mean FST across these loci: 0.117889300124951"


---
#### It's important to note that the mean Fst values calculated above are actually higher than they would be if we used the raw data.

#### We converted all negative Fst values to 0; this results in higher mean Fst. Not sure if that is the correct approach.

#### Would like to discuss with Katherine to get her thoughts on why she made that conversion and how she thinks it impacts this analysis.

### Identify loci with Fst > 0.4

#### Per [Katherine's notebook](https://github.com/ksil91/2016_Notebook/blob/master/2bRAD%20Subset%20Population%20Structure%20Analysis.ipynb): "Which SNPs have elevated FST (> 0.4)? These can be putative "outlier loci""

#### All populations

In [10]:
%%R
all_fst <- read.table("/data/20161117_oly_gbs_vcf_analysis/1HL_1NF_1SN.fst",header = TRUE)
all_fst_high <-all_fst[which(all_fst$WEIR_AND_COCKERHAM_FST > 0.4),]
all_fst_high

       CHROM POS WEIR_AND_COCKERHAM_FST
27        31  13               1.000000
28        31  41               1.000000
29        31  45               1.000000
30        31  47               1.000000
31        31  54               1.000000
32        31  59               1.000000
33        31  63               1.000000
35        35  12               0.451931
36        35 100               0.401453
37        35 138               0.401453
42        41  21               0.622865
43        41  26               0.663462
44        41  38               0.663462
45        41  46               0.490196
46        41  51               0.622865
47        41  55               0.503226
48        41  58               0.663462
49        41  59               0.490196
51        41  63               0.622865
52        41  64               0.431555
53        41  76               0.622865
56        41 107               0.653465
57        41 111               0.653465
58        41 117               0.503226


#### HL and NF populations

In [13]:
%%R
HL_NF_fst_high <-HL_NF_fst[which(HL_NF_fst$WEIR_AND_COCKERHAM_FST > 0.4),]
HL_NF_fst_high

       CHROM POS WEIR_AND_COCKERHAM_FST
26        29  48               0.500000
27        31  13               1.000000
28        31  41               1.000000
29        31  45               1.000000
30        31  47               1.000000
31        31  54               1.000000
32        31  59               1.000000
33        31  63               1.000000
35        35  12               0.665302
36        35 100               0.590226
37        35 138               0.590226
38        37  52               0.600000
39        39   4               0.490360
40        39  12               0.490360
41        39  48               0.612903
56        41 107               0.666667
57        41 111               0.666667
59        41 134               0.666667
70        53 108               0.458268
91        65  23               1.000000
93        65  55               1.000000
140       91  34               0.455253
141       91  48               0.455253
148      119  14               0.666667


In [16]:
%%R
nrow(HL_NF_fst_high)

[1] 10347


#### HL and SN populations

In [14]:
%%R
HL_SN_fst_high <-HL_SN_fst[which(HL_SN_fst$WEIR_AND_COCKERHAM_FST > 0.4),]
HL_SN_fst_high

       CHROM POS WEIR_AND_COCKERHAM_FST
42        41  21               0.734807
43        41  26               0.734807
44        41  38               0.734807
45        41  46               0.627907
46        41  51               0.734807
47        41  55               0.589177
48        41  58               0.734807
49        41  59               0.627907
51        41  63               0.734807
52        41  64               0.445087
53        41  76               0.734807
58        41 117               0.589177
131       87   9               0.600000
132       87  12               0.600000
133       87  14               0.600000
134       87  18               0.600000
135       87  22               0.600000
136       87  28               0.600000
137       87  53               0.600000
138       87  90               0.600000
148      119  14               0.750000
149      119  27               0.750000
151      119  44               0.750000
154      125  36               0.428571


In [17]:
%%R
nrow(HL_SN_fst_high)

[1] 11685


#### NF and SN populations

In [15]:
%%R
NF_SN_fst_high <-NF_SN_fst[which(NF_SN_fst$WEIR_AND_COCKERHAM_FST > 0.4),]
NF_SN_fst_high

       CHROM POS WEIR_AND_COCKERHAM_FST
42        41  21               0.818182
46        41  51               0.818182
51        41  63               0.818182
52        41  64               0.818182
53        41  76               0.818182
56        41 107               0.818182
57        41 111               0.818182
59        41 134               0.818182
92        65  39               1.000000
113       77   4               0.442244
114       77  25               0.442244
150      119  35               0.618058
153      125  23               0.600000
154      125  36               0.428571
156      125  61               0.428571
157      125  62               0.428571
158      125  73               0.600000
159      125  83               0.600000
160      125 106               0.600000
161      125 107               0.600000
163      125 110               1.000000
193      139  26               0.670152
194      139  44               0.617630
195      139  47               0.609658


In [18]:
%%R
nrow(NF_SN_fst_high)

[1] 9752
