# Popgen analyses for batch 6.30

These analyses were run on the batch 6 final filtered genepop where individuals missing **>30%** of their data were removed from analysis.

The number of samples retained for each sampling site are as follows: 

|sampling site|# samples| # replicates |
|:-------------|:---------:|:----:|
|Pohang 2015|30|1|
|Geoje 2015|33|-|
|Namhae 2015|16|-|
|Yellow Sea Block 2016|23|1|
|Jukbyeon 2007|22|3|
|Jinhae Bay 2007|32|1|
|Jinhae Bay 2008|26|5|
|Boryeong 2007|22|-|
|Goeje 2014|22|10|


All analyses were completed with replicates removed. 

<br>
<br>
This notebook includes the following analyses: 

1. DAPC for entire sample set
2. DAPC for southern samples only
3. DAPC for 2015/2016 samples only
4. DAPC for 2007/2008 samples only


<br>
<br>
###  (Prep: Generate Genepop file without replicates)

My filtered genepop file contains replicates, so I need to remove those before starting the analyses. When the replicates were the 500ng v. 300ng protocols, I retained the 300ng individuals.

In [1]:
cd ../analyses

/mnt/hgfs/Pacific cod/DataAnalysis/PCod-Korea-repo/analyses


In [2]:
remove_individuals = ["PO010715_02_rep,", "YS121315_12,","JUK07_02_rep.1,","JUK07_09_rep.1,","JUK07_16_rep.1,","JB121807_08_rep,","JB021108_05_rep,", "JB021108_35_rep,", "JB021108_36_rep.1,","JB021108_37_rep.1,","JB021108_46_rep.1,", "GEO020414_11,","GEO020414_14,","GEO020414_15,","GEO020414_16,","GEO020414_17,","GEO020414_23,","GEO020414_24,","GEO020414_25,","GEO020414_8,","GEO020414_9,"]

orig_genepop = open("../stacks_b6_wgenome/batch_6.filteredMAF_filteredLoci_30filteredIndivids_filteredHWE.gen", "r")
new_genepop = open("batch_6.filteredMAF_filteredLoci_30filteredIndivids_filteredHWE_noreps.gen", "w")

for line in orig_genepop: 
    if line.startswith("sample"):
        new_genepop.write(line)
    else:
        sample_id = line.strip().split()[0]
        if sample_id not in remove_individuals:
            new_genepop.write(line)
orig_genepop.close()
new_genepop.close()

### DAPC for entire sample set

In R, I ran the script `DAPC_script_PCod_7-14.r`. 

Below is an excerpt of the script text, run for ALL individuals in the population: 

In [None]:

# First set working directory
setwd("D:/Pacific cod/DataAnalysis/PCod-Korea-repo/analyses")

# Load all necessary R packages
install.packages("ape")
install.packages("ade4")
install.packages("adegenet")
library(ape)
library(ade4)
library(adegenet)
library(diveRsity)
library(doParallel)
library(foreach)
library(genetics)
library(hierfstat)
library(httpuv)
library(iterators)
library(sendplot)
library(xlsx)
library(pegas)
library(plotrix)

###################################################################################
# Let's first run a DAPC with all individuals and all loci, without replicates

## read in genepop
b6i30<-read.genepop("batch_6.filteredMAF_filteredLoci_30filteredIndivids_filteredHWE_noreps.gen")
summary(b6i30)

## note that there are 226 individuals in the file: POH = 30, GE15 = 33, NAM = 16, YS = 23, JUK = 22, JBE = 32, JBL = 26, BOR = 22, GE14 = 22

pop_groups <- as.factor(c(rep("POH15",30),rep("GE15",33),rep("NAM15",16),rep("YS16",23),rep("JUK",22),rep("JBE",32), rep("JBL",26), rep("BOR07", 22), rep("GE14", 22)))
pop_labels <- c("Pohang '15", "Geoje '15", "Namhae '15", "YellowSea '16", "Jukbyeon '07", "Jin. Bay '07 Early", "Jin. Bay '07 Late", "Boryeong '07", "Geoje '14")
pop_cols <- c("seagreen1","mediumorchid1","darkgoldenrod","firebrick4","chartreuse","deepskyblue", "deepskyblue4", "coral1", "mediumorchid4")

## run dapc
dapc_all <- dapc(b6i30,b6i30$pop,n.pca=75,n.da=9) ##Retain all PCA (223/3), then identify optimal number by optim.a.score
## find optimal number of principal components
test_a_score <- optim.a.score(dapc_all)


![a_score_img](https://github.com/mfisher5/PCod-Korea-repo/blob/master/analyses/dapc_b6_ascore.png?raw=true)

In [None]:
## run dapc only on optimal number of principal components
dapc_all <- dapc(b6i30,b6i30$pop,n.pca=12,n.da=9) ##28 PC's is the optimal number

#2D plot WITH ALL POPULATIONS
scatter(dapc_all,scree.da=TRUE,scree.pos = "topright", cellipse=0,leg=FALSE,label=c("POH15","GE15","NAM15","YS16","JUK07","JBE", "JBL","BOR07", "GE14"), posi.da="bottomleft",csub=2,col=pop_cols,cex=1.5,clabel=1,pch=c(19),solid=1)
legend(x = -10, y = 7, bty='n', legend = pop_labels,pch=c(19),col=pop_cols,cex=1)

In [None]:
dapc_all$var  ### Proportion of variance conserved by the principal components
#0.1281183
dapc_all$eig[1]/sum(dapc_all$eig)  ### Variance explained by first discriminant function
# 0.6403211
dapc_all$eig[2]/sum(dapc_all$eig)  ### Variance explained by second discriminant function
#0.2893712

![DAPC_plot](https://github.com/mfisher5/PCod-Korea-repo/blob/master/analyses/b6_dapc_all.png?raw=true)

### DAPC with southern samples only

In [1]:
cd ../analyses

/mnt/hgfs/Pacific cod/DataAnalysis/PCod-Korea-repo/analyses


In [2]:
remove_individuals = []
popmap = open("../scripts/PopMap_L1-4.txt", "r")
for line in popmap:
    if line.strip().split()[1] == "Boryeong07":
        remove_individuals.append(line.strip().split()[0] +  ",")
    elif line.strip().split()[1] == "YellowSea16":
        remove_individuals.append(line.strip().split()[0] + ",")
    elif line.strip().split()[1] == "Jukbyeon07":
        remove_individuals.append(line.strip().split()[0] + ",")

orig_genepop = open("batch_6.filteredMAF_filteredLoci_30filteredIndivids_filteredHWE_noreps.gen", "r")
new_genepop = open("batch_6.filteredMAF_filteredLoci_30filteredIndivids_filteredHWE_noreps_south.gen", "w")

for line in orig_genepop: 
    if line.startswith("sample"):
        new_genepop.write(line)
    else:
        sample_id = line.strip().split()[0]
        if sample_id not in remove_individuals:
            new_genepop.write(line)
orig_genepop.close()
new_genepop.close()

Proportion of variance conserved by the principal components : 0.2721199

Variance explained by first discriminant function : 0.4473815

Variance explained by second discriminant function : 0.2783839

![dapc_south](https://github.com/mfisher5/PCod-Korea-repo/blob/master/analyses/b6_dapc_south.png?raw=true)