/
PopGenHelpR_benchmarking.Rmd
204 lines (132 loc) · 7.29 KB
/
PopGenHelpR_benchmarking.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---
title: "Benchmarking PopGenHelpR with adegenet, hierfstat, mmod, and StAMPP"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{PopGenHelpR_benchmarking}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
options(rmarkdown.html_vignette.check_title = FALSE)
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Purpose
We will compare the performance of `PopGenHelpR` with other R packages available on CRAN. Below, we list the packages we will compare `PopGenHelpR` with and the statistics in each comparison.
**F~st~ and Nei's D**
- [StAMPP](https://cran.r-project.org/web/packages/StAMPP/index.html) ([Pembleton et al., 2013](https://doi.org/10.1111/1755-0998.12129))
**Jost's D**
- [mmod](https://cran.r-project.org/web/packages/mmod/index.html) (Winter et al., 2017)
**Expected and Observed Heterozygosity**
- [hierfstat](https://cran.r-project.org/web/packages/hierfstat/index.html) ([Goudet, 2005](https://doi.org/10.1111/j.1471-8286.2004.00828.x))
- [adegenet](https://cran.r-project.org/web/packages/adegenet/index.html) ([Jombart, 2008](https://doi.org/10.1093/bioinformatics/btn129))
### Let's Begin
First we will load the packages and the data in `PopGenHelpR`. The data comes from [Farleigh et al. (2021)]( https://doi.org/10.1111/mec.16070). We also load `vcfR` to convert between data formats ([Knaus & Grunwald, 2017](https://doi.org/10.1111/1755-0998.12549)).
```{r setup}
# Load the packages
library(PopGenHelpR)
library(adegenet)
library(hierfstat)
library(StAMPP)
library(mmod)
library(vcfR)
# Load the data
data("HornedLizard_VCF")
data("HornedLizard_Pop")
```
## *F~ST~* and *Nei's D* Comparison
We will compare `PopGenHelpR` and `StAMPP`. Both packages use the formulas from Weir and Cockerham (1984) ane Nei (1972) to calculate *F~ST~* and *Nei's D*, respectively but we want to make sure that our estimates are consistent across packages.
First, we need to format the data for `StAMPP`
```{r StAMPP formatting, out.width= "750px", out.height= "750px", echo=FALSE, eval=TRUE, fig.align='center'}
# StAMPP uses genlight objects
Glight <- vcfR2genlight(HornedLizard_VCF)
# Set the population information and ploidy
Glight@pop <- as.factor(HornedLizard_Pop$Population)
ploidy(Glight) <- 2
```
Now we can calculate our statistics. Let's start with *F~ST~*.
```{r Fst, echo=TRUE, eval=TRUE}
PGH_fst <- Differentiation(dat = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "Fst")
Stmp_fst <- stamppFst(Glight, nboots = 0)
```
Let's inspect the results.
```{r Fst comparison, echo=TRUE, eval=TRUE}
PGH_fst$Fst
Stmp_fst
# Is there a difference between the two?
Fst_comparison <- PGH_fst$Fst-Stmp_fst
summary(Fst_comparison)
```
Now we move onto *Nei's D*. We can use the same genlight that we created for the *F~ST~* calculations. We will calculate *Nei's D* for both population's and individual's.
```{r ND, echo=TRUE, eval=TRUE}
PGH_ND <- Differentiation(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "NeisD")
# StAMPP population Nei's D
Stmp_popND <- stamppNeisD(Glight)
# StAMPP individual Nei's D
Stmp_indND <- stamppNeisD(Glight, pop = FALSE)
```
Compare the results like we did with *F~ST~*. Note that `PopGenHelpR` only reports result on the lower triangular element so we will set the upper triangular element of the `Stmp_popND` and `Stmp_indND` objects to NA.
```{r ND comparison, echo = TRUE, eval=TRUE}
# Population comparison
PGH_ND$NeisD_pop
Stmp_popND
# Set StAMPP upper diagnoals to NA
Stmp_popND[upper.tri(Stmp_popND)] <- NA
Stmp_indND[upper.tri(Stmp_indND)] <- NA
popND_comparison <- PGH_ND$NeisD_pop-Stmp_popND
summary(popND_comparison)
# Get the mean difference
mean(popND_comparison, na.rm = T)
# Individual comparison, uncomment if you want to see it
#PGH_ND$NeisD_ind
#Stmp_indND
indND_comparison <- PGH_ND$NeisD_ind - Stmp_indND
mean(indND_comparison, na.rm = T)
```
We see that the difference is very small and is because of rounding. Let's move onto *Jost's D* comparison with `mmod`.
## *Jost's D* Comparison
We will compare `PopGenHelpR` and `mmod`. Both packages use the formulas from Jost (2008). `mmod` uses genind objects so we have to do some format conversion first.
```{r JD, echo=TRUE, eval=TRUE}
Genind <- vcfR2genind(HornedLizard_VCF)
Genind@pop <- as.factor(HornedLizard_Pop$Population)
ploidy(Genind) <- 2
# Calculate Jost's D
PGH_JD <- Differentiation(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "JostsD")
mmod_JD <- pairwise_D(Genind)
PGH_JD$JostsD
mmod_JD
# Compare differences mathematically
PGH_JD$JostsD[2:3,1] - mmod_JD[1:2]
PGH_JD$JostsD[2,2] - mmod_JD[3]
```
Estimate's are very similar between `PopGenHelpR` and `mmod`, we will move onto heterozygosity.
## *Expected* and *Observed Heterozygosity* Comparison
We will compare `PopGenHelpR`, `hierfstat`, and `adegenet`. Again, the packages use the same formula's, so we expect similar if not identical results. `hierfstat` uses it's own format, so we will convert the data before calculations. Luckily we can convert the genind object from *Jost's D* comparisons.
```{r Heterozygosity, echo=TRUE, eval=TRUE}
Hstat <- genind2hierfstat(Genind)
### Calculate heterozygosities
# Expected
PGH_He <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "He")
# Observed
PGH_Ho <- Heterozygosity(data = HornedLizard_VCF, pops = HornedLizard_Pop, statistic = "Ho")
Hstat_hets <- basic.stats(Hstat)
Hstat_Ho <- colMeans(Hstat_hets$Ho)
He_adnet <- Hs(Genind)
PGH_He$He_perpop$Expected.Heterozygosity-He_adnet
PGH_Ho$Ho_perpop$Observed.Heterozygosity-Hstat_Ho
```
We again see very small differences between the estimates.
Please reach out to Keaka Farleigh (farleik@miamioh.edu) if you have any questions, and please see the references and acknowledgments below.
## References
Farleigh, K., Vladimirova, S. A., Blair, C., Bracken, J. T., Koochekian, N., Schield, D. R., ... & Jezkova, T. (2021). The effects of climate and demographic history in shaping genomic variation across populations of the Desert Horned Lizard (Phrynosoma platyrhinos). Molecular Ecology, 30(18), 4481-4496.
Goudet, J. (2005). hierfstat, a package for R to compute and test hierarchical F‐statistics. Molecular ecology notes, 5(1), 184-186.
Jost, L. (2008). GST and its relatives do not measure differentiation. Molecular ecology, 17(18), 4015-4026.
Knaus, B. J., & Grünwald, N. J. (2017). vcfr: a package to manipulate and visualize variant call format data in R. Molecular ecology resources, 17(1), 44-53.
Nei, M. (1972). Genetic distance between populations. The American Naturalist, 106(949), 283-292.
Pembleton, L. W., Cogan, N. O., & Forster, J. W. (2013). St AMPP: An R package for calculation of genetic differentiation and structure of mixed‐ploidy level populations. Molecular ecology resources, 13(5), 946-952.
Weir, B. S., & Cockerham, C. C. (1984). Estimating F-statistics for the analysis of population structure. evolution, 1358-1370.
Winter, D., Green, P., Kamvar, Z., & Gosselin, T. (2017). mmod: modern measures of population differentiation (Version 1.3.3).
## Acknowledgements
We thank the authors of `hierfstat`, `mmod`, `StAMPP`, and all of the package dependencies. They provided inspiration for `PopGenHelpR` and their commitment to open science made it possible to develop and benchmark our package.