-
Notifications
You must be signed in to change notification settings - Fork 4
/
Niche_Overlap.Rmd
645 lines (501 loc) · 23.5 KB
/
Niche_Overlap.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
Niche Models
============
Introduction
------------
The following R markdown file performs the steps needed to calculate niche
overlap between the extant Ungulates. Here we implement two different approaches
to calculate the niche per species:
1. Maxent species distribution models, and
2. Outlying Mean Index (OMI)
The following libraries and scripts need to be loaded:
```{r libraries}
library(raster, quietly = T)
library(knitr, quietly = T)
library(maxent, quietly = T)
library(maps, quietly = T)
library(rJava, quietly = T)
library(maptools, quietly = T)
library(jsonlite, quietly = T)
library(caret, quietly = T)
library(ENMeval, quietly = T)
library(repmis, quietly = T)
library(CoordinateCleaner, quietly = T)
library(dismo, quietly = T)
library(virtualspecies, quietly = T)
library(sp, quietly = T)
library(rgeos, quietly = T)
library(ape, quietly = T)
library(adehabitatMA, quietly = T)
library(ade4, quietly = T)
library(raster, quietly = T)
library(SDMTools, quietly = T)
library(factoextra, quietly = T)
library(ecospat, quietly = T)
library(phytools, quietly = T)
library(rphylopic, quietly = T)
library(whisker, quietly = T)
library(rgdal, quietly=T)
library(dplyr, quietly=T)
# our local code. the goal is that this will eventually be properly portable
# code that lives inside the ./script folder of the repository
source("../script/MaxEnt_function.R")
```
In addition, here we define the global variables that we will reuse throughout
the code:
```{r globals}
# this is the location of the root directory of the git repository in the local
# file system. normally you would be running the code from within rstudio, using
# a project that is initialized within that directory, so the working directory
# should automatically be set.
REPO_HOME <- paste(getwd(), "/../", sep = "")
```
Environmental data
------------------
The environmental data used in this research are based on climatic variables,
topography, and soil characteristics. Climatic information about the present
was extracted from the widely used Bioclim dataset, which includes 19
bioclimatic layers. The datasets contain information such as 'precipitation
in the driest quarter' or 'maximum temperatures of the coldest month' and are
constructed based on monthly remote sensing data between 1950 and 2000 (Hijmans
et al., 2005, Title et al., 2018). The dataset can directly be downloaded with
the getData() function from the `raster` package. It is possible to adjust the
spatial resolution `res` to 30 arc seconds, 2.5, 5, and 10 arc minutes.
```{r getdata}
# This function from the raster package loads bioclimatic layers. It first
# attempts to do this from files in data/GIS/wc5, but if these are not found it
# will attempt to download them and cache them locally. Since the files are
# small enough to commit to github we will normally have these files already.
# This could change if we move to a resolution that is finer than 5 arcminutes,
# as the layer files will then grow above 15MB.
gis.layers <- raster::getData(
"worldclim",
var = "bio",
res = 5,
path = paste(REPO_HOME, "/data/GIS", sep = ""),
download = T
)
```
The other environmental datasets that we used are the new ENVIREM variables that
give additional climatic information to the Bioclim datasets. We used median
elevation variables from the Harmonized World Soil Database (HWSD), which are
based on NASA’s Shuttle Radar Topographic Mission to calculate worldwide slope
and aspect. We used indirect height measures such as slope and aspect because
the height variables are directly correlated with the temperature Bioclim
datasets. To capture soil characteristics, we used organic carbon, pH CaCL, bulk
density and clay percentage datasets obtained from the land-atmosphere
interaction research group at Sun Yat-sen University.
```{r add_layers}
# The location of the TIFF file with the stacked layers described directly above
files.names <- list.files(paste(REPO_HOME, "/data/GIS/5_deg", sep = ""))
# Turn the file names into layer names: strip the prefix, which might include
# the resolution, and strip the file extension
gis.layers.names <- files.names
gis.layers.names <- gsub('current_5arcmin_','',gis.layers.names)
gis.layers.names <- gsub('.tif','',gis.layers.names)
# Combine the layer names with those we've already read from BIOCLIM
gis.layers.names <- c(names(gis.layers),gis.layers.names)
# Iterate over files
for (i in 1:length(files.names)) {
# Stack with previously read layers
gis.layers <- stack(
gis.layers,
# Read as raster
raster(
# Construct file name
paste(REPO_HOME, "/data/GIS/5_deg/", files.names[i], sep = "")
)
)
}
# Apply all names
names(gis.layers) <- gis.layers.names
rm(gis.layers.names, files.names, i)
```
Taxon data
----------
In a project that includes many taxa, such as this one, it is helpful if we can
work incrementally by having separate input and output files per taxon. To set
this up we first read in the list of taxa from `/data/filtered/taxa.txt` and
create corresponding output directories:
```{r load_taxa}
# read listing
taxa <- scan(paste(REPO_HOME, "/data/filtered/taxa.txt", sep = ""), sep = "\n", what = character())
# make output directories, if needed
for (i in 1:length(taxa)) {
taxon.dir.name <- sprintf("%s/results/per_species/%s/", REPO_HOME, taxa[i])
if (!dir.exists(taxon.dir.name)) {
dir.create(taxon.dir.name, recursive = T)
}
}
rm(i, taxon.dir.name)
```
When calculating the species distribution models a dataframe containing all the
occurence points is needed, which is created and filled below:
```{r combine_occurences}
# create an empty dataframe
occurrences.df = data.frame(
matrix(
vector(), 0, 4,
dimnames=list(
c(),
c("gbif_id", "taxon_name", "decimal_latitude", "decimal_longitude")
)
),
stringsAsFactors=F
)
# loop to add the occurence data to the empty dataframe
for ( i in 1:length(taxa) ) {
csv.file <- sprintf("%s/data/filtered/%s.csv", REPO_HOME, taxa[i])
csv <- read.csv(csv.file, header = T)
occurrences.df <- rbind(occurrences.df, csv)
}
rm(csv.file,csv,i)
```
Below, the raw occurrence data are plotted on a map, which goes to
`/results/per_species/<taxon>/occurrences.png`:
```{r plot_points}
data(wrld_simpl)
for (i in 1:length(taxa)) {
# construct the name of the occurrences file for the current taxon
taxon <- taxa[i]
occ.map.file <- sprintf("%s/results/per_species/%s/occurrences.png", REPO_HOME, taxon)
# generate the map
if (!file.exists(occ.map.file)) {
taxon.name <- gsub( "_", " ", taxon)
# select the longitude and latitude of the focal taxon from the data frame
taxon.occurrences <- dplyr::filter(occurrences.df, taxon_name == taxon.name)
lon <- dplyr::select(taxon.occurrences, decimal_longitude)$decimal_longitude
lat <- dplyr::select(taxon.occurrences, decimal_latitude)$decimal_latitude
# plot a simple map ±5 degrees long and lat around the occurrences
png(file = occ.map.file)
plot(
wrld_simpl,
axes = TRUE,
xlim = c(min(lon) - 5, max(lon) + 5),
ylim = c(min(lat) - 5, max(lat) + 5),
col = "gray93",
main = sprintf("Filtered occurrences for %s", taxon.name)
)
points(lon, lat, col = "red", pch = 20, cex = 0.75)
box()
dev.off()
# clean up
rm(taxon.name,taxon.occurrences,lat,lon)
}
rm(occ.map.file, taxon)
}
rm(i, wrld_simpl)
```
Maxent: species distribution modeling
-------------------------------------
For this research we used the Maximum Entropy (MaxEnt) machine learning
algorithm version 3.3.3 to construct SDMs for all available species. Previous
research has demonstrated that the MaxEnt technique performs well when using
presence only data to estimate the relationships between environmental
predictors and the occurrences of species (Elith et al., 2011, Philips et al.,
2006, Tognelli et al., 2009, Conolly et al., 2012). The widely-used machine
learning algorithm is very efficient in the complex handling of response and
predictor variable interactions and works well with little occurrence data
points (Elith et al., 2011, Elith et al., 2006, Wisz et al., 2008).
Before the construction of the maxent model the environmental rasters are
cropped to the extent of the occurrence points for a specific species + a buffer
around the total extent. Buffers that are too small can result in
underestimations of edge effects while buffers that are too large have the risk
of losing track of favorable environmental conditions due to noise. In this
research we set the extent of the buffer in proportion to the maximum distance
between two occurrence points, in this way we account for species with a wide
distribution and species with a small distribution.
The cropped environmental data are checked for collinearity, because only
uncorrelated environmental raster layers can be used in the SDM (Dormann et al.,
2013, Reas & Aguirre-Gutierrez, 2018). To remove correlated layers the
removeCollinearity function in the virtualspecies version 1.4-4 R package was
used (Leroy et al., 2016). Environmental variables with a Pearson’s R
correlation coefficients above 0.7 were grouped and one variable within this
group was randomly chosen, resulting in a raster stack with only uncorrelated
environmental rasters.
Afterwards the occurrence dataset is split in k-fold partitions: a training
dataset containing 75% of the data and a test dataset containing 25% of the
data. The maxent model is constructed using the maxent function from the dismo
R package (Hijmans & Elith, 2013). The function extracts abiotic environmental
data for the training occurrence locations and 1000 random sampled background
locations, resulting in a model maxent object that can be used to predict which
other locations are suitable.
To assess the model performance we used the area under the receiver operating
curve (ROC), also known as the AUC, which is often used to estimate the ability
of models (Fourcade et al., 2014). The AUC was interpreted as the probability
that a randomly chosen “presence point”, the location of a species occurrence,
has a higher predicted probability of occurrence than a randomly chosen absence
point (Reas & Aguirre-Gutierrez, 2018). To compare a presence point to an
absence point 1000 absence points were randomly sampled within the study area.
The test occurence datasets were used to evaluate the model performance with
the evaluate function in the dismo R package (Hijmans & Elith, 2013). Only SDMs
with an AUC value higher than 0.7 were used in the further steps, since these
models are generally accepted as useful models (Swets et al., 2000).
The script below loops through each species in the list, assesses the niche
model, and classifies it into one of two options:
1. "valid"
2. "invalid"
The Maxent function used in this loop can be viewed and adapted in our Github
repository in this [script](../script/MaxEnt_function.R).
```{r maxent1}
# first we created two empty lists, one list for models with a low accuracy (AUC
# below 0.7) and one list for models with a high accuracy (AUC above 0.7)
list_species_model_low_accuracy <- list()
list_species_model_high_accuracy <- list()
# length(taxa)
# iterate over n species
for (i in c(1:134, 136:150, 152:154)) {
name <- taxa[i]
valid.model.file <- sprintf("%s/results/per_species/%s/valid_maxent_model.rda", REPO_HOME, name)
invalid.model.file <- sprintf("%s/results/per_species/%s/invalid_maxent_model.rda", REPO_HOME, name)
# check if we have the model cached
if (!file.exists(valid.model.file) && !file.exists(invalid.model.file)) {
message(name)
# read occurrences
csv.file <- sprintf("%s/data/filtered/%s.csv", REPO_HOME, name)
species_occurence <- read.csv(csv.file, header = T)
# Run maxent:
# - clip environment to ± 9 degrees lat/long relative to the occurrences
# - remove colinear layers
# - prepare k-folds cross-validation partition
# - run dismo::maxent()
species_model <- Maxent_function(species_occurence, currentEnv)
# select species model from the output list
spmod <- species_model[[1]]
# select trainingAUC
trainingAUC <- spmod@results[[5,1]]
## null model
# - select occurence points within the clip environment
# - remove the species from the dataframe with all species
# - random sample x points from the dataframe
# - run the maxent model 100 times
name_underscore<- gsub( "_", " ", name)
occurence_samples<- nrow(species_occurence)
visited_areas <- subset(empty.df, empty.df$taxon_name != name_underscore)
modelenvironment <- species_model[[2]]
polygonextent<- species_model[[4]]
AUCvalues<- nullModel_without_spatial(visited_areas, modelenvironment, polygonextent, occurence_samples, rep = 100)
auc <- sort(unlist(AUCvalues), decreasing = FALSE)
# select 95th number in the list
confidence.interval<- auc[95]
# populate two stacks, one with the models with a low accuracy and one with a high accuracy
if (trainingAUC > confidence.interval) {
model <- species_model[[1]]
list_species_model_high_accuracy[[name]] <- model
save(model, file = valid.model.file)
}
else {
model <- species_model[[1]]
list_species_model_low_accuracy[[name]] <- model
save(model, file = invalid.model.file)
}
# valid model was previously constructed, load it
} else if (file.exists(valid.model.file)) {
env <- new.env()
nm <- load(valid.model.file, env)[[1]]
list_species_model_high_accuracy[[name]] <- env[[nm]]
# invalid model was previously constructed, load it
} else if (file.exists(invalid.model.file)) {
env <- new.env()
nm <- load(invalid.model.file, env)[[1]]
list_species_model_low_accuracy[[name]] <- env[[nm]]
}
}
```
```{r list_aucvalues}
# combine two lists of valid and invalid models
output_AUC_valid <- data.frame(matrix(ncol = 3, nrow = length(list_species_model_high_accuracy)))
colnames(output_AUC_valid) <- c("taxon","trainingAUC","validation")
AUC.csv <- paste(REPO_HOME, "/results/maxent/AUCvalues.csv", sep="")
for (i in 1:length(list_species_model_high_accuracy)) {
open_species_model <- list_species_model_high_accuracy[[i]]
name <- names(list_species_model_high_accuracy[i])
name_underscore<- gsub( "_", " ", name)
trainingAUC<-open_species_model@results[[5,1]]
output_AUC_valid[i,] <- c(name_underscore, trainingAUC, "valid")
}
output_AUC_invalid <- data.frame(matrix(ncol = 3, nrow = length(list_species_model_low_accuracy)))
colnames(output_AUC_invalid) <- c("taxon","trainingAUC","validation")
for (i in 1:length(list_species_model_low_accuracy)) {
open_species_model <- list_species_model_low_accuracy[[i]]
name <- names(list_species_model_low_accuracy[i])
name_underscore<- gsub( "_", " ", name)
trainingAUC<-open_species_model@results[[5,1]]
output_AUC_invalid[i,] <- c(name_underscore, trainingAUC, "invalid")
}
combined_auc<- rbind(output_AUC_invalid, output_AUC_valid)
write.csv(combined_auc, file= AUC.csv)
```
The models are now stacked in a list but can be viewed and examined as follows:
```{r plot_importance and plot_response}
# Select the first model in the list
open_species_model <- list_species_model_high_accuracy[[1]]
# name of the dataset plotted
name <- paste("Variable importance for", names(list_species_model_high_accuracy[1]))
# plot the variable importance
plot(open_species_model, main = name)
# plot the response curve to see what the range of important values is per abiotic dataset
response(open_species_model)
# the model views can be saved in a loop
valid.model.variable.importance <- sprintf("%s/results/per_species/%s/valid_maxent_variable_importance.png", REPO_HOME, name)
valid.model.response.curve <- sprintf("%s/results/per_species/%s/valid_maxent_response_curve.png", REPO_HOME, name)
combined_lists<- append(list_species_model_low_accuracy, list_species_model_high_accuracy)
for (i in 1:length(list_species_model_high_accuracy)) {
open_model <- list_species_model_high_accuracy[[i]]
name <- names(list_species_model_high_accuracy[i])
name2 <- paste("Variable importance")
variable.map.file <- sprintf("%s/results/per_species/%s/valid_maxent_variable_importance.png", REPO_HOME, name)
if (!file.exists(variable.map.file)) {
png(file = variable.map.file)
plot(open_model, main = name2)
dev.off()
}
response.map.file <- sprintf("%s/results/per_species/%s/valid_maxent_response_curve.png", REPO_HOME, name)
if (!file.exists(response.map.file)) {
png(file = response.map.file, res = 80, units = "px")
response(open_model)
dev.off()
}
}
```
The outcome of the species distribution models can be used to create a prediction of the areas
that are suitable for the species. The script below shows a loop that creates prediction raster
files for the models with a high prediction accuracy.
```{r plot_suitability}
# combine all prediction rasters, either by doing the projection or by reading from a file
prediction_rasters <- stack()
taxon.names<- taxa
taxon.names <- names(list_species_model_high_accuracy)
realms<- readOGR(paste(REPO_HOME, "/data/GIS/Realms/newRealms.shp", sep = ""))
realms<- spTransform(realms, CRS("+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"))
plot(realms)
for (i in c(1:134, 136:150, 152:154)) {
#length(list_species_model_high_accuracy)
# file name: load if exists, create otherwise
valid.model.prediction <- sprintf("%s/results/per_species/%s/valid_maxent_prediction.rda", REPO_HOME, taxon.names[i])
if (file.exists(valid.model.prediction)) {
# load file
message(sprintf("loading prediction for taxon %s from file %s", taxon.names[i], valid.model.prediction))
env <- new.env()
nm <- load(valid.model.prediction, envir = env)[[1]]
prediction_rasters <- stack(prediction_rasters, env[[nm]])
}
else {
message(sprintf("computing prediction for taxon %s, will write to file %s", taxon.names[i], valid.model.prediction))
open_species_model <- list_species_model_high_accuracy[[i]]
# select non correlated layers from model environment in the world environment
noncorrelated <- names(open_species_model@presence)
subsetworldEnv <- subset(currentEnv, noncorrelated)
# make projection of the whole earth based on the maxent SDM
species.pred <- dismo::predict(open_species_model, subsetworldEnv)
# limit the predictions to zoogeographic regions (Realms)
occurence.csv <- sprintf("%s/data/filtered/%s.csv", REPO_HOME, taxon.names[i])
occ <- read.csv(occurence.csv, header = T)
bindlonglat <- as.data.frame(cbind(occ[, c("decimal_longitude", "decimal_latitude")]))
points <- bindlonglat
points$decimal_longitude <- as.numeric(as.character(points$decimal_longitude))
points$decimal_latitude <- as.numeric(as.character(points$decimal_latitude))
coordinates(points) <- ~ decimal_longitude + decimal_latitude
proj4string(points) <- CRS("+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs")
polys.sub <- realms[!is.na(sp::over(realms, sp::geometry(points))), ]
polys.sub <- gUnaryUnion(polys.sub)
# set all the values outside the polygon to 0
maskedprediction<- mask(species.pred, polys.sub, updatevalue = 0, updateNA=TRUE)
# remove the 10% lowest prediction values under the occurence points
projection.values<-extract(maskedprediction, points, method="simple")
sort.projection.values <- sort(unlist(projection.values), decreasing = FALSE)
percentage.number<- round((0.1* nrow(occ)), digits = 0)
threshold.value<- sort.projection.values[percentage.number]
maskedprediction[maskedprediction < threshold.value] <- 0
# save file, stack projection contents
save(maskedprediction, file = valid.model.prediction)
prediction_rasters <- stack(prediction_rasters, maskedprediction)
}
}
names(prediction_rasters) <- taxon.names[c(1:134, 136:150, 152:154)]
```
Now that we have the prediction rasters we can write them to files:
```{r write_predictions}
for (i in 1:length(names(prediction_rasters))) {
taxon.name <- names(prediction_rasters)[[i]]
prediction.map.file <- sprintf("%s/results/per_species/%s/prediction_map.png", REPO_HOME, taxon.name)
if (!file.exists(prediction.map.file)) {
png(file = prediction.map.file)
plot(prediction_rasters[[i]])
dev.off()
}
prediction.occurence.map.file <- sprintf("%s/results/per_species/%s/prediction_occurence_map.png", REPO_HOME, taxon.name)
if (!file.exists(prediction.occurence.map.file)) {
# select correct occurence datasets
csv.file <- sprintf("%s/data/filtered/%s.csv", REPO_HOME, taxon.name)
occ <- read.csv(csv.file, header = T)
lon <- occ$decimal_longitude
lat <- occ$decimal_latitude
png(file = prediction.occurence.map.file)
plot(prediction_rasters[[i]])
points(lon, lat, col = "red", pch = 20, cex = 0.2)
box()
dev.off()
}
}
```
Write readme files to the github.
```{r readme_files}
# location of a triangular, inverse distance matrix file
for ( i in 1:length(taxa)) {
name <- taxa[i]
readme.file <- sprintf('%s/results/per_species/%s/README.md', REPO_HOME, taxa[i])
if(!file.exists(readme.file)) {
taxa_remove<- gsub( "_", " ", name)
template<-
'# {{taxa_name}}
![](image_taxa.png)
## Distribution of occurence points
The following map shows the distribution of the filtered occurence points for {{taxa_name}} used in the Maxent model.
![](occurrences.png)
## Variable importance
The variable importance graph shows the relative importance of the abiotic raster layers in the Maxent model for {{taxa_name}}.
![](valid_maxent_variable_importance.png)
## Response curve
The response curve graphs show the response of the Maxent model for {{taxa_name}} to different values in the abiotic raster layers.
![](valid_maxent_response_curve.png)
## prediction map
The first map shows the predicted suitable areas on earth based on the niche preferences for {{taxa_name}} calculated in the Maxent model. The second map shows the suitable area map with the original occurence points.
![](prediction_map.png)
![](prediction_occurence_map.png)
'
data <- list( taxa_name = taxa_remove)
text <- whisker.render(template, data)
writeLines(text, readme.file)
}
}
```
To assess whether the predicted suitable niche spaces differ per species we calculate their niche
overlap. We use Schoener’s D to calculate niche overlap since it has been suggested to be the best
suited index for maxent SDM outputs (Rödder & Engler, 2011). The index ranges from 0 which is no
overlap to 1 which is a complete overlap and is based on the model prediction maps.
```{r calc_dist}
# location of a triangular, inverse distance matrix file
overlap.file <- paste(REPO_HOME, "/results/maxent/overlap.csv", sep = "")
if (!file.exists(overlap.file)) {
# file doesn't exist, calculate Schoener's Distance and store file
overlap <- calc.niche.overlap(prediction_rasters, stat = "D", maxent.args)
write.table(overlap, file = overlap.file, sep = ",", quote = F)
} else {
# file exists, read it
overlap <- read.table(overlap.file, sep = ",")
}
```
Having calculated the Schoener's D distances, we can now use these to make a symmetrical distance
matrix, and perform a neighbor joining clustering to visualize the distances in niche space.
```{r cluster_dist}
dendrogram.file <- paste(REPO_HOME, "/results/maxent/dendrogram.tree", sep = "")
if (!file.exists(dendrogram.file)) {
# create a distance matrix, i.e. the inverse of the overlap
dmatrix <- as.dist(1 - overlap)
# plot an unrooted dendrogram
tree <- nj(dmatrix)
write.tree(tree, file = dendrogram.file)
} else {
tree <- read.tree(file = dendrogram.file)
}
```