-
Notifications
You must be signed in to change notification settings - Fork 2
/
using-in-sna.Rmd
490 lines (369 loc) · 17.7 KB
/
using-in-sna.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
---
title: "Using spatsoc in social network analysis"
author: "Alec Robitaille, Quinn Webber and Eric Vander Wal"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
number_sections: false
toc: false
vignette: >
%\VignetteIndexEntry{Using spatsoc for social network analysis}
%\VignetteEngine{knitr::knitr}
%\VignetteEncoding{UTF-8}
---
```{r knitropts, include = FALSE}
knitr::opts_chunk$set(message = TRUE,
warning = FALSE,
eval = FALSE,
echo = TRUE)
```
`spatsoc` can be used in social network analysis to generate gambit of the group format data from GPS relocation data, perform data stream randomization and generate group by individual matrices.
Gambit of the group format data is generated using the grouping functions:
* `group_times`
* `group_pts`
* `group_lines`
* `group_polys`
Data stream randomization is performed using the `randomizations` function.
Group by individual matrices are generated using the `get_gbi` function.
# Generate gambit of the group data
spatsoc provides users with one temporal (`group_times`) and three spatial (`group_pts`, `group_lines`, `group_polys`) functions to generate gambit of the group data from GPS relocations. Users can consider spatial grouping at three different scales combined with an appropriate temporal grouping threshold. The gambit of the group data is then used to generate a group by individual matrix and build the network.
## 1. Load packages and prepare data
`spatsoc` expects a `data.table` for all `DT` arguments and date time columns to be formatted `POSIXct`.
```{r}
## Load packages
library(spatsoc)
library(data.table)
library(asnipe)
library(igraph)
## Read data as a data.table
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))
## Cast datetime column to POSIXct
DT[, datetime := as.POSIXct(datetime)]
## Calculate the year of the relocation
DT[, yr := year(datetime)]
```
Next, we will group relocations temporally with `group_times` and spatially with one of `group_pts`, `group_lines`, `group_polys`. Note: these are mutually exclusive, only select one spatial grouping function at a time.
## 2. a) `group_pts`
Point based grouping by calculating distance between relocations in each timegroup. Depending on species and study system, relevant temporal and spatial grouping thresholds are used. In this case, relocations within 5 minutes and 50 meters are grouped together.
```{r}
## Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
## Spatial groups
group_pts(
DT,
threshold = 50,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup'
)
```
## 2. b) `group_lines`
Line based grouping by measuring intersection of, optionally buffered, trajectories for each individual in each timegroup. Longer temporal thresholds are used to measure, for example, intersecting daily trajectories.
```{r, eval = FALSE}
# UTM zone for relocations
utm <- '+proj=utm +zone=36 +south +ellps=WGS84 +datum=WGS84 +units=m +no_defs'
## Group relocations by julian day
group_times(DT, datetime = 'datetime', threshold = '1 day')
## Group lines for each individual and julian day
group_lines(
DT,
threshold = 50,
projection = utm,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup',
sortBy = 'datetime'
)
```
## 2. c) `group_polys`
Polygon based grouping by generating home ranges using `adehabitatHR` and measuring intersection or proportional overlap. Longer temporal thresholds are used to create seasonal, monthly, yearly home ranges.
```{r, eval = FALSE}
# UTM zone for relocations
utm <- '+proj=utm +zone=36 +south +ellps=WGS84 +datum=WGS84 +units=m +no_defs'
## Option 1: area = FALSE and home range intersection 'group' column added to DT
group_polys(
DT,
area = FALSE,
hrType = 'mcp',
hrParams = list(percent = 95),
projection = utm,
id = 'ID',
coords = c('X', 'Y')
)
## Option 2: area = TRUE
# results must be assigned to a new variable
# data.table returned has ID1, ID2 and proportion and area overlap
areaDT <- group_polys(
DT,
area = TRUE,
hrType = 'mcp',
hrParams = list(percent = 95),
projection = utm,
id = 'ID',
coords = c('X', 'Y')
)
```
# Build observed network
Once we've created groups using `group_times` and one of the spatial grouping functions, we can generate a group by individual matrix.
The following code chunk showing `get_gbi` can be used for outputs from any of `group_pts`, `group_lines` or `group_polys(area = FALSE)`. For the purpose of this vignette however, we will consider the outputs from `group_pts` ([2. a)](#a-group_pts)) for the following code chunk.
Note: we show this example creating the group by individual matrix and network for only 2016 to illustrate how `spatsoc` can be used for simpler data with no splitting of temporal or spatial subgroups (e.g.: yearly, population). See the random network section for how to use `spatsoc` in social network analysis for multi-year or other complex data.
## 3. `get_gbi`
```{r}
## Subset DT to only year 2016
subDT <- DT[yr == 2016]
## Generate group by individual matrix
# group column generated by spatsoc::group_pts
gbiMtrx <- get_gbi(DT = subDT, group = 'group', id = 'ID')
```
Note: `spatsoc::get_gbi` is identical in function to `asnipe::get_group_by_individual`, but is more efficient (some benchmarks measuring >10x improvements) thanks to `data.table::dcast`.
## 4. `asnipe::get_network`
Next, we can use `asnipe::get_network` to build the observed social network. Ensure that the argument "data_format" is "GBI". Use other arguments that are relevant to your analysis, here we calculate a Simple ratio index.
```{r}
## Generate observed network
net <- get_network(gbiMtrx,
data_format = "GBI",
association_index = "SRI")
```
# Data stream randomization
Three types of data stream randomization are provided by `spatsoc`'s `randomizations` function:
* step: randomizes identities of relocations between individuals within each time step.
* daily: randomizes identities of relocations between individuals within each day.
* trajectory: randomizes daily trajectories within individuals ([Spiegel et al. 2016](https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12553)).
The results of `randomizations` must be assigned. The function returns the `id` and `datetime` columns provided (and anything provided to `splitBy`). In addition, columns 'observed' and 'iteration' are returned indicating observed rows and which iteration rows correspond to (where 0 is the observed).
As with spatial grouping functions, these methods are mutually exclusive. Pick one `type` and rebuild the network after randomization.
Note: the `coords` argument is only required for trajectory type randomization, since after randomizing with this method, the 'coords' are needed to redo spatial grouping (with `group_pts`, `group_lines` or `group_polys`).
## 5. a) `type = 'step'`
`'step'` randomizes identities of relocations between individuals within each time step. The `datetime` argument expects an integer group created by `group_times`. The `group` argument expects the column name of the group generated from the spatial grouping functions.
Four columns are returned when `type = 'step'` along with `id`, `datetime` and `splitBy` columns:
* 'randomID' - randomly selected ID from IDs within each time step
* 'observed' - observed rows (TRUE/FALSE)
* 'iteration' - which iteration rows correspond to (0 is observed)
```{r}
# Calculate year column to ensure randomization only occurs within years since data spans multiple years
DT[, yr := year(datetime)]
## Step type randomizations
# providing 'timegroup' (from group_times) as datetime
# splitBy = 'yr' to force randomization only within year
randStep <- randomizations(
DT,
type = 'step',
id = 'ID',
group = 'group',
coords = NULL,
datetime = 'timegroup',
iterations = 3,
splitBy = 'yr'
)
```
## 5. b) `type = 'daily'`
`'daily'` randomizes identities of relocations between individuals within each day. The `datetime` argument expects a datetime `POSIXct` format column.
Four columns are returned when `type = 'daily'` along with `id`, `datetime` and `splitBy` columns:
* 'randomID' - randomly selected ID for each day
* 'jul' - julian day
* 'observed' - observed rows (TRUE/FALSE)
* 'iteration' - which iteration rows correspond to (0 is observed)
```{r}
# Calculate year column to ensure randomization only occurs within years since data spans multiple years
DT[, yr := year(datetime)]
## Daily type randomizations
# splitBy = 'yr' to force randomization only within year
randDaily <- randomizations(
DT,
type = 'daily',
id = 'ID',
group = 'group',
coords = NULL,
datetime = 'datetime',
splitBy = 'yr',
iterations = 20
)
```
## 5. c) `type = 'trajectory'`
`'trajectory'` randomizes daily trajectories within individuals ([Spiegel et al. 2016](https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12553)). The `datetime` argument expects a datetime `POSIXct` format column.
Five columns are returned when `type = 'trajectory'` along with `id`, `datetime` and `splitBy` columns:
* random date time ("random" prefixed to *datetime* argument)
* 'jul' - observed julian day
* 'observed' - observed rows (TRUE/FALSE)
* 'iteration' - which iteration rows correspond to (0 is observed)
* 'randomJul' - random julian day relocations are swapped to from observed julian day
```{r}
# Calculate year column to ensure randomization only occurs within years since data spans multiple years
DT[, yr := year(datetime)]
## Trajectory type randomization
randTraj <- randomizations(
DT,
type = 'trajectory',
id = 'ID',
group = NULL,
coords = c('X', 'Y'),
datetime = 'datetime',
splitBy = 'yr',
iterations = 20
)
```
# Build random network
Once we've randomized the data stream with `randomizations`, we can build the random network.
We will use the `get_gbi` function directly when `type` is either 'step' or 'daily'. For `type = 'trajectory'`, we will recalculate spatial groups with one of `group_pts`, `group_lines`, `group_polys` for the randomized data. In this case, the example shows `group_pts`.
Since we want to create a group by individual matrix for each random iteration (and in this case, each year), we will use `mapply` to work on subsets of the randomized data.
Note: building the random networks depends on the `type` used and therefore, the following chunks are mutually exclusive. Use the one that corresponds to the randomization type you used above.
## 6. a) `type = 'step'`
`randomizations` with `type = 'step'` returns a 'randomID' which should be used instead of the observed 'ID' to generate the group by indiviual matrix.
After `get_gbi`, we use `asnipe::get_network` to build the random network.
```{r}
## Create a data.table of unique combinations of iteration and year, exluding observed rows
iterYearLs <- unique(randStep[!(observed), .(iteration, yr)])
## Generate group by individual matrix
# for each combination of iteration number and year
# 'group' generated by spatsoc::group_pts
# 'randomID' used instead of observed ID (type = 'step')
gbiLs <- mapply(FUN = function(i, y) {
get_gbi(randStep[iteration == i & yr == y],
'group', 'randomID')
},
i = iterYearLs$iter,
y = iterYearLs$yr,
SIMPLIFY = FALSE
)
## Generate a list of random networks
netLs <- lapply(gbiLs, FUN = get_network,
data_format = "GBI", association_index = "SRI")
```
## 6. b) `type = 'daily'`
`randomizations` with `type = 'step'` returns a 'randomID' which should be used instead of the observed 'ID' to generate the group by indiviual matrix.
After `get_gbi`, we use `asnipe::get_network` to build the random network.
In this case, we will generate a fake column representing a "population" to show how we can translate the `mapply` chunk above to three (or more variables).
```{r}
## Generate fake population
randDaily[, population := sample(1:2, .N, replace = TRUE)]
## Create a data.table of unique combinations of iteration, year, and population, exluding observed rows
iterYearLs <- unique(randStep[!(observed), .(iteration, yr, population)])
## Generate group by individual matrix
# for each combination of iteration number and year
# 'group' generated by spatsoc::group_pts
# 'randomID' used instead of observed ID (type = 'step')
gbiLs <- mapply(FUN = function(i, y, p) {
get_gbi(randDaily[iteration == i &
yr == y & population == p],
'group', 'randomID')
},
i = iterYearLs$iter,
y = iterYearLs$yr,
p = iterYearLs$population,
SIMPLIFY = FALSE
)
## Generate a list of random networks
netLs <- lapply(gbiLs, FUN = get_network,
data_format = "GBI", association_index = "SRI")
```
## 6. c) `type = 'trajectory'`
`randomizations` with `type = 'trajectory'` returns a random date time which should be used instead of the observed date time to generate random gambit of the group data.
First, we pass the randomized data to `group_times` using the random date time for `datetime`.
After `get_gbi`, we use `asnipe::get_network` to build the random network.
```{r}
## Randomized temporal groups
# 'datetime' is the randomdatetime produced by randomizations(type = 'trajectory')
group_times(randTraj, datetime = 'randomdatetime', threshold = '5 minutes')
## Randomized spatial groups
# 'iteration' used in splitBy to ensure only points within each iteration are grouped
group_pts(randTraj, threshold = 50, id = 'ID', coords = c('X', 'Y'),
timegroup = 'timegroup', splitBy = 'iteration')
## Create a data.table of unique combinations of iteration and year, exluding observed rows
iterYearLs <- unique(randStep[!(observed), .(iteration, yr)])
## Generate group by individual matrix
# for each combination of iteration number and year
# 'group' generated by spatsoc::group_pts
# 'ID' used since datetimes were randomized within individuals
gbiLs <- mapply(FUN = function(i, y) {
get_gbi(randTraj[iteration == i & yr == y],
'group', 'ID')
},
i = iterYearLs$iter,
y = iterYearLs$yr,
SIMPLIFY = FALSE
)
## Generate a list of random networks
netLs <- lapply(gbiLs, FUN = get_network,
data_format = "GBI", association_index = "SRI")
```
# Network metrics
Finally, we can calculate some network metrics. Please note that there are many ways of interpreting, analyzing and measuring networks, so this will simply show one option.
## 7. Calculate observed network metrics
To calculate observed network metrics, use the network (`net`) produced in [4.](#asnipeget_network) from 2016 data.
```{r}
## Generate graph
g <- graph.adjacency(net, 'undirected',
diag = FALSE, weighted = TRUE)
## Metrics for all individuals
observed <- data.table(
centrality = evcent(g)$vector,
strength = graph.strength(g),
degree = degree(g),
ID = names(degree(g)),
yr = subDT[, unique(yr)]
)
```
## 8. Calculate random network metrics
With the list of random networks from [6.](#build-random-network), we can generate a list of graphs with `igraph::graph.adjacency` (for example) and calculate random network metrics.
This example uses the `netLs` created by [6. a)](#a-type-step-1) which was split by year and iteration.
```{r}
## Generate graph and calculate network metrics
mets <- lapply(seq_along(netLs), function(n) {
g <- graph.adjacency(netLs[[n]], 'undirected',
diag = FALSE, weighted = TRUE)
data.table(
centrality = evcent(g)$vector,
strength = graph.strength(g),
degree = degree(g),
ID = names(degree(g)),
iteration = iterYearLs$iter[[n]],
yr = iterYearLs$yr[[n]]
)
})
## Metrics for all individuals across all iterations and years
random <- rbindlist(mets)
## Mean values for each individual and year
meanMets <- random[, lapply(.SD, mean), by = .(ID, yr),
.SDcols = c('centrality', 'strength', 'degree')]
```
## 9. Compare observed and random metrics
Instead of calculating observed and random metrics separately (shown in [7.](#calculate-observed-network-metrics) and [8.](#calculate-random-network-metrics)), we can calculate metrics for both at the same time and compare.
This chunk expects the outputs from [5. a)](#a-type-step), skipping steps 6.-8.
Note: by removing the `!(observed)` subset from `randStep` performed in [6. a)](#a-type-step-1), we will include observed rows where `iteration == 0`. This will return a `gbiLs` where the observed and random rows are included in the same `data.table`.
```{r}
## Create a data.table of unique combinations of iteration and year, including observed and random rows
iterYearLs <- unique(randStep[, .(iteration, yr)])
## Generate group by individual matrix
# for each combination of iteration and year
# 'group' generated by spatsoc::group_pts
# 'randomID' used instead of observed ID (type = 'step')
gbiLs <- mapply(FUN = function(i, y) {
get_gbi(randStep[iteration == i & yr == y],
'group', 'randomID')
},
i = iterYearLs$iter,
y = iterYearLs$yr,
SIMPLIFY = FALSE
)
## Generate a list of random networks
netLs <- lapply(gbiLs, FUN = get_network,
data_format = "GBI", association_index = "SRI")
## Generate graph and calculate network metrics
mets <- lapply(seq_along(netLs), function(n) {
g <- graph.adjacency(netLs[[n]], 'undirected',
diag = FALSE, weighted = TRUE)
data.table(
centrality = evcent(g)$vector,
strength = graph.strength(g),
ID = names(degree(g)),
iteration = iterYearLs$iter[[n]],
yr = iterYearLs$yr[[n]]
)
})
## Observed and random for all individuals across all iterations and years
out <- rbindlist(mets)
## Split observed and random
out[, observed := ifelse(iteration == 0, TRUE, FALSE)]
## Mean values for each individual and year, by observed/random
meanMets <- out[, lapply(.SD, mean), by = .(ID, yr, observed),
.SDcols = c('centrality', 'strength')]
```