-
Notifications
You must be signed in to change notification settings - Fork 6
/
dd-paper.Rmd
598 lines (480 loc) · 24.2 KB
/
dd-paper.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
---
title: 'Distance decay: assessing existing and novel functional forms for modeling uptake of active transport'
author: "Robin Lovelace & David Lovelace"
output:
word_document: default
html_document:
keep_md: yes
toc: yes
pdf_document:
fig_caption: yes
toc: yes
bibliography: references.bib
---
```{r, include=FALSE}
# title: 'Distance decay in active travel: an exploration of functional forms
# for modelling individual and aggregate level data'
# Researchers using the *dd* terminology have gone beyond mere verbal
# descriptions of the relationship between trip distance and frequency.
pkgs <- c("gdata", "dplyr", "tidyr", "ggmap", "grid", "png", "knitr")
lapply(pkgs, library, character.only = T)
opts_knit$set(root.dir = "../")
opts_chunk$set(echo = FALSE)
```
# Abstract
\clearpage
# Introduction
Trips of short distances tend to be more frequent than trips of long distances.
This is a fundamental feature of transport behaviour, observed in the majority of
transport systems worldwide. The concept is neatly captured by the term
*distance decay* (*dd*) which means literally that
the proportion of trips of any given type tends to *decay*
to zero as distance tends to infinity.
```{r, echo=FALSE}
# Because speed
# is limited, trips of infinite distance would take an
# infinite amount of time, so
# *dd* can be seen as a universal law. It is the purpose of this paper to review
# existing
```
*Distance decay* --- along with the synonymous *deterence function* [@simini_universal_2012]
--- builds on older terms expressing the same fundamental truth,
such as the 'friction of distance' and the 'gravity model' (described in the
subsequent section).
Unlike these established terms, $dd$ encourages quantitative
exploration of *how* the proportion of trips
($p$) declines with distance ($d$) [@iacono_models_2008],
beyond the simple observation that the
slope is negative beyond a certain threshold distance.
A resergence of interest in the distance-frequency relationship has
revisited and refined and in some cases sought to overturn pre-existing
(and often context-specific) formulations, such as
$p = \alpha e^{\beta d}$ [@wilson_family_1971] and $p = \alpha d^{\beta}$
[@fotheringham_spatial_1981].
```{r, echo=FALSE}
# Look into changing this to a partial variable
# Convert ds in xs
# $$
# \frac{\delta p}{\delta d} < 0
# $$
# where $d$ is distance (km), $X$ is a set of explanatory variables, $tr$
# is the trip rate, expressed either in absolute numbers or as a proportion of
# all trips for a given trip distance and $e$ is stochastic error.
```
This equation can be refined.
Subscripts can be added to the terms, representing
how distance decay changes in response to a range of variables.
These variables operate at a range of levels.
The type of trip (e.g. time of day,
purpose), the characteristics of the people making the trip
(e.g. age and sex) and the location and physical surroundings of the
trip (e.g. hilliness, transport infrastructure) have all been found to
affect the shape of distance decay curves [@fingleton_beyond_2005; @fotheringham_spatial_1981].
Critically for understanding transport systems is the *mode*, or 'method'
of transport. Mode refers to the
choice of 'vehicle' (e.g. walking, cycling, car driving) and has been found,
unsurprisingly, to dramatically affect
the speed travelled during and amount of effort required for trips of different
distances. The *dd* concept is especially applicable to
active travel modes due to physiological limits on human mechanical power
output and the slow speed of these modes.
```{r, echo=FALSE}
# Todo: add some theoretical dd curves here
```
# The distance decay literature
A variety of terms have been used to express
the idea underlying the $dd$ concept outlined in this paper.
These including the 'first law of geography', the 'friction of distance'
and 'the gravity model'. It is worth reviewing these terms before describing
work refering to $dd$ directly, to put the term in its wider historical
context. The final part of this section reviews recent literature
explicitly using the term 'distance decay' to explore the concept
## The gravity model
The 'gravity model' of movement patterns
helped to quantify and generalise early incarnations if *dd* [@zipf_p1_1946].
This rule (it is sometimes referred to as the 'gravity law')
suggests that travel incentives are roughly analogous to
Newtonian gravitation [@ravenstein_law_1885]. The resulting formulae imply that
the total movement between two places
($T_{ij}$ between origin $i$ and destination $j$) is proportional
to the product of their populations ($m_i$ and $n_j$),
divided by a power of the distance between them:
$$
T_{ij} = \frac{m_i n_j}{d^\beta}
$$
as it has sometimes been called has been a rich source of theoretical and methodological advance in many fields, primarily
urban modelling but also in fields as diverse as highway planning [@jung_gravity_2008],
national transport of minerals [@Zuo2013]
and spatial epidemiology [@balcan_multiscale_2009].
## The radiation model
Despite dissenting voices --- including the statement that "a strict gravity
model simply did not work" for modelling urban systems and that some subsequent
refinements to the gravity model were "fudge factors" [@wilson_land-use/transport_1998] ---
the gravity model has been one of the dominant tools for understanding
urban mobility over the past 100 years [@masucci2013gravity]. A recent
development in this field has been the 'radiation model', which has been
found to fit travel to work and other flow data well
[@simini_universal_2012]. This new formula for estimating flow rates between geographic
zones is interesting in its ommission of distance as an explicit explanatory
variable. Instead, the radiation model uses the
number 'intervening opportunities' ($IO$)
as a proxy for $dd$ the denominator to estimate flow:
$$
dd \approx (m_i+s_{ij})(m_i+n_j+s_{ij})
$$
```{r, echo=FALSE}
# a concept that is familiar to conventional transportation planning practice as an
# integral part of methods to model the distribution of trips
```
A recent study compared the parameter-free radiation model against the
gravity model on a large intra-city (London) dataset on commuting. It was
found that neither model produced a satisfactory fit with the data,
leading to the conclusion that "commuting at the city scale still lacks a
valid model and that further research is required to understand
the mechanism behind urban mobility" [@masucci2013gravity].
The 'first law of geography' and the related concept of the 'friction
of distance' are alternative yet closely related (i.e. not mutually
exclusive) terms for exploring $dd$ th
## The first law of geography
Tobler's famous **first law of geography** states that
"everything is related to everything
else, but near things are more related than distant things" [@tobler_computer_1970].
The phrase implies that interaction between things declines
with distance without specifying how or why. Clearly,
the increased frequency of communications and transport between places that
are close to each other can help explain spatial autocorrelation
[@miller_toblers_2004-1]. However, unlike $dd$,
the 'first law' says little about the way in which
relatedness between entities declines with distace.
Moreover, in a world of accelerating globalisation
under the auspices of the ongoing 'digital revolution', the relevance of the
'first law' to the system-level processes it was proposed to explain has come under scrutiny [@westlund_brief_2013].
```{r, echo=FALSE}
# and related quantitative measures of Tobler's law
```
To overcome this issue, the scope of this paper is limited to transportation
of people and goods, opposed to immaterial communication.
This simplifies the issue and makes it more tangible.
Due to fundament physical limits on the efficiency with which matter
can be moved [@mackay_sustainable_2009], and a limited supply of energy
(especially pertinent in an era of resource over-extraction and
climate change), such physical transport
will always be limited to some degree by distance.
This notion of travel frequency tending to zero as distance tends to
infinity, present in the 'friction of distance' terminology, is also
encapsulated by the more recent phrase 'distance decay' used in this paper.
## The friction of distance
The concept of the 'friction of distance' has been in use for over 100
enjoying steady (albeit slowing) growth in use until the early 2000s
(Fig. 1). The term helps to explain why Tobler's Law is true,
implying that it is more *difficult* (e.g. in terms of energy use)
for distant things to interact. 'Friction' is thus defined as
"the difficulty of moving a volume (passengers or goods)"
[@Muvingi2012].
The friction of distance is predominantly used as a synonym for $dd$,
and to some extent the slowing use of the term illustrated in Fig. 1
can be seen as a consequence of the latter simply replacing the former.
@cliff_evaluating_1974, for example, refer to "the friction of distance
parameter in gravity models" whereas @griffith_explorations_1980 and @McCall2006
refer to 'distance decay' in spatial interaction and gravity models
respectively: the terms are largely interchangeable.
However, the question of precisely *how* the metaphorical friction
increases with distance has been tackled to a lesser extent in
the (generally older) literature using the term. The recent $dd$ literature,
by contrast, is largely focussed this question, as
described in the subsequent section.
## Recent distance decay literature
As illustrated in Fig. 1, $dd$ has grown rapidly as a term in the academic
literature over the past 50 years, compared with the well-established
"gravity model" terminology. By the 1970s, it seems that
$dd$ overtook the "friction of distance" amongst transport researchers.
Although Tobler's 'First Law' of geography has gained rapid acceptance
since its inception in the 1970s, it has a far lower (by a factor
of 5) rate of use than $dd$ in the transport literature. In other words,
a quantitative review of the literature demonstrates that $dd$ is an
up-and-coming term in transport studies. This section does not attempt
a full overview of the 10,000+ articles using the $dd$
terminology in the academic literature, instead focussing on a few key
and highly cited works that helped set the agenda for $dd$ research.
"friction of distance" terms. (which tends to assign importance to interaction
between places rather than the impact of distance)
```{r, echo=FALSE}
brt <- theme(axis.text.x = element_text(angle=20, color = "black"),
panel.background = element_rect(fill = "white"),
panel.grid.major.y = element_line(color = "grey", linetype=3))
ddt <- read.csv("input-data/dd-terms-scholar.csv")
ggplot(ddt) +
geom_line(aes(x = Decade, y = Frequency, color = Term, group = Term)) +
scale_y_log10() + brt +
ylab("Number of 'transport' papers in Google Scholar")
```
# Defining distance decay
This section moves from verbal definitions of *dd* towards formal definitions
which enable the functional form of distance decay curves to be derived using
regression analysis. Within the general understanding that trips become less
frequent with distance (beyond some threshold limit) there are
various ways of describing the dependent variable that *dd* functions seek to
explain. *dd* can be understood as:
- the absolute
number of trips expected for any given distance band ($db$):
$f(d) = T_{db}$
- the proportion of *all* trips that are made for a given distance band:
$f(d) = T_{db} / T$
- the proportion of trips *within a given distance band* made by a particular trip
type (e.g. walking): \newline $f(d) = Twalk_{db} / T_{db}$
Of these definitions the second is the most generalisable, being a proportion
($0 \geq p \leq 1$)
that tends to zero with increasing distance for any mode of transport.
The third definition of *dd* also estimates a proportion with the same constraints,
but is less generalisable: the result will not always tend to zero
(the proportion of trips by air travel, for example, will tend to one for
large trip distances). The first and simplest definition is the
least generalisable as it is highly dependent on the total amount of travel.
For each of these definitions the *functional form* of the *dd* equation
can take many different forms. These are explored in the next section.
<!-- We use a variety of datasets for this study. Primarily, we use data on distance -->
<!-- vs *clc* (current level of cycling) derived from the census (Fig. 1). -->
```{r, fig.cap="The relationship between distance and current level of cycling amongst 8 different groups."}
f <- "data-sources/150304_SegmentationCompare_8cat.xlsx"
nts_seg <- read.xls(f, sheet = 2) # load in the .xlsx file with dist:pcycle values
nts_seg <- rename(nts_seg, dist = X) # rename the "X" column as dist (miles)
ntss <- gather(nts_seg, segment, clc, -dist)
p1 <- ggplot(aes(x = dist), data = ntss) + geom_line(aes(y = clc)) +
facet_wrap(~ segment)
```
# Functional forms of distance decay
[@martinez_new_2013] identified 4 main functional forms commonly used in the
literature to characterise distance decay curves:
- Exponential functions, $e^{\beta x}$ [e.g. @wilson_family_1971]
- Power functions, $x^{\beta}$ [e.g. @fotheringham_spatial_1981]
- Tanner functions, $x^{\beta_1}e^{\beta_2 x}$ [e.g. @openshaw_optimal_1977]
- Box-Cox functions, $exp(\beta \frac{x^{\gamma} - 1}{\gamma})$ when the parameter $\gamma \neq 0$ and $x^{\beta}$ when $\gamma = 0$
In addition to these, @martinez_new_2013 advocated their own distance decay function, a modified version of the generalised logistic or 'Richards' function:
$$
f(x) = \frac{1}{(1 + Qe^{-B(x - M)})^{1/v}}
$$
Each of the 4 parameters in this model can be altered to ensure optimal
fit and each has a separate meaning:
- $B$ is analogous to $\beta$ in the other functions
- $Q$ represents the intercept
- $v$ affects the skewness of the curve
- $M$ is the *x* value of maximum growth
## A linear model
The simplest model to fit to the data is a linear model. For illustrative
purposes we will fit a linear model to the data presented in Fig. 1,
with different intercepts and gradients for each of the groups (Fig. 2):
```{r, fig.cap="Linear model (red lines) fitted to the data, with intercept and gradient parameters estimated for each group."}
# nb: the '*' means estimate all par
m1 <- lm(clc ~ dist * segment, data = ntss)
ntss$m1 <- m1$fitted.values
p1 + geom_line(aes(y = m1$fitted.values), color = "red")
```
The intercepts and gradients for each group are presented in table TODOxx:
```{r, include=FALSE}
summary(m1)
```
The above numbers are equations that describe the relationship between
distance and *clc* for each group. In the `Mal_Young_NC` group, for example,
$$
clc ~ (0.0941 - 0.0682) + (-0.00377 + 0.00301) * dist
$$
Let's double-check this makes sense:
```{r, fig.cap="Demonstration of the meaning of the parameters produced by the 'summary()' function for the linear model (m1) for the group 'Mal_Young_NC'.", fig.height=3, fig.width=4}
mync <- filter(ntss, segment == "Mal_Young_NC")
mync$m1 <- (0.0941 - 0.0682) + (-0.00377 + 0.00301) * mync$dist
plot(mync$dist, mync$clc)
lines(mync$dist, mync$m1)
```
Considering the linear model is so simple, an adjusted R squared value of
0.58 is not bad!
Now we will progress to fit slightly more complicated polynomial models.
## Cubic polynomial models
The results of the cubic models are displayed in Fig. 4.
```{r, fig.cap="Different versions of the cubic polynomial distance decay function fitted to the data with per-covariate parameters estimated for linear, square, cubic and all power terms (red, green, blue and yellow lines respectively)."}
m2 <- lm(clc ~ dist * segment + I(dist^2) + I(dist^3), data = ntss)
m3 <- lm(clc ~ dist + I(dist^2) * segment + I(dist^3), data = ntss)
m4 <- lm(clc ~ dist + I(dist^2) + I(dist^3) * segment, data = ntss)
m5 <- lm(clc ~ dist * segment + I(dist^2) * segment + I(dist^3) *
segment, data = ntss)
p1 + geom_line(aes(y = m2$fitted.values), color = "red") +
geom_line(aes(y = m3$fitted.values), color = "green") +
geom_line(aes(y = m4$fitted.values), color = "blue") +
geom_line(aes(y = m5$fitted.values), color = "yellow", size = 3, alpha = 0.5)
```
Note that there are different ways to fit parameters to the model: we can change
one parameter value for every group, or we can change many. In the finat model
presented in yellow if Fig. 4, we changed all 4 parameters in response to every
group. Thus we have calculated 32 parameter values! This is not a problem: we
can extract each formula from the coefficients. Lets extract them for the
`Fem_Old_NC` group, for example:
```{r}
c5 <- coefficients(m5)
c5[grep("Fem_Old_NC", names(c5))]
```
```{r, echo=FALSE, eval=FALSE}
# Which model fits best
summary(m1)
summary(m2)
summary(m3)
summary(m4)
summary(m5)
```
Of the cubic models fitted, the one with 32 parameters (8 for each parameter in
the general model) fits by far the best with the data, with an adjusted R-squared
value of 0.89.
From the preceeding analysis, it is clear that a 4 parameter polynomial model
fits sufficiently well for modelling: after all, we are interested in a simple
way to increase the update of cycling not *fit curves to the current rate of cycling*.
However, let's exploring how much better the curve can fit our data,
which are admitedly noisy for the small groups.
## Cubic polynomial with cube and square root terms
The model fit (in terms of adjusted R-squared) is improved slightly by adding
sqare-root, cubic-root and square- cubic-root terms, from 0.89 to 0.91 (0.914,
0.913, and 0.917 respectively). Interestingly,
the model fit barely changes between square-root, cube-root or cube- and square-root
versions of the model and introduces some
unexpected inflection points (wiggles in the red line) in the fitted
curve (Fig. 5). This implies that a 6 parameter model is overkill
and unnecessarily complex. Can the problems of overfitting and overcomplexity
be resolved by using a different distance decay function?
That is the subject of the next section.
```{r, fig.cap="6 parameter cubic polynomial model with cube and square root terms", fig.height=3, fig.width=4}
m6 <- lm(clc ~ dist * segment + I(dist^2) * segment + I(dist^3) *
segment + I(dist^0.333) * segment, data = ntss)
m7 <- lm(clc ~ dist * segment + I(dist^2) * segment + I(dist^3) *
segment + I(dist^0.5) * segment, data = ntss)
m8 <- lm(clc ~ dist * segment + I(dist^2) * segment + I(dist^3) *
segment + I(dist^0.5) * segment + I(dist^0.333) + segment, data = ntss)
p1 + geom_line(aes(y = m8$fitted.values), color = "red")
# summary(m6) ; summary(m7); summary(m8)
length(coefficients(m6))
```
## Fitting to the log of pcycle
Because of the non-linearity evident in distance decay, fitting to the log
of the cycling level may be appropriate. A problem with this approach is that
it fails to capture the fact that the rate of cycling can peak with a distance
above 0. This could be problematic with respect to cycling's interaction with
walking but, as long there is a sufficient number of short trips in the
"training" data, this should not be a serious problem. A potential solution
would be to cap the probability of cycling at a specific level. Using
the log-square root function in a logit model could also solve this issue.
```{r, echo=FALSE}
# TODO: fit to clc = log(d)
# pcycle = a * exp(-k * d)
```
$$
clc = \alpha e^{\beta d}
$$
$$
log(clc) = log(\alpha) + \beta d
$$
This fits reasonably well with the data...
$$
clc = \alpha e^{\beta d + \gamma d^{1/2}}
$$
$$
log(clc) = log(\alpha) + \beta d + \gamma d^{1/2}
$$
```{r, fig.cap="Log-linear approaches to distance decay with 2 and 3 parameter models (red and green lines respectively)."}
ntss$clc[ntss$clc == 0] <- 0.0001
m11 <- lm(log(clc) ~ dist * segment, data = ntss)
m12 <- lm(log(clc) ~ dist * segment + I(dist^2) * segment, data = ntss, )
p1 + geom_line(aes(y = exp(m11$fitted.values)), color = "red") +
geom_line(aes(y = exp(m12$fitted.values)), color = "green") + ylim(c(0,0.15))
```
```{r, echo=FALSE}
# Testing what happens when the distance goes very high
xtest <- mutate(ntss, dist = dist^2)
predm12 <- predict(object = m12, newdata = xtest)
pred12 <- lm(log(clc) ~ dist * segment + I(dist^0.5) * segment, data = xtest)
p12 <- glm(log(clc) ~ dist * segment + I(dist^0.5) * segment, data = ntss)
p1 + geom_line(aes(y = exp(m11$fitted.values)), color = "red") +
geom_line(aes(y = exp(predm12)), color = "green") + ylim(c(0,0.15))
p1 + geom_line(aes(y = exp(m11$fitted.values)), color = "red") +
geom_line(aes(x = dist^2, y = exp(p12$fitted.values)), color = "green") + ylim(c(0,0.15))
```
As can be seen from the output of the log-linear-square-root model,
the parameters change in a predictable way to changes in the shape of the distance
decay curve:
```{r, include=FALSE}
summary(m12)
```
```{r, echo=FALSE, eval=FALSE}
summary(m11) # model fit - not shown
summary(m12) # model fit - not shown
summary(m13) # model fit - not shown
# m14 <- lm(log(clc) ~ dist + I(dist^2) * segment, data = ntss)
# summary(m14) # model fit - not shown
# geom_line(aes(y = m13$m$fitted()), color = "blue")
# Models not understood
# m12 <- lm(clc ~ -1 + dist / I(exp(dist)) * segment, data = ntss)
# m11 <- lm(clc ~ dist / I(exp(dist)) * segment, data = ntss)
# m12 <- lm(log(clc) ~ dist + I(exp(dist)) * segment, data = ntss)
# m13 <- lm(log(clc) ~ dist + I(log(dist)) * segment, data = ntss)
```
As shown in Fig. 7, the parameters of the 'log linear square-root' are meaningful.
The intercept, log-lin and log-sqrt terms affect the intercept, short-term distance
decay and longer-term distance decay respectively. The function fits the census flow data quite well.
```{r, fig.cap="Impacts of changing the parameters of the 'log-linear-square-root' function. The green lines represent increases of 0.25 times the the original values derived from the Leeds dataset and the red lines represent -0.1 times the original value.", echo=FALSE}
grid.raster(readPNG("figures/log-sqrt-params.png"))
```
## Exponential decay
A feature present in all relationships between frequency and distance
of trip is decay to zero as distance increases to infinity. This is especially
the case with active travel modes such as cycling. Therefore some kind of
exponential decay function may be suitable. This section fits various
functions that have an exponential decay term to the data to see which perform
well.
The simplest type of exponential decay model is one in which a linear
term (dominant in over short distances) is combined with an exponential decay:
$$
clc = \frac{\beta_1 dist}{exp(\beta_2)}
$$
Where each $\beta$ term is estimated for each group, as with the twin parameter
linear model illustrated Fig. 1. We can fit this to the data using
*non-linear regression* (e.g. by using `nls()` in R).
```{r}
od_all = readRDS("~/npct/pct-outputs-regional-R/commute/msoa/cambridgeshire/l.Rds")@data %>% as_data_frame()
od = select(od_all, id, all, bicycle, d = rf_dist_km) %>%
mutate(., pcycle = bicycle / all)
dd_exp1 = function(x, a1, a2, a0 = 0) {
px = a1 * x
ex = exp(a2 * x)
px / ex + a0
}
measure_fit <- function(par, mod, data) {
if(length(par) == 1) {
diff <- data$pcycle - mod(data$d, par[1])
} else if(length(par) == 2) {
diff <- data$pcycle - mod(data$d, par[1], par[2])
} else if(length(par) == 3) {
diff <- data$pcycle - mod(data$d, par[1], par[2], par[3])
} else {
diff <- data$pcycle - mod(data$d, par[1], par[2], par[3], par[4])
}
sqrt(mean(diff ^ 2))
}
par = c(2, 1.5)
fit = optim(par, fn = measure_fit, data = od, mod = dd_exp1)
plot(od$d, dd_exp1(od$d, fit$par[1], fit$par[2]))
```
# Characterising distance decay curves
From the preceding analysis the log-linear square root equation (*LSQR*) is the
preferred *dd* functional form, of the options tested so far. The subsequent
analysis therefore uses *LSQR* as the basis for exploring the key feature of
*dd* curves, although the the same analysis could be applied to alternative
functions.
# Applications
Using the functional forms for distance decay as an input to
predict binary outcomes is relatively simple.
"We can transform the output of a linear regression to be
suitable for probabilities by using a logit link
function" [@manning_logistic_2007]:
$$
logit(p) = ln \left( \frac{p}{1 - p} \right)
$$
We can use this knowledge to fit the aformentioned models to raw NTS
data, with "Cycle trip" as a binary variable, for example using
`family = "binomial"` in R's `glm()` function.
# Discussion and conclusions
# References