documents/dd-paper.Rmd

---
title: 'Distance decay: assessing existing and novel functional forms for modeling uptake of active transport'
author: "Robin Lovelace & David Lovelace"
output:
  word_document: default
  html_document: 
    keep_md: yes
    toc: yes
  pdf_document:
    fig_caption: yes
    toc: yes
bibliography: references.bib
---

```{r, include=FALSE}
# title: 'Distance decay in active travel: an exploration of functional forms
  # for modelling individual and aggregate level data'

# Researchers using the *dd* terminology have gone beyond mere verbal
# descriptions of the relationship between trip distance and frequency.
pkgs <- c("gdata", "dplyr", "tidyr", "ggmap", "grid", "png", "knitr")
lapply(pkgs, library, character.only = T)
opts_knit$set(root.dir = "../")
opts_chunk$set(echo = FALSE)
```

# Abstract

\clearpage

# Introduction

Trips of short distances tend to be more frequent than trips of long distances.
This is a fundamental feature of transport behaviour, observed in the majority of
transport systems worldwide. The concept is neatly captured by the term
*distance decay* (*dd*) which means literally that
the proportion of trips of any given type tends to *decay*
to zero as distance tends to infinity. 

```{r, echo=FALSE}
# Because speed
# is limited, trips of infinite distance would take an
# infinite amount of time, so
# *dd* can be seen as a universal law. It is the purpose of this paper to review
# existing 
```


*Distance decay* --- along with the synonymous *deterence function* [@simini_universal_2012]
--- builds on older terms expressing the same fundamental truth,
such as the 'friction of distance' and the 'gravity model' (described in the
subsequent section). 
Unlike these established terms, $dd$ encourages quantitative
exploration of *how* the proportion of trips
($p$) declines with distance ($d$) [@iacono_models_2008],
beyond the simple observation that the
slope is negative beyond a certain threshold distance. 
A resergence of interest in the distance-frequency relationship has
revisited and refined and in some cases sought to overturn pre-existing
(and often context-specific) formulations, such as
$p = \alpha e^{\beta d}$ [@wilson_family_1971] and $p = \alpha d^{\beta}$
[@fotheringham_spatial_1981].


```{r, echo=FALSE}
# Look into changing this to a partial variable 
# Convert ds in xs
# $$
# \frac{\delta p}{\delta d} < 0 
# $$
# where $d$ is distance (km), $X$ is a set of explanatory variables, $tr$
# is the trip rate, expressed either in absolute numbers or as a proportion of
# all trips for a given trip distance and $e$ is stochastic error.
```

This equation can be refined.
Subscripts can be added to the terms, representing
how distance decay changes in response to a range of variables.
These variables operate at a range of levels.
The type of trip (e.g. time of day,
purpose), the characteristics of the people making the trip
(e.g. age and sex) and the location and physical surroundings of the
trip (e.g. hilliness, transport infrastructure) have all been found to
affect the shape of distance decay curves [@fingleton_beyond_2005; @fotheringham_spatial_1981].
Critically for understanding transport systems is the *mode*, or 'method'
of transport. Mode refers to the
choice of 'vehicle' (e.g. walking, cycling, car driving) and has been found,
unsurprisingly, to dramatically affect
the speed travelled during and amount of effort required for trips of different
distances. The *dd* concept is especially applicable to
active travel modes due to physiological limits on human mechanical power
output and the slow speed of these modes.

```{r, echo=FALSE}
# Todo: add some theoretical dd curves here
```

# The distance decay literature

A variety of terms have been used to express
the idea underlying the $dd$ concept outlined in this paper.
These including the 'first law of geography', the 'friction of distance'
and 'the gravity model'. It is worth reviewing these terms before describing
work refering to $dd$ directly, to put the term in its wider historical
context. The final part of this section reviews recent literature
explicitly using the term 'distance decay' to explore the concept

## The gravity model

The 'gravity model' of movement patterns
helped to quantify and generalise early incarnations if *dd* [@zipf_p1_1946].
This rule (it is sometimes referred to as the 'gravity law')
suggests that travel incentives are roughly analogous to
Newtonian gravitation [@ravenstein_law_1885]. The resulting formulae imply that
the total movement between two places
($T_{ij}$ between origin $i$ and destination $j$) is proportional
to the product of their populations ($m_i$ and $n_j$),
divided by a power of the distance between them:

$$
T_{ij} = \frac{m_i n_j}{d^\beta}
$$

as it has sometimes been called has been a rich source of theoretical and methodological advance in many fields, primarily
urban modelling but also in fields as diverse as highway planning [@jung_gravity_2008],
national transport of minerals [@Zuo2013]
and spatial epidemiology [@balcan_multiscale_2009].

## The radiation model

Despite dissenting voices --- including the statement that "a strict gravity
model simply did not work" for modelling urban systems and that some subsequent
refinements to the gravity model were "fudge factors" [@wilson_land-use/transport_1998] ---
the gravity model has been one of the dominant tools for understanding
urban mobility over the past 100 years [@masucci2013gravity]. A recent
development in this field has been the 'radiation model', which has been
found to fit travel to work and other flow data well
[@simini_universal_2012]. This new formula for estimating flow rates between geographic
zones is interesting in its ommission of distance as an explicit explanatory
variable. Instead, the radiation model uses the
number 'intervening opportunities' ($IO$)
as a proxy for $dd$ the denominator to estimate flow:

$$
dd \approx (m_i+s_{ij})(m_i+n_j+s_{ij})
$$

```{r, echo=FALSE}
# a concept that is familiar to conventional transportation planning practice as an
# integral part of methods to model the distribution of trips 
```

A recent study compared the parameter-free radiation model against the
gravity model on a large intra-city (London) dataset on commuting. It was
found that neither model produced a satisfactory fit with the data,
leading to the conclusion that "commuting at the city scale still lacks a
valid model and that further research is required to understand
the mechanism behind urban mobility" [@masucci2013gravity].
The 'first law of geography' and the related concept of the 'friction
of distance' are alternative yet closely related (i.e. not mutually
exclusive) terms for exploring $dd$ th

## The first law of geography

Tobler's famous **first law of geography** states that
"everything is related to everything
else, but near things are more related than distant things" [@tobler_computer_1970].
The phrase implies that interaction between things declines
with distance without specifying how or why. Clearly,
the increased frequency of communications and transport between places that
are close to each other can help explain spatial autocorrelation
 [@miller_toblers_2004-1]. However, unlike $dd$,
the 'first law' says little about the way in which
relatedness between entities declines with distace.
Moreover, in a world of accelerating globalisation
under the auspices of the ongoing 'digital revolution', the relevance of the
'first law' to the system-level processes it was proposed to explain has come under scrutiny [@westlund_brief_2013].

```{r, echo=FALSE}
# and related quantitative measures of Tobler's law
```

To overcome this issue, the scope of this paper is limited to transportation
of people and goods, opposed to immaterial communication.
This simplifies the issue and makes it more tangible.
Due to fundament physical limits on the efficiency with which matter
can be moved [@mackay_sustainable_2009], and a limited supply of energy
(especially pertinent in an era of resource over-extraction and
climate change), such physical transport
will always be limited to some degree by distance.
This notion of travel frequency tending to zero as distance tends to
infinity, present in the 'friction of distance' terminology, is also
encapsulated by the more recent phrase 'distance decay' used in this paper.

## The friction of distance

The concept of the 'friction of distance' has been in use for over 100
enjoying steady (albeit slowing) growth in use until the early 2000s
(Fig. 1). The term helps to explain why Tobler's Law is true,
implying that it is more *difficult* (e.g. in terms of energy use)
for distant things to interact. 'Friction' is thus defined as
"the difficulty of moving a volume (passengers or goods)"  
[@Muvingi2012].
The friction of distance is predominantly used as a synonym for $dd$,
and to some extent the slowing use of the term illustrated in Fig. 1
can be seen as a consequence of the latter simply replacing the former.
@cliff_evaluating_1974, for example, refer to "the friction of distance
parameter in gravity models" whereas @griffith_explorations_1980 and @McCall2006 
refer to 'distance decay' in spatial interaction and gravity models
respectively: the terms are largely interchangeable.
However, the question of precisely *how* the metaphorical friction 
increases with distance has been tackled to a lesser extent in
the (generally older) literature using the term. The recent $dd$ literature,
by contrast, is largely focussed this question, as
described in the subsequent section.

## Recent distance decay literature

As illustrated in Fig. 1, $dd$ has grown rapidly as a term in the academic
literature over the past 50 years, compared with the well-established
"gravity model" terminology. By the 1970s, it seems that 
$dd$ overtook the "friction of distance" amongst transport researchers.
Although Tobler's 'First Law' of geography has gained rapid acceptance
since its inception in the 1970s, it has a far lower (by a factor
of 5) rate of use than $dd$ in the transport literature. In other words,
a quantitative review of the literature demonstrates that $dd$ is an
up-and-coming term in transport studies. This section does not attempt
a full overview of the 10,000+ articles using the $dd$
terminology in the academic literature, instead focussing on a few key
and highly cited works that helped set the agenda for $dd$ research.

"friction of distance" terms. (which tends to assign importance to interaction
between places rather than the impact of distance)

```{r, echo=FALSE}
brt <-  theme(axis.text.x = element_text(angle=20, color = "black"),
        panel.background = element_rect(fill = "white"),
        panel.grid.major.y = element_line(color = "grey", linetype=3))
ddt <- read.csv("input-data/dd-terms-scholar.csv")
ggplot(ddt) +
  geom_line(aes(x = Decade, y = Frequency, color = Term, group = Term)) +
  scale_y_log10() + brt +
  ylab("Number of 'transport' papers in Google Scholar")
```


# Defining distance decay

This section moves from verbal definitions of *dd* towards formal definitions
which enable the functional form of distance decay curves to be derived using
regression analysis. Within the general understanding that trips become less
frequent with distance (beyond some threshold limit) there are
various ways of describing the dependent variable that *dd* functions seek to
explain. *dd* can be understood as:

- the absolute
number of trips expected for any given distance band ($db$):
$f(d) = T_{db}$
- the proportion of *all* trips that are made for a given distance band:
$f(d) = T_{db} / T$
- the proportion of trips *within a given distance band* made by a particular trip
type (e.g. walking): \newline $f(d) = Twalk_{db} / T_{db}$

Of these definitions the second is the most generalisable, being a proportion
($0 \geq p \leq 1$)
that tends to zero with increasing distance for any mode of transport.
The third definition of *dd* also estimates a proportion with the same constraints,
but is less generalisable: the result will not always tend to zero
(the proportion of trips by air travel, for example, will tend to one for
large trip distances). The first and simplest definition is the
least generalisable as it is highly dependent on the total amount of travel.

For each of these definitions the *functional form* of the *dd* equation
can take many different forms. These are explored in the next section.


<!-- We use a variety of datasets for this study. Primarily, we use data on distance -->
<!-- vs *clc* (current level of cycling) derived from the census (Fig. 1). -->

```{r, fig.cap="The relationship between distance and current level of cycling amongst 8 different groups."}
f <- "data-sources/150304_SegmentationCompare_8cat.xlsx"
nts_seg <- read.xls(f, sheet = 2) # load in the .xlsx file with dist:pcycle values
nts_seg <- rename(nts_seg, dist = X) # rename the "X" column as dist (miles)
ntss <- gather(nts_seg, segment, clc, -dist)
p1 <- ggplot(aes(x = dist), data = ntss) + geom_line(aes(y = clc)) +
  facet_wrap(~ segment)
```

# Functional forms of distance decay

[@martinez_new_2013] identified 4 main functional forms commonly used in the
literature to characterise distance decay curves:

- Exponential functions, $e^{\beta x}$ [e.g. @wilson_family_1971]
- Power functions, $x^{\beta}$ [e.g. @fotheringham_spatial_1981]
- Tanner functions, $x^{\beta_1}e^{\beta_2 x}$ [e.g. @openshaw_optimal_1977]
- Box-Cox functions, $exp(\beta \frac{x^{\gamma} - 1}{\gamma})$ when the parameter $\gamma \neq 0$ and $x^{\beta}$ when $\gamma = 0$ 

In addition to these, @martinez_new_2013 advocated their own distance decay function, a modified version of the generalised logistic or 'Richards' function:

$$
f(x) = \frac{1}{(1 + Qe^{-B(x - M)})^{1/v}}
$$

Each of the 4 parameters in this model can be altered to ensure optimal
fit and each has a separate meaning:

- $B$ is analogous to $\beta$ in the other functions
- $Q$ represents the intercept
- $v$ affects the skewness of the curve
- $M$ is the *x* value of maximum growth


## A linear model

The simplest model to fit to the data is a linear model. For illustrative
purposes we will fit a linear model to the data presented in Fig. 1,
with different intercepts and gradients for each of the groups (Fig. 2):

```{r, fig.cap="Linear model (red lines) fitted to the data, with intercept and gradient parameters estimated for each group."}
# nb: the '*' means estimate all par
m1 <- lm(clc ~ dist * segment, data = ntss) 
ntss$m1 <- m1$fitted.values
p1 + geom_line(aes(y = m1$fitted.values), color = "red")
```

The intercepts and gradients for each group are presented in table TODOxx:

```{r, include=FALSE}
summary(m1)
```

The above numbers are equations that describe the relationship between
distance and *clc* for each group. In the `Mal_Young_NC` group, for example,

$$
clc ~ (0.0941 - 0.0682) + (-0.00377 + 0.00301)  * dist
$$

Let's double-check this makes sense: 

```{r, fig.cap="Demonstration of the meaning of the parameters produced by the 'summary()' function for the linear model (m1) for the group 'Mal_Young_NC'.", fig.height=3, fig.width=4}
mync <- filter(ntss, segment == "Mal_Young_NC")
mync$m1 <- (0.0941 - 0.0682) + (-0.00377 + 0.00301)  * mync$dist
plot(mync$dist, mync$clc)
lines(mync$dist, mync$m1)
```


Considering the linear model is so simple, an adjusted R squared value of
0.58 is not bad! 

Now we will progress to fit slightly more complicated polynomial models.

## Cubic polynomial models

The results of the cubic models are displayed in Fig. 4.

```{r, fig.cap="Different versions of the cubic polynomial distance decay function fitted to the data with per-covariate parameters estimated for linear, square, cubic and all power terms (red, green, blue and yellow lines respectively)."}
m2 <- lm(clc ~ dist * segment + I(dist^2) + I(dist^3), data = ntss) 
m3 <- lm(clc ~ dist + I(dist^2) * segment + I(dist^3), data = ntss) 
m4 <- lm(clc ~ dist + I(dist^2) + I(dist^3) * segment, data = ntss) 
m5 <- lm(clc ~ dist * segment + I(dist^2) * segment + I(dist^3) * 
    segment, data = ntss) 
p1 + geom_line(aes(y = m2$fitted.values), color = "red") +
  geom_line(aes(y = m3$fitted.values), color = "green") +
  geom_line(aes(y = m4$fitted.values), color = "blue") +
  geom_line(aes(y = m5$fitted.values), color = "yellow", size = 3, alpha = 0.5)
```

Note that there are different ways to fit parameters to the model: we can change
one parameter value for every group, or we can change many. In the finat model
presented in yellow if Fig. 4, we changed all 4 parameters in response to every
group. Thus we have calculated 32 parameter values! This is not a problem: we
can extract each formula from the coefficients. Lets extract them for the
`Fem_Old_NC` group, for example:

```{r}
c5 <- coefficients(m5)
c5[grep("Fem_Old_NC", names(c5))]
```

```{r, echo=FALSE, eval=FALSE}
# Which model fits best
summary(m1)
summary(m2)
summary(m3)
summary(m4)
summary(m5)
```

Of the cubic models fitted, the one with 32 parameters (8 for each parameter in
the general model) fits by far the best with the data, with an adjusted R-squared
value of 0.89.

From the preceeding analysis, it is clear that a 4 parameter polynomial model
fits sufficiently well for modelling: after all, we are interested in a simple
way to increase the update of cycling not *fit curves to the current rate of cycling*.

However, let's exploring how much better the curve can fit our data,
which are admitedly noisy for the small groups. 

## Cubic polynomial with cube and square root terms

The model fit (in terms of adjusted R-squared) is improved slightly by adding
sqare-root, cubic-root and square- cubic-root terms, from 0.89 to 0.91 (0.914, 
0.913, and 0.917 respectively). Interestingly,
the model fit barely changes between square-root, cube-root or cube- and square-root
versions of the model and introduces some
unexpected inflection points (wiggles in the red line) in the fitted
curve (Fig. 5). This implies that a 6 parameter model is overkill
and unnecessarily complex. Can the problems of overfitting and overcomplexity
be resolved by using a different distance decay function?
That is the subject of the next section.

```{r, fig.cap="6 parameter cubic polynomial model with cube and square root terms", fig.height=3, fig.width=4}
m6 <- lm(clc ~ dist * segment + I(dist^2) * segment + I(dist^3) * 
    segment + I(dist^0.333) * segment, data = ntss)
m7 <- lm(clc ~ dist * segment + I(dist^2) * segment + I(dist^3) * 
    segment + I(dist^0.5) * segment, data = ntss)
m8 <- lm(clc ~ dist * segment + I(dist^2) * segment + I(dist^3) * 
    segment + I(dist^0.5) * segment + I(dist^0.333) + segment, data = ntss) 
p1 + geom_line(aes(y = m8$fitted.values), color = "red") 
# summary(m6) ; summary(m7); summary(m8)
length(coefficients(m6))
```

## Fitting to the log of pcycle

Because of the non-linearity evident in distance decay, fitting to the log
of the cycling level may be appropriate. A problem with this approach is that
it fails to capture the fact that the rate of cycling can peak with a distance
above 0. This could be problematic with respect to cycling's interaction with
walking but, as long there is a sufficient number of short trips in the
"training" data, this should not be a serious problem. A potential solution
would be to cap the probability of cycling at a specific level. Using
the log-square root function in a logit model could also solve this issue.

```{r, echo=FALSE}
# TODO: fit to clc = log(d)
# pcycle = a * exp(-k * d)
```

$$
clc = \alpha e^{\beta d}
$$

$$
log(clc) =  log(\alpha) + \beta d
$$

This fits reasonably well with the data...

$$
clc = \alpha e^{\beta d + \gamma  d^{1/2}}
$$

$$
log(clc) = log(\alpha) + \beta d + \gamma  d^{1/2}
$$


```{r, fig.cap="Log-linear approaches to distance decay with 2 and 3 parameter models (red and green lines respectively)."}
ntss$clc[ntss$clc == 0] <- 0.0001
m11 <- lm(log(clc) ~ dist * segment, data = ntss)
m12 <- lm(log(clc) ~ dist * segment + I(dist^2) * segment, data = ntss, )
p1 + geom_line(aes(y = exp(m11$fitted.values)), color = "red") +
  geom_line(aes(y = exp(m12$fitted.values)), color = "green") + ylim(c(0,0.15))
```

```{r, echo=FALSE}
# Testing what happens when the distance goes very high
xtest <- mutate(ntss, dist = dist^2)
predm12 <- predict(object = m12, newdata = xtest)

pred12 <- lm(log(clc) ~ dist * segment + I(dist^0.5) * segment, data = xtest)
p12 <- glm(log(clc) ~ dist * segment + I(dist^0.5) * segment, data = ntss)

p1 + geom_line(aes(y = exp(m11$fitted.values)), color = "red") +
  geom_line(aes(y = exp(predm12)), color = "green") + ylim(c(0,0.15))

p1 + geom_line(aes(y = exp(m11$fitted.values)), color = "red") +
  geom_line(aes(x = dist^2, y = exp(p12$fitted.values)), color = "green") + ylim(c(0,0.15))
```


As can be seen from the output of the log-linear-square-root model,
the parameters change in a predictable way to changes in the shape of the distance
decay curve:

```{r, include=FALSE}
summary(m12)
```


```{r, echo=FALSE, eval=FALSE}
summary(m11) # model fit - not shown
summary(m12) # model fit - not shown
summary(m13) # model fit - not shown

# m14 <- lm(log(clc) ~ dist + I(dist^2) * segment, data = ntss)
# summary(m14) # model fit - not shown
  # geom_line(aes(y = m13$m$fitted()), color = "blue") 
# Models not understood
# m12 <- lm(clc ~ -1 + dist / I(exp(dist)) * segment, data = ntss)
# m11 <- lm(clc ~ dist / I(exp(dist)) * segment, data = ntss)
# m12 <- lm(log(clc) ~ dist + I(exp(dist)) * segment, data = ntss)
# m13 <- lm(log(clc) ~ dist + I(log(dist)) * segment, data = ntss)
```

As shown in Fig. 7, the parameters of the 'log linear square-root' are meaningful.
The intercept, log-lin and log-sqrt terms affect the intercept, short-term distance
decay and longer-term distance decay respectively. The function fits the census flow data quite well.

```{r, fig.cap="Impacts of changing the parameters of the 'log-linear-square-root' function. The green lines represent increases of 0.25 times the the original values derived from the Leeds dataset and the red lines represent -0.1 times the original value.", echo=FALSE}
grid.raster(readPNG("figures/log-sqrt-params.png"))
```

## Exponential decay

A feature present in all relationships between frequency and distance
of trip is decay to zero as distance increases to infinity. This is especially
the case with active travel modes such as cycling. Therefore some kind of
exponential decay function may be suitable. This section fits various
functions that have an exponential decay term to the data to see which perform
well.

The simplest type of exponential decay model is one in which a linear
term (dominant in over short distances) is combined with an exponential decay:

$$
clc = \frac{\beta_1 dist}{exp(\beta_2)} 
$$

Where each $\beta$ term is estimated for each group, as with the twin parameter
linear model illustrated Fig. 1. We can fit this to the data using
*non-linear regression* (e.g. by using `nls()` in R).

```{r}
od_all = readRDS("~/npct/pct-outputs-regional-R/commute/msoa/cambridgeshire/l.Rds")@data %>% as_data_frame()
od = select(od_all, id, all, bicycle, d = rf_dist_km) %>% 
  mutate(., pcycle = bicycle / all)

dd_exp1 = function(x, a1, a2, a0 = 0) {
  px = a1 * x
  ex = exp(a2 * x)
  px / ex + a0
}
measure_fit <- function(par, mod, data) {
  if(length(par) == 1) {
    diff <- data$pcycle - mod(data$d, par[1])
  } else if(length(par) == 2) {
    diff <- data$pcycle - mod(data$d, par[1], par[2])
  } else if(length(par) == 3) {
    diff <- data$pcycle - mod(data$d, par[1], par[2], par[3])
  } else {
    diff <- data$pcycle - mod(data$d, par[1], par[2], par[3], par[4])
  }
  sqrt(mean(diff ^ 2))
}
par = c(2, 1.5)
fit = optim(par, fn = measure_fit, data = od, mod = dd_exp1)
plot(od$d, dd_exp1(od$d, fit$par[1], fit$par[2]))
```


# Characterising distance decay curves

From the preceding analysis the log-linear square root equation (*LSQR*) is the
preferred *dd* functional form, of the options tested so far. The subsequent
analysis therefore uses *LSQR* as the basis for exploring the key feature of
*dd* curves, although the the same analysis could be applied to alternative
functions.

# Applications

Using the functional forms for distance decay as an input to
predict binary outcomes is relatively simple.
"We can transform the output of a linear regression to be
suitable for probabilities by using a logit link
function" [@manning_logistic_2007]:

$$
logit(p) = ln \left( \frac{p}{1 - p} \right)
$$

We can use this knowledge to fit the aformentioned models to raw NTS
data, with "Cycle trip" as a binary variable, for example using
`family = "binomial"` in R's `glm()` function.

# Discussion and conclusions


# References